Cirrascale to offer on-prem Google Gemini models

Cirrascale Cloud Services has announced it will make artificial intelligence models available for on-premise use through Google Distributed Cloud, a move aimed at organizations that want advanced AI capabilities while keeping data inside their own firewall.

The company said enterprise and public-sector agencies will be able to run Gemini models either on-prem or in Cirrascale data centers, including in connected or fully air-gapped deployments, to address data sovereignty and regulatory requirements.

Cirrascale said the offering expands its inference platform to support Gemini on Google Distributed Cloud, positioning the service for industries such as government, defense, finance, healthcare and higher education.

Cirrascale runs on-prem Gemini on a Dell-made appliance running Intel and Nvidia CPUs and GPUs but doesn’t use Google’s vaunted Tensor Processing Unit (TPU). It takes the appliance from Dell and installs the Gemini and GDC on the appliance and are able to deliver that as a service to the clients.

Dave Driggers, CEO of Cirrascale, said customers won’t get the same performance they would get with a TPU, but they do get more than adequate in performance. “They’re really the only other training platform separate of Nvidia, where you’ve got a full stack, you’ve got the processors, the networking, the software stack is all integrated top to bottom,” he said.

Cirrascale said the deployment model is designed for customers with strict data residency rules or low-latency needs by keeping computing resources close to where data is stored and processed.

Google Distributed Cloud can be deployed in customer-controlled environments, including installations that are disconnected from the Internet, which is a key requirement for some government and critical-infrastructure users.

One of the big challenges is that these models are incredibly valuable and they need to be delivered in a trusted, secure environment, said Driggers. “That’s what’s really the most important thing to Google, is this model. So they need to be delivered in a confidential compute manner,” he said.

The model is not stored on a hard drive; it is stored in memory. If there’s any intrusion to the machine, the machine basically turns itself off, and the model is gone, so it cannot be stolen, according to Cirrascale.

Cirrascale said it will provide the hardware configurations, performance tuning and support needed to run Gemini inference at scale as part of its Cirrascale Inference Platform.

The company said the service is aimed at customers that want a production environment without rebuilding existing infrastructure and includes what it described as optimized systems for Gemini inference and ongoing operational support.

“It’s Google’s model. Our secret sauce is being a trusted partner to be able to deliver that model to the clients,” said Driggers. “It’s part of our inference as a service offering. So for our customers, we have a software layer on top of the model that allows them to tailor how they use it, so they can set user queues up and set user limitations.”

This allows subscribers to engage in tokenomics, so they can have a knowledge worker who gets a different token rate than, say, a high end programmer that needs to get a job done quickly.

The service can also distribute Gemini if the customers spread across multiple regions, and the company does load balancing for the end user, according to the vendor.

The service is just starting previews now and general availability is planned for late June or early July.

Sources: Network World
Published: Apr 24, 2026, 12:32:58 PM EDT