Google bets on workload-specific TPUs with 8t and 8i launch

Google on Tuesday unveiled two distinct eighth‑generation TPUs, one for training and one for inference, reviving a split‑chip strategy as cloud providers race to tailor AI hardware to sharply different performance and cost demands.

The company has experimented with differentiated TPU variants before, notably with its fifth-generation V5p and V5e chips, but recent generations such as Trillium and Ironwood largely followed a single-design approach.

The split designs of the new chips, according to HFS Research Phil Fersht, are a strategic move by Google to align its hardware more closely to different stages of the AI lifecycle across enterprises, potentially improving utilization and cost efficiency in production environments.

“Training and inference are now diverging in economics, memory behavior, networking needs, and buying patterns. Customers increasingly want the right price-performance curve for each stage of the model lifecycle, not a one-size-fits-all accelerator,” Fersht said.

In practical terms, the ability to choose between two TPUs designs would help enterprises avoid paying training-grade costs for inference-heavy workloads, said Charlie Dai, principal analyst at Forrester.

Echoing Dai, TrendForce analyst Fion Chiu pointed out that the more cost-efficient 8i chip would help enterprises deploy larger models at a lower price point.

For model providers, such as OpenAI and Anthropic, the choice of chips, Dai pointed out, enables a clearer separation between training and serving fleets, while still allowing reuse of common tools and code paths, in turn lowering total costs, improving fleet efficiency, and simplifying model lifecycle transitions.

In fact, Google is not the only chip provider that is walking the split-design path, said Stephen Sopko, analyst at HyperFRAME Research, giving the example of AWS, which has two distinct chips — Trainium and Inferentia — for different AI workloads.

How is 8t and 8i better than Ironwood?

While the design split reflects changing economics, the two chips are also built for distinct technical advantages over their predecessor, Ironwood.

TPU 8t, the training-focused variant, according to Google, offers nearly 3x compute performance per pod, larger superpods, and double the interchip bandwidth when compared to Ironwood.

While Ironwood delivers 42.5 exaflops across a 9,216-chip pod, TPU 8t scales to 121 exaflops across 9,600 chips, alongside a doubling of bidirectional scale-up bandwidth to 19.2 Tbps per chip and a fourfold increase in scale-out networking bandwidth to 400 Gbps, the company said in a statement.

The boost to performance and bandwidth between racks, according to Omdia principal analyst Alexander Harrowell, will support training of even larger models with shorter runs compared to Ironwood.

By contrast, TPU 8i, the inference-focused variant, represents a more substantial departure, at least three times the memory, from Ironwood’s design priorities, Harrowell said.

TPU 8i introduces 288GB of high-bandwidth memory combined with 384MB of on-chip SRAM, which Harrowell pointed out brings TPUs closer to the memory footprint of leading GPUs.

The expanded on-chip SRAM, the analyst noted, will help keep active model data closer to the processor, in turn, reducing latency during inferencing, particularly as models grow in size and complexity.

The architectural changes in 8i reflect the industry’s incremental shift towards Mixture of Experts (MoE) and long-context models, which are expected to grow further in size in the future, said HyperFRAME Research’s Sopko.

“Large RAM and pod sizes are now required to keep trillion-parameter models and million-token context windows memory-resident during serving,” Sopko said.

When compared directly to Ironwood, which offers 256 chips per pod and 1.2 exaflops per pod, TPU 8i offers pods that can be scaled up to 1,152 chips, generating 11.6 exaflops per pod.

Beyond scale and memory, the new chips, according to Google, improve system efficiency over Ironwood, delivering 2x better performance per watt and tighter integration with Google’s Axion Arm-based CPU hosts.

TPU 8t and 8i, the company added, will be made generally available later this year as part of Google’s AI Hypercomputer platform.

Sources: Network World
Published: Apr 22, 2026, 8:32:57 AM EDT