AI demand is so high, AWS customers are trying to buy out its entire capacity

The Amazon Web Services (AWS) chip business is “on fire,” Trainium offers better price-performance than Nvidia, and customers are so eager for AI compute capacity that they’re looking to buy up all that’s currently available.

These are the takeaways shared by Amazon CEO Andy Jassy in his eight page letter to shareholders in the tech giant’s 2025 annual report.

Jassy’s comments underscore how all-in enterprises are for AI, and Amazon’s ambitions to dominate a technology that, as he described it, will be as transformative as electricity.

Noted Scott Bickley, advisory fellow at Info-Tech Research Group, “pulling it all together, AWS is diving deeper to control the AI stack comprehensively through every layer: power, data center, custom silicon in the middle, and training and inference at the top.”

Big inference asks from customers

AWS added 3.9GW of new power capacity in 2025 and expects to double its total power capacity by the end of 2027, Jassy wrote to shareholders. “Yet we still have capacity constraints that yield unserved demand,” he said.

Notably, he revealed that two large customers are in such need of AI compute that they asked to buy all available 2026 instance capacity for AWS’ custom CPU chip, Graviton. He emphasized that AWS can’t agree to those kinds of requests, given other customer needs.

Matt Kimball, VP and principal analyst at Moor Insights & Strategy, noted, “two large customers asking to buy all of AWS’s Graviton capacity for 2026 says everything we need to know about where the market is.”

It’s not necessarily just a supply chain story, though, he said; it’s more of a “strategic dependency” story. Enterprises aren’t just shopping for compute, they’re trying to lock up capacity before a competitor does. “The risk for AWS isn’t failing to build fast enough. It’s more along the lines of constrained customers maybe hedging toward Azure or Google Cloud Platform (GCP),” he pointed out.

This also indicates how popular Graviton has become, and suggests that AWS might be struggling to meet demand. Rather than “lightweight chips supporting lightweight workloads,” Graviton is being used across workloads “with a variety of computational profiles,” said Kimball.

As they mature, Azure Cobalt and Google Cloud Axion processors will likely see the same kind of demand, which will make for an “interesting market dynamic” between Arm and x86 technologies, he said.

Info-Tech’s Bickley agreed that the impact of supply chain constraints is “broad and deep” in its effect on AI buildout. Even in the midst of reports that 50% of planned AI data center capacity will not materialize in 2026, “everything is sold out across the board.”

Trainium’s competitive edge

Going into 2026, Jassy described Amazon’s chip business as “on fire.” While AWS has a strong partnership with Nvidia and uses its semiconductors, there is what he called a “new shift” in the processor landscape as customers seek out better price-performance.

Notably, Amazon released the second generation of its custom AI silicon, Trainium2, in late 2024, and Bedrock now runs most of its inference on these next-generation accelerators. Jassy claimed Trainium2 offers roughly 30% better price-performance than comparable GPUs, and is “largely sold out.”

Meanwhile, Trainium3, which just began shipping, is 30% to 40% more price/performant than Trainium2, and is already “nearly fully-subscribed,” he said. Further, a significant chunk of Trainium4 capacity, which is still about 18 months from broad availability, has been reserved.

“There’s so much demand for our chips that it’s quite possible we’ll sell racks of them to third parties in the future,” Jassy said.

Info-Tech’s Bickley pointed out that Amazon is not necessarily trying to eliminate Nvidia so much as reduce its dependence on the chip leader’s technology in areas “where AWS can win on economics.”

While AWS remains a strong Nvidia partner, it can provide a differentiated value proposition based on price-performance, he said. AWS brings a “holistic package” via tight integration with Bedrock, AWS-designed interconnects, more efficient token economics, and a software stack built on standard PyTorch/JAX/vLLM workflows.

Trainium’s prime use cases are training and inference for large language models (LLMs), multimodal models, and diffusion transformers in the hundreds of billions to trillion-plus parameter range, Bickley explained.

Marquee names like Anthropic and Uber are “putting AWS’s efficiency claims to the test,” he noted; on the other hand, customers like Cohere and Stability AI prefer Nvidia’s mature tooling framework and “superior chip designs,” citing AWS service and availability issues.

Moor’s Kimball pointed out that another factor to consider is AWS’ partnership with Cerebras. Trainium is optimized for prefill and Cerebras CS-3 is optimized for decode, allowing the two to deliver what they claim is the best inference performance with no user intervention required. “This is the kind of ‘point-and-click’ simplicity enterprise users are looking for,” he said.

Ultimately, Jassy is drawing a direct line from what Graviton did to x86 to what Trainium is doing to Nvidia, he said. Inference is the “fastest-growing and most cost-sensitive workload in enterprise AI, and that’s exactly where Trainium is gaining the most ground.”

Learning from the Mantle scale-up

Jassy also emphasized the importance of being able to go back to the starting line to “redirect the trajectory.” For instance, Amazon Bedrock was built rapidly and scaled “faster than expected,” and the team realized it required a whole different type of inference engine, not just a tweak.

The Bedrock team quickly spun up a group of six “very skilled engineers” using AWS’ agentic coding service, Kiro, to deliver a new engine, Mantle, in 76 days. Mantle has since become the backbone of Bedrock, which processed more tokens in Q1 2026, Jassy claimed, than had been processed in all prior years combined.

The ability for a small team to accomplish such a large rebuild in such a short time frame, alongside adding features such as stateful conversation management, asynchronous inference, and higher default quotas, among others, is “impressive at first blush,” noted Info-Tech’s Bickley.

“The takeaway is that Mantle should be considered a key product for inference in its own right,” he said. And a separate AWS engineering post seeks to add confidence in the model’s security and governance considerations, Bickley explained.

Moor’s Kimball called the genesis of Mantle “really two stories.” One is operational (Bedrock needed a new architecture); the other is productivity compression.

“If six engineers with agentic tools can do what 40 couldn’t have done faster, the calculus on team size, project timelines, and build-vs-buy decisions shifts fundamentally,” he said. “The token volume numbers make the outcome clear and compelling.”

But Mantle isn’t just a rebuild, it’s yet another proof point that AI-assisted development is changing what’s possible. “Not just in theory or some marketing slogan,” Kimball said, “but in production.”

Jassy noted, “progress will not be linear. There will be moments of acceleration and moments where we adjust course. We will experiment, invest disproportionately behind what matters, and pull back when something isn’t working.”

Sources: Network World
Published: Apr 10, 2026, 8:27:59 PM EDT