AWS raises AgentCore runtime quotas by up to 5x to help enterprises scale AI agents

AWS has increased key Amazon Bedrock AgentCore runtime quotas by up to fivefold, enabling enterprises to support more concurrent AI agents and user interactions without going through the quota-increase process that often slows production deployments.

While quota increase service requests are free themselves, the added capacity is more likely to translate into higher underlying compute and runtime consumption as enterprises expand AI deployments.

“The new default limits support up to 5,000 active concurrent sessions in US East (N. Virginia) and US West (Oregon), and 2,500 in all other supported Regions (previously 1,000 and 500 respectively),” AWS wrote in its release notes.

The hyperscaler has also increased the number of interactions each AI agent can handle from 25 tokens per second to 200 tokens per second across all supported regions, which it says will enable enterprises to support more simultaneous user requests.

Further, to help enterprises scale AI applications faster during periods of peak demand, the hyperscaler also quadrupled the rate at which new AI agent sessions can be created for container deployments, increasing the limit from 100 TPM to 400 TPM.

Why the higher quotas matter for enterprise AI deployments

The change in AgentCore Runtime quotas, according to Charlie Dai, principal analyst at Forrester, is the hyperscaler’s response to enterprises rapidly shifting AI-agent experiments to production deployments: “In our client conversations, the bigger change is not the number of agents but the move from single-task copilots to multiple production-grade agents serving larger user populations.”

That means that AWS is seeing higher concurrency, longer-running agents, and more complex orchestration patterns that exceed earlier default assumptions, Dai said.

For enterprises making that transition, the higher default quotas, according to Ashish Banerjee, senior principal analyst at Gartner, will help reduce the operational friction of scaling AI agents from pilot projects to production deployments.

Large-scale AI deployments, especially multi-agent systems, are becoming an operational consideration as they outgrow default runtime quotas quickly, requiring enterprises to seek quota increases, echoed Amit Chandak, chief analytics officer at IT Consulting firm Kanerika.

“That quota increase request in an enterprise environment means a support ticket, a business justification, and a review cycle. That’s days or weeks of overhead on something that shouldn’t block a deployment,” Chandak said.

“A quota beyond the process cost, teams design architectures around whatever the default ceiling is. Higher defaults change what teams are willing to attempt without triggering an exceptions process, and that shapes architectural decisions, not just day-to-day operations,” Chandak added.

The benefits extend beyond reducing administrative overhead, Chandak further added, as exhausting runtime quotas in production can interrupt customer-facing applications and multi-agent workflows.

“Agent sessions are stateful. When a session gets throttled mid-task, the agent can lose intermediate context, and reconstructing that state is significantly harder than retrying a stateless API call,” Chandak pointed out.

“In multi-agent pipelines, one rejected session stalls the entire workflow. You get orphaned sessions, incomplete tool calls, and gaps in monitoring that are hard to diagnose after the fact,” Chandak added.

These gains, however, are unlikely to be uniform across enterprises. Enterprises running high-concurrency, transaction-intensive AI workloads, according to Gaurav Dewan, research director at Avasant, stand to benefit the most from the higher default quotas.

These include customer service and contact centers, software engineering and DevOps automation, IT operations, financial services process automation, healthcare administration, supply chain coordination, and security operations, where AI agents often operate simultaneously at scale, Dewan added.

Hyperscalers are taking different paths to production AI

AWS, however, is not alone in adapting its infrastructure for helping enterprises scale AI agents in production, and rival hyperscalers, such as Microsoft and Google, are approaching the challenge in different ways.

Microsoft’s approach with the Azure Foundry Agent Service, according to Chandak, differs from AWS: “Many of its agent runtime limits are fixed by design; they cannot be increased even on request.”

“Instead, Microsoft puts the scaling flexibility at the model deployment layer, where quotas are adjustable, rather than at the agent runtime layer. That’s a deliberate architectural difference from what AWS is doing with AgentCore: raising the floor on concurrent sessions at the runtime level,” Chandak pointed out.

The updated quota limits for Bedrock AgentCore will automatically apply to all enterprise accounts, AWS said.

Sources: Info World
Published: Jul 2, 2026, 7:31:10 AM EDT