Google has only one way to measure the phenomenal AI growth it’s seen: in tokens.
The company processes 3.2 quadrillion tokens per month, Google CEO Sundar Pichai said during this week’s I/O keynote, adding, “never imagined I’d say quadrillion…, but here we are.”
Basically, tokens are a unit of measure used by large language models (LLMs) to process data.
Tokens, which have been called the “new oil” fueling the AI revolution, are also a way AI vendors can meter usage and price their services. Enterprises are lusting for tokens, and spending billions of them to grab compute time.
As with oil, the demand for tokens is seemingly insatiable — and it is straining an already short GPU supply, which in turn is increasing the cost of running AI tools.
What exactly is a token?
Similar to the way humans think, LLMs grasp the meaning of a sentence by breaking words down into tokens. Pichai described them as “the fundamental units of data our models process, many representing a problem being solved.”
The fundamental unit could be in the form of a word, a sub-word, or a string of letters, symbols, or phrases. Compound words can be split into multiple tokens.
For example, the prompt “I am running after a car” could generate “run” as one token and “ing” as the second token because it changes the meaning of the sentence. “Car” would be its own token.
“On average, one token is about three-quarters of a word, so 100 words works out to roughly 135 tokens,” said Deepak Seth, senior director analyst at Gartner.
Token prices can vary
Not all tokens are priced the same. An uploaded token to an AI system is cheaper, while downloaded tokens are more expensive. A user, for instance, might pay to upload a resume, then pay even more to download the resume polished by an LLM.
“The upload cost is less expensive than the download cost because the AI has done some work,” explained Max Leaming, head of data science and AI solutions at ManpowerGroup.
Token-based pricing is mainly used for enterprises and power users such as coders. Anthropic’s Claude Code and OpenAI’s Codex are priced in tokens, and Microsoft’s GitHub is adopting a form of token-based pricing starting June 1.
The final AI bill includes the costs of tokens and computing expenses (such as GPU time).
ManpowerGroup pays the token cost to the model provider, Leaming said, while compute costs ring up in parallel. (The company uses Microsoft Azure, which offers multiple LLMs, with Snowflake as its database.)
Some LLMs can be smarter and token friendly
Some AI models give better responses, which might represent a more efficient use of a token budget. Pichai said Google’s new Gemini 3.5 Flash — which is priced in tokens — delivers “frontier-level capabilities at less than half the price of comparable frontier models.
“We’ve heard that many companies are already blowing through their annual token budgets…,” Pichai said. “If companies use a mix of [Gemini 3.5] Flash and other frontier models, they could save a lot of money.”
Prompt efficiency matters
Using tokens inefficiently is wasteful spending, Gartner’s Seth said. One coder might use up 10,000 tokens to get his or her work done, while another might use only 1,000. But there’s no tool to measure efficiency, Seth said.
“Some companies are moving towards outcome-based pricing because when people start realizing the real cost of tokens, companies will start looking at token efficiency,” Seth said.
With that in mind, ManpowerGroup developed a dashboard that cuts the steps for clients to get data, Leaming said. New users to an internal labor-market data tool initially needed 10 follow-up questions to drill into a query. A year later, those same users averaged four follow-ups.
“They’re using fewer tokens and they’re simply more efficient,” he said. “And that, in large part, has to do with your ability to prompt efficiently.”
But there’s a flip side. AI tools such as Anthropic’s controversial Mythos LLM — which isn’t available publicly yet — might be priced astronomically high, though its superior reasoning could make it more efficient.
“Even though the per-token costs may go up, we may see overall costs go down,” Leaming said.
AI vendors and the ‘drug dealer strategy’
Top AI vendors are spending trillions to build out AI infrastructures, but they’re not charging enough on tokens, Seth said. “I feel like the OpenAIs, the Googles and the Anthropics of the world are following a drug dealer strategy: Get people addicted to AI, and then raise the price of a token,” he said.
AI vendors could also use free tokens as a way to lock in customers, Leaming said. Free tokens from AI vendors could incentivize companies to build processes and workflows around proprietary LLMs and agents. And as if to reinforce the effort, major AI vendors are now sending out engineers to deploy AI models at customer sites.
The engineers, better known as forward-deployed engineers, or FDEs, are more or less hired guns for AI deployments. They focus on helping customers roll out AI projects successfully.
FDEs can study and help set strategies, put battle plans in place, build agentic frameworks, and roll out AI in conjunction with customers’ own domain experts and engineers. They also evaluate AI models, resolve context and reasoning problems, and handle security issues.
OpenAI, Google, and Microsoft are moving away from LLMs as the product. “Now they want to get inside of the firm and build your infrastructure for you,” Leaming said.
Free tokens, the next worker perk
Tokens are now sometimes offered as a job perk to engineers, Nvidia CEO Jensen Huang has said. Experts compare that to when companies cover cell phone bills for their workers.
Leaming, who said he hasn’t seen instances of that yet, found the idea odd. But if it is happening, much depends on who is offering free tokens.
Employers offering free OpenAI or Microsoft tokens could represent an indirect form of vendor lock-in, he said. “Then I’m incentivized. The more I’m familiar with the product, the more I’m gonna use it.”
Free tokens are also a way to spur the adoption of emerging AI technologies that are not yet safe for work. Many top tech leaders, for example, are exploring the possibilities of OpenClaw — considered a breakthrough AI technology — on their own dime because the technology is considered risky for enterprise environments.
Alex Spinelli, ARM’s senior vice president for AI and developer platforms, is one such person experimenting with OpenClaw at his own cost.
“In my OpenClaw, when I had it configured wrong, I got a bill for $500 in one weekend, and I was like, what the hell happened here? There’s no free lunch. Tokens are expensive,” Spinelli said.
Gartner’s Seth compared the free-token tactic to a cigarette company in India that once gave employees boxes of cigarettes alongside their salaries. “In addition to their salaries, they used to get a couple of boxes of cigarettes. The whole intent was they will…distribute them out and just make them more popular,” he said.
“If you give it to them, they will use it, because now it’s in lieu of money.”