AI Token Costs Are Out of Control. The Industry Finally Has a Plan.

Uber burned through its entire 2026 AI coding budget by April. One engineer ran up $40,000 in tokens in a single month. A company forgot to set usage limits and got a $500 million Claude bill. These aren’t horror stories — they’re this week’s news.

The AI token cost crisis has arrived, and it’s forcing the industry to do something it’s resisted for years: build a FinOps discipline for language models.

The Numbers Don’t Lie

The scale of overspend is becoming visible because companies are now forced to look. A study by engineering operations platform Faros, covering 20,000 developers over two years, found that AI-powered engineers are roughly twice as productive as those using AI less — but they consume 10x the number of tokens to get there. Per-developer token consumption at companies using AI heavily has risen 18.6x in nine months, according to Jellyfish.

Goldman Sachs projects global token usage will multiply 24x by 2030, reaching 120 quadrillion tokens per month. The inference market alone is forecast to expand from $106 billion in 2025 to $255 billion by 2030 — a sector that barely existed four years ago.

Meanwhile, per-token prices have stopped falling. After years of declining costs, new frontier models are priced higher than their predecessors. The era of cheap inference is over, and companies that built their AI strategies around perpetually cheaper tokens are now reaping what they sowed.

“We started hearing existential crises,” said J.R. Storment, executive director of the FinOps Foundation. “In April and May, companies told us: ‘Oh my god, we are 3x over our entire 2026 token budget and it’s only April.’”

Enter the Tokenomics Foundation

This week, the Linux Foundation announced the intent to launch the Tokenomics Foundation, a new standards body focused on open specifications, benchmarks, and best practices for AI infrastructure economics. It launches formally in July, backed by Google Cloud, Oracle, Microsoft, Salesforce, IBM, JPMorganChase, and others. It will operate in close partnership with the existing FinOps Foundation, which already applies cost discipline to cloud spend.

The goal is to create a common language for AI token economics — canonical definitions for what a token costs, what it produces, and how to compare spend across vendors. It plans to define metrics like cost-per-intelligence and tokens-per-watt, and to expand the FOCUS specification (an existing cloud billing standard) into token-based spending models.

“Token economics is fundamentally more abstract and opaque than anything we’ve managed at this scale before,” said Nishant Gupta, chief availability officer at Salesforce. “It requires a different operational muscle than the one the industry built for cloud.”

Why Current Tools Fall Short

Tracking cloud costs is a problem measured in hundreds of millions of rows per month. Tracking token costs is measured in trillions of rows per month. You can’t put that in a spreadsheet.

The problem is compounded by a billing opacity that mirrors what happened in telecom and early cloud: companies routinely find discrepancies between what their vendor reports for usage and what their own systems show. At Priceline, senior director of IT finance Chris Reed has already flagged inconsistencies between vendor-reported usage and internal data — a familiar pattern from his telecom expense management background.

There’s also the model-routing illusion. When you call “Claude Opus,” you’re not only running Opus. Frontier model providers route tasks to smaller, cheaper models when they’re capable of handling the job — so a meaningful chunk of your “Opus spend” is actually Sonnet or Haiku. Gordon, CEO of Faros AI, describes this as already showing up on enterprise Anthropic bills. As models get smarter about delegation, this effect will only grow.

The Emerging Tooling Stack

The market is responding, if unevenly. Factory launched a model router this week that automatically selects the right model for every task — a direct answer to the cost problem. Jellyfish, Waydev, and Faros offer AI agent monitoring to demonstrate developer tool ROI. Ramp moved into AI spend management; Datadog and New Relic added token-level observability to their existing APM platforms. AWS is expected to introduce new AI financial management features at the FinOps X conference next week.

The Tokenomics Foundation’s first deliverable is still months away. Companies that need answers today have to build their own — or pay someone who has.

What You Should Do Now

The foundation’s standards will matter, but they’re not ready yet. Here’s what works today:

  1. Set hard token budgets per task or per engineer. One developer spending $40,000 a month without oversight is a process failure, not a productivity win. Budget limits with alerts catch this before it becomes a line-item crisis.

  2. Track costs per feature, not just aggregate spend. If you can’t attribute token costs to a feature or a team, you can’t make intelligent decisions about where to cut or expand.

  3. Use model routing intentionally, not accidentally. When a vendor routes your “frontier” calls to cheaper models, that’s fine — but you should be the one deciding when to use which model. Tools like Factory or OpenRouter let you set that policy explicitly.

  4. Watch the Tokenomics Foundation’s technical roadmap. It will define the measurement standards your tooling will eventually need to conform to. Understanding the direction early means you build toward it instead of retrofitting later.

The FinOps discipline took a decade to mature for cloud. The AI version doesn’t have that luxury — token bills are arriving monthly and budgets are already blown.


Sources: