LLM Token Prices Dropped 97% in Three Years. Here's What Actually Changed.
GPT-4-level performance cost $30 per million tokens in 2023. Today you can get it for under $1. The LLM cost collapse is real — but the implications for how you build are more complicated than 'it's all cheap now.'
Here’s a number worth remembering: $30 per million tokens in 2023. One dollar per million tokens today.
That’s a 97% collapse in the cost of frontier-level intelligence. Three years. And the pace hasn’t slowed.
The Price History in One Table
The numbers are stark. GPT-4-level performance — the kind that cost serious money in early 2023 — is now available at commodity pricing:
| Year | Frontier Model Cost (per 1M tokens) | What That Buys |
|---|---|---|
| 2023 | $30–$60 | GPT-4, Claude 2 |
| 2024 | $3–$7 | GPT-4o, Claude 3 Sonnet |
| 2025 | $0.60–$3 | Claude 3.5 Sonnet, GPT-4.5 |
| 2026 | $0.30–$1.20 | GPT-5.5 Instant, Gemini 3.5 Flash, Qwen 3.7 Max |
The driving forces are real: competition between labs is intense, inference infrastructure has improved dramatically, and the economic model of AI-as-a-service creates constant downward price pressure. Google, OpenAI, Anthropic, DeepSeek, and Alibaba are all competing on price in addition to capability.
Why the Cost Collapse Doesn’t Mean “It’s All Free”
The headline price is the trap. The real cost of LLM-powered systems includes several layers that don’t show up in per-token pricing:
Context window overhead. Larger context windows mean more tokens per query. A 200K context query at $0.50/M tokens sounds cheap until you realize that query is 100 tokens of input and 199,900 tokens of mostly-wasted context padding. The actual cost-per-query is driven by what you actually send, not the ceiling.
Agentic overhead. Single-query use cases are cheap. Multi-step agents — the kind that browse, call tools, and loop — multiply token consumption in ways that per-token pricing obscures. A coding agent that runs 50 queries at $0.50/M tokens isn’t paying $25/M — it’s paying $25 per task, which is a completely different unit economics conversation.
Provider lock-in. The cheapest API tier often comes with rate limits, no SLA, or usage caps that make production deployment risky. “Under $1/M tokens” doesn’t mean “under $1/M tokens with guarantees.”
Where the Real Value Has Shifted
The cost collapse has been absorbed by the market in a predictable way: lower per-query costs drove higher query volumes, which drove more complex agentic architectures, which consumed the savings. The productivity gains from cheaper LLM inference are real, but they’ve largely been cycled back into more capable — and more expensive in aggregate — systems.
What changed structurally is the floor, not the ceiling. The minimum viable intelligence for a given task is now dramatically cheaper. That means:
-
Commodity tasks are genuinely cheap. Summarization, classification, extraction, rewriting — tasks that don’t require frontier model capability can run on $0.10–$0.30/M token models with minimal quality loss.
-
Local inference is now competitive. With capable 12B–26B models running at 85 tokens/second on consumer hardware, the economics for on-device use cases have flipped. For certain workloads, the choice between API and local isn’t about quality anymore — it’s about privacy, latency, and infrastructure cost.
-
The frontier is still expensive. The models topping the benchmark tables — Claude Opus 4.8, GPT-5.5 Pro, Claude Fable 5 — are not $0.30/M tokens. They’re $5–$15/M tokens. Capability at the frontier still costs real money. The price collapse was absorbed by the mid-tier, not the top.
Practical Implication: Build a Routing Layer
If you’re running LLM workloads today, the cost collapse has one practical consequence that most teams haven’t fully acted on: model routing is now worth the engineering cost.
A simple task router that sends commodity tasks (classification, summarization, extraction) to a cheap model (Qwen 3.7 Max at ~$0.60/M, or Gemini 3.5 Flash at ~$0.30/M) and reserves expensive frontier models only for tasks that actually need them can cut your API bill by 60–80% without meaningful quality degradation on the commodity tier.
The engineering cost of routing is low. The potential savings aren’t. For any team spending more than a few hundred dollars per month on LLM inference, a routing layer with per-task routing logic is now worth building.
The 97% cost collapse didn’t make intelligence free. It made it cheap enough to use everywhere. The teams that win are the ones that built the infrastructure to take advantage of that.
Sources
- LMSys Arena — AI Model Leaderboard June 2026 (June 2026)
- LMSys AI Trends — Prices and Value (June 2026) (June 2026)
- Lushbinary — Claude Fable 5 vs GPT-5.5 vs Gemini 3.1 Pro comparison (June 10, 2026)
- BenchLM.ai — LLM Agent Benchmarks 2026 (accessed June 10, 2026)