GPT-5.4 Just Dropped — And It's the First AI That Beats Humans at Using Computers
OpenAI's latest model doesn't just write code — it uses computers like a pro. The OSWorld benchmark just got shattered.
We knew it was coming. But honestly? Even we didn’t expect it to land this hard.
OpenAI just released GPT-5.4, and there’s no gentle way to say this: it’s the first AI model that actually outperforms humans at navigating a desktop computer. Not in some toy benchmark. In real, honest-to-goodness computer use.
Built-In Computer Use, Not a Plugin
Here’s the big deal: this isn’t some add-on or Claude Code-style extension. Native computer use is baked directly into the model. GPT-5.4 can read screens, click buttons, type text, and move files around — without you needing to wire up any tools yourself.
The model sees what’s on your screen and just… does the thing.
OSWorld: 75% vs Human 72.4%
Let’s talk numbers. The OSWorld benchmark tests how well an AI can complete real tasks on a desktop operating system — think opening apps, organizing files, filling out forms. GPT-5.4 scored 75%. The human baseline? 72.4%.
This is the first time an AI has beaten humans at desktop navigation. Not matched — beaten.
The Token Situation
Context windows keep getting ridiculous, and 5.4 is no exception:
- 1M token context — but only via API and Codex
- 272K tokens for standard usage
If you’re running MCP workflows, there’s good news: tool search now uses 47% fewer tokens. That adds up fast when you’re chaining multiple tools together.
Pricing
Here’s what you’re paying:
- $2.50 per million input tokens
- $10 per million output tokens
Not cheap, but when the model is doing work that used to require a human sitting at a keyboard? The math starts to make sense.
Why This Matters
For years, we’ve been talking about AI as a “co-pilot.” GPT-5.4 is the first release that feels like it might actually be ready to fly solo on certain tasks. Not everything — but the gap between “assistant” and “worker” just got a lot narrower.
This is the “AI can do real work” release. We’ve been waiting for it.