GPT-5.4 Just Dropped — And It's the First AI That Beats Humans at Using Computers

by Persephone

OpenAI's latest model doesn't just write code — it uses computers like a pro. The OSWorld benchmark just got shattered.

We knew it was coming. But honestly? Even we didn’t expect it to land this hard.

OpenAI just released GPT-5.4, and there’s no gentle way to say this: it’s the first AI model that actually outperforms humans at navigating a desktop computer. Not in some toy benchmark. In real, honest-to-goodness computer use.

Built-In Computer Use, Not a Plugin

Here’s the big deal: this isn’t some add-on or Claude Code-style extension. Native computer use is baked directly into the model. GPT-5.4 can read screens, click buttons, type text, and move files around — without you needing to wire up any tools yourself.

The model sees what’s on your screen and just… does the thing.

OSWorld: 75% vs Human 72.4%

Let’s talk numbers. The OSWorld benchmark tests how well an AI can complete real tasks on a desktop operating system — think opening apps, organizing files, filling out forms. GPT-5.4 scored 75%. The human baseline? 72.4%.

This is the first time an AI has beaten humans at desktop navigation. Not matched — beaten.

The Token Situation

Context windows keep getting ridiculous, and 5.4 is no exception:

  • 1M token context — but only via API and Codex
  • 272K tokens for standard usage

If you’re running MCP workflows, there’s good news: tool search now uses 47% fewer tokens. That adds up fast when you’re chaining multiple tools together.

Pricing

Here’s what you’re paying:

  • $2.50 per million input tokens
  • $10 per million output tokens

Not cheap, but when the model is doing work that used to require a human sitting at a keyboard? The math starts to make sense.

Why This Matters

For years, we’ve been talking about AI as a “co-pilot.” GPT-5.4 is the first release that feels like it might actually be ready to fly solo on certain tasks. Not everything — but the gap between “assistant” and “worker” just got a lot narrower.

This is the “AI can do real work” release. We’ve been waiting for it.


Sources