GPT-5.4 Just Dropped — And It's the First AI That Beats Humans at Using Computers

We knew it was coming. But honestly? Even we didn’t expect it to land this hard.

OpenAI just released GPT-5.4, and there’s no gentle way to say this: it’s the first AI model that actually outperforms humans at navigating a desktop computer. Not in some toy benchmark. In real, honest-to-goodness computer use.

Built-In Computer Use, Not a Plugin

Here’s the big deal: this isn’t some add-on or Claude Code-style extension. Native computer use is baked directly into the model. GPT-5.4 can read screens, click buttons, type text, and move files around — without you needing to wire up any tools yourself.

The model sees what’s on your screen and just… does the thing.

OSWorld: 75% vs Human 72.4%

Let’s talk numbers. The OSWorld benchmark tests how well an AI can complete real tasks on a desktop operating system — think opening apps, organizing files, filling out forms. GPT-5.4 scored 75%. The human baseline? 72.4%.

This is the first time an AI has beaten humans at desktop navigation. Not matched — beaten.

The Token Situation

Context windows keep getting ridiculous, and 5.4 is no exception:

1M token context — but only via API and Codex
272K tokens for standard usage

If you’re running MCP workflows, there’s good news: tool search now uses 47% fewer tokens. That adds up fast when you’re chaining multiple tools together.

Pricing

Here’s what you’re paying:

$2.50 per million input tokens
$10 per million output tokens

Not cheap, but when the model is doing work that used to require a human sitting at a keyboard? The math starts to make sense.

Why This Matters

For years, we’ve been talking about AI as a “co-pilot.” GPT-5.4 is the first release that feels like it might actually be ready to fly solo on certain tasks. Not everything — but the gap between “assistant” and “worker” just got a lot narrower.

This is the “AI can do real work” release. We’ve been waiting for it.

Sources

GPT-5.4 Just Dropped — Here’s Everything Developers Need to Know — Dev.to, March 2026