The Meter Is Running on Cloud AI — Windows Just Made Local Free Forever
At Build 2026, Microsoft drew the sharpest line yet between metered cloud AI and unmetered on-device intelligence. Here's what it means and why it changes everything for local AI agents.
Cloud AI has a dirty secret: every query costs money. Every agent task, every reasoning chain, every document processed — metered. At Build 2026, Microsoft named it out loud and then announced the alternative is now free.
“Unmetered intelligence on Windows powered by on-device AI.” Those aren’t our words — that’s Microsoft’s own phrasing, dropped on stage at this year’s developer conference. And unlike previous AI-on-device pitches, this one came with silicon, software, and a shipping OS to back it up.
The Metered Problem
If you’ve been running cloud AI agents at scale, you know the math breaks badly past a certain volume. Copilot Credits have caps. Per-query billing adds up fast when agents work autonomously. Enterprise contracts help, but they don’t change the fundamental model: you’re renting intelligence by the token.
Microsoft’s answer at Build 2026 was direct. On-device Windows AI runs without Copilot Credits. No per-query billing. No metered inference. The compute is yours because the hardware is yours.
Seven New MAI Models, All Running Local
The model layer got a full refresh. Microsoft dropped seven new MAI models at Build, headlined by MAI-Thinking-1 — the company’s first in-house reasoning model, trained from scratch on clean, commercially licensed data. 35 billion active parameters, 256K context window, designed for multi-step reasoning and code generation. Independent raters preferred it to Sonnet 4.6 in blind tests. It matches Opus 4.6 on coding tasks via SWE Bench Pro.
MAI-Code-1-Flash — the inference-efficient coding variant — is already live in Copilot and VS Code. MAI-Image-2.5 and its flash variant handle text-to-image and image-to-image workloads. MAI-Voice-2 and its flash variant expanded to 15 additional languages. MAI-Transcribe 1.5 covers 43 languages with near-streaming accuracy.
All of these are available through Microsoft’s Foundry, but the strategic move is getting them running locally — where the meter stops.
Project Solara: An OS Built for AI Gadgets
Microsoft also announced Project Solara, a new operating system purpose-built for always-on AI devices and autonomous agents. Think dedicated AI hardware — the kind that doesn’t need a keyboard, doesn’t need a phone companion app, just runs in the background and handles tasks proactively.
This is Microsoft’s answer to the rabbit r1 and the humbler wave of AI-native hardware. Solara is the OS those devices needed from day one.
Agent 365 SDK: Agents That Know Your Workplace
On the enterprise side, the Agent 365 SDK shipped into general availability. It connects agents directly into Microsoft 365 data — Teams, Outlook, documents, calendar — with Work IQ, a new context layer that grounds agents in both world knowledge and how work actually happens in an organization.
Microsoft Scout — a personal agent for work — is built on OpenClaw and Work IQ. It handles meeting prep, scheduling conflicts, and routine tasks proactively, without asking. NVIDIA specifically called out both OpenClaw and Hermes as seeing “rapid adoption” in the agent space.
RTX Spark: 1 Petaflop, 128GB, Local Agents Now Practical
Hardware is the missing piece that’s now here. NVIDIA’s RTX Spark — announced at Computex alongside Build — delivers 1 petaflop of AI compute and 128GB of unified memory in a laptop form factor. That’s not a concept chip. That’s a machine that can run 35B parameter models locally without wheezing.
The RTX Spark ecosystem includes new security primitives built into Windows and the NVIDIA OpenShell runtime, giving agents the containment and policy controls enterprises need. OpenClaw and Hermes are integrating both into their new Windows applications.
Performance improvements compound the picture. Multi-token prediction — a speculative decoding technique where a smaller draft model proposes multiple tokens at once — is now shipping in both llama.cpp and vLLM. The result: 2x inference performance on top agentic models like Qwen 3.6 and 3.5 27B.
The Unmetered Takeaway
If you’re running cloud AI agents at any real scale, you’re paying a tax on every task. Microsoft’s Build 2026 message is straightforward: that tax is optional now.
The hardware exists. The models exist. The OS exists. Project Solara gives always-on AI hardware a real home. The Agent 365 SDK brings the same intelligence into enterprise workflows. And with RTX Spark pushing 1 petaflop into laptops, the “my hardware can’t handle this” excuse is gone.
The meter was always a business model, not a technical limitation. Microsoft just made that visible — and then shipped the alternative.
Sources
- Microsoft Build 2026: All the news about Windows, AI, RTX Spark, and more — The Verge, May 2026
- NVIDIA Levels Up Local AI Agents Across RTX PCs and DGX Spark — NVIDIA Blog, June 2026
- Microsoft Build 2026: Be yourself at work — Microsoft Blog, June 2, 2026