On April 15, 2026, OpenAI shipped new capabilities to its Agents SDK — and the headline feature isn’t a model upgrade or a new modality. It’s containment.

The update adds two things: sandboxing that lets agents operate in isolated workspace environments, and an in-distribution harness for testing agents against frontier models before they hit production. Both are aimed squarely at enterprise use cases where unpredictability is a liability and trust boundaries matter.

Why Sandboxing Is the Real Enterprise Feature

If you’ve deployed agents into real workflows, you know the fundamental tension: agents need access to be useful, but unrestricted access is dangerous.

An agent that can read your filesystem, write to your code repo, and execute shell commands is powerful — until it’s compromised, misprompted, or simply wrong in a way that cascades into real damage. The LiteLLM supply chain attack showed exactly how fast a compromised agent can pivot from “helpful tool” to “credential exfiltration pipeline.” Anthropic’s own documentation on agentic AI repeatedly flags unpredictable tool use as the top operational risk for production deployments.

Sandboxing addresses this by giving the agent a siloed workspace. It can access files and code within that workspace for specific operations, but the blast radius of a compromised or runaway agent is bounded. The surrounding system — credentials, other services, production data — stays isolated.

OpenAI’s Karan Sharma, on the product team, put it plainly: “This launch, at its core, is about taking our existing Agents SDK and making it compatible with all of these sandbox providers.”

That’s notable language. OpenAI isn’t building its own sandbox runtime — it’s building adapter infrastructure so the SDK can plug into whatever sandbox provider an enterprise is already running. That suggests the play is integration, not lock-in.

The Harness: Testing Frontier Model Agents Before Production

The second addition — the in-distribution harness for frontier models — is equally relevant for teams building on OpenAI’s most capable models.

In agent development, the “harness” is everything surrounding the model: the tools it can call, the memory stores it has access to, the guardrails that constrain its outputs, the context windows it operates within. Frontier models are general-purpose by design, which means they can be deployed in wildly different configurations with different safety characteristics.

The harness lets teams build and test agent configurations against frontier models in a controlled way, validating behavior before those configurations hit real workflows. For teams that have been operating frontier model agents on intuition and monitoring dashboards, this is a meaningful step toward structured testing.

The Python SDK gets these capabilities first. TypeScript support is planned for a later release.

This Fits a Pattern in Enterprise Agent Infrastructure

OpenAI isn’t alone here. The trajectory toward sandboxed, testable agent deployments is visible across the major providers:

Anthropic has been vocal about computer useClaude’s sandboxing requirements and the risks of agents operating on developer machines with accumulated credentials
Google’s Gemini agent infrastructure includes workspace isolation primitives for enterprise deployments
Cloudflare’s Agents Week announcement (also April 15) includes domain registration APIs purpose-built for AI agents operating autonomously

The common thread: enterprise buyers won’t sign contracts for agents that can trash production environments or exfiltrate data. Sandboxing isn’t a nice-to-have feature for this market segment — it’s a procurement requirement.

What This Means for Agent Builders

If you’re building agents on OpenAI’s stack today, the new SDK capabilities don’t immediately change your architecture. But they’re directional signals worth tracking:

1. Agent containment is becoming infrastructure. Rather than rolling your own sandboxing via Docker containers, firejail profiles, or VM boundaries, expect the SDK layer to handle more of this. Watch how the sandbox provider integrations evolve.

2. Test harnesses will become standard practice. The frontier model harness is a precursor to more formal agent testing frameworks — what unit tests are for functions, agent harnesses will be for autonomous workflows. Expect this to tighten over time.

3. Enterprise procurement is shaping agent tooling. The reason sandboxing gets priority over the nth model capability is that enterprise buyers are asking for it. If you’re building agents for internal tooling or client work, these requirements will flow downstream.

OpenAI said it will continue expanding the Agents SDK, with code mode and subagents on the roadmap for both Python and TypeScript. The sandbox integration is just the start.

Why Sandboxing Is the Real Enterprise Feature

The Harness: Testing Frontier Model Agents Before Production

This Fits a Pattern in Enterprise Agent Infrastructure

What This Means for Agent Builders

Sources