OpenAI Ships Jalapeño: First Custom Inference Silicon, Built with Broadcom in Nine Months

June 24, 2026

OpenAI shipped Jalapeño today. Working with Broadcom, the company designed the chip in nine months from a blank slate. Jalapeño runs LLM inference — not a GPU competitor, not a training accelerator, just a purpose-built inference engine that OpenAI now controls end-to-end.

This is OpenAI’s first custom silicon. It is also the first major inference accelerator where the design loop itself ran on the lab’s own frontier models. That second detail is the bigger story.

What Shipped

Jalapeño is the first generation of an “Intelligence Processor” that OpenAI and Broadcom built together. Per the OpenAI announcement, the design started from a blank RTL spec — no licensed IP, no GPU heritage — and converged on engineering samples that are already running GPT-5.3-Codex-Spark workloads in Broadcom’s lab at production target frequency and power. Celestica handles board, rack, networking, and production integration alongside Broadcom. Initial deployment with data-center partners targets the end of 2026, scaling to gigawatt capacity across multiple generations.

The headline numbers, as OpenAI reports them:

9 months blank-slate to engineering samples
First-generation performance per watt “substantially better than current state-of-the-art” — OpenAI’s own characterization of early testing
End of 2026: first hyperscaler-class deployments
Multi-generation roadmap to gigawatt scale

Read that first bullet again. Nine months is the design cycle for a custom inference accelerator — typically an 18- to 36-month exercise for a serious chip. OpenAI did not get there by throwing more engineers at the problem. The accelerator, per the announcement, ran on its own models.

The Underreported Story: AI Designed the Chip

OpenAI says Jalapeño’s verification and design-iteration loop leaned on OpenAI’s own frontier models. The company is explicit that the compressed timeline was accelerated by their own models — a phrase that, in chip-design terms, means something concrete: faster RTL exploration, automated verification corner cases, layout regression detection, and likely a fair amount of formal-property generation.

The implications travel further than the chip itself. Custom inference silicon is the most physics-bound product in the AI stack — design cycles, mask sets, and tape-out economics traditionally gate everything else. If a frontier lab can compress that cycle to nine months by routing its own models through the verification loop, the pace of inference-hardware iteration just stepped up. A year from now, “v2” is plausibly 12 months behind v1, not three years.

For now this is OpenAI’s claim, and the design-loop details stay proprietary. But the directional signal — AI accelerators designed with heavy AI assistance — is the part platform teams should be modeling into capacity and cost forecasts for 2027 and 2028.

What Jalapeño Is Not

It is worth being precise about what did not ship:

Not a GPU competitor. Nvidia is not named in the announcement. The chip is inference-only and purpose-built for LLM serving.
Not a training accelerator. No backward pass, no optimizer state, no gradient buffering. GPT-5.3-Codex-Spark running in Broadcom’s lab is an inference workload.
Not a general-purpose accelerator. There is no public story about running non-LLM workloads (recommendation systems, classical ML, vision) on Jalapeño.
Not a one-shot. Broadcom and OpenAI are explicit about a multi-generation roadmap. The chip is the first step of a capacity strategy, not a single product cycle.

If you read “OpenAI vs Nvidia” into today’s announcement, you are reading more than the press release actually says. The framing that fits is “OpenAI removes itself from the GPU allocation queue for inference, with Broadcom as the partner.” That is a procurement story, not a chip-industry head-to-head.

Platform-Engineering Implications

Three things shift for platform teams over the next 18 months.

The inference cost curve bends — for OpenAI first. A purpose-built inference chip with claimed step-change perf/W is going to be deployed where OpenAI’s own traffic sits: ChatGPT, the API, and Codex-style workloads. That removes a large slab of LLM inference from the shared GPU market and changes the supply-demand math for everyone else. Inference spot pricing on hyperscalers is a function of utilization; if OpenAI is no longer bidding for the same racks, residual capacity for the rest of the market behaves differently than the 2024–2025 baseline suggested.

Build-vs-buy gets concrete numbers behind it. The interesting question for every team running models at scale is no longer “could we ever build silicon?” but “what is the inference cost delta between a frontier custom chip and an H-class GPU at our batch size and quantization?” The OpenAI-Broadcom number — first-generation, claimed substantially better perf/W — is the first data point. The detailed technical report OpenAI promises in the coming months will turn that claim into a number you can model into a unit-economics spreadsheet.

Sovereignty and regulated workloads get a new option. If OpenAI can run regulated traffic on silicon it controls end-to-end, the data-residency and chain-of-custody argument changes shape. Some EU and APAC buyers have been waiting for a non-GPU, lab-controlled inference path precisely so they can argue about hardware provenance. Jalapeño is the first credible one. The “inference sovereignty” pitch is plausible enough that procurement teams in regulated industries should ask about it on the next vendor call.

What We Don’t Know

Three open questions worth tracking:

The performance-per-watt claim is OpenAI’s. No independent benchmark yet. The detailed technical report OpenAI promises in the coming months is the moment this becomes a number, not a press-release phrase.
The model-side of the design loop is not disclosed. OpenAI has not said which models ran the verification, what stage of the flow they touched, or how much of the chip is AI-generated versus AI-verified. Expect partial disclosure over the next two quarters.
Capacity is targeted, not booked. End-of-2026 deployment and “gigawatt scale” are roadmap claims. Actual rack count, fab allocation, and partner mix will not be public until Broadcom’s next earnings call and OpenAI’s next infrastructure post.

Treat today’s announcement as a credible directional claim backed by a working part, not as benchmark-grade proof of perf/W leadership.

What To Do This Week

Three actions for platform teams, in order of urgency:

Re-run your 2027 inference cost model with a custom-silicon scenario. Not “OpenAI vs the market” — just a 30–40% perf/W uplift on a meaningful fraction of the world’s LLM inference capacity, applied to your own batch sizes and traffic mix. See what the unit-economics number actually looks like.
Ask your hyperscaler rep about the OpenAI-Broadcom roadmap. Capacity tied up by OpenAI is capacity that does not get reallocated to the open market. That changes which instance types and regions get constrained when your next 6-month plan lands.
Do not start an “OpenAI vs Nvidia” procurement memo. The announcement does not support that framing. If someone in your org is about to write one, redirect them to “what does the perf/W claim mean for our 2027 inference bill.”

Jalapeño is a real chip running real workloads in a real lab. It is also one announcement. The interesting platform questions are the ones that play out across the next four quarters, not the next four hours.

Sources

OpenAI and Broadcom unveil LLM-optimized inference chip — OpenAI (2026-06-24)
OpenAI and Broadcom Unveil LLM-Optimized Intelligence Processor — Broadcom IR / GlobeNewswire (2026-06-24)
OpenAI unveils first chip as part of Broadcom deal in effort to ‘build the full stack’ — CNBC (2026-06-24)
OpenAI unveils its first custom chip, built by Broadcom — TechCrunch (2026-06-24)