February 20, 2026

MiniMax M2.5 vs Claude Opus 4.6: The $0.15 Model That Changes Everything

AI Cost Coding

The numbers don't lie: an AI model that matches Anthropic's best coding model at 1/20th the cost just landed. Here's what that means for your budget—and why enterprise AI buyers should pay attention.

The Headline Numbers

SWE-Bench Verified Score

80.2%

vs Claude Opus 4.6: 80.8%

Cost Per Coding Task

$0.15

vs Claude Opus 4.6: $3.00

On February 11, 2026, Shanghai-based AI startup MiniMax released M2.5—an open-source model that achieves 80.2% on SWE-Bench Verified, the industry's most respected coding benchmark. That's within 0.6 percentage points of Anthropic's Claude Opus 4.6 (80.8%), at approximately one-twentieth the price.

Run continuously at 100 tokens per second? The model costs just $1 per hour. Four AI agent instances can run year-round for roughly $10,000 total. Pricing is $0.30 per million input tokens and $1.20 per million output tokens—compared to Claude Opus 4.6's $60-75 per million output tokens.

How This Works: The MoE Advantage

M2.5 uses a Mixture-of-Experts (MoE) architecture. While the model contains 230 billion total parameters, only 10 billion—roughly 4.3%—activate during any single inference pass. This sparse activation pattern delivers frontier-tier responses without frontier-tier compute costs.

The model was trained using MiniMax's proprietary Forge reinforcement learning framework across more than 200,000 real-world environments spanning code repositories, web browsers, and office applications. A notable behavioral feature: M2.5 develops what MiniMax calls an "Architect Mindset"—the model decomposes and plans features before writing code, rather than diving directly into implementation.

Benchmark Breakdown

Benchmark MiniMax M2.5 Claude Opus 4.6
SWE-Bench Verified 80.2% 80.8%
Multi-SWE-Bench 51.3% (#1) 50.3%
BFCL Tool Calling 76.8% 63.3%
Output Price (per 1M tokens) $1.20–$2.40 $60–$75

What This Means for Enterprise AI

The release confirms a structural shift in AI economics. According to Epoch AI research, GPT-4-level capability now runs at roughly 1/100th of what it cost two years ago. Additional analysis found that achieving comparable benchmark scores on challenging AI tasks dropped from $4,500 per task to $11.64 over the course of 2025 alone.

But M2.5 isn't for everyone. It lags significantly on general reasoning (AIME 2025: 45%) and complex terminal operations (Terminal-Bench 2: 52% vs Opus 4.6's 65.4%). Think of it as a specialist that matches the generalist on its specialties—coding and agentic workflows—not an across-the-board replacement.

💡 Practical Takeaway

Build a hybrid routing architecture now: use cost-efficient models like M2.5 for high-volume coding and agentic tasks, and reserve frontier closed models for complex reasoning, knowledge work, and tasks requiring maximum reliability.

Should Your Team Adopt M2.5?

✓ Yes, If...

  • • Coding tasks dominate your AI workload
  • • You need to run AI agents at scale
  • • Data sovereignty requires self-hosting
  • • Budget constraints are real

✗ Wait, If...

  • • General reasoning is your priority
  • • Complex terminal workflows needed
  • • You need premium support/SLAs
  • • Branding requirements are strict

The Bottom Line

MiniMax M2.5 represents a pivotal moment in enterprise AI economics. When a model achieving 80.2% on SWE-Bench is available at $1.20 per million output tokens, the justification for paying $60–$75 per million for comparable coding performance requires clear articulation of what the premium buys—data privacy guarantees, SLA commitments, regulatory compliance support, and superior general reasoning.

The era of AI "too expensive to deploy at scale" is ending. Whether that's good news for your organization depends on how quickly you can adapt your architecture to take advantage of it.


Sources

Want to see how M2.5 performs against other models? Check our AI agent rankings and methodology for detailed benchmark breakdowns.