Qwen3.5-35B-A3B: First Local LLM That Passes Real Coding Tests

Published March 4, 2026

Local AI • Coding • Open Source

Alibaba's Qwen3.5-35B-A3B just scored 37.8% on SWE-bench Verified Hard — nearly matching Claude Opus 4.6 at 40%. The kicker: it's a 3-billion-active-parameter model you can run on your own GPU.

What This Means

For years, the narrative has been: if you want serious coding capability, you need GPT-4 or Claude Opus. Those require API calls, subscription fees, and sending your code to third-party servers. Qwen3.5 changes that equation.

The model uses a novel "verify-on-edit" agent strategy that breaks down coding tasks into smaller, verifiable steps. Instead of generating one large block of code and hoping it works, the model:

  1. Generates a small edit
  2. Verifies it works
  3. Builds on success
  4. Rolls back failures immediately

The Numbers

  • 37.8% on SWE-bench Verified Hard (full model)
  • 3B active parameters (far smaller than frontier models)
  • ~35B total parameters with Mixture of Experts architecture
  • Runs on consumer hardware — 24GB VRAM recommended

Why It Matters for Local AI

This isn't just another benchmark win. It's validation that small models + good prompting strategies can compete with frontier models on real coding tasks. For developers who care about:

  • Privacy — your code stays on your machine
  • Cost — no per-token API fees
  • Control — customize system prompts and behavior
  • Speed — local inference with dedicated GPU

Qwen3.5-35B-A3B delivers the closest experience to Claude/GPT-4 for local development workflows.

How to Run It

You'll need:

  • GPU with 24GB+ VRAM (RTX 3090, RTX 4090, or equivalent)
  • LM Studio, Ollama, or text-generation-webui
  • Q4_K_M or Q5_K_S quantization for best quality/performance balance
# Example with LM Studio
# Search for "Qwen3.5-35B-A3B" in the model browser
# Recommended: Q4_K_M or better quantization
# Set context length to 8192+ for complex files

The Bottom Line

Qwen3.5-35B-A3B represents a inflection point for local AI coding assistants. You can now build a coding companion that:

  • Handles real SWE-bench level tasks
  • Runs entirely offline
  • Costs nothing after hardware purchase
  • Keeps your proprietary code private

The gap between local and API-based models is closing fast. If you've been waiting for a local model that can actually help with production code — this is it.


Sources