DeepSeek R1 vs V3: The Complete Local Running Guide
Here's the deal: DeepSeek dropped two monsters in 2025-2026, and they're both free to run locally. But R1 and V3 are built for completely different jobs. Picking the wrong one will waste your GPU cycles.
This guide cuts through the noise with real benchmarks, actual hardware requirements, and zero marketing fluff.
What You're Actually Comparing
DeepSeek R1 is a reasoning model. It thinks out loud — working through problems step by step, checking its own logic, backtracking when it hits dead ends. Trained via pure reinforcement learning (no human reasoning traces), it developed chain-of-thought capabilities that match or beat OpenAI's o1 on math benchmarks.
DeepSeek V3 is the general-purpose flagship. Think of it as the smarter, cheaper alternative to GPT-4o and Claude 3.5 Sonnet. It doesn't "think" — it just answers. Fast.
DeepSeek themselves put it plainly: "R1 falls short of V3 in general-purpose tasks." But R1 destroys V3 on reasoning. Here's the data.
Benchmark Showdown
Reasoning & Math
| Benchmark | DeepSeek R1 | DeepSeek V3 | Notes |
|---|---|---|---|
| AIME 2024 | 79.8% | N/A | R1 = o1 tier |
| MATH-500 | 97.3% | N/A | R1 beats o1 here |
| Codeforces Elo | 2,029 | N/A | Candidate Master level |
| MMLU | ~70% | 88.5% | V3 wins general knowledge |
| GPQA | ~60% | 59.1% | Close on science |
Coding
R1 excels at algorithmic coding and competitive programming. V3 (and especially V3.1) handle real-world software engineering better. For coding tasks, check our updated local coding model rankings.
The Hardware Reality Check
This is where most people get tripped up.
DeepSeek R1 (full 671B):
- MoE architecture: 671B total params, 37B active per token
- At Q4 quantization: ~336GB VRAM+RAM
- Practical? Only on Mac Studio Ultra (512GB) or multi-GPU setups
DeepSeek V3 (full 671B):
- Same MoE architecture
- At Q4 quantization: ~400GB
- At FP16: ~1,400GB
- Not practical for any consumer hardware.
Here's the key: run the distilled versions. DeepSeek distilled R1's reasoning into manageable sizes:
| Model | VRAM (Q4) | AIME 2024 | MATH-500 | Ollama |
|---|---|---|---|---|
| R1-Distill-7B (Qwen) | ~6 GB | 55.5% | 92.8% | ollama pull deepseek-r1:7b |
| R1-Distill-14B (Qwen) | ~11 GB | 69.7% | 93.9% | ollama pull deepseek-r1:14b |
| R1-Distill-32B (Qwen) | ~22 GB | 72.6% | 94.3% | ollama pull deepseek-r1:32b |
| R1-Distill-70B (Llama) | ~43 GB | 70.0% | 94.5% | ollama pull deepseek-r1:70b |
GPU recommendation:
- 8 GB VRAM → 7B model (~50 tok/s)
- 12 GB VRAM → 14B model (sweet spot, ~30 tok/s)
- 24 GB VRAM → 32B model (~20 tok/s)
- 48 GB+ VRAM → 70B model (~10 tok/s)
When to Use Each
Use R1 when:
- Math homework or competition problems
- Algorithm design
- Multi-step logic puzzles
- You need to see the model's reasoning (transparency)
- Budget reasoning API calls ($0.55 input / $2.19 output per 1M tokens)
Use V3 when:
- General chat and Q&A
- Writing assistance
- Code generation (real-world, not algorithms)
- Information retrieval
- Cheapest API rates ($0.28 input / $0.42 output per 1M tokens)
Don't use R1 for:
- Simple questions — it overthinks. "What's the capital of France?" gets 200+ thinking tokens before the answer.
- Creative writing — it's functional, not engaging.
- Speed-critical applications — reasoning tokens add latency.
Running Locally: Quick Setup
Ollama (Recommended)
# Pull the 14B distilled version (11GB VRAM)
ollama pull deepseek-r1:14b
# Important: Increase context length (default 4096 is too small)
# Create a Modelfile:
FROM deepseek-r1:14b
PARAMETER num_ctx 16384
ollama create deepseek-r1-14b-16k -f Modelfile
LM Studio
Download from lmstudio.ai, search for "DeepSeek R1", and adjust the GPU layers slider. For a 14B model on 12GB VRAM, start with ~28 layers.
The Bottom Line
DeepSeek R1 and V3 aren't competitors — they're complements. R1 is your reasoning engine. V3 is your general assistant. Run both locally in their distilled forms:
- 7B/14B R1 → Consumer GPU, strong math reasoning
- API V3 → When you need GPT-4 class intelligence without the GPT-4 price
The $5.6 million training cost for these models (vs. $100M+ for proprietary alternatives) is why open-source AI is eating the world. See how they stack up in our full rankings.