If you’ve been sleeping on AMD GPUs for local AI inference, now’s the time to wake up.

AMD’s RX 9000 series launched early 2026 with RDNA4 architecture, and the ROCm software stack — historically the achilles heel of AMD GPU computing — has finally matured enough that you can now run local LLMs on it without wrestling with WSL workarounds or praying to the driver gods.

What’s Changed

The headline: ROCm 6.4.4 and ROCm 7.x (CES 2026) support RDNA3 and RDNA4 cards natively on Windows. No more WSL-only nightmares. If you have an RX 7000 or RX 9000 series card, you can run local inference with PyTorch’s native AMD support — no workarounds required.

AMD’s own GPUOpen guide walks through setting up PyTorch with AMD acceleration on Windows, targeting the RX 7000 and 9000 series plus the Ryzen AI 300 and AI Max APUs.

This is a meaningful shift from 12-18 months ago when AMD GPU owners were essentially locked out of the local LLM game.

The RX 9070 XT Benchmarks

Ivan Angelov posted detailed benchmarks running local LLMs on a 16GB RX 9070 XT — RDNA4’s flagship consumer card. He tested Vulkan vs ROCm across different model sizes and quantization levels.

Key findings from his testing:

16GB VRAM handles 7B models at Q4 fine, 13B at Q3 usable, 70B requires aggressive offloading
ROCm 7.x outperforms Vulkan for compute-heavy attention mechanisms
vLLM support for RDNA4 (gfx1201) requires patching in v0.20.0 — context length limits are a known issue being addressed

The CraftRigs article confirms the broader picture: ROCm 7.x at CES 2026 marked the inflection point where AMD went from “theoretically possible” to “actually works for normal humans.”

What You Can Actually Run

Model	Quantization	RX 9070 XT (16GB)
Llama 3 8B	Q4_K_M	✅ Smooth
Mistral 7B	Q5_K_M	✅ Good
Qwen 2.5 14B	Q4_K_M	⚠️ Tight but usable
Llama 3 70B	Q4_K_M	❌ Needs more VRAM

The 16GB models will serve most people fine. If you’re targeting 70B+, you’re looking at multi-GPU setups or waiting for the 24GB variants.

The Catch

AMD still trails NVIDIA on the software ecosystem side. The tooling is rougher, some frameworks don’t have first-class AMD support yet, and you’ll occasionally hit edge cases that require patching (like the vLLM context length issue on RDNA4).

But the trajectory is clear. ROCm 6.x was “barely works.” ROCm 7.x is “works reliably.” The gap between AMD and NVIDIA for local LLM inference has shrunk dramatically in the past 12 months.

Is It Worth Building Around?

If you’re speccing a local AI rig in 2026, an RX 9070 XT at ~$550 is competitive with an RTX 4070 Ti Super on price-to-VRAM ratio — and AMD’s ray tracing performance has caught up enough that you’re not sacrificing gaming to run AI workloads.

The AMD advantage: more VRAM per dollar than NVIDIA’s consumer lineup, especially at the $500-$700 price point.

The AMD disadvantage: you’ll move slower on the bleeding edge of framework updates. NVIDIA’s CUDA ecosystem is still years ahead in tooling maturity.

Bottom line: If you’re already on AMD, ROCm is finally a viable path. If you’re building from scratch and want maximum flexibility, NVIDIA is still the safe bet. But the AMD option is real now in a way it wasn’t 18 months ago.

Sources

AMD ROCm Compatibility Matrix — Official AMD ROCm support timeline
AMD GPUOpen: PyTorch Windows LLM Guide — Official AMD instructions for running local LLMs on RX 7000/9000 series
CraftRigs: AMD ROCm in 2026 — Full ROCm 6.4.4 / 7.x overview for RDNA3 and RDNA4
Ivan Angelov: RX 9070 XT Local LLM Benchmarks — Vulkan vs ROCm on RDNA4 with actual performance numbers
dasroot: Patching vLLM for AMD RDNA4 Context Limits — Technical details on the vLLM RDNA4 patching process