Qwen’s official models come with safety training baked in — refusals for certain prompts, content filters, behavioral guardrails. The community variant? Rip all that out and see what happens.
That’s how you end up with models like Qwen 3.6 35B A3B Uncensored Heretic — a fine-tune that strips the safety training but keeps the underlying capability, resulting in a model that doesn’t refuse prompts and allegedly scores better on actual usefulness benchmarks than the base.
When a model is fine-tuned to be “uncensored,” what happened is the safety/RLHF training was either removed or overwritten. The model retains most of its base capability on standard tasks — coding, reasoning, analysis — but stops refusing prompts it would otherwise refuse.
This matters for two use cases:
The tradeoff: uncensored models can be more “helpful” on edge cases but they also lose some of the alignment benefits. You’re trading safety for capability on a specific axis.
The Reddit posts are dropping a metric called KLD — KL divergence — on these uncensored variants. The “Heretic” variant posts a KLD of 0.0015 with 10/100 refusals vs. the base model’s 100/100.
KLD measures how much the uncensored variant diverges from the base model’s probability distributions. Lower KLD = closer to the original model’s behavior, just without the refusal mechanism. A KLD of 0.0015 is described as “incredible” in the LocalLLaMA thread — meaning the uncensoring barely changed the model’s core reasoning patterns.
For context: a KLD above 0.01 would start to show meaningfully different output distributions. At 0.0015, you’re getting the uncensored behavior with minimal behavioral drift.
Based on what’s active on HuggingFace and r/LocalLLaMA right now:
Qwen3.6-35B-A3B-Uncensored-Heretic-Native-MTP-Preserved — the one getting the most buzz. The “Native MTP Preserved” means all 19 Mixture of Token Heads are retained (MTP is Qwen’s extension to the transformer architecture for improved inference). Claims include:
Qwen3.6-35B-A3B-Uncensored-Genesis-APEX-MTP — another variant, similarly uncensored with MTP architecture preserved.
DavidAU’s Qwen 3.5 Collection — on HuggingFace, a collection of both regular and uncensored Qwen 3.5 variants from 0.8B through 35B, with min 256K context and image support. Both “Reg” and “Heretic” fine-tunes available.
The 35B parameter class requires serious VRAM for meaningful context:
| Quantization | VRAM | Context |
|---|---|---|
| Q4_K_M | ~20GB | 128K |
| IQ4_XS + Q8 KVcache | ~24GB | 262K |
| FP16 (full) | ~70GB | 128K |
For comparison: a 24GB RTX 3090 or RTX 4090 can run the IQ4_XS variant at 262K context. An RTX 4080 Super at 16GB would need to drop to Q4_K_M at 128K.
The people running these successfully are using 24-48GB VRAM configurations — either single high-end consumer cards or professional cards (A6000 48GB, etc.).
The honest answer from the Reddit threads:
The tradeoff is purely on alignment. If you were using Qwen 3.6 35B for coding assistance and it was refusing on some legitimate technical questions about security or adult topics, the uncensored variant solves that. If you needed the alignment for safety reasons, you probably shouldn’t be running uncensored variants.
The main sources:
llmfan46/Qwen3.6-35B-A3B-uncensored-heretic-Native-MTP-Preserved-GGUF (and similar)collections/DavidAU/qwen-35-08-2-4-9-27-35b-regular-uncensoredThese aren’t on the model API services — you need to run them locally or on a dedicated GPU instance.
If you’re running Qwen 3.6 35B locally and hitting refusal walls on legitimate technical questions, the uncensored variants are worth trying. The KLD scores on the best ones show minimal behavioral drift from the base model — you’re mostly just removing the refusal mechanism.
For users who don’t need uncensored access, the official Qwen 3.6 models remain the better choice. The uncensored space is specifically for developers and researchers who need that door open.