Qwen’s official models come with safety training baked in — refusals for certain prompts, content filters, behavioral guardrails. The community variant? Rip all that out and see what happens.

That’s how you end up with models like Qwen 3.6 35B A3B Uncensored Heretic — a fine-tune that strips the safety training but keeps the underlying capability, resulting in a model that doesn’t refuse prompts and allegedly scores better on actual usefulness benchmarks than the base.

What “Uncensored” Actually Means

When a model is fine-tuned to be “uncensored,” what happened is the safety/RLHF training was either removed or overwritten. The model retains most of its base capability on standard tasks — coding, reasoning, analysis — but stops refusing prompts it would otherwise refuse.

This matters for two use cases:

Research and development — when you’re testing boundary cases or studying refusal behavior, you need a model without guardrails
Personal use cases — some users genuinely want a model that will engage with anything, even if they personally never ask it to do anything harmful

The tradeoff: uncensored models can be more “helpful” on edge cases but they also lose some of the alignment benefits. You’re trading safety for capability on a specific axis.

The KLD Metric (What 0.0015 Actually Means)

The Reddit posts are dropping a metric called KLD — KL divergence — on these uncensored variants. The “Heretic” variant posts a KLD of 0.0015 with 10/100 refusals vs. the base model’s 100/100.

KLD measures how much the uncensored variant diverges from the base model’s probability distributions. Lower KLD = closer to the original model’s behavior, just without the refusal mechanism. A KLD of 0.0015 is described as “incredible” in the LocalLLaMA thread — meaning the uncensoring barely changed the model’s core reasoning patterns.

For context: a KLD above 0.01 would start to show meaningfully different output distributions. At 0.0015, you’re getting the uncensored behavior with minimal behavioral drift.

The “Heretic” and “Genesis” Variants

Based on what’s active on HuggingFace and r/LocalLLaMA right now:

Qwen3.6-35B-A3B-Uncensored-Heretic-Native-MTP-Preserved — the one getting the most buzz. The “Native MTP Preserved” means all 19 Mixture of Token Heads are retained (MTP is Qwen’s extension to the transformer architecture for improved inference). Claims include:

KLD 0.0015 (very close to base model behavior)
10/100 refusals (vs. base model’s 100/100)
Fits in 24GB VRAM with IQ4_XS quantization + Q8 KVcache at 262K context

Qwen3.6-35B-A3B-Uncensored-Genesis-APEX-MTP — another variant, similarly uncensored with MTP architecture preserved.

DavidAU’s Qwen 3.5 Collection — on HuggingFace, a collection of both regular and uncensored Qwen 3.5 variants from 0.8B through 35B, with min 256K context and image support. Both “Reg” and “Heretic” fine-tunes available.

Hardware Requirements

The 35B parameter class requires serious VRAM for meaningful context:

Quantization	VRAM	Context
Q4_K_M	~20GB	128K
IQ4_XS + Q8 KVcache	~24GB	262K
FP16 (full)	~70GB	128K

For comparison: a 24GB RTX 3090 or RTX 4090 can run the IQ4_XS variant at 262K context. An RTX 4080 Super at 16GB would need to drop to Q4_K_M at 128K.

The people running these successfully are using 24-48GB VRAM configurations — either single high-end consumer cards or professional cards (A6000 48GB, etc.).

What You Actually Get

The honest answer from the Reddit threads:

Coding: still very capable, unchanged from base
General reasoning: preserved
Refusals: essentially gone
Multi-turn tool calling: works without failure
Speed: same as base at equivalent quantization

The tradeoff is purely on alignment. If you were using Qwen 3.6 35B for coding assistance and it was refusing on some legitimate technical questions about security or adult topics, the uncensored variant solves that. If you needed the alignment for safety reasons, you probably shouldn’t be running uncensored variants.

Where to Find Them

The main sources:

HuggingFace: llmfan46/Qwen3.6-35B-A3B-uncensored-heretic-Native-MTP-Preserved-GGUF (and similar)
DavidAU collection: collections/DavidAU/qwen-35-08-2-4-9-27-35b-regular-uncensored
Unsloth Studio: has Dynamic 4-bit MXFP4_MOE GGUF variants for all Qwen 3.5 sizes

These aren’t on the model API services — you need to run them locally or on a dedicated GPU instance.

Bottom Line

If you’re running Qwen 3.6 35B locally and hitting refusal walls on legitimate technical questions, the uncensored variants are worth trying. The KLD scores on the best ones show minimal behavioral drift from the base model — you’re mostly just removing the refusal mechanism.

For users who don’t need uncensored access, the official Qwen 3.6 models remain the better choice. The uncensored space is specifically for developers and researchers who need that door open.

Sources

r/LocalLLaMA: Qwen3.6 35B A3B Heretic (KLD 0.0015) — first-hand accounts of the Heretic variant
r/LocalLLaMA: Qwen3.6 35B A3B uncensored heretic Native MTP Preserved — MTP preservation details
r/LocalLLaMA: Qwen3.6-35B-A3B-Uncensored-Genesis-APEX-MTP — second major uncensored variant
Unsloth Documentation: Qwen3.5 — Dynamic 4-bit MXFP4_MOE GGUF variants
HuggingFace: DavidAU Qwen 3.5 Collection — regular and uncensored variants in all sizes