The Bleeding Llama: How a Critical Ollama Vulnerability Exposed 300,000 AI Servers

Let’s get this straight: if you’re running Ollama and it’s exposed to the internet, your process memory is readable. Right now. That’s what “Bleeding Llama” means — and it’s as bad as it sounds.

Cyera researchers disclosed CVE-2026-7482 on May 5, 2026, and it’s been making rounds in security circles since. The vulnerability has a CVSS score of 9.1 — that’s Critical, full stop. Here’s the breakdown.

What Happened

Ollama is the go-to tool for running LLMs locally. 170,000+ GitHub stars, 100 million+ Docker pulls, and widespread adoption across enterprises and developers. It’s become the default way to run open-source models on your own hardware.

The problem: Ollama uses Go’s unsafe package for low-level memory operations. That’s the escape hatch that lets you bypass Go’s memory safety guarantees. And that’s exactly where this vulnerability lives.

The bug is in Ollama’s GGUF model loader. When processing a GGUF file — the format used to store model weights — the loader reads tensor metadata including an offset and size value provided by the file itself. The issue is it doesn’t verify those values are actually within the file’s bounds. An attacker can craft a GGUF file that declares a tensor offset and size larger than the actual file, and Ollama will happily read whatever’s at that memory location.

This is an out-of-bounds heap read. No authentication required. Send the right request to /api/create with a malicious GGUF file, and you get chunks of process memory dumped back.

What’s At Risk

This isn’t theoretical. Approximately 300,000 Ollama servers are exposed on the public internet [1]. Most default Ollama deployments bind to 127.0.0.1, but the documented OLLAMA_HOST=0.0.0.0 configuration is common in practice — especially in containerized or cloud deployments. If someone spun up an Ollama instance and exposed it thinking “it’s just internal,” that’s the attack surface.

What could be in that memory?

User prompts and conversation history — anything users have sent through the system
System prompts — instructions that define how the model behaves
Environment variables — including API keys, credentials, and secrets
Session tokens and internal data — from other requests processed by the same server

If you’re running Ollama with API keys loaded as environment variables — which is how most production setups work — those keys could already be in someone’s hands.

The Technical Root

The vulnerability exists because Ollama uses Go’s unsafe package in the tensor loading code path. Go is normally memory-safe, but unsafe opt-out means the compiler can’t protect against out-of-bounds reads. The GGUF loader trusts the file’s declared tensor offsets without bounds-checking them against the actual file size.

For the technical details: when /api/create processes a GGUF file, it parses the header and then creates tensor objects with offset/size values taken directly from the file. If those values claim “this tensor starts at byte 999999 and is 5000 bytes long” but the file is only 1000 bytes, Ollama will read from wherever that points in the heap — not from the file [2].

What You Should Do Right Now

1. Patch immediately. Ollama versions prior to 0.17.1 are vulnerable. Update to 0.17.1 or later. If you can’t patch, stop reading this and go restrict access to your Ollama servers now.

2. Don’t expose Ollama to the internet. If it was listening on 0.0.0.0, assume it’s been scanned already. Rotate any secrets that were in that process’s environment. Check your cloud security groups, firewall rules, and Kubernetes network policies.

3. Check if you’ve been affected. There’s no official breach notification system here — you’d need to check your own logs for unusual /api/create requests with large GGUF files, or unusual outbound connections from your Ollama servers. RunZero has a scanner that can identify exposed Ollama instances [3].

4. Audit your environment variables. If any API keys, database passwords, or secrets were loaded into an exposed Ollama process — treat them as compromised. Rotate them.

The Bigger Picture

300,000 exposed AI inference servers is not a small number. This is the new attack surface — not just model weights, but the infrastructure running them. Every self-hosted LLM deployment is a potential memory leak.

The pattern is becoming clear: as organizations rush to run AI locally for privacy and cost reasons, they often skip the security hardening they’d apply to traditional web services. An exposed PostgreSQL instance gets locked down immediately. An exposed Ollama instance? Maybe not so much.

Before you spin up your next Ollama container, ask:

Does this really need to be publicly accessible?
Are environment variables with secrets being passed into this process?
What’s the blast radius if this process memory gets read?

The Fix Is Simple — Do It Now

Update Ollama. Restrict network access. Rotate any potentially exposed credentials. That’s it. This isn’t a zero-day anymore — there’s a patch. The window between disclosure and active exploitation is now.

If you want the full technical breakdown, Cyera’s research is detailed [4]. SecurityWeek and The Hacker News also covered it with good summaries [5][6].

Sources:

[1] RunZero (2026). “Ollama vulnerability CVE-2026-7482: Find impacted assets.” https://www.runzero.com/blog/ollama/

[2] Cyera (2026). “Bleeding Llama: Critical Unauthenticated Memory Leak in Ollama.” https://www.cyera.com/research/bleeding-llama-critical-unauthenticated-memory-leak-in-ollama

[3] RunZero (2026). “Ollama Vulnerability Scanner.” https://www.runzero.com/blog/ollama/

[4] Cyera (2026). “Bleeding Llama: Critical Unauthenticated Memory Leak in Ollama.” https://www.cyera.com/research/bleeding-llama-critical-unauthenticated-memory-leak-in-ollama

[5] SecurityWeek (2026). “Critical Bug Could Expose 300,000 Ollama Deployments to Information Theft.” https://www.securityweek.com/critical-bug-could-expose-300000-ollama-deployments-to-information-theft/

[6] The Hacker News (2026). “Ollama Out-of-Bounds Read Vulnerability Allows Remote Process Memory Leak.” https://thehackernews.com/2026/05/ollama-out-of-bounds-read-vulnerability.html