We Built an Agent. It Made a $47,000 Decision. We Couldn't Explain It.
The $47,000 approved order nobody could explain. Why agent observability isn't the same as accountability — and why we started building the difference.
Here’s the moment we decided to build this.
We had an agent running in production for three months. Routine stuff — handling order adjustments for an e-commerce client. Customer emails in, agent reads the context, approves or escalates. Nothing exotic.
Then one night, our ops lead flagged a discrepancy: a $47,000 order adjustment had been approved. It shouldn’t have been.
We opened the logs.
We had the event log: timestamp, agent ID, action taken. We had the model output: “Approved. Reason: customer has valid promo code on file.” We had the chat transcript: customer’s original request, agent’s response.
What we did not have: what the agent actually saw at decision time. What context it retrieved. What it considered and rejected before landing on that output. What its internal state was at the exact moment it said “approved.”
We had every piece except the one that mattered.
We spent six hours reconstructing that decision manually. Six hours of correlating logs, querying vector store state, chasing down tool call timestamps. We got there — but it was archaeology, not debugging. We were lucky the agent had only been running for three months. What happens when it’s running for three years, making 300 decisions a day?
That night was the seed. We’re building Agent Audit Trail because we couldn’t find a tool that gave us what we actually needed when it mattered.
The Problem Is Not Observability
The market is full of agent observability tools. LangSmith. Helicone. Langfuse. AgentOps. They’re good at what they’re built for: helping engineers understand what agents are doing, debug failures, optimize performance.
That’s not our problem.
Our problem is: a regulator asks why your agent approved that loan. Your enterprise customer asks what your agent decided about their account and on what basis. Your CEO asks why the system moved $200,000 last Tuesday.
Observability tools don’t answer those questions. They’re built for the engineer debugging at 2am — not the compliance officer writing a report for a regulator. The outputs are different. The audience is different. And the accountability standard is different.
We’re not building another observability tool. We’re building the accountability layer underneath it.
Why Now
Agents are no longer a research problem. They’re a production problem.
Money is moving. Decisions are being made. Personal data is being processed. And the teams responsible have no better tooling than event logs and chat transcripts.
The EU AI Act makes this concrete. High-risk AI systems — including agents deployed in financial services, healthcare, and hiring — need mandatory audit trails by August 2026. Fines are real: up to €35 million or 7% of global turnover. Non-EU companies deploying agents into European markets face the same requirements.
But the regulation is a forcing function, not the root cause. The root cause is simpler: agents are making decisions that affect real things, and the people responsible can’t explain those decisions. That’s not a compliance problem. That’s an infrastructure problem.
What We’re Building
Agent Audit Trail is a transparency layer for AI agents.
Decision logs with full context. Context snapshots — what the agent saw at decision time, not just what it output. Chain-of-thought provenance — the reasoning path, including what was considered and rejected. Full replay. Compliance-ready exports for GDPR subject-access requests and EU AI Act documentation.
Not a governance framework. Not a compliance product. The plumbing. The foundation.
We’re building in the open. Early, honest about what we learn, not pretending we have all the answers.
The Question
What would you do with a complete audit trail of your agent’s decisions that you can’t do now?
What would it unlock? What’s the use case that keeps you up at night? What’s the meeting you dread because someone will ask a question you can’t answer?
We’re listening. Reply, comment, DM. And if you’re running agents in production and this problem sounds familiar — let’s talk.