Never start with a negative, especially in a headline, brandname or event title… they say. Unless, perhaps, you happen to be Undo, a technology designed to give developers the runtime context needed to solve the codebase debugging and execution problems. Undo is now extending its AI toolsets to back up its services, which are built to uncover the root cause (so developers can go back and undo, get it?) of even the most elusive bugs, automatically.
The organisation’s new AI capability gives agents a complete recording of runtime execution, so developers (and indeed root cause agents) can find the root cause of complex software problems instead of guessing.
By capturing a full recording of how a program executed and making that behaviour queryable, Undo lets agents automatically pinpoint the cause of failures that would otherwise defeat them and gives engineers the visibility to review and answer for the code their agents produce
Lumpy large codebases
Ask any software application development team of a reasonable size and they will tell you that it can take skilled engineers days, weeks, or sometimes even years to track down complex problems in large codebases.
But, it appears, AI coding agents alone are “unequal” to the task. Why is this so?
Because the cause of these problems often lies in the precise details of what the program does at runtime, which source code analysis and logs cannot reveal. Thus far as we stand today, coding agents have been unable to see this detail – relying on static context from source code and documentation.
Greg Law, co-founder and CEO of Undo says that as a result, AI agents “fill the gap with a plausible guess” and frequently hallucinate an answer that looks right but is not. The hardest problems i.e. the intermittent failures and state-dependent bugs that live in complex, multi-threaded, multi-process systems, therefore remain unsolved, no matter how capable the model is.
The problem with AI agents: context
Law brings up a key fact that we all know now and reminds us that AI agents don’t have an intelligence problem; they have a context problem.
“No matter how capable a model is, it can only reason about what it can see,” said Law. “When it cannot see what the program did at runtime, it’s left to guess, which is where hallucinations come from. We’ve spent years building deterministic recording technology that captures precisely what happened inside a running program. AI has now made that capability essential. With Undo AI, engineers can hand their agent that ground truth, so it can stop guessing and start reasoning about runtime behaviour the same way it can reason about static code.”
A deterministic recording of runtime execution
With Undo AI, agents can now combine the static context (what the code says) with the dynamic, runtime context (what the program does). They have a deterministic recording of runtime execution that contains the information needed to solve problems with certainty.
Undo records exactly what a program did as it ran: every variable, line of code, and instruction, so the agent can query that information to work out why something happened. Instead of guessing, or making up the cause of a problem, the agent reasons from what took place.
The technology enables engineering teams to identify the root cause of long-standing production bugs that have gone unsolved for months or years; resolve failing tests quickly and automatically, so engineers can close the tickets and move on to work that drives more value; track down the source of bad data flowing through systems with many interacting processes; and diagnose hard-to-reproduce issues such as intermittent failures, memory corruptions or concurrency issues in highly complex environments.
Claude Code, Codex, Cursor & GitHub Copilot
Undo AI plugs into any coding agent an engineering team already uses, including Claude Code, Codex, Cursor and GitHub Copilot in Visual Studio Code. Once the agent is connected to the Undo MCP server, it decides for itself when to investigate, when to capture a recording, and which tools to call. Each recording is deterministic, self-contained and portable, so a problem captured in development, test or production can be handed to the agent and replayed anywhere, behaving identically every time.
“As AI writes more of the code running in critical systems, someone still has to be accountable for it. However, you can’t be accountable for something you don’t understand,” continued Law. “Undo lets engineers see exactly what AI-generated code did and why, so they can put agents to work on solving their hardest problems with confidence rather than blindly trusting them. It makes Sonnet work like Opus, closes the gap between Opus 4.8 and Fable 5, and supercharges Mythos’ ability to analyze and understand code behavior. This means teams get to answers faster and stay firmly in control of how their software behaves.”
Undo AI is generally available today as part of the Undo Suite and works with any coding agent that supports MCP.
What developers should think next…
Feeding agents deterministic execution recordings via MCP feels like it could be a sound (as in substantial, robust and worthy) fix for hard bugs such as intermittent failures, concurrency issues and other various production mysteries. Teams looking at piloting a technology like this might be well-advised to watch overhead/storage costs and measure how well Undo’s “accountability” claims hold up once agents act autonomously on captured data. Developers should perhaps view this technology not as an excuse to blindly trust agent-driven fixes, but as a tool that empowers them to verify, understand, and ultimately own the code their agents produce.