Git records what changed. It does not record how you got there

Git records what changed in your code. It does not record how you steered the agent to get there. Where the model misunderstood the goal. The correction that fixed it. The branch you abandoned. The constraint you had to repeat three times. That steering is high-value regression data, and the moment the session ends, it is gone. TreeTrace is the tool I built because I kept losing it.

It reconstructs the steering from the local transcript and turns it into something durable: regression tests, an eval set, a handoff brief for the next agent, a security audit trail. Nothing leaves the machine. No account, no upload, no telemetry, no network in the export path. That was not a privacy afterthought. For a tool that reads your raw agent transcripts, local-first is the only version anyone should trust.

The sharpest version of the pitch

The lead use case is security regression. TreeTrace flags every time an agent touched auth, secrets, or access control, ran an unsafe shell command, or opened a path toward SSRF, RCE, or XSS. It captures the human correction that pulled the agent back, and turns that correction into a regression eval so the next agent does not repeat the mistake.

Generic productivity is the secondary framing, and I learned to lead with security because it is where the signal is sharpest and the cost of a missed correction is highest. An agent that quietly re-introduces a secret leak you already corrected once is a specific, expensive failure, and it is exactly the kind a transcript can catch.

No model renders the verdict

This is the design rule I will not relax: there is no LLM-as-judge anywhere in TreeTrace. Every failure and every security flag is a transparent heuristic with evidence text and node ids you can check yourself. When TreeTrace says an agent touched access control, it shows you the command and the line it matched on. You do not have to trust the tool. You can verify it.

I hold that line because the whole pitch is no theater, deterministic, checkable. The moment I let a model render the verdict, the verdict becomes another thing you have to take on faith, and the differentiator evaporates. Building accuracy out of explicit heuristics is harder than prompting a model to grade the transcript. It is also the only version that is honest about what it knows, and I would rather ship the honest one.

Fail closed on secrets

A tool that parses agent transcripts is a tool that will eventually parse a transcript containing a leaked key. So the redaction gate fails closed. Outside an interactive terminal, every detected secret is redacted, and a shadow scan refuses to write any artifact if an unresolved secret remains. The safe default is to write nothing, not to write and hope.

The other deliberate choice is zero runtime dependencies. It is a Node ESM CLI with no dependency tree at all. For a tool people are asked to run over their own transcripts, every dependency is a supply-chain question I would have to answer, and the cleanest answer is not to have any. Ingestion is model-agnostic through adapters, so it reads sessions from the common coding agents rather than betting on one. The constraints are the product. Take away the local-first guarantee, the no-judge rule, or the fail-closed gate, and it stops being the thing worth trusting.