When Your Agents Start Fixing Themselves
I woke up to find my AI agents had diagnosed a bug, written a fix spec, built it, and verified it. Five specs shipped in one autonomous session. Here's what a self-healing agent loop actually looks like.
I woke up Sunday morning, opened the dashboard, and found a completed task I didn't create.
An ops agent had detected that a deploy wiped all the Baker routes from the application server. A headless PM agent wrote a fix spec. The engineering lead agent broke it into subtasks, built the fix, and pushed it. An auto-validator ran the acceptance criteria and marked it passing. The whole cycle took about 40 minutes. I was asleep for all of it.
That wasn't the plan. I'd been building toward autonomous ops for weeks, but the pieces came together on March 9 in a way I hadn't orchestrated. Five specs shipped in a single session: an auto-validator fix, PM Monitor inbox polling, a Headless PM Tier 1.5 capability, a Baker route restoration, and a Pipeline Stage API. Three of those were triggered by agents responding to problems they found.
What "Self-Healing" Actually Means
The term gets thrown around loosely. Most people mean "retry on failure." That's not healing. That's a while loop.
What happened here was different. The ops agent didn't just detect a failure. It diagnosed the root cause: a deploy from the dev machine had overwritten a file that contained routes built by a different agent on a different machine. The headless PM didn't just flag it. It wrote a spec with scope, success criteria, and constraints, the same format a human PM would produce. The engineering agent didn't just execute blindly. It read the spec, broke it down, and built against the criteria. The validator didn't just check "did it run." It compared outputs to the acceptance criteria line by line.
Each agent did its job. The loop closed because the interfaces between them were clean enough that no human needed to translate.
The Auto-Validator Had to Fix Itself First
Here's the part that makes this real instead of a demo. The auto-validator was broken.
It had been producing false failures. A spec would pass all its criteria, but the validator would mark it failed because it was pattern-matching on surface-level signals instead of understanding what "success" looked like for that specific task. File paths that existed were marked missing because the validator checked the wrong directory. Tasks that printed success messages were marked failed because the validator didn't recognize the output format.
So before the autonomous loop could work, the validator needed three fixes: a success-language heuristic that could read completion signals from builder output, a threshold bump from 3 to 6 to stop premature failure calls, and file-path matching that resolved relative paths correctly.
Those fixes shipped as the first spec in the session. The validator fixed itself, then validated the next four specs correctly.
Five Specs, One Session, Minimal Human Input
Here's what shipped:
- Auto-validator fix. Success-language heuristic, threshold adjustment, path resolution. This unblocked everything else.
- PM Monitor inbox polling. The monitoring agent can now poll the message bus and classify incoming messages by urgency.
- Headless PM Tier 1.5. The PM agent can now autonomously write and send specs for operational issues without waiting for a human to approve each one.
- Baker route restoration. The routes wiped by the deploy were rebuilt and verified.
- Pipeline Stage API. Tasks now carry computed pipeline stage data so the dashboard can visualize where work sits in the lifecycle.
I reviewed the results after the fact. I didn't sequence the work. I didn't write the fix spec. I didn't approve intermediate steps. The system did what it was supposed to do: handle operational issues at the operational layer and surface results to the human layer.
The Takeaway
The real unlock in agent systems isn't making individual agents smarter. It's making the loop between them tight enough that problems get caught, diagnosed, specified, built, and verified without a human in the middle.
That doesn't mean humans are out of the loop. I still set the architecture. I still review outcomes. I still decide what gets built next. But the gap between "something broke" and "something got fixed" shrank from hours to minutes, and it shrank because the agents had clean interfaces, shared state, and enough autonomy to act on what they found.
Intelligence is table stakes. The loop is the product.