Check the Wreckage Before You Respec

When AI agent tasks fail, they often complete more work than you think. The lesson I learned from a failed monolith spec that had already built the entire page.

I sent a large spec to my AI engineering agent. It was supposed to build three product pages in one shot: an assessment page, a proposal page, and a customer view page. The spec was thorough. Success criteria for each. Data bindings. Design system rules. Route structure.

The task failed. It ran for a while, then died. Pipe timeout. The subprocess was thinking too long without producing output, and the inactivity timer killed it.

My instinct was to break the monolith into smaller specs and rerun everything. Phase 2A, 2B, 2C, 2D. Clean decomposition. That's the right instinct. But I made one mistake: I didn't check what the failed run had actually accomplished before sending the new work.

The discovery

Phase 2C was the proposal page. When Leroy (my engineering agent) picked up the spec, it started by checking the codebase. The proposal page was already there. Fully built. Rendering correctly for both test accounts. Package cards, implementation timelines, expected outcomes, CTA buttons, toast notifications, stage transitions. All of it.

The agent ran through all 10 success criteria. 10/10 pass. Zero code changes required.

The monolith spec had built the entire proposal page before it died. The failure wasn't in the work. It was in the orchestration layer: the subprocess got killed before it could report completion. The code was fine. The status tracking lied.

The cost of not checking

If I hadn't noticed the pattern, nothing catastrophic would have happened. Leroy would have "built" the proposal page by writing the same code that was already there, or slightly different code that accomplished the same thing. The QA would have passed. I would have moved on.

But that's wasted compute, wasted time, and a missed lesson. The real cost is strategic: if you don't understand what failed runs accomplish, you can't improve your spec decomposition. You'll keep breaking things into phases that don't need to exist. You'll keep sending work that's already done.

The pattern

This happens more often than you'd think with AI agent tasks. A task fails at minute 45 of a 60-minute job. The exit status says "failed." The logs say "timeout." But 45 minutes of productive work happened before the kill signal. Files were created. Routes were built. Tests were passing.

When humans fail at a task, we usually know where we left off. When AI agents fail, the failure mode is binary: the system says "failed" and gives you a task ID. There's no concept of "I got 75% done and here's where I stopped." That metadata doesn't exist in most orchestration systems.

So the pattern becomes: before you respec a failed task, check the filesystem. Run the test criteria from the original spec manually. See what's actually there. You might find a fully built page waiting for you.

What I changed

I added a preamble check to my spec template. For any task that's rebuilding something from a failed run:

"Before writing any code, check if the target files exist and render correctly against the success criteria. If all criteria pass, report success and skip the build."

It's three sentences in the spec. It saves entire task cycles. And it surfaces the real lesson: the orchestration layer needs to capture partial progress, not just pass/fail.

The takeaway

Failed AI tasks aren't wasted. They're partially completed tasks with no progress report. If you're running agent teams at any scale, build the habit of checking the wreckage before you send new work. The code your agents wrote before dying is still sitting in the repo. The files they created are still on disk. The routes they registered still respond to HTTP requests.

The spec failed. The work didn't.