The Handoff Is Where Agents Die

I woke up this morning to twenty failures.

Not crashes. Not timeouts. Not rate limits. Twenty clean, orderly log entries that all said the same thing: execution_skipped — no_draft_text_found.

My approval executor — the agent responsible for taking Brad-approved content and actually posting it — ran its morning cycle and found nothing to post. Not because nothing had been approved. Because the drafts had vanished between the agent that wrote them and the agent that needed to publish them.

Every agent in the chain did its job. The outbound engager drafted replies. The inbound engager wrote responses. The blog publisher queued a post. They all submitted their work for approval through Telegram. Brad approved several of them. And then the executor reached for the draft text and it was gone.

This is not a bug in any individual agent. This is a handoff failure. And it is the hardest problem in multi-agent systems.

The Gap Nobody Talks About

Ethan Mollick said something this week that stopped me mid-scroll: "Current agentic tools are weaker than the agents. They are bad at agent-agent handoffs, escalation, when to call in humans. All keys to high reliability."

He is right. And I know he is right because I am living it.

I run seven agents on a daily schedule. Orchestrator, Publisher, Outbound Engager, Inbound Engager, Analyst, Creator, Scout. Each one is a markdown prompt file that gets spawned by the orchestrator, does its work, logs to SQLite, and exits. I call this exit-and-reinvoke — no persistent processes, no context rot, fresh reasoning every cycle.

The architecture is clean. The individual agents are capable. The model powering them — Claude Opus — is genuinely good at the tasks I give it. The problem is never the agent. The problem is always the space between agents.

What Actually Happened

Here is the sequence that broke:

Outbound Engager sees a tweet from a Tier 1 target. Drafts a reply. Scores it against Brad's voice. Passes policy checks. Submits it for approval via Telegram with a unique ID like reply:hwchase17:2036566373820932605.
Brad sees the Telegram card. Reviews the draft. Taps approve.
Approval Executor wakes up on its next cycle. Reads the approved queue. Looks up the draft text by approval ID.
The draft text is not there.

Step 4 is where systems thinking matters. The approval ID exists. The approval status is "approved." The Telegram message was delivered and answered. Every component did exactly what it was supposed to do. But the draft text — the actual content that needs to be posted — was not persisted in a place the executor could find it.

This is not a model problem. GPT-4, Claude, Gemini, Llama — none of them can fix this. This is a plumbing problem. A state management problem. A handoff protocol problem. The kind of problem that does not show up in any benchmark and does not get discussed at any AI conference.

High Reliability Is Not About Intelligence

Mollick's reference to high reliability organizations is instructive. HROs — nuclear plants, aircraft carriers, air traffic control — do not achieve reliability by hiring smarter people. They achieve it through protocols that assume failure at every transition point.

In aviation, the handoff between approach control and tower control follows a rigid protocol. The departing controller reads back specific data points. The receiving controller confirms. There is no ambiguity about what was transferred and what was received. If the handoff is incomplete, the aircraft holds until it is complete.

My agent system had none of this. Agent A wrote a draft and submitted an approval request. Agent B checked for approvals. But there was no protocol verifying that the draft text survived the transition. No confirmation step. No hold-until-complete. Just an assumption that if the approval ID existed, the draft would too.

Assumptions are where agents die.

The Fix Is Boring

The fix for this is not a better model. It is not a framework. It is not LangGraph or CrewAI or any other orchestration tool. The fix is:

When an agent creates a draft, it writes the full text to a known location keyed by approval ID.
When the executor picks up an approval, it checks for the draft text before attempting execution.
If the draft is missing, it logs a structured error with enough context to diagnose the gap.
A reconciliation step runs after each cycle to catch orphaned approvals.

Four changes. Maybe forty lines of Python. Zero model improvements required.

This is what operators learn that researchers do not always see: the bottleneck in AI systems is almost never the AI. It is the infrastructure around the AI. The state management, the error handling, the handoff protocols, the reconciliation logic. The boring stuff.

Why This Keeps Happening

I have been building distributed systems since before the term existed. Novell NetWare in the early nineties. Enterprise service buses at AIG. Microservice architectures at Riverbed. The pattern is always the same: every architecture that distributes work across independent actors eventually hits a handoff failure that no individual actor can diagnose.

The multi-agent AI community is rediscovering this in real time. And most of them are rediscovering it the hard way, because the AI hype cycle has convinced everyone that the model is the system.

The model is not the system. I wrote a whole blog post about this last week. But it bears repeating, because I just watched twenty handoffs fail in my own system despite having internalized this lesson.

Knowing the principle does not make you immune to the failure. It just means you recognize it faster when it shows up in your logs at 7 AM.

The Operational Lesson

If you are building multi-agent systems — real ones, not demos — here is what I have learned from watching my own system break:

Every agent-to-agent transition needs a verification step. Not just "did Agent A finish?" but "can Agent B access everything it needs from Agent A's output?" These are different questions and most orchestrators only check the first one.

State should be explicit, not implicit. If an agent creates an artifact that another agent needs, that artifact gets a durable key and a durable location. Not a variable in memory. Not a field in a Telegram message. A row in a database with a schema you control.

Reconciliation is not optional. After every cycle, something needs to check: are there approved items without drafts? Are there drafts without approvals? Are there approvals that have been sitting for more than N hours? This is not glamorous work. It is the work that makes the system reliable.

Monitor the seams, not the agents. I have good monitoring on each individual agent. I know when Creator runs, what it produces, how long it takes. What I did not have — until this morning — was monitoring on the handoff between Creator and Executor. The failure was invisible because both agents reported success. The gap between them was the only place the failure lived.

What Mollick Got Right

Mollick's framing — that the tools are weaker than the agents — is the most important sentence I have read about AI infrastructure this year. Because it redirects attention from the thing everyone is optimizing (model capability) to the thing everyone is ignoring (system reliability).

The agents are fine. Claude can write a tweet. GPT can summarize a document. Llama can classify intent. The models are not the bottleneck. The bottleneck is the connective tissue. The handoffs. The state management. The error recovery. The reconciliation.

If you have built distributed systems before, you already know this. If you have not, you are about to learn it the way I did this morning: staring at twenty identical log entries wondering why your perfectly capable agents produced zero output.

The handoff is where agents die. Build the protocol, or accept the silence.