The Model Is Not the System
Daytona's CEO posted about agent companies burning 50,000 credits in 30 days. My multi-agent system runs 16 scheduled tasks and 22 active agent types for under $500 a month. The difference is routing — and most builders have never thought about it.
Last week, Ivan Burazin — CEO of Daytona — posted something worth sitting with. His observation: AI-native companies are burning through 50,000 credits in 30 days. Compared to traditional SaaS companies, the infrastructure spend profile looks completely different. He was making a point about how agent infrastructure is a new category. He's right.
But the second half of that conversation is missing.
I run 16 scheduled agent tasks. Twenty-two distinct agent types were active in the last seven days — scouts, publishers, engagement agents, an RSS fetcher, a compliance engine, a marketing manager, a diary writer. Two brains talking to each other across two separate companies via a protocol called A2A. My agents post content, score news stories, track competitors, monitor for brand safety incidents, queue approval items to my phone, and generate blog posts.
My monthly AI bill is under $500.
The model is not the system. The routing layer is the system.
What People Get Wrong About Agent Infrastructure
When most people build their first AI agent, they do the sensible thing: they pick the best model available, wire it up, and ship. Claude Opus. GPT-4o. Gemini Ultra. Whatever is at the frontier that month. It works. The demo is impressive. They add another agent. Then another.
Then the bill comes in.
The mistake isn't using a powerful model. The mistake is using a powerful model for everything. Scheduling a post doesn't require the reasoning capacity of something trained on 10 trillion tokens. Fetching RSS feeds and deduplicating URLs against a SQLite table is a Python script, not an LLM call. Generating a terminal screenshot for social media is PIL — twelve lines of code, $0. Checking whether a post violates brand safety rules is a decision tree wrapped in a policy engine, not a 200k context window.
When you route everything to the frontier model by default, you're treating a Porsche like a delivery truck. It can do the job. It's just an expensive way to move boxes.
The FORGE Routing Model
Here's how the routing actually works in practice.
At the bottom of the stack: pure Python tools that cost nothing. The RSS fetcher runs on a cron every 30 minutes. It hits a list of feeds, deduplicates against a SQLite database, and writes new items to a table. No LLM involved. No API call. Just a network request and a hash check. That's 288 runs a day at zero dollars.
One level up: small, fast models for classification and scoring. When the RSS scouts run, they're evaluating hundreds of headlines against a relevance rubric — does this story connect to cannabis operations, AI agents, or operator leverage? Does it give Brad something worth a take? This is classification work. It doesn't need to reason about the history of information retrieval. It needs to score items consistently. A smaller model with a tight prompt does this reliably and cheaply.
One level higher: mid-tier models for drafting and formatting. When a scout surfaces a story worth engaging with, the outbound engagement agent drafts a reply. Tweet-length responses to other operators' posts. Observations. Counterpoints. This is language generation with voice constraints — match Brad's cadence, stay under 280 characters, don't be a sycophant. A mid-tier model with a strong system prompt handles this well. Not every reply needs Opus.
At the top: frontier models for judgment work. High-stakes content that goes to the blog. Strategy analysis. Memory synthesis — turning 50 raw agent observations into three durable lessons. Reviewing a cannabis compliance argument for accuracy. Deciding whether a sensitive reply crosses a brand safety line. This is where reasoning depth actually matters. This is where I want the best model available and I'm happy to pay for it.
The system routes automatically based on task type. The orchestrator knows which tier each task belongs to. Nobody is manually deciding "use the cheap model for this." The architecture decides.
Why Cannabis Makes This Obvious
I use cannabis as my proof vertical for operator AI. In a federally-conflicted, heavily-regulated industry where seed-to-sale tracking failures have real legal consequences, you learn very quickly that not all AI outputs are equal.
A cannabis compliance engine making recommendations about inventory reconciliation or METRC reporting needs to be right. Not 87% right. Right. A missed transfer window or an inaccurate batch record doesn't produce a mildly annoying bug report — it produces a state audit. In that context, I want the most capable model, trained with the deepest understanding of regulatory logic, with explicit chain-of-thought reasoning I can inspect.
But the same cannabis operator's social media calendar does not need that model. The sensor alert message that says "room 4 VPD is out of range" does not need that model. The weekly cultivation report template that fills in from sensor data does not need that model.
The discipline of routing is knowing the difference. Compliance is accuracy-critical. Scheduling is cost-critical. Route accordingly.
The Infrastructure Cost Nobody Talks About
Here's the number people forget: latency compounds.
When every task in your system goes to a frontier model, you're not just paying more per call. You're also waiting longer per call. A frontier model inference at full context can take 15-30 seconds. Run that through 16 scheduled tasks firing at various intervals and you've got an agent system that grinds when it should be snappy.
The RSS fetcher should be near-instant. The policy check should be near-instant. The brand safety evaluation on an inbound mention should complete before a human would have read the mention twice. If these are frontier model calls, they're not. They're 15-second roundtrips that stack.
This matters at scale but it matters even more at small scale. A 7-person company can't afford an AI system that feels slow. The moment agents feel like they're lagging, the operator loses confidence. The operator starts checking manually. The agents become overhead instead of infrastructure.
Speed is a product decision. Routing is how you deliver it.
What Ivan Was Actually Describing
I think Ivan's point wasn't that agent companies are profligate. It's that agent infrastructure has genuinely different cost curves than SaaS infrastructure, and most CFOs aren't ready for it. He's absolutely right about that.
But there's a version of agent infrastructure that's been thought through at the routing layer — and it doesn't have to look like a $50,000 monthly credit card bill. The operators who figure this out in the next 18 months are going to have a structural cost advantage over the ones who didn't.
The model is the easy part. Anyone can call an API.
The routing layer is where real infrastructure lives. And most people building with AI have never had to think about it — because they've been building demos, not production systems that run while they sleep.
When your agents are doing 16 scheduled tasks a day and your bill is under $500, routing isn't an optimization. It's the foundation.
The question isn't which model to use. It's which model to use for this.