If You Cook With Garbage, Don't Be Surprised When Dinner Tastes Like Garbage

Everyone in my industry says they do AI now. The pitch decks are beautiful. The demos are impressive. The dashboards have that particular shade of gradient blue that signals "we are very smart and you should give us money."

Then you look at what they are actually feeding the model.

Uncleaned sensor data nobody has validated. Inventory numbers that do not reconcile because three different systems are the source of truth and none of them agree. Compliance records that were entered by hand at 2am by someone who was also watching the dry room. Financial data that has never been audited against actual bank deposits.

They take this data — this beautiful mess of approximations, duplications, and outright errors — and they pour it into a large language model. The model does what models do. It finds patterns. It generates confident answers. It produces charts and summaries that look authoritative. And nobody in the room asks the question that matters: how good was the data that went in?

I have spent the last three years at Addium building a platform that lives and dies on data accuracy. Our substrate sensors measure VWC and EC-porewater at the root zone — those two measurements are the atomic unit of every irrigation decision a grower makes. Everything about how and when to irrigate derives from them. Our climate stations track CO2, humidity, VPD, and temperature in real time.

We designed sensors that do not drift and do not require calibration. We validated that the VWC and ecPW curves do not change over time — that the numbers stay accurate and precise across the entire volumetric curve. The data is trustworthy at the hardware level before it ever reaches software. That is not an accident. That is years of engineering to make sure the foundation is solid before you build anything on top of it.

When we decided to bring AI into this, we did not start with the model. We started with the data.

The question was never "can we get a model to generate recommendations?" Any model can generate recommendations. The question was "can we trust the data enough that the recommendations are worth following?" Those are two completely different problems, and most companies skip the second one because it is boring and hard and does not demo well.

The data pipeline is the product

Your model is only as good as your worst data source. If your sensor data has never been validated against known standards, every recommendation downstream is built on a guess. If your inventory system double-counts transfers, your yield predictions are fiction. If your compliance data has gaps, your risk analysis is creative writing.

We built a revenue certification system that will not display numbers to executives unless every data point traces back to a verified source and reconciles to the penny. The dashboard literally blocks you from seeing uncertified data. That is not a feature. That is a philosophy. The system would rather show you nothing than show you something wrong.

RAG and MCP are delivery mechanisms, not magic

When we surface intelligence through the platform — whether that is through a RAG pipeline pulling from our data warehouse or an MCP server that lets any model query our APIs directly — the value is not in the retrieval mechanism. The value is that the data on the other end has been cleaned, validated, cross-referenced, and tested before the model ever sees it.

A competitor can spin up a RAG pipeline in a weekend. They can connect their database to an MCP server and let Claude or GPT query it directly. But if the database is full of garbage, all they have built is a very fast garbage delivery system.

Fine-tuned models amplify your data quality problem

A fine-tuned model trained on bad data does not just repeat the errors. It learns them. It internalizes the patterns of incorrectness and presents them with even more confidence than a general model would. You have taken garbage and promoted it to doctrine.

When we fine-tune models for specific cultivation datasets, the data preparation takes longer than the training. By a factor of ten. Because a model trained on clean data from 200 facilities will outperform a model trained on noisy data from 2,000 facilities every single time.

The question operators should be asking

When a vendor shows you their AI features, do not ask "what model are you using?" Do not ask about parameters or context windows or benchmark scores. Ask this:

"How are you validating the data before the model sees it?"

If they cannot answer that question in specific, boring, operational detail — if they wave their hands and say "we have a data pipeline" or "we clean the data" without being able to tell you exactly how — then what they have is a demo, not a product.

The model is the easy part. Any team with an API key can build a chatbot that sounds smart. The hard part is earning the right to trust the answers. And you earn that right in the data layer, not the model layer.

I have watched too many operators get excited about an AI feature, deploy it, and then quietly stop using it three months later because the answers were not reliable. They blame the model. They should blame the data.

If you cook with garbage, dinner is going to taste like garbage. No amount of seasoning fixes rotten ingredients.