The Receipts: How I Built a Certified Revenue System in 16 Hours

Yesterday I told you why I built a revenue system that certifies to the penny. The embarrassing ELT meeting. The phone call from my boss. The decision to fix it myself, that night, no committee required.

Today is the how. The tech stack, the session management, the architectural decisions, and the moments where the AI did something I did not expect. I said I keep receipts. Here they are.

The Problem Was Not Visualization. The Problem Was Trust.

Before I wrote a single prompt, I had to diagnose correctly. We had a BI tool. We had data. We had dashboards. What we did not have was a pipeline that could tell you whether the number on screen was actually right.

Our existing setup was a classic SaaS reporting stack. Salesforce as the system of record. A BI tool pulling data through an extraction layer that multiple people were modifying daily. The extraction logic was fragile. The transformation logic was implicit. And the certification logic did not exist, because nobody had thought to build one.

The root cause was not bad data. The root cause was that nobody had separated the concerns. Extraction, transformation, certification, and presentation were all tangled together in a single layer that was doing too many things and guaranteeing none of them.

I needed to decompose the problem before I could solve it.

The Stack Decision

I made the technology choices before I started prompting. This matters. If you let the AI pick your stack, you will get whatever it was trained on most recently. I wanted specific things for specific reasons.

Python FastAPI for the API layer. Lightweight, async-capable, and the typing system lets you enforce contracts at the schema level. When your API serves certified revenue data, the contract between the API and the consumer is not optional. It is the product.

PostgreSQL for the data layer. Not a data warehouse. Not a lakehouse. Postgres. Because I needed ACID transactions, I needed deterministic query behavior, and I needed something that would run identically in development and production without a six-figure annual bill. Cloud SQL gave me managed Postgres with zero ops overhead.

Next.js and ECharts for the dashboard. Server-side rendering for the initial load, client-side interactivity for drilldowns. ECharts because it handles the waterfall charts and time series that revenue reporting demands without fighting you on every axis label.

Cloud Run for hosting. Serverless containers. Deploy a Docker image, get an endpoint. No Kubernetes. No cluster management. No infrastructure team required. For a system built and operated by one person, the deployment model has to be as simple as the code.

The entire stack runs on managed services with no standing infrastructure. When nothing is running, nothing costs money. When the certification pipeline fires, it spins up, does its work, and shuts down.

Session Architecture: Why Context Windows Matter

Here is something nobody talks about when they describe building with AI: context window management is an engineering discipline, not an afterthought.

Revenue accounting has nuance that compresses badly. The difference between a contraction and a churn event. How chain renewals work when a controller has approved a specific outcome methodology. What happens when a subscription spans a month boundary and you need daily proration. These are not details you can summarize. They are load-bearing walls. Compress them and the whole structure fails.

So I did not try to build the entire system in one session. I architected the sessions the same way I would architect the code: separated concerns, clear interfaces between them, and explicit handoffs.

Session one: Schema design and extraction logic. The goal was to get Salesforce CPQ data into a clean staging layer. Raw data in, structured tables out. No transformation. No calculation. Just faithful extraction with an audit trail showing exactly what was pulled and when.

This session established the most important architectural pattern in the entire system: schema layering. Four layers, each with a specific job.

Staging holds the raw extracted data. Untouched. Auditable. If someone asks "what did Salesforce actually say," staging answers that question.
Mart holds the transformed data. This is where revenue recognition logic lives. Daily proration, movement classification, cohort assignment. Every calculation is deterministic and reproducible.
Viz holds the API-optimized views. Pre-joined, pre-aggregated, shaped for dashboard consumption. The API never touches staging or mart directly.
Analytics holds ad-hoc analysis views. Separate from the production path so exploratory queries never interfere with certified reporting.

This is not a novel pattern. It is a standard data engineering practice. But it is remarkable how many revenue reporting systems skip it entirely and try to do extraction, transformation, and presentation in the same query. That is how you end up with numbers that cannot be audited.

Session two: Revenue recognition and movement classification. This was the hardest session. Revenue recognition for subscription businesses has edge cases that will eat you alive if you do not handle them explicitly.

What happens when a customer upgrades mid-month? That is expansion revenue, prorated from the upgrade date. What happens when a contract ends and a new one starts the same day at a lower price? That is a contraction, not a churn-and-new. What happens when a renewal is processed by a different team and the contract ID changes but the subscription is continuous? You need a chain resolution algorithm that traces the lineage.

I spent significant time in this session establishing the business rules with the AI, making sure it understood that these were not suggestions. They were controller-approved methodologies that had to be implemented exactly. The AI pushed back on a few edge cases, which was useful. It identified two scenarios where my initial logic would have produced incorrect movement classifications. We resolved them in the session, documented the resolution, and moved on.

This is the session where the "Failing Loudly" philosophy was born. The AI suggested that rather than handling edge cases with fallback logic, we should make the system refuse to produce a number when it encountered something it could not classify with certainty. I had been thinking about certification as a post-processing step. The AI reframed it as a design constraint that should permeate the entire pipeline. If the movement identity does not balance, the pipeline fails. If a subscription cannot be classified, the pipeline fails. If the calculated total does not match the source total, the pipeline fails.

Failing Loudly is not about catching errors. It is about making errors impossible to ignore.

Session three: Certification logic and the dashboard. By this point I had extraction and transformation working. Session three was about building the gate.

The certification check is straightforward in concept: compare the calculated revenue against an independent derivation from the source data. If they match to the penny, certify. If they do not, block.

The implementation required careful thinking about what "match" means in practice. Floating point arithmetic in SQL can introduce rounding errors that are technically discrepancies but not real disagreements. We used fixed-precision decimal types everywhere. No floats. No rounding. The revenue number is either exactly right or it is wrong, and there is no gray area to hide in.

The dashboard was built in the same session. This was deliberate. I wanted the certification gate and the presentation layer to be designed together, because the whole point of certification is that it controls what you see. The dashboard does not query the mart layer. It queries the viz layer, which only contains certified data. If certification has not passed for a given month, that month simply does not appear. No asterisks. No warnings. No stale numbers with a timestamp. Absence is the signal.

Session four: Deployment, automation, and the QA run that caught fifty thousand dollars. The final session was about putting the system into production and running the first real certification against live data.

The deployment pattern is simple. Cloud Build watches the repository. Push to main, it builds a container image, deploys to Cloud Run, and optionally runs the migration job. The certification pipeline runs on a schedule, twice daily. By the time anyone opens the dashboard in the morning, the number has already survived or failed.

The QA run against live data was the moment that justified the entire build. The first certification attempt surfaced a significant discrepancy. Not a rounding error. A real discrepancy caused by a specific data quality issue in the source system that had been quietly producing incorrect revenue numbers in every report that had come before.

We traced it. We fixed it. We re-ran certification. It passed.

That discrepancy had been there for months. Every dashboard, every board report, every ELT meeting that used the old reporting pipeline had included that error. Nobody knew. That is the cost of systems that do not certify.

What the AI Did That I Did Not Expect

Three things happened during the build that I had not planned for.

First, the certification gate concept. I described above how the AI suggested formalizing certification as a design constraint rather than a post-processing check. That suggestion changed the architecture of the system. It was not a feature request. It was a reframe that made the whole system fundamentally more trustworthy.

Second, the shadow view pattern. When we needed to add new analytical capabilities without risking the certified production path, the AI proposed shadow views: parallel SQL views that use the same source data but implement alternative calculations. They can be tested against the production views for consistency without ever touching the certified pipeline. This is how we added new metrics without introducing regression risk.

Third, the movement identity invariant. The AI insisted that for any given month and business unit, the sum of all revenue movements (new, expansion, contraction, churn, winback) must equal the delta between starting and ending MRR. Not approximately. Exactly. And that this identity should be checked on every pipeline run, with the pipeline failing if it does not hold. This is a hard mathematical invariant that makes entire categories of bugs impossible. If you add a new movement type and forget to account for it, the identity breaks and the pipeline tells you immediately.

These were not hallucinations or generic suggestions. They were context-specific architectural decisions that came from the intersection of the AI's knowledge of revenue accounting and the very specific constraints I had established in the earlier sessions. The context windows mattered. Without the careful session management that preserved the nuance of the business rules, the AI would not have had the foundation to make these suggestions.

The Overnight Build in Context

Sixteen hours. Four sessions. A production revenue intelligence platform that extracts from Salesforce, computes certified MRR through daily-prorated revenue recognition, and serves interactive dashboards that only display numbers after certification passes.

Every AI model I asked estimated twelve months for a traditional implementation. Scoping, vendor selection, implementation, testing, training. A year before you see a certified number.

The difference is not that AI writes code faster than humans. It does, but that is not the point. The point is that the person directing the AI knew exactly what the system needed to do and why. The architectural decisions, the business rules, the edge cases, the certification philosophy. That knowledge came from thirty years in business and twenty in revenue operations. The AI was the accelerant. The expertise was the fuel.

If I had handed this to a junior developer with the same AI tools, they would have built a dashboard. They would not have built a certification gate, because they would not have known to want one. They would not have insisted on movement identity invariants, because they would not have understood why revenue movements must balance. They would not have separated staging from mart from viz, because they would not have lived through the consequences of not doing so.

The 16 hours was possible because I had spent years learning what the system needed to be. The AI compressed the build. It did not compress the thinking.

What Happened After

Finance audited the system. They traced every calculation back to the source data. They confirmed what I already knew: the numbers are right. They adopted it.

The CEO walks into board meetings with a number that has already survived a certification gauntlet before it ever appeared on screen. No hedging. No footnotes. No "let me get back to you on that."

The system now has its own monitoring agent that watches data quality, catches extraction anomalies before they reach the pipeline, and learns from every issue it encounters. It compounds its own operational knowledge over time.

This is what I build. Not dashboards. Not reports. Systems where the number earns the right to exist before anyone sees it.

Yesterday was the why. Today was the how. The receipts are on the table.