What Two Decades of Shipping Taught Me About Trusting AI-Generated Code
My first paid software work was an ecommerce site for a pharmacy in my hometown of El Reno, Oklahoma, back in 2003. I built ecommerce apps like that one for a string of small companies over the next stretch, and quietly kept a lot of them running for years, usually on the side of a 9-to-5 engineering job. I left El Reno after junior college and spent most of the years since elsewhere in Oklahoma, with some time in China and a while in Ohio mixed in. Now I’m working my way back home, and back to contracting full-time, which is honestly the work I care about most. Almost nothing I built in 2003 survives, which is normal, almost nothing from 2003 does. But one thing has held up the whole time: you don’t really know what you’ve built until it’s running in front of real people.
A demo shows the software working when everything lines up. Production is the rest of the time, which is most of the time, and it’s where you find out what you actually built. Whatever it gets wrong eventually surfaces in front of someone real, usually at a bad moment, and “it worked on my machine” has never once helped.
The maintainer’s view
Most of what I know about software I learned by being responsible for it after it shipped.
At HomeTown Ticketing I designed and built four production native ticketing apps nearly from scratch: iOS and Android, the gate-side apps that scan you into the stadium and the customer-facing ones in people’s pockets, with hardware ticket printers and QR codes in the mix. At Quantum Health I spent three and a half years as the lead mobile developer (and the mobile architect for part of that), and for the whole time the 2.0 platform was being built, I was the only person keeping the production 1.0 iOS and Android apps alive for the people who depended on them every day.
Years of that gives you a particular way of reading code. You stop asking whether it looks right and start asking what happens when it’s wrong. Looking right is the easy part. Just about any working programmer can produce code that looks right, the good ones and the bad ones both. The differences tend to show up later: under load, at the edges, six months in, where two systems meet. That’s the part a demo never tests.
Then the machines started writing code
I build with AI every day now, and I want to say that without hedging into mush: the tools are good, better than most of our industry expected a few years ago. A solo engineer in 2026 can responsibly carry an amount of work that would have needed a small team in 2016. I founded a company on that premise.
But look at how it fails. AI-generated code is fast and it’s plausible, and most of the time it compiles and demos fine and reads as correct, which after two decades of maintaining production systems is the thing that puts me most on edge. Plausible and correct aren’t the same, and nothing on the surface of the code tells you which one you’re holding.
This isn’t an argument against AI. Plenty of human code has the same problem; every maintainer has inherited a confident, fluent codebase that fell over the first time real data hit it. AI didn’t invent that. It just made a lot more of it, a lot faster. There’s more code than ever, it costs almost nothing to generate, and the gap between something that demos and something that holds up is easier than ever to hide.
It also raised the bar for what counts as a real product. A few years ago, just getting a working demo in front of users was the hard part, and that alone could be a head start. Now anyone can describe an app and have a working-looking version live by dinner. When the demo is nearly free and everyone has one, it stops being the thing that sets you apart. Software got cheaper to produce, so the easy, generated layer is worth less than it used to be, and the part that was always hard is worth more: an app that’s actually thought through, that holds up when real people and real data hit it. The first version has to be better than we used to accept, exactly because everyone can now ship the shallow one.
Trust is a process property
It sounds almost rude said out loud, but you never really trusted human developers either. Not directly. You trusted what was around them, the code review and the tests and the types and the small diffs and the staging environment. A junior engineer’s code gets merged because it survived the checks, not because they seem sharp. Trust at a working software shop was never a feeling about who wrote the thing. It was about whether the change held up under the checks.
Once you see it that way, the AI question stops being philosophical. AI-generated code doesn’t earn a lower bar because it was fast to produce, and it doesn’t deserve a boycott because a human didn’t type it. It gets the same bar as everything else, and given how much of it there is, that bar has to be held up by things that scale better than me paying attention. In my own work that means small changes I can actually review, tests on the parts that carry weight before I trust them, types I lean on instead of leave for decoration, and a standing rule that anything the model hands me is a draft until it’s checked.
Where this ended up for me
I take that rule seriously enough that I eventually built a programming language around it. Arcana is built so that AI-generated code gets checked for correctness before it ever runs, which moves a whole class of “the AI got it wrong” problems to compile time instead of to your customers. It’s an open, public language spec you can read for yourself, and the story of how I backed into building it (I set out to build websites for small businesses, not a language) is on arcanalang.org in the origin essay.
I won’t oversell it. A language can’t make this stuff safe on its own, and you still have to use your head. What it does is move the checking earlier and make it automatic, so the question I’ve been asking for twenty years, what happens when this is wrong, gets asked before the code runs instead of after.
The standard didn’t move
So, do I trust AI-generated code? That was never the right question. I didn’t trust my own code in 2003 either, which is why I learned to test it. It comes back to the process, not who typed it. That was true at the pharmacy website, and the new tools haven’t changed it: ship things that work, that someone else can maintain, and that actually belong to the people who paid for them.