“A model that produces code which compiles and passes the tests it was given is not the same as a model that produces correct, secure, maintainable, well-architected software”
A lot of code is being written by AI, but what does it mean?
The title here, a paraphrased quote from me, pretty much says it all. It’s in TNW, today, evaluating a claim from “OpenAI president [who] says AI is now writing 80% of the company’s code”.
Great to see a nuanced point reported correctly in the media, by Ana Maria Constantin, and great also to see OpenAI’s President Greg Brockman sorta kinda acknowledging the point I was making, in a rare note of realism from OpenAI.
After all, it’s only with realism that we can hope to make progress.
Realism re AI coding is knowing that next-word prediction gets us a surprisingly long way in writing code, but less far in making sure that code is robust. Coders (especially vibe coders with little experience) beware.
And all you OpenClaw devotees, that goes 10x, if not 100x, for you.


I'm the chief software architect of a non-profit, open source, decentralized messaging application, and I recently incorporated AI coding into my workflow. I've been a fairly big AI skeptic, but decided to give AI coding a chance to see if it could change my workflow after hearing a lot of reports about how AI coding has become rather good over the past few months. At first, there was some learning curve: I absolutely did not like the first attempts the AI gave me at writing code. It had dumb defaults (for example, a built-in default to not cleanly refactor things and to be far too willing to duplicate rather than reuse code). This results in bad tech debt of the sorts you are describing.
But as I continued the experiment, I managed to temper those behaviours, and ended up in a situation where I could use AI coding to significantly improve the output with a code quality level that I am happy signing off on and maintaining going forward. In other words, I manage to get it massaged into code that I would be happy with had I written in.
Now don't get me wrong: AI still isn't close to produce the right code on the first try, or second try, but interfaces like Claude Code allow me to tell it why what it's doing is wrong as it proposes changes. I end up having to correct things ("No, didn't I see similar code earlier that should be refactored to reuse this?" or "No, that is not an efficient way to approach this") frequently -- often the same sort of corrections I would use with a junior dev (back when junior devs were a thing in this industry). But *with* continual corrections and guidance and not blindly accepting proposed implementations and changes, I've found the AI augmented approach to substantially increase the output of *high quality* code that I produce.
With a micro focus on my particularly needs, this is great: I can double or triple my output -- even with all the time spent giving feedback for nearly every proposed piece of code -- without a sacrifice in code quality or maintainability. That's a real gain for me, personally.
But backing out to the macro focus, it all becomes more than a little worrisome. Yes, *I* am prompting the AI and refining its output to massage it into good code, but I'm a developer with many years of experience who can distinguish between good and bad code a mile away. When I think of inexperienced devs applying AI, and think of what happens after the senior devs like me retire (or start forgetting everything because we become so dependent on AI), I'm left thinking about who will be left: AI jockeys without much ability to distinguish maintainable code from unmaintainable code but-at-lest-this-one-feature-I-am-adding-works spaghetti. That's a recipe for disaster because of just how much risk it creates. Would you want to log in to a banking interface, for example, whose claim to fame is that they have lower fees because they cut costs while making a really great web interface by using vibe coding AI to implement the entire banking stack using one junior dev driving the AI at a breakneck pace? Will that be the only option for a company looking for software developers in 10-20-30 years time?
I don't have a solution to that problem, and it's a terrifying one. The hope of some seems to be that AI coding models will get a lot better at producing good code the first time around so that the "just accept everything" AI jokey output ends up better and high quality and doesn't produce spaghetti the first time through. I'm not sure that's particularly feasible, though, given the diminishing returns we seem to be getting from AI models. The focus now seems to be shifting to "marginal improvements in quality, but significantly cheaper", which makes me think that expectation isn't realistic.
💯 Anybody who's developed real-world, large-scale, commercial software and who had to maintain and evolve said code knows this (your argument) to be true. Compiling and passing tests is a very low bar. I'd maybe expect this from an undergraduate coding exercise, but even that's being generous.