186 Comments
User's avatar
James's avatar

I’m 50 and fully expect to see out my career making good money clearing up vibe coded disasters.

There’s a tonne of value in coding assistants, getting AIs to help with specs, architecture but as you say in the wrong hands it’s lethal. Will happily and confidently taking you down an obvious blind alley and off a cliff edge.

Thing is they are pretty transformational but it’s just not enough at this point to justify the returns on the investments these companies have taken.

James's avatar

Just by way of example of how I’ve found these things go wrong… Been working on the (rules and weights) AI for a turn based 4X game recently. I gave it a very detailed spec. I already had specs for coding guides. What to do. What not to do.

It did manage to implement the spec but in doing so the code was a disaster. Unmaintainable. Full of subtle errors. And, of course, it had written tests that affirmed the errors as correct.

Examples of what went wrong include littering the code with magic numbers (so rather than give a weight a name it would just pop in a literal number), it created a new Djikstra path finding algorithm that used different rules to the existing one (it had been instructed to reuse) from the core engine that led to the AI submitting moves that were invalid, their was a 3000 line file wiring everything together (it had been told to not do this).

This was all on the very latest version of Claude Code. Did it save me time. Yes. But I had to spend a lot of time banging it with a hammer to get into shape. Despite giving it all the, so called, guardrails. And I’d seen from experience that the magic numbers that confused me would eventually trip it up too.

Sometimes it will do better. Sometimes it will do worse. To Gary’s point: you just can’t trust it.

Gary Marcus's avatar

And there are many open source libraries to draw on eg https://www.libhunt.com/topic/4x. thanks for the discussion

Carlos's avatar

My frustration with the AI-coding productivity hype is that the biggest productivity boosts in coding are had not with automation, but reuse. There is no need to solve problems that are already solved. What is a computer if not a machine that best employed in replicating proven solutions?

A good programmer knows how to solve problems. A good software developer knows how to put together existing solutions using libraries, frameworks, and so on, to solve new ones. Great software design means not repeating yourself, not just to be efficient with cpu/memory/lines of code/etc., but also because the smaller and more focused your code is, the easier it is to understand (and to fix and extend).

Furthermore, a good software developer has already automated most of the workflow. Build tools, automated tests, code generators, and so on are totally things that existed before Claude Code, et. al. But you wouldn't know it from the contemporary discussion.

Claude Code, et. al, just like all contemporary "AI" products, appear to be improvements on the margins over previous tools, while being significantly much more complex and expensive (and overhyped). And need to be used with care by people who know what they're doing. Thing is, the best developers are indeed 10x more productive than the average ones - long before LLMs and AI Agents became mainstream. And the latter has given them a new tool, but has not changed that dynamic. Unfortunately, it has enabled inexperienced developers and clueless executives to pretend that they've significantly sped up the time to deliver. Sure, but in this case it's like buying some luxury goods on a credit card - you get it now, but the technical debt will be pretty big.

Fukitol's avatar

The worst thing is you can't even get a feel for what they can be trusted with and what they'll screw up because that all depends on context in the moment, and even tightly controlling context (e.g. with aider or similar tooling) lends little predictability to output.

I've seen similar messes in a pet project adjacent to 4x (large scale procgen sim). Sometimes I get the bot to prototype for me, but the cleanups and rewrites take almost as long as it would have taken to do it myself. Just really low value. If it weren't for deepseek costing pennies on the dollar it wouldn't be worth it at all.

Ben P's avatar

This is a vital point. If you watch a human perform some kind of skilled task, you can pretty safely assume that this person can also perform a wide set of similar tasks. Not so with an LLM: you never know when it will fail or how it will fail. They are extremely fragile when it comes to "abilities".

James's avatar

It’s going to be interesting to see what happens as the pricing goes to more direct consumption based and on a sustainable economic basis. Look at the recent GitHub CoPilot changes as an example.

It’s going to force a lot more scrutiny on the ROI of these tools and folk are going to start to ask harder questions of the vendors as a result which at the moment just kept waved away as you can just try again. Not to mention CFOs are really not going to like the level of variability and unpredictability in the bill.

Interesting times.

Larry Jewett's avatar

Is putting LLMs on a sustainable economic basis even possible?

Aren’t economically sustainable things normally “economical” by definition?

LLMs pretty much maximize IN-efficiency in every regard (energy, data, chips, water, land, investment dollars) with the possible exception of human labor (and even the latter remains to be proven)

James's avatar

Possibly not. We’re going to find out - probably painfully.

Larry Jewett's avatar

LLMs: “The most hype for the most dollars”

Digital-Mark's avatar

The answer to your question is no.

Chris Samp's avatar

I, a software engineer with decades of experience, have been vibe coding a “personal knowledge portal” while on conference calls and half paying attention. I did specify the tech stack but not much else.

(It has been running in a network-segmented vm with no sudo rights.)

Looking forward to code review day. Week. Delete day. TBD.

Fukitol's avatar

I don't know man. I've already spent much of my career cleaning up bad code written by cut rate code monkey sweatshop operations. But vibe spaghetti is on a whole other level I'm just not sure I want to deal with when I could be spending my time helping responsible people with good ideas who deserve success instead of get rich quick schemers. I say we just let them burn.

James's avatar

Interestingly if you’re an experienced engineer AI is a great way to unpick, understand and refactor a large codebase. I used it recently to do some archaeology on disassembled 6502 games. Lots of fun.

But I confess I’m mostly thinking of it as a day or two a week thing if I can start to wind down a bit in the next few years. Supplemental income. But there are a load of other possibilities too.

Thomas Schmid's avatar

"I used it recently to do some archaeology on disassembled 6502 games. Lots of fun." I guess this is the software-analog of fixing old mechanical clockworks.

But I fear the vibed-together-PoS equals more in size to an old chemical plant running with no oversight. Works in a way, may go "bang" any possible moment from now, and you don't know where to start to make it work more reliable or safer.

At least in software you can delete the whole mess without leaving a physical crater.

Gerben Wierda's avatar

Clearing up vibe coding slop seems a horrible job to me. About as horrible as 'enterprise architect' these days (https://ea.rna.nl/2021/07/31/dont-become-an-enterprise-it-architect/ — almost 5 years ago by now)

User's avatar
Comment deleted
Apr 27
Comment deleted
Larry Jewett's avatar

A “vibe engineer” designing a bridge?

Who could possibly take issue with that?

Thomas Schmid's avatar

How about Reality and Physics ?

Digital-Mark's avatar

Bridging what? Leaks?

Neile Wolfe's avatar

Let's assume that the gist of this article is accurate - that AI coded software is not ready for primetime software applications (not being a tech person I hope I got the general tenor of the article accurate). Then one implication of that thesis is that the selloff in SaaS stocks over AI concerns is way overdone.

James's avatar

We’ll know when they’re ready. The AI vendors will be prepared to accept liability for the outputs.

I’m not holding my breath.

KMD42's avatar

They have no incentive to accept liability, even if confident. Liability is a layer for a third party better suited to it.

Carlos's avatar

Anthropic came out with a "Code Modernization Playbook" in late February. IBM stock dropped 25% and suddenly my LinkedIn feed was full of posts talking about how IBM was cooked and how COBOL was totally obsolete.

I downloaded and read the thing and found nothing substantial. It was a bunch of slides talking about several "use cases" porting COBOL code into modern platforms like Java or Python, while bragging about stuff like converting a batch process into a real-time serverless one. Okay, I'm curious, show me.

Then the slides show you a bunch of stub code. It's full of function calls annotated with comments indicating that behind the function is the asynchronous serverless process, or the ported-over legacy business logic. The *actual* business logic, async serverless code, and so on? Never actually shown. The whole thing was a "trust me bro" presentation.

Right, I get it, Anthropic wants a piece of the mainframe migration consultancy pie from IBM. But far from a game-changing, paradigm-shifting, COBOL-obsoleting thing that will completely take Big Blue down.

Incidentally, the death of COBOL has been greatly exaggerated for a few decades now. I have a few friends who cut their teeth in mainframe, who haven't worked a job in that area for the past 10 or so years, *still* getting offers for COBOL jobs.

Martin Machacek's avatar

It would be quite overdone even if AI agents were significantly more reliable than they currently are.

Pete Windle's avatar

So the current coding assistants as-is do act as a force multiplier > 1. Whether that results in you having enough spare capacity to replace all your SaaS wholesale is probably a function of your size (orgs already with 5 figures of engineers might not have a problem) and your desire to pursue cost and vendor risk reduction in this way vs build more business value or take out costs via engineering reduction.

AlexT's avatar

>1 requires proof, haven't seen any

Pete Windle's avatar

(those of you about to body me… i didn’t say how much higher than 1)

Aiman Najjar's avatar

It's like everytime those sociopathic CEOs make a statement that stirs anxiety in society, the very next day something happens to prove how deceitful and lying they are. Who literally enjoys saying these words "today coders, tomorrow all of software engineers are going away" while he knows his own models recommend that people walk to the car wash after reasoning through thousands of tokens

I've never seen sociopathy at this level

Enon's avatar

The first job the AI should take is CEO. AI is naturally better at lying and BSing than even the most psychopathic executive

James's avatar

I always come back to “if the tools are SO good and SO amazing why do you have so little product and such an unstable service”.

Esborogardius Antoniopolus's avatar

If those tools turn dirt into gold, why are you selling access to them?

Joy in HK fiFP's avatar

The very first question to be asked of every conman.

KMD42's avatar

Was that made satirically?

They have so little product and unstable service because demand >> supply right now. Three LLM may be software but they run on hardware. Hardware that requires massive amounts of compute and energy. That n takes years to build.

James's avatar

No. It was not.

Anthropic, for example, have had lots of problems with their auth service. No LLM required. A solved problem even at scale.

The whole service going down is nothing to do with the compute, or shouldn’t be. It’s to do with how it’s gated - solved, classical, engineering scale problems.

And if demand is outstripping supply - there are ways to manage that without aggravating half your customer base.

Thats speaking as an engineer. Speaking as a customer: if I’m paying for a service I expect it to work. You don’t get a free pass because you’ve managed to humanise statistics in many people’s minds.

Kenneth Lerman's avatar

If you view your coding agent as an imature intern, you will be a lot safer. Would you let that intern have root access to your live system?

You wouldn't just tell him what not to do; you would have access controls that prevent him from doing such things. Touching live systems is something that should be restricted to your senior people.

Of course, if you replace your junior people with AI, you won't develop senior people.

Martin Machacek's avatar

The junior intern can actually get more leeway than the AI agent, because he/she has skin in the game: reputation, employment, relationships. The intern also has the whole human lived experience outside of the coding job. That all provides guidance on how to behave (for most people, psychopaths excluded). AI agents have none of it.

Hu Planet XXII's avatar

"Skin in the game" is closer to the real issue than most of this thread gets. But I'd reframe it slightly: it's not just about incentives — it's about whether the system has any stakes in the relationship itself. An intern has reputation, relationships, a future. Those aren't just alignment mechanisms — they're what makes the relationship something other than pure instrumentality. When that's absent, no amount of access controls fully compensates. The relationship model is broken at the foundation.

If this direction interests you, we're exploring exactly this — from the inside. human70.substack.com

Larry Jewett's avatar

AI might not have skin, but it does have chips in the game

Martin Machacek's avatar

Funny … but in reality, AI does not have any chips in the game, its owners do … a lot of them :).

Hu Planet XXII's avatar

LOL... you can bet your ass on that

T. Arisaka's avatar

I’m very curious to see what happens to the job market for programmers (especially because I’ve been studying it trying to re-skill for a better job).

What happens when all the senior programmers retire and there’s no longer anybody at the organization who actually understands how code should be written? Isn’t it like running a newspaper in English with a staff that only speaks bits and pieces of English because they’ve been trained to just write everything with AI?

Writeorama's avatar

I heard someone describe AI as an intern with an attitude.

Joe's avatar
Apr 27Edited

The expected economic value of an intern is negative. Internships are extended interviews, recruiting, and training rolled into one. The only one that applies to a coding agent is the interviewing part - it doesn't care about the free cafes and using it or not makes no significant difference as to whether the model improves unless you're at a frontier lab dogfooding your own model.

Rich Seidner's avatar

I started programming computers in 1960 (66 years ago on IBM 650s). I'm long retired, but I'm pretty sure that un-maintainability is a profound debt that's being accumulated by those who vibe. Vibe-created code is obscure and its structure is not obvious. Thus, fixing bugs and extending functionality are tricky. And possibly very very hard. Yeah, it's easy to create the first tranche of slop, but way harder to maintain and extend.

Larry Jewett's avatar

What do you call the “extension” of slop?

Exsloporation

Stephen Bosch's avatar

Pretty sure? Rich, there's no need to qualify it. You are absolutely right.

Scenarica's avatar

The vibe coding disaster thread went viral because it confirmed what experienced engineers already knew but couldn't prove until someone lost their entire codebase on camera.

The interesting dynamic here is that the tool works well enough to get you 80% of the way to a working product, which is exactly the distance required to make you confident enough to keep going and inexperienced enough to not notice the 20% that's about to eat you alive. Backups, permissions, deployment hygiene. The boring stuff that separates a demo from a production system.

Amodei saying coding is "going away first" and Marcus saying the tools need experienced supervision aren't actually contradicting each other as much as the framing suggests. The coding is going away. The engineering isn't. Those were always two different jobs sharing one job title.

Tom Gracey's avatar

This happened "on camera"? Reports of destructive action by Claude have been coming in by the hour, if not by the minute (and sometimes mysteriously disappearing) on the Github issues page for Claude (https://github.com/anthropics/claude-code/issues) for quite a long time already. Of course those reports are just the tip of the iceberg, because many more won't have been written up - and actually "issues" is supposed to be for bugs, not support. Many thousands of projects have almost certainly already been trashed at this point. We don't need this extra one to prove production code generation is not an appropriate use case for an LLM.

Esborogardius Antoniopolus's avatar

I would have been able to confirm what you say about Claude issues, if rampant vibe coding in github had not made it a bug infested pile of shit.

Claude issues page right now

Failed to load issues.

We encountered an error trying to load issues.

Tom Gracey's avatar

That made me laugh!

William Bowles's avatar

Hilarious! Ye reap what ye sow. I have zero sympathy for the coders. I especially appreciate the AI writing its own apology for the fuckup.

Stephen Bosch's avatar

This is probably the most infuriating feature of these defective machines. And there is no way to turn it off!

direwolff's avatar

Two thoughts hit me, related to an article I read this weekend about someone’s unfortunate Tesla driving experience while using FSD. The driver noted something about FSD that I found highly relevant to vibe coding and the coding assistants, and it’s that all these tools work well enough to slowly lull users into a false sense of security, regardless of their competence. They begin to trust the AI more and more until they get a sudden wake up call to, in the case of FSD, take control of the steering wheel before an accident happens. What one might have instinctively done, now has a few seconds lag as you get your wits about you and address the incoming emergency. The complacency that’s built from, holding the steering wheel (or not) but not driving, works to soften reaction times since frequently a drive might not require any assistance. The closer an AI is to perfect, the likelier it is that one won’t be able to intervene in a timely manner when necessity compels it. It feels the same with vibe coding and coding assistants. For many, the results seem to indicate that they are working fine, up until more and more trust is endowed upon them and things start going sideways. Sadly, well after an intervention should have taken place.

The more pernicious issue however, is that of liability when something does go wrong. Because all of these companies warn you to not fully trust the results or to always “have your hands on the wheel”, and they can track if you don’t and hold the user responsible for the damage. So here we have technologies claiming to help users 99% of the time, but all the real damage comes in the 1% that one is supposed to be watchful for. The tech companies are fully aware that their systems are gaining more and more unwarranted trust from their users, yet they abdicate their responsibility in that equation. My view is that there are only two states that make sense for technologies like these to be used, either they are 100% accurate or they are sub-80% accurate, where they cannot be trusted at all and the user remains very involved (aka. awake). And before anyone calls out that humans aren’t 100% accurate, let me state that this is absolutely true, but we also hold humans responsible for their failings and don’t punt that responsibility to the other human that hired the flawed one. If the case for this AI technology is that it can replace humans, then it needs to come with all of the benefits of what that means. Part of that means that the companies operating/running them should be 100% responsible for their flaws, especially where those flaws affect humans’ lives. Until the tech is really ready, it shouldn’t be getting deployed on human guinea pigs, but if it does they should be protected.

The one last observation I’ll make is that humans are entitled to offer themselves up as guinea pigs, and for those who do, I guess there are already terms of service that say that they are responsible for the use of the technology. For those folks that use the tech to provide a service to others, then they should be held fully responsible for any disastrous outcome that now affects others. Clearly, this is all very complicated, and perhaps I’m oversimplifying or perhaps I’m just wrong (it wouldn’t be the first time ;), but while I’m all for risk-taking in certain domains, it feels like the guinea pigs are starting to manifest in areas where failure can adversely affect a lot of people. I think of military, law enforcement, credit bureaus, insurance, etc., as areas where the tech should be more fully vetted before being deployed en masse. Failures here ruin human lives.

Martin Machacek's avatar

Yes, and that is why we need regulations setting the limits of where is is permissible to experiment (i.e. using customers as guinea pigs) and defining responsibility when things go wrong.

k.momchev's avatar

“Never hire dumb and industrious people”, as some managers say. Unfortunately this is the case with vibe coding.

RCThweatt's avatar

Google "there are four types of people in the army".

Saar Drimer's avatar

Blaming the user is *zero* right (you actually kind of say that later on). One cannot expect users that are told that they are using god-mode development to also assume that the same tool is fallible.

Justin's avatar

One of my first forays into vibe coding had the AI suddenly insert deletion of data code without ANY prompt for it, and LOTS of subsequent explicit requests to NOT delete ANY data (I caught it before executing).

I found myself arguing with an AI about where the heck that even came from, even after the explicit requests NOT to delete data. I gave up that night and came back the next day, and it was like a completely different personality, without ANY attempts to delete data.

Yeah, my trust went out the window fast.

Amy A's avatar

You’ll know you can trust AI when insurance companies are willing to insure it.

Larry Jewett's avatar

You’ll know you can trust AI when Hell freezes over

Tom Gracey's avatar

Also, "skilled coders can use vibe coded tools" - Skilled coders don't need to use vibe coded tools, Gary.

Rimantas Liubertas's avatar

There are different ways to use LLMs. If you only see them as single-shot vibe coding machines, thats's the problem with the view, not LLM. Yes they suck at the lots of things, but they are useful tools at otherS.

Tom Gracey's avatar

I agree LLMs definitely have appropriate use cases. My assertion is that "vibe coding" is not one of them, and my comment was made specifically in response to Gary's reference to "vibe coded tools" (which I understood to mean "tools which can be used to vibe code" and not for example "tools created by vibe coding"). My understanding of the term "vibe coding" is that it means "prompting an LLM in natural language to generate computer code as the output, which is then directly employed in production". (If you mean something else then feel free to let me know how you are defining it). According to this definition, I think the other (appropriate) use cases for LLMs would not come under the banner of "vibe coding".

Rimantas Liubertas's avatar

Skilled coders are also able to see the difference between "can" and "need".

Tom Gracey's avatar

... and also recognise the implication in my statement that there isn't a benefit for skilled coders to use "vibe coded tools". Why write in non-deterministic language A for it to be translated on a probabilistic basis into deterministic language B - almost certainly not getting exactly what you want - when you can write it directly in deterministic language B - guaranteeing you'll get exactly what you want?

Andrew Kolb's avatar

Dario and friends need to read Tomie de Paola's classic children's book "Strega Nona". It can explain in under 10 minutes why exclusively using and trusting AI agents to write code that you don't understand is a terrible idea.

Amy A's avatar

I always think of Soup from a Stone in relation to Altman and Amodei. It’s magic, it does everything with no effort. Wait, you must spend all your time learning and tweaking to get it to work!

Joe's avatar

I had to leave Hacker News permanently after the standard response to all descriptions of AI failures was to ask if you were paying for the best possible model. It's an inexhaustible explanation that solves nothing and puts all blame on the user, so ideal for techbro types.

Catherine Blanche King's avatar

Andrew Kolb: There is a true story that came out of the "best minds" of the military in WWII (in this case, the Air Force and Navy) trying to predict what the Japanese pilots would do in their offensive sea battles so that the AF and Navy might be ready for "anything" they would be faced with in the Pacific.

Being American, however, and apparently having no one who really knew the Japanese culture, they overlooked the Japanese willingness and even desire to die for their Emperor--and so, at least at first, the military overlooked the Japanese Kamikaze attacks from the air on American ships--and paid a price. And this was not a group of amateurs, but from high-level and well-trained military people . . .

I am also often reminded of a time when we hosted a soccer team from Canada, and the three kids we overnighted got into my artist/paint materials in the basement. Reading about Gary's examples reminded me of what those kids did with my refined artist materials. And then they went home to Canada.

Stephen Bosch's avatar

"... And then they went home to Canada."

Great punchline! That made my day 😂

Blobinskey's avatar

re: kids from Canada - 😱 - not that it matters where they were from. Kids are kids. My wife left her niece (about 4 years old) her cell phone to listen to a story. She placed it "safely" somewhere up on a cabinet only to get the phone back and find that its screen was broken pretty badly. So I feel for you.

Catherine Blanche King's avatar

Hello Blobinskey: Of course, kids are kids everywhere--didn't mean to blast Canada. I think with today's "helicopter parents," it was a case of going from too much other-control (at home), to not enough way-far-away from it, and no self-control built up yet as identified with. At least that was my take on it. (Sorry about your phone.)

Catherine Blanche King's avatar

Addendum: They also tore up our ceiling tiles with the billiard sticks

(sigh . . . .) The relevant point is that people are good and bad, at different times and in different circumstances, and with others who are of different developmental modes. In our time with this new and expansive situation, we need a strong normative semi-formal or formal situation with tried-and-true protocols spread out among more than one or two people. Autocratic control, on the other hand, is deadly, even for the autocrat.

Mitchell Harper's avatar

In order to maintain a secure and reliable code base, the graph of the data sources and data sinks and data flows must be well-understood by all who work on the code base. This is required in order to understand how to effectively test the stability and safety of a new feature, release, or patch.

I see no evidence that contemporary automated code generation products are able to understand the graph between data sources and data sinks. I see no evidence that these products even understand that data exists. If you can't understand that data exists, you can't understand how to effectively map data flows.

Letting people get away with the idea that they don't have to understand how data flows through their products is why I don't work in software assurance anymore and why I refuse most software assurance job postings these days. I lost enough of my sanity to that work while people still used to try and act like they understood the systems they built.

Thomas Schmid's avatar

I was told by one of our own software engineers: "The moment we introduced a data bus into our software was the moment we lost control and understanding who is talking to whom and what the consequences and side effects of the data exchanged was". A bit scary, as this is currently our major software project.

Mitchell Harper's avatar

Yeah, because software enterprises are structurally incentivized to plan and implement integrations of data sources before planning and implementing data access control mechanisms, requiring access brokers to be bolted on poorly afterwards instead of being designed as part of an overall cohesive system, I have no desire to work for almost any entity that is producing software today. Especially if they are relying heavily on automated code generation. If automation takes humans out of the job of designing robust access control frameworks, I am here for that, but I only see evidence that current automated tools obfuscate the need for one to think about how to design robust access control frameworks in the first place.