Decoding (and debunking) Hard Fork’s Kevin…

Mar 2

His latest New York Times piece tells us a lot about what he doesn’t really understand

92 Comments

Mar 2Edited

AI hype might be a smoke screen to justify layoffs of expensive programmers with authors like Roose as unwitting dupes (i.e. the productivity gains don't actually have to exist). And then the military industrial complex might be conspiring to get money thrown at AI to improve warfare; they don't really care about hallucinations in war but they do want AI swarms on the battlefield. Finally, our overfinancialized, debt-based system needs something for people to believe in to keep the financial bubbles going.

Most of AI might be a smokescreen.

Expand full comment

The majority of "agentic AI" development that large tech companies claim has huge cost savings are just natural language interfaces into their legacy APIs. In other words, these AI agents are constrained in what they can do by the millions of lines of legacy code, written by human programmers, over a long period of time. And the "cost savings" these companies are saying they're getting are the results of layoffs, not AI. I wouldn't go so far as to say its a scam - AI is a big productivity enhancer in software development. But it is nowhere near the disruptive force that Roose and other tech writers make it out to be.

Expand full comment

If you want to know more about how generative AI is a tool for employers to wield against workers (and the powerful to wield over people more broadly), tech journalist Brian Merchant has a great Substack called Blood In the Machine.

Expand full comment

Michael Edmonds

I have to disagree on the military aspect. Hallucinations on the battle-field could be a ginormous no-no.

Of course, for the military industrial complex, the resulting chaos, death and destruction may be considered a good thing...

Expand full comment

Compare killings in civil society to killings in war. In civil society, killings are usually at least investigated, often in great depth by police, detectives, etc., and sometimes litigated. Killings in war are rarely even investigated (except for high profile cases). This incentivizes the military to cut corners on safety as long as a technology basically works and achieves overall battlefield success, so hallucinations are low down on the list. Military thinking is brutish.

Expand full comment

Probably no responsibility, therefore no culpability. Sounds like a military-type of win-win.

Expand full comment

I couldn't agree more with this piece. I've been using Copilot for quite some time and it's great as a Stack Overflow replacement (most of the time), much less so as an actual coder. Prime example - in a function that calculates profit margin, do I need to calculate using the variable totalCharges or totalChargesInScope? If I get it wrong, every test still passes and my users think it's correct because it ALWAYS HAS BEEN. No one notices until they sit down with a calculator and add up a couple hundred invoices by hand and by then...boy, aren't I popular! If need a reminder of C#'s date string formats, it's great. Otherwise, not only can it be wrong, but it's often wrong in complex, difficult to debug ways. Any coder who's never written systems in the real world where real money is at stake probably wouldn't even think about that type of problem, much less an NYT reporter. If it's what you get paid to do, you take shortcuts at your own peril.

Expand full comment

The point about debugging is very important and it is often overlooked in discussions of AI coding abilities. Most software is dynamic and constantly evolving. In codebases with millions of lines of code, new code can often interact with existing code or features in unexpected ways. As a software developer who started coding as a kid almost 40 years ago and went on to study CS, I find that the overwhelming majority of my time these days is spent in debugging rather than coding.

People think GenAI tools spitting out code makes a big difference, when in fact the actual writing of the code is a small part of the overall process. Furthermore, to me, it does not seem that current AI tools have the capability to properly analyze large bodies of code, form "mental models" of their inner workings, then form hypotheses and perform tests, which are often required when debugging large projects.

Expand full comment

yes - even aside from actual debugging, software always has to be adapted and modified ...

Expand full comment

Yes. AI, the machine that can sometimes give us answers to the questions we have already answered.

Expand full comment

And that always gives us questions about its answers

Expand full comment

Nicholas Bronson

Great article. Not sure I agree with your point about novelty - you're not wrong but I don't think that's the most important part, 90% of commercial coding doesn't do anything particularly novel and 90% of the "novel" coding that is done is really new arrangements of what has come before. AI will be useless for that last bit but in my experience, so are more programmers these days.

I do roll my eyes each time I see the "no experience required" claims. I used Ai in coding pretty extensively for a few weeks to try it out and it was a massive productivity enhancer; as mentioned earlier, a good stack overflow replacement. I've been coding for decades though so when it made mistakes, and it did sometimes in weird places, I could catch it and either prompt it to recognise the mistake or just fix it as I went.

Its less like a superstar coder who will build your app for you and more like an enthusiastic but green junior, who is incredibly fast but occasionally makes dumb mistakes due to lack of experience.

Expand full comment

Mar 3Edited

As a hobbyist when it comes to coding with access to a GH Copilot license from work, I still find StackOverflow has merit in the sense that for any given problem (mostly basic ones), you get other people providing context as to why the most upvoted solution might not always be the best one. And you get to discern why some lesser voted or even downvoted suggestions aren't really addressing the issue at hand, go about it in way too complex ways or simply rubbed a moderator the wrong way (and people piled up on it). Sometimes they're even better pr take into account more modern libraries and just have less upvotes due to them not being available for as long as the previous solution was. I find that extremely valuable for someone still testing their own knowledge, but that is my perspective not being a dev by trade, of course.

I feel like Copilot gives me strikingly little in that regard as it's tuned to be much less of a conversationalist that I can bounce ideas off of, but will often take prompts at face value where have to rein in its misguided enthusiasm (to be fair, it'd be misguided by myself for bringing up suggestions or options if I wouldn't know about how it tends to answer).

There's always gonna be a source where it synthesized most of its answers from. I just guess graphing all of the sources out isn't something that was accounted for when training these LLMs and feels off to say the least to get unsourced information in chat mode (I can see why that'd be overkill for auto-completion).

Expand full comment

Nicholas Bronson

Definitely! I'm not trying to say there's no need for help from other devs anymore, far from it :D Where I have enjoyed the most benefit from it is in the small things. For instance, I had a model suggest a library i'd never used before that greatly simplifies setting up arguments for a command line program in .Net. Normally, with a brand new library, i'd spend some time trolling through the documentation for examples, figuring out how it works, a bit of trial and error.

With my model, I basically asked it for an example and it created one. Then I asked it some more questions about the functionality and got some more details. I ended up digging into the documentation later because I wanted to know more about its functionality, but I probably spent about half as much time getting it up and going as I would have anyway. Additionally, the recommendation to use it at all is the sort of thing you'd get from a colleague - I'm in a poisition now where I'm working on my own, not in a team of like-minded devs. A model isn't a replacement for a skilled colleague and friend, but it's something when you've got nothing.

There seem to be two main opinions regarding coding assistants. Either they're the best thing ever and we don't need coders anymore, or they're the work of the devil and will ruin our developers and turn them into data entry potatoes. I don't think either of them are true - the real benefit, at the moment at least, is as a productivity enhancing tool for already skilled developers, and as a learning enhancement tool for junior developers.

And if a model leads a junior astray the consequence is much the same as when they copy public domain code and plug it in, which juniors have been doing for many years. They learn a valuable lesson about the dangers of using code you don't understand.

Expand full comment

Thank you for elegantly pushing back on Roose and the Times continued mischaracterization of the state of AI. Unfortunately in spite of your efforts, your well reasoned arguments and presentation of facts will simply be ignored and they will continue push their support of the hype meisters. Roose will continue his "golly gee Mr. Wizard look what I can do" essays and the readership will be no better informed.

Expand full comment

Mar 2Edited

I expect the New York Times thinks it is a feature, not a bug, to not run Roose's piece past software and AI experts. They like the unadulterated everyman aspect of his experience. Just like Roose himself, they probably don't want to know the truth as it is inevitably a downer. These authors and editors wouldn't think of portraying history so inaccurately but they don't mind at all when it comes to technology and science. The result of a liberal arts education?

Expand full comment

"These authors and editors wouldn't think of portraying history so inaccurately..." I think there quite a few places, and times, when the NYT, and others had/have no problem in passing known dross for gold in their pages. WMD, and Russiagate, both heavily promoted, come to immediately to mind, but there are plenty more.

Expand full comment

I don’t disagree with your statement in general but not your examples. Russiagate ended with a report that while nothing could be proven, there are a lot of unanswered questions. Given Trump’s complete capitulation to Putin on Ukraine on Friday, these questions are still significant and unanswered.

Expand full comment

Russiagate scepticism is aging pretty badly these days.

Expand full comment

Kevin Zatloukal

Brian Kernighan once said "Debugging is twice as hard as writing the code correctly in the first place." He was talking about debugging your own code. Debugging code written by someone else, even when you're an experienced programmer, is even harder than that. Debugging it when you don't know how to code at all is probably impossible.

I don't see the gain in replacing the medium difficulty problem of writing standard apps like this with the much harder problem of debugging apps like this, especially when your understanding of code is not great.

Expand full comment

Code is faaaaaaaar to brittle (sensitive to small and infrequent errors) to be done by LLMs in any serious way.

Note: that "1/3 of answers contains hallucinations" might be a good example of 'antihype' (just invented that term myself, probably already exists).The benchmark OoenAI created does show that, but the benchmark is far from representative I gather from the abstract. I.e. it is been created initially from 'answers GPT4 gets wrong' and is about a very specific kind of answer where LLMs are probably weaker.

(How on earth did I get in the position that I am defending LLMs against a wrongly extrapolated *negative* result???)

Expand full comment

i said “one measure” because it’s a whole other kettle of worms to really nail such a context-dependent number, but i should maybe have footnoted with some color on that. agree with your note.

Expand full comment

Your analysis is accurate, though it lacks some key details from the trade.

I am a successful software engineer with over 20 years of experience, including a few years at Amazon Web Services. For the past five years, I have been focused on AI.

I shared my thoughts at length and uncovered even more damning insights, some even more striking than yours—such as the story of Dane Korsi. With no coding experience, he lost at least $27,000 and spent nine months trying to build a startup based on these new AI coding tools.

You can listen to his gripping story, along with key reasons why learning the fundamentals is crucial before diving into coding with nothing but enthusiasm.

Read the full article and Dane’s personal story here.

https://ai-cosmos.hashnode.dev/the-ai-code-companion-a-double-edged-sword-for-developers

Expand full comment

Money loss is the least of the worries about people with little or no knowledge and experience using LLMs to write code

Expand full comment

Oh, Gary. Gary, Gary, Gary.

Journalism switched to a pay to play bribe model beginning 25 years ago. You saw this develop in the form of the alt's like Mintpress, etcetera where the pukitzer prize refugees went to die. Bribery of editors, bribery of journalists, and back room influence wielded upon the conglomerated media is in full force. This is in tandem with the evolutionary algorithm that makes journalists who don't write stories the bosses want lose their jobs.

The elephant 🐘 in the room is... Venture Capital! That's who is paying for all these stories from the most brilliant writers in the most respected media. The reason is obvious. They have sunk a trillion dollars all told so far. Remember that the primary job of a VC is not to build a great company on great technology---this hasn't been true for at least 15 years. The primary job is to cash out positive. On that score Uber led the way. Uber, without a business plan to ever be profitable (and never having been) was a smashing success for the VCs. It is now the model for selling a dead whale.

Dead fish portfolios are portfolios of stinking junk that died years before that is on life support . A dead fish portfolio justifies the management continuing to take their 2% fee. Since 70% or so of VCs are inglorious bozos who will never win a hand, the dead fish portfolio is quite popular. A dead whale 🐋 is what Uber was. In times past such a monster's stench would pervade the hallowed halls of the NYSE! Traders would expire in New York catching the scent from Chicago! But that dead whale was sold to the public! And Uber has so much cash that it can go on losing billions for decades! What self-disrespecting MBA drawing a high 6 figures salary could in conscience shit down such a gravy train just to be fair and honest to the... stockholders? Perish the thought! Tie that one to yon mizzenmast! Make him walk the plank! Yippee. Yo ho ho. Etcetera!

To accomplish that, there had to be some capacity to boggle-goggle the public. And of such skin of their teeth beginnings are born new disinformation ecosystems.

Ta da! Roose loose upon a field of gore!

New heraldry beckons! My and thy escutcheons are as dust in the wind against the royal order of codswallop!

Expand full comment

Me thinks I like the way you talk, even if I don't know what it all means!

But, please don't disregard the commercial value of a dead whale, (Dead fish are practically worthless, OTOH, dead whales have always been valuable.) There's a reason the world was once a whaling planet, and even today, there are three countries still continuing to hunt whales commercially.

Expand full comment

I shall blame the mushy metaphor of dead fish portfolio. Dead is quite unfortunately used where rotten dead fish 🐟 was meant. Hence, rotten dead whale 🐋 would be more correct. This was implied by the use of the word stinking with it. But your point is gratefully accepted.

Alas! For the metaphor is imperfect even in concept. Rotten fish and rotted whale carcass also have value as fertilizer. (Not just for their rendered oil.)

Expand full comment

After using Deepseek for the past two months i have not seen much of a problem at all with hallucinations. What am i missing?

Expand full comment

But DeepSeek is a very different kind of chatbot, it tries to be an intelligent research assistant. Bolt is a sophisticated code generator with a user friendly text interface. If

DeepSeek hallucinates you get bad facts. When Bolt does so, it create buggy software which is a lot more problematic.

Expand full comment

Mar 2Edited

In 30 years in software, working my way from programmer to engineer to architect I developed a few rules of thumb about the nature of software development. Forgetting about the woodpecker* for a moment, one useful rule is that a typical program devotes about half of its code to runtime error handling, more in the case of complex UIs or mission-critical operational requirements**. My experience with Copilot makes me believe that AI coders are not up to that at all, and that’s the tedious and error-prone sort of code you really would like to relieve a human coder of.

* “If building construction were done like software development, the first woodpecker to come along would destroy civilization.” — apocryphal saying

** you really don’t want an unhandled divide-by-zero error in nuclear reactor control code.

Expand full comment

And this is exactly why real software pro's like you will not have to worry about their jobs!

Expand full comment

My question for AI systems is, does it have access to ground truth, and if so, is there a reason to believe that it is superior to a human expert?

In radiology, the systems are trained on images and the opinions of radiologists - not whether the person actually had cancer. So, at best, the AI knows as much as the crowd. It can’t spot a cancer that radiologists couldn’t spot. People assume the magic boxes know more, and often they know the same or less.

Expand full comment

There seems to be such a large amount of credulity involved with these 'reviewers.' Do you ever feel like it's a constant game of whack-a-mole?

Expand full comment

That are not so much AI “reviewers” as they are hAIgiographers.

Expand full comment

SAInt LLMo

Expand full comment

From Wikipedia : Erasmus of Formia, also known as Saint LLMo (died c. 303), was a Christian saint and martyr. He is venerated as the patron saint of sailors and abdominal pain“

Expand full comment

Building software that will run in production in the real world to solve real business requirements and used by thousands or even millions of users is very different from building a fridge app that's probably used once as a toy then thrown away.

Software engineering is very complex. And there is no way anyone can build any complex and reliable real world software without having the knowledge and skills of software engineering.

Expand full comment

The managers who don’t already know that (of which there are undoubtedly many) and try to “replace” human programmers with LLMs will discover that eventually — but too late to salvage the mess that they made

Expand full comment

#nojs-banner { position: fixed; bottom: 0; left: 0; padding: 16px 16px 16px 32px; width: 100%; box-sizing: border-box; background: red; color: white; font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 13px; line-height: 13px; } #nojs-banner a { color: inherit; text-decoration: underline; } This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts