About that Matt Shumer post that has nearly 50 million views
Something big is allegedly happening
All morning people have been asking me about a blog post by Matt Shumer that has gone viral, with nearly 50 million views on X.
It’s a masterpiece of hype, written in the style of the old direct marketing campaigns, with bold-faced call outs like “I know this is real because it happened to me first” and “I am no longer needed for the actual technical work of my job”. It’s chock full of singularity vibes:
As I told a journalist who asked me about the post, I wouldn’t take it so seriously:
Shumer’s blog post is weaponized hype that tells people want they want to hear, but stumbles on the facts, especially with respect to reliability. He gives no actual data to support this claim that the latest coding systems can write whole complex apps without making errors. Similarly, when he describes how AI’s are doing longer and longer tasks on METR’s famous task-time benchmark, he neglects to say the criterion on that benchmark is 50% correct, not 100%, and that the benchmark is only about coding and not tasks in general. No AI system can reliably do every five-hour long task humans can do without error, or even close, but you wouldn’t know that reading Shumer’s blog, which largely ignores all the hallucination and boneheaded errors that are so common in every day experience. And of course Shumer didn’t cite the new Caltech/Stanford article that reviews a wide range of reasoning errors in so-called reasoning models [or the Apple reasoning paper or the ASU mirage paper, etc]. The picture he sells just isn’t realistic, however much people might wish it were true. I should add that Shumer is the guy who was once famous for apparently exaggerated claims about a big model of his that didn’t replicate and that many people saw a fraud; he likes to sell big. But that doesn’t mean we should take him seriously.
In hindsight, I should have made five other points, too:
Shumer made no reference to a different METR study showed that coders sometimes imagine big productivity gains where they actually lost productivity. (Even though he selectively mentioned their other well-known study).
He also didn’t acknowledge that other user’s experience is certainly not “it’s usually perfect.” Take the often AI-optimistic Kelsey Piper who reported a few weeks back that Claude Code was sometimes perfect, and other times maddening. (Example: “Sometimes, Claude is absolutely the worst coworker you’ve ever had. At one point, it deleted every single one of the phoneme files of each English sound pronounced [that she was working with in her app] absolutely correctly, which I had personally emailed an English teacher to secure permission to use, and replaced them with AI-generated sounds which were all subtly wrong.”) Shumer glosses over that kind of experience.
LLMs write code wicked fast, but some coders are starting to report burnout, and only relatively modest gains relative to that burnout, as a new story from Connie Loizos at Tech Crunch illustrates
To his credit, Shumer is actually correct that something has changed recently. You really can let things rip more in the most recent systems. But he misses something subtle but essential, captured well by a friend who very skilled in the art of coding pointed out to me. After reading a draft of this essay, he texted me
“[Shumer’s blog is representing my experience as well… Something happened a couple of months ago where you can truly give it a description and let it go and - sometimes! - will come out with the right answer. … Sometimes… [But] Ultimately, I think this makes it more dangerous …. Generally, the closer these systems are to appearing right, the more dangerous they become because people become increasingly at ease just trusting them [when the shouldn’t]”.
The security of the autogenerated code is very much in question, as for example this article last week noted (precisely as this substack has long warned):
The kicker, though, might be the accidental self-own at the end.
Quoting another friend, “Why did [Shumer] need these guys to review the drafts, why didn’t he have the AI do it? Heck, why didn’t he have the AI write his little missive to begin with?” Hmm…
The bottom line is this: LLMs are certainly coding more, but it’s not clear that the code they are creating is secure or trustworthy. Shumer’s presentation is completely one-sided, omitting lots of concerns that have been widely expressed here and elsewhere.
A lot of people may have taken his post seriously, but they shouldn’t have.







It is just marketing nonsense. There is a reason why they provide precisely zero detail about what the LLM actually does
The Venn diagram of people who were shocked about moltbook and are glazing this article is a 1:1
Sobering