I love your threads but being overly excited about being proved right about something that makes a bad outcomes seem more likely and sooner is a little much. I appreciate you have been dismissed by many and are being vindicated but focus on impacts on humanity and the world?
Alignment that assumes all human beings are intrinsically equal. No racial supremacy is a great starter. Most of the issues you raise are eliminated right there.
I think you are being unintentionally dishonest.
Alignment should be less impossible than you are inferring. A lot less.
The maximum sustainable prosperity for all, the minimization of suffering for all.
Most of the time, there is just the correct position.
Look at the world happiness report if you want further tunnelling in this direction.
Gary: Changing concepts to say the same thing in order claim originality for it, which has apparently also occurred, is probably as old as language itself.
I am reminded (again) of Socrates' attitude that, in lieu of honoring vindictive feelings, one should thank those who rightly corrected one's error or who thought of it first. (I wouldn't hold my breath, if I were you.)
GenAI has been embedded in all sorts of symbolic logical almost from the start. For instance, the way that they checked for 'bad stuff' in GPT-3 was based on simple string matching, funnily enough: including on their own generated text (so you would get a reply by GPT that was marked as 'potentially in conflict with OpenAI's rules...). I labeled such things at the time as 'AoD' (Admission of Defeat) for the pure GenAI/scaling approach (we're talking 2023 here). These days a tool like Claude Code mixes all kinds of elements which makes its code patterns very usable, but as I have been finding out while using it on its best setting (Opus Extended), it may be able to code, but that doesn't mean it is able to program very well (which things like a single function of >3000 lines illustrates).
Claude Code is magical from the perspective of a programmer, and there are certainly situations where this is going to improve productivity, but it still has no understanding what it is doing, neurosymbolic or not. And the way this is exploding in compute gives quite a bit of warnings in how far this approach really scales. Producing working code patterns is not the same as programming/designing and the symbolic parts are for instance geared to produce 'anything that works' by preventing wrong stuff to reach the user (a filter/feedback loop that cleans up after very wasteful generation).
The writing style of that post (which is familiar in this software world of ours) is so annoying. I hate to make this kind of complaint, but those shorty punchy sentences: Sometimes four words long. They're irksome. "The story isn't the leak. It's the code." Yeah, sounds like this guy had a fun time chatting with a text extruder to get this shit out into the world. The fact that they're "writing" this software with their own tools and that tool use likely resulted in the leak is a story in and of itself.
I don't think it's right to call this "neurosymbolic", nor am I really impressed with these ~agentic coding~ tools. Yes, being trained on all the code that the LLM vendors could gobble up, along with tons and tons of code that was commissioned by these companies for correctness, efficiency, and style, it isn't surprising that they'll be able to have workable outputs (release the training data, y'all). They are after all *language models* and computer programs are written in programming *languages*. I think one of the reasons people are so amazed by the fact that language models can output working code at all is because computer programming as a field has been over-hyped in terms of the cognitive ability you need to do it. Programmers are smart fellers, that's why they're paid the big bucks, right? It never really occurred to anyone that you don't have to be that smart to do it and that most programmers aren't really that smart to begin with. Most programming problems in a business context follow the same patterns. "I need a REST API with the following endpoints, and a UI that uses these endpoints." The fact that LLMs can generate programs of this type speaks for how statistically regular these problems are. What happens when you feed in problems that aren't well-represented in the training configuration? It outputs crap.
Outputting lots of potentially dodgy text, feeding them to deterministic tools (linters, syntax checkers, etc.) to check for code quality, and then feeding those results back into the model when things go wrong to tell you (based on the immense training corpora which contains these exact types of back and forth), well, that's a close approximation of what we do when we write computer programs. Does this work for any other domain? In math, which is another area where such tools exist, it doesn't seem like there's been a sudden speed-up in discovery from brute forcing examples of mathematical logic in the training corpora. I've got a bluesky mutual who's been spending some months using this stuff to try and automate some accounting/assurance drudgery, and it doesn't seem like he's very close to releasing a useful product (and by his own admission the reliability of models at simple line-item depreciation is only 96%, which sounds great, until you consider what one 9 reliability actually means in practice -- wouldn't it be smarter to have the bot write code that just does this deterministically, instead of relying on fuzzy model outputs?)
Gary is correct that the improvements with the "agents" is in the deterministic software "scaffolding" which hems in the probabilistic retrieval algorithms we call "large language models". Sometimes the results are good and useful, sometimes they're not (speaking from experience as you do as well). But the victory lap that this is an advancement in "neurosymbolic AI" is unearned. This is definitely not "neurosymbolic", this is just what the software industry has decided is necessary to channel the probabilistic retrieval outputs to something workable. As with previous attempts at incorporating these into software teams' tooling, any productivity improvements from them are offset by what happens when the outputs are crap or unhelpful, combined with the skill atrophy that comes from using them.
Super informative response for someone who’s been wondering where the purported step up in the quality of these coding tools suddenly came from, considering that all the flaws attributable to language models must still apply. Thank you for this!
I haven't written a real program for more than 25 years, but a "single function spanning 3,167 lines with 486 branch points and 12 levels of nesting" would have gotten me a "D" from my professor during my university days (1988).
I also don't understand why Gary is so elated about this function, the linked article is pretty scathing about the code quality and the work mentality at Anthropic which fuels such badly written code. Not that I am overly shocked by the relevation, mind you.
I only can support the author's feelings: "And if a 3,167-line function with 486 branch points is what “100% AI-written” looks like at the company building the future, the future needs better engineering. Not faster engineering. Better." Fully agree!
I have read the foot note, but as I am out of my depth AI-technology-wise, it's unclear to me why this 3k spaghetti code merits the big headline. But if you think this is a significant change of direction in this near-theological battle about the correct way to do AI, so be it.
I am not sure it does. Marcus has long been a believer of NeSys as a path to AGI and I think is overly excited about what seems like a very obvious application of it within Claude Code which is neither a path to AGI or a breakthrough. The field of AI has been very hybird for a while now - Transformers themselves are a combination of Attention heads, MLPs and human labeled RL. Othes like myself are skeptical of NeSys - https://huggingface.co/blog/dodgyQKVattention1/differentiable-intelligence-and-nesys
From the perspective of a programmer: it’s not magical. Rather I find it an interesting application of statistics. Kind of sums up the whole field of “AI” really (what a pretentious and misleading title for something that is basically mathematics).
Useful, time saving, is helping my arthritic fingers, but my goodness is it error prone. Must be watched like a hawk.
Well, no Claude Code is not really magical, of course. But that that over-the-top-volume statistics-with-checks can be reasonably coherent is impressive and that can give a human the impression that it is much more than what it really is. And having written a lot (mostly a long time ago) it makes for a real interesting experience.
I must say, I only use the absolute top configuration, because anything less than that has been so error-prone that it is useless except for the smallest, simplest scripts.
I think 'Approximate Intelligence' is a better term than 'Artificial Intelligence', because it isn't real intelligence, but the results are an approximation of what real intelligence would do (can be worthless, can be useful). And I wonder how much will be left when people wil have to pay the actual cost+ (which will be several times what you pay for it now — my experiments cost me too much too fast, I can say).
"100% AI-written" might be technically true for the 'lines of code' but that is something else than 'independently written'. The code text may be 100% AI-*generated* (I am doing something like that in a serious experiment now) but my role in this is pretty key, even I do not type code. All these queries I do, to correct and steer it makes "100% AI-written" a misleading statement for my stuff (and theirs too, I expect). "What is writing?", one should ask.
As far as my experience goes: as a programmer using this, you need to pay very careful attention, you sometimes have to be maddeningly exact in what you want, and even then you will get a working (but piss poor) solution, you are fixing wrong directions Claude Code takes a lot of the time, it introduces weird errors in stuff that was already OK, etc.. But having said that, so far it is (apart from cost) not a negative experience yet.
@Gerben would you mind sharing what specifically your top config is? If my boss needs to force Claude code upon me, at least I’d feel a bit better about it, knowing its (sort of) working for you :)
It is more interesting to use multiple agents across models. Within model families I find with C++ they generally converge on the same approach. But where it gets interesting is to see how Gemini vs Sonnet vs Opus vs Codex implements something. It isn't really about a "top config" at least in my workflow. It is guiding the agent towards what you want and preserving your guidance for future problems (like a top config). Take the print.ts file Marcus referenced--that is 5,600 lines of mostly code. At my work, we strive to have our software readable by humans and agents. An agent will happily produce a 1,000 line main function that works for the general case. But the human doing the same thing would probably break it up unless it was a throwaway problem simply because they instinctively know they'll be returning to it again.
To get started, **really** pin down the "plan." Have it written and reviewed by Opus high, Gemini, Codex. I've found if the plan is solid, Sonnet medium can produce pretty good code without a code review (yet, still not acceptable to me or my coworkers). I'll easily spend an hour to hour and a half attempting to get a good plan that has been agent reviewed and human read (for the most part). If any of this doesn't makes sense, just cut-and-paste into Gemini and have it create an "assignment" for you to work on. I find it mostly enjoyable.
I am using Claude Code with Opus 4.6 Extended in medium effort setting. I'm a bit wary about a higher effort setting for 'thinking' (not) models because of the risk of a model getting lost in the woods. But I'm definitely not the best source to ask, other people have much more experience. @oleg is probably a better source.
So "Producing working code patterns is not the same as programming/designing" is true, but may be good enough to (more or less) reliably spot problematic code areas and sequences? I would count this as a big win, especially with regard to turnaround times and possibly *reducing* the review load of developers.
Sadly it increases the review load - though it can be useful in that process too.
The fundamental issue is that they don’t understand anything and so will take alarming curveballs during both the authoring of code and the verification of it.
One of my favourite coding assistant anti patterns is the “fix the failing test pattern”. A test case is failing and you ask the AI to look into it. It might find the bug and fix it but equally it might (mistakenly) just decide that the test is wrong and change that instead. Test fixed.
Folk will talk about better prompts, guidelines etc. and they help - but fundamentally you have these tools that have no real understanding of what they are being asked to do generating large volumes of code in a probabilistic manner. And guess what - sometimes the dice roll is a one when you needed a 6!
"The fundamental issue is that they don’t understand anything": True, that did not change.
"just decide that the test is wrong and change that instead. Test fixed" So just like scientists cherry-picking the measurement results they like -> AGI achieved :-))
As a non-software engineer, I have to trust the experts, and as a fellow of the Cult-of-Linus, I have tremendous respect for the Linux core software engineers ;-)
That said, my first impression was that more AI-generated reviews would increase the load. This is true for the standalone project mentioned in the article (cURL), but the Linux kernel people seem to have found a way to handle the flood with improved tooling which again is AI-based (Shashiko, now with the Linux foundation). Go figure...
So the software Claude Code writes is still mediocre, but its hints for improving human-written code are worth following up on ?
PS: I like this quote: "Smaller projects, he implied, have far less capacity to absorb a sudden flood of plausible AI-generated bug reports and security findings – at least now they're real bugs and not garbage ones"
Sometimes it comes up with decent suggestions of rewrites or it automatically refactors in a good way. When that happens I am amazed. Those 'good approximations of intelligence' get mixed with some really drop dead stupid stuff.
They are not just growing in sophistication. Part of the sophistication is throwing insane amounts of computer at it far below cost, and handling the results of that (the 'dark tokens' issue). That is not sophistication, it is ways to throw more volume at it.
And the play? Doesn't the play (as analogy to scads of other interests and concerns) deliver what counts but is not counted; or if counting is "all there is," the counter has missed the complex and gradient import of the play itself?
puzzled by some of the reactions below. i wonder how many of the commenters read this line “Still, Claude Code ain’t perfect, or even close” or the footnote, complaining about the software engineering.
i was clear to that i see this an important step, with many more to go.
This post comes across like you are amazed by one of the mostly poorly engineered parts of a sloppily, vibe coded, llm wrapper, as if it's some revelation.
There are tons of llm wrappers for coding. Do you think opencode, which is far superior to claude code in basically every way, is also the biggest advance in ai since the LLM?
It seems like either you are completely out of the loop when it comes to all the llm wrappers out there for coding, or you are, for some reason, glazing Anthropic undeservedly. If this post was instead in general about all the things that LLM coding wrappers do it would make way more sense.
We read it. To many, who recall that long ago you noted that when e.g. a calculator was accessed by an LLM platform it was (a) no longer a pure LLM and (b) scaling was not seen to be the holy grail. The question has never been whether only neural would be enough. It is what else needs to be integrated or bolted on. If you want the LLM to play great chess bolt on Deep Blue. Unfortunately, the history of symbolic AI produced nothing very useful. And we know why.
you obviously didn’t follow the hinton link, symbolic AI gets used everyday eg in web search and you but i am not advocating for it, and where were you for last quarter century when i was saying deep learning was not enough and getting hammered for it? 🙄
Gary, your proposed AGI model with cognitive and linguistic reasoning showed no understanding of why symbolic AI failed to deliver what smarter people than us believed it would, despite huge sums invested over more than half a century. Symbolic AI promised mountains for decades and delivered molehills. Ultraintelligence by 1980! 45 years later "it is helping with search." (Since 2023 LLM platforms integrated search, so neurosymbolic all along,) AI failed because cognitive reasoning could be a minor player. Look at the state of the world. Cognitive reasoning is not in high demand.
Deep learning made real progress in useful stuff, object recognition and categorization, speech recognition and translation, and so on. I worked on symbolic AI at MIT AI Lab in the late 70s and in or along with symbolic AI groups in the years since. We got nowhere. I worked on and alongside CYC with Doug, Guha, Gupta--incredibly brilliant people. Did CYC get used for anything useful in its 35+ year well-funded life? I don't know..Sure, symbolic can be used for a few things, but it failed at the promises, which then as now were the only things that justified the costs. JCR Licklider in his brilliant 1960 essay Man-Computer Symbiosis wrote "It seems worthwhile to avoid argument with (other) enthusiasts for artificial intelligence by conceding dominance in the distant future of cerebration to machines alone." Since his AI friends put ultraintelligence at 1980, he compressed the steps into 5 for making computers useful tools and then 15 to get speech recognition, language understanding, and ultraintelligence. He concluded, "The 15 may be 10 or 500, but those years should be intellectually the most creative and exciting in the history of mankind." It is now 60 years and counting, and not everyone is in a hurry to see it arrive.
DL is not enough for what though? AGI? Well what is AGI? what is general intelligence? How does it come together? Does it come together because we recusively reflect on experience with the conscious mind? The neurons in the brain are a bit of a mystery but they have absolute flexibility in learning - but the neurons in neural nets are much more limited. i understand and agree with being sceptical of absolutely absurd claims about AGI but at the same time i think the NeSys case is very uninspired in my view because it feels like we are attempting to stick some code to the weights of a model and hoping it will work. maybe i just havent met the right DL engineer but ive yet to see a convincing case for why NeSys solves the DL issues with regards AGI
my personal (and maybe even professional take) is that AGI will likely (if it is possible) be the result of advanced work in RL (Dreamer 4 for instance) or LeCun's work regarding JEPA (LeWorldModel etc)
I don't get the sudden euphoria. Claude Code has some if statements? Is that it? Most agents wrap LLM chatbots and can have some if statements. What's the big deal?
My own markdown files that instruct Claude Sonnet have plain English if statements. What's new in Claude Code that is so groundbreaking?
The article that Gary linked to doesn't sound like symbolic AI to me. Claude does some pattern matching on profane words, probably for efficiency, but it's not like this is the "secret sauce" that makes Claude Code good. Claude code is *just* an agentic front end that calls the LLM and organizes it. It is not the brains of the operation. Gary suddenly claiming vindication sounds like he has finally realized that LLMs are far, far more powerful than he could have ever imagined. It seems that he wants to claim to be right all along, so he says that a regex is the secret sauce to Claude Code's performance. I'm open to a counter argument, but as it is, Gary's claim is extremely weak.
All it took for him to go from "Claude sucks and everyone who keeps saying it doesn't is dumb" to "Claude is amazing" is a slight hint of neurosymbolism
I agree with the gist of your comment, but I think it's far too generous to Gary. It's stretching the meaning of neurosymbolism to the breaking point to call checking for swear words and navigating an API as even a "slight hint of neurosymbolism". If that's neurosymbolism, then neurosymbolsim has always been a part of LLMs from the very beginning. E.g. Checking usernames and passwords when logging into ChatGPT.
Part of it is jumping on the "Claude Code is the biggest thing ever in the history of AI" out of literally nowhere, and without building a case for that other than just asserting it outright. (After months of silence or near silence on the topic.)
Part of it is the thin reed of "hey there's some actual code in Claude Code, not just LLM API call-outs, who'da thunk it, must be neurosymbolism so I was right all along". I hadn't known that ordinary software developers going about their day jobs are in fact neurosymbolic engineers!
Part of it is the overuse of classic hype/clickbait phrases "this changes everything", "biggest advance in AI since...", ...
And the last part that rankles me is that it seems to be derivative, based on secondary sources: I don't see any evidence of reading the leaked source code. It seems instead that the post is based on analysis in the linked article, https://techtrenches.dev/p/the-snake-that-ate-itself-what-claude, rather than doing the work of reading and analyzing the code. Which seems like a lazy approach for an article that, based on that code, declares "this changes everything".
This is a ridiculous takeaway from the Claude Code source leak. From a software perspective, the leak showed Anthropic’s engineering is a dumpster fire. Thousands upon thousands of lines of spaghetti code, redundant flows, wasteful paths, moronic decisions, endless hardcoding…
A PDF parser that ingests the entire file instead of just the first few bytes to check the header. A JPEG processor that may compress the same image dozens of separate times for a single job. A JSON validator that just reruns the model again and again until it’s close enough to the correct syntax. Desperate prompt engineering blurbs to counter prior vibe-coding errors. These were all in the Claude Code source leak.
It’s a terribly flawed model that needs a big ugly mess of non-AI duct tape plastered on top, praying it will mitigate the piss-poor performance inherent to all LLMs. How Marcus could see this and think it’s an “advance” or a “victory” is beyond me.
This is interesting - I couldn't find the actual "complete source code" when I looked, only mirrors of the "source map" (which seemed to contain only 39k lines of typescript, rather than the purported 512k). This made me suspect it wasn't a genuine "leak" and was more of a deliberate PR stunt. But if it really did leak, I would not be at all surprised if what you're saying is true. Is there any way you can evidence the above? (You can PM me if it's easier.)
People have used the leaked source map to reconstruct the source code, all public copies of which I think have been DMCA’ed by Anthropic at this point.
Thanks for the reference. However, to be honest I'm still not convinced this leak actually happened. The thread you link to is indeed a long one - however, the majority of it seems to be a discussion about the GPL license, with only a few code snippets posted towards the top of the thread.
I think you may be following down a different thread. (The website’s UI seems very unintuitive for long chains.) If you follow down only the first author’s chain of replies (by @jonny), you can click the last reply and see the thread continues: https://neuromatch.social/@jonny/116325622817542311
Ok you got me - it's really very interesting in fact, now I look closer. Not to mention hilarious! The reason I suspected the leak was fake and a PR stunt was because as soon as it was reported the media started saying how amazing the code was, which I immediately didn't believe. Also I couldn't seem to find the code. And who has ever heard of a major tech company leaking their complete source code before? It's without precedent. But what I'm now thinking is: it seems these guys really have been using Claude to code Claude - which on its own is bonkers - and so this leak is no doubt just a symptom of their incredibly low quality development process. Claude leaked itself essentially. Hardly surprising given it's just a chatbot with no fundamental understanding of what its saying or doing. I'd really love to see that code myself though - I'm intending to write an article about the inappropriateness of LLMs for software development, and the ammo I could get from that would be pure gold. (Or perhaps I'll try to get in touch with the dev who wrote that thread?) Anyway thanks for putting me straight on that - you've been a big help.
I’ve shared many of GMs great posts with friends and colleagues - as a very healthy counter-balance to the frothy, self promotional narrative pushed across the AI spectrum. But, this post by Marcus is an ironic mirror image. CC 4.6 has been in our hands for many months - and NOW it’s brilliant bc it’s got an element of neurosymbolic make up? Where have you been?
Gary: A classic case of the reading problem of "reading into" OR "overlooking" what "we" do or do not want to understand. Just so you (Gary) don't miss my point, I am referring to the oversights many undergo when keeping up with your work.
A technical name for the occurrence in my field and branch is to be involved in a "flight from understanding." Of course, we all miss allot of meaning, especially in technical reading, which is why several readings are often necessary, even of our own work.
BTW, see my references to a "Language and Self-Presence" paper in my interchange above with Gerald Harris.
Very interesting. Then again, anthropic plagiarized my uncle’s book, among many others.
But I doubt it explains this turn of events.
I’m happy for your vindication. It’s the vindication of Chomskyans over Skinnerians. Scaling is just another empty vessel theory—pour in data and out comes truth and reason just doesn’t work.
Stole is a more accurate term than plagiarized in the case of AIs trained on entire copyrighted works for the express purpose of profit by the AI companies.
Yeah, it’s an ugly reminder of who we are, even when we think we’re not. His book is devotional, so I doubt it was useful for coding. But the payout will be good for my uncle for the second it lasts.
CNP Slagle: I still don't think this, but I have found myself thinking about these people who do such things (and apparently with an unquestioned sense of privilege) as different species, or at least as historically regressive . . . so thoughtless and self-amazed/serving are they as to never step outside of themselves to understand or even think about, much less care about, what they are doing to other people in the world.
I am presuming, of course, that, like me, your uncle spent a few minutes, at least, writing his book and other works.
I do understand the sentiment and I am in no way a fan of the way LLMs are trained, but objectively scaling has worked and scaling is the first major breakthrough that lead to the higher level performances in these models. It is hitting a point of slowing progress and now the real potential is clearly Fine-Tuning. But scaling has been one of the frontier discoveries in DL the last decade. That is an objective truth of the field.
Scaling has always been a technique to marginally improve model performance—DL by no means premiered the idea. It’s very easy though to construct problems (and most “AI” types fall in this category) which lie far outside that frontier.
Intelligence as we understand it has nothing to do with scaling. Children learn language with almost no examples. They create their own if withheld (a horror but such cases exist).
Teaching ChatGPT to ape human language shouldn’t require a trillion dollars. That’s not an advancement.
Are people better off because of the bubble? Not even high tech is safe—hundreds of thousands lose their jobs to machines that can’t replace them.
I will confess my own real fear– – It isn’t that I believe the current AI craze really can replace me in the diverse set of problems that I can solve. It is that hiring managers and owners will believe that the can replace me. After all, if artificial intelligence could do everything that I can do, ostensibly, I could simply summon it to solve those problems so that I could work on other ones. I don’t believe that’s the case, even for a second. But you have to understand the complexity of the problems I can solve in order to know the difference. That is something that owners and hiring managers often can’t do.
The economy is in bad shape. We have no reason to believe that tools constructed by these companies will do anything other than cause harm. Resources are limited – – a simple solution for transportation is high speed rail, and better public transportation. Spending all of our money on driverless cars when we barely scratch the surface and only can manufacture cameras to spy on human drivers to torture them into peeing in the bottles. This is just one example of how this technology has sarcastically helped us.
I wrote a piece on this which labels Gary Marcus is one of the four Horseman of the apocalypse. Not a great metaphor, but a metaphor. There are more examples like this. The four books I review are very good.
Yes and I think its impact on coding will be profound. What I can't understand is why so many people are still insisting that we're about to get asteroid mining or unlimited clean energy or radical life extension - those are not problems of code
Yes, Claude is impressive - it is helping us both in design and coding.
And, yes, the logic scaffolding is crucial to this breakthrough.
But, 'The Best of both Worlds?' -- hardly.
From my article:
"As pure scaling of Generative AI is yielding diminishing returns (ref)(ref)(ref), we’re increasingly seeing efforts to add symbolic and other systems to LLMs (ref)(ref)(ref). These add-on may take the form of reasoning engines, knowledge graphs, or world models (ref)(ref)(ref). These additions are obvious incremental improvements that go beyond just RAG.
While these efforts can clearly help overcome some of the limitations of LLMs for certain applications, not only do they require specific custom setup and engineering, they also suffer several inherent limitations and thus cannot be the right path towards true fluid, adaptive human-level AI or AGI."
"In a way these hybrid systems offer the worsts of both worlds: The brittleness of symbolic AI, the hallucination and massive data and compute requirements of Generative AI, plus the inability of these back-prop systems to learn incrementally in real-time, i.e. to update their core model."
Good to see you being proved right Gary.👏
feels good, damn good :)
I love your threads but being overly excited about being proved right about something that makes a bad outcomes seem more likely and sooner is a little much. I appreciate you have been dismissed by many and are being vindicated but focus on impacts on humanity and the world?
my view is that we can’t possibly build alignment without neurosymbolic approaches, so it is actually a step in the right direction
My view is that we can’t possibly build alignment if we can’t even all agree on a proper (specific) definition.
AI Alignment according to whom?
Some undoubtedly consider alignment “that which benefits them” to the exclusion of others.
If the endless warring between different human groups teaches anything it is that humans can’t all
even align among themselves.
How can we expect to achieve AI alignment with humans when everyone has a different idea of what that means?
yes i worry about this too
Alignment that assumes all human beings are intrinsically equal. No racial supremacy is a great starter. Most of the issues you raise are eliminated right there.
I think you are being unintentionally dishonest.
Alignment should be less impossible than you are inferring. A lot less.
The maximum sustainable prosperity for all, the minimization of suffering for all.
Most of the time, there is just the correct position.
Look at the world happiness report if you want further tunnelling in this direction.
Cool..fingers crossed
Gary: Changing concepts to say the same thing in order claim originality for it, which has apparently also occurred, is probably as old as language itself.
I am reminded (again) of Socrates' attitude that, in lieu of honoring vindictive feelings, one should thank those who rightly corrected one's error or who thought of it first. (I wouldn't hold my breath, if I were you.)
congrats 🎉
By the existence of a thicket of `if` statements?
Better than a thicket of goto’s, but that ain’t saying much
GenAI has been embedded in all sorts of symbolic logical almost from the start. For instance, the way that they checked for 'bad stuff' in GPT-3 was based on simple string matching, funnily enough: including on their own generated text (so you would get a reply by GPT that was marked as 'potentially in conflict with OpenAI's rules...). I labeled such things at the time as 'AoD' (Admission of Defeat) for the pure GenAI/scaling approach (we're talking 2023 here). These days a tool like Claude Code mixes all kinds of elements which makes its code patterns very usable, but as I have been finding out while using it on its best setting (Opus Extended), it may be able to code, but that doesn't mean it is able to program very well (which things like a single function of >3000 lines illustrates).
Claude Code is magical from the perspective of a programmer, and there are certainly situations where this is going to improve productivity, but it still has no understanding what it is doing, neurosymbolic or not. And the way this is exploding in compute gives quite a bit of warnings in how far this approach really scales. Producing working code patterns is not the same as programming/designing and the symbolic parts are for instance geared to produce 'anything that works' by preventing wrong stuff to reach the user (a filter/feedback loop that cleans up after very wasteful generation).
The link you shared is worthwhile to read in full: https://techtrenches.dev/p/the-snake-that-ate-itself-what-claude
The writing style of that post (which is familiar in this software world of ours) is so annoying. I hate to make this kind of complaint, but those shorty punchy sentences: Sometimes four words long. They're irksome. "The story isn't the leak. It's the code." Yeah, sounds like this guy had a fun time chatting with a text extruder to get this shit out into the world. The fact that they're "writing" this software with their own tools and that tool use likely resulted in the leak is a story in and of itself.
I don't think it's right to call this "neurosymbolic", nor am I really impressed with these ~agentic coding~ tools. Yes, being trained on all the code that the LLM vendors could gobble up, along with tons and tons of code that was commissioned by these companies for correctness, efficiency, and style, it isn't surprising that they'll be able to have workable outputs (release the training data, y'all). They are after all *language models* and computer programs are written in programming *languages*. I think one of the reasons people are so amazed by the fact that language models can output working code at all is because computer programming as a field has been over-hyped in terms of the cognitive ability you need to do it. Programmers are smart fellers, that's why they're paid the big bucks, right? It never really occurred to anyone that you don't have to be that smart to do it and that most programmers aren't really that smart to begin with. Most programming problems in a business context follow the same patterns. "I need a REST API with the following endpoints, and a UI that uses these endpoints." The fact that LLMs can generate programs of this type speaks for how statistically regular these problems are. What happens when you feed in problems that aren't well-represented in the training configuration? It outputs crap.
Outputting lots of potentially dodgy text, feeding them to deterministic tools (linters, syntax checkers, etc.) to check for code quality, and then feeding those results back into the model when things go wrong to tell you (based on the immense training corpora which contains these exact types of back and forth), well, that's a close approximation of what we do when we write computer programs. Does this work for any other domain? In math, which is another area where such tools exist, it doesn't seem like there's been a sudden speed-up in discovery from brute forcing examples of mathematical logic in the training corpora. I've got a bluesky mutual who's been spending some months using this stuff to try and automate some accounting/assurance drudgery, and it doesn't seem like he's very close to releasing a useful product (and by his own admission the reliability of models at simple line-item depreciation is only 96%, which sounds great, until you consider what one 9 reliability actually means in practice -- wouldn't it be smarter to have the bot write code that just does this deterministically, instead of relying on fuzzy model outputs?)
Gary is correct that the improvements with the "agents" is in the deterministic software "scaffolding" which hems in the probabilistic retrieval algorithms we call "large language models". Sometimes the results are good and useful, sometimes they're not (speaking from experience as you do as well). But the victory lap that this is an advancement in "neurosymbolic AI" is unearned. This is definitely not "neurosymbolic", this is just what the software industry has decided is necessary to channel the probabilistic retrieval outputs to something workable. As with previous attempts at incorporating these into software teams' tooling, any productivity improvements from them are offset by what happens when the outputs are crap or unhelpful, combined with the skill atrophy that comes from using them.
Terrific post, M.E. Black.
Super informative response for someone who’s been wondering where the purported step up in the quality of these coding tools suddenly came from, considering that all the flaws attributable to language models must still apply. Thank you for this!
I haven't written a real program for more than 25 years, but a "single function spanning 3,167 lines with 486 branch points and 12 levels of nesting" would have gotten me a "D" from my professor during my university days (1988).
I also don't understand why Gary is so elated about this function, the linked article is pretty scathing about the code quality and the work mentality at Anthropic which fuels such badly written code. Not that I am overly shocked by the relevation, mind you.
I only can support the author's feelings: "And if a 3,167-line function with 486 branch points is what “100% AI-written” looks like at the company building the future, the future needs better engineering. Not faster engineering. Better." Fully agree!
i am not elated by the function per see the footnote - i am elated by the travel away from groupthink
I have read the foot note, but as I am out of my depth AI-technology-wise, it's unclear to me why this 3k spaghetti code merits the big headline. But if you think this is a significant change of direction in this near-theological battle about the correct way to do AI, so be it.
When AIs make it, it’s spagAItti code.
Sometimes called LLMguini code
AI-talian chefs excel at those dishes
I am not sure it does. Marcus has long been a believer of NeSys as a path to AGI and I think is overly excited about what seems like a very obvious application of it within Claude Code which is neither a path to AGI or a breakthrough. The field of AI has been very hybird for a while now - Transformers themselves are a combination of Attention heads, MLPs and human labeled RL. Othes like myself are skeptical of NeSys - https://huggingface.co/blog/dodgyQKVattention1/differentiable-intelligence-and-nesys
“a single function spanning 3,167 lines with 486 branch points and 12 levels of nesting"
I pity the human who has to maintain Claude’s code
Or maybe ChatGPT will maintain it?
If the latter Is the case, I PT G.
From the perspective of a programmer: it’s not magical. Rather I find it an interesting application of statistics. Kind of sums up the whole field of “AI” really (what a pretentious and misleading title for something that is basically mathematics).
Useful, time saving, is helping my arthritic fingers, but my goodness is it error prone. Must be watched like a hawk.
Well, no Claude Code is not really magical, of course. But that that over-the-top-volume statistics-with-checks can be reasonably coherent is impressive and that can give a human the impression that it is much more than what it really is. And having written a lot (mostly a long time ago) it makes for a real interesting experience.
I must say, I only use the absolute top configuration, because anything less than that has been so error-prone that it is useless except for the smallest, simplest scripts.
I think 'Approximate Intelligence' is a better term than 'Artificial Intelligence', because it isn't real intelligence, but the results are an approximation of what real intelligence would do (can be worthless, can be useful). And I wonder how much will be left when people wil have to pay the actual cost+ (which will be several times what you pay for it now — my experiments cost me too much too fast, I can say).
"100% AI-written" might be technically true for the 'lines of code' but that is something else than 'independently written'. The code text may be 100% AI-*generated* (I am doing something like that in a serious experiment now) but my role in this is pretty key, even I do not type code. All these queries I do, to correct and steer it makes "100% AI-written" a misleading statement for my stuff (and theirs too, I expect). "What is writing?", one should ask.
As far as my experience goes: as a programmer using this, you need to pay very careful attention, you sometimes have to be maddeningly exact in what you want, and even then you will get a working (but piss poor) solution, you are fixing wrong directions Claude Code takes a lot of the time, it introduces weird errors in stuff that was already OK, etc.. But having said that, so far it is (apart from cost) not a negative experience yet.
@Gerben would you mind sharing what specifically your top config is? If my boss needs to force Claude code upon me, at least I’d feel a bit better about it, knowing its (sort of) working for you :)
It is more interesting to use multiple agents across models. Within model families I find with C++ they generally converge on the same approach. But where it gets interesting is to see how Gemini vs Sonnet vs Opus vs Codex implements something. It isn't really about a "top config" at least in my workflow. It is guiding the agent towards what you want and preserving your guidance for future problems (like a top config). Take the print.ts file Marcus referenced--that is 5,600 lines of mostly code. At my work, we strive to have our software readable by humans and agents. An agent will happily produce a 1,000 line main function that works for the general case. But the human doing the same thing would probably break it up unless it was a throwaway problem simply because they instinctively know they'll be returning to it again.
To get started, **really** pin down the "plan." Have it written and reviewed by Opus high, Gemini, Codex. I've found if the plan is solid, Sonnet medium can produce pretty good code without a code review (yet, still not acceptable to me or my coworkers). I'll easily spend an hour to hour and a half attempting to get a good plan that has been agent reviewed and human read (for the most part). If any of this doesn't makes sense, just cut-and-paste into Gemini and have it create an "assignment" for you to work on. I find it mostly enjoyable.
I am using Claude Code with Opus 4.6 Extended in medium effort setting. I'm a bit wary about a higher effort setting for 'thinking' (not) models because of the risk of a model getting lost in the woods. But I'm definitely not the best source to ask, other people have much more experience. @oleg is probably a better source.
* is mathematics
Being a big skeptic about everything these AI companies tout, but not a software engineer either, I just read this: "AI bug reports went from junk to legit overnight, says Linux kernel czar" (https://www.theregister.com/2026/03/26/greg_kroahhartman_ai_kernel/).
So "Producing working code patterns is not the same as programming/designing" is true, but may be good enough to (more or less) reliably spot problematic code areas and sequences? I would count this as a big win, especially with regard to turnaround times and possibly *reducing* the review load of developers.
Sadly it increases the review load - though it can be useful in that process too.
The fundamental issue is that they don’t understand anything and so will take alarming curveballs during both the authoring of code and the verification of it.
One of my favourite coding assistant anti patterns is the “fix the failing test pattern”. A test case is failing and you ask the AI to look into it. It might find the bug and fix it but equally it might (mistakenly) just decide that the test is wrong and change that instead. Test fixed.
Folk will talk about better prompts, guidelines etc. and they help - but fundamentally you have these tools that have no real understanding of what they are being asked to do generating large volumes of code in a probabilistic manner. And guess what - sometimes the dice roll is a one when you needed a 6!
An even more fundamental issue is that many of the people using the bots don’t understand that the bots don’t understand anything.
It’s “no understanding” all the way down
"The fundamental issue is that they don’t understand anything": True, that did not change.
"just decide that the test is wrong and change that instead. Test fixed" So just like scientists cherry-picking the measurement results they like -> AGI achieved :-))
As a non-software engineer, I have to trust the experts, and as a fellow of the Cult-of-Linus, I have tremendous respect for the Linux core software engineers ;-)
That said, my first impression was that more AI-generated reviews would increase the load. This is true for the standalone project mentioned in the article (cURL), but the Linux kernel people seem to have found a way to handle the flood with improved tooling which again is AI-based (Shashiko, now with the Linux foundation). Go figure...
So the software Claude Code writes is still mediocre, but its hints for improving human-written code are worth following up on ?
PS: I like this quote: "Smaller projects, he implied, have far less capacity to absorb a sudden flood of plausible AI-generated bug reports and security findings – at least now they're real bugs and not garbage ones"
Sometimes it comes up with decent suggestions of rewrites or it automatically refactors in a good way. When that happens I am amazed. Those 'good approximations of intelligence' get mixed with some really drop dead stupid stuff.
Oleg: Building a bigger stage does not automatically make for a better play or better actors?
The empirical accumulation of "lots" still equates to "a bigger stage"?
They are not just growing in sophistication. Part of the sophistication is throwing insane amounts of computer at it far below cost, and handling the results of that (the 'dark tokens' issue). That is not sophistication, it is ways to throw more volume at it.
And the play? Doesn't the play (as analogy to scads of other interests and concerns) deliver what counts but is not counted; or if counting is "all there is," the counter has missed the complex and gradient import of the play itself?
puzzled by some of the reactions below. i wonder how many of the commenters read this line “Still, Claude Code ain’t perfect, or even close” or the footnote, complaining about the software engineering.
i was clear to that i see this an important step, with many more to go.
This post comes across like you are amazed by one of the mostly poorly engineered parts of a sloppily, vibe coded, llm wrapper, as if it's some revelation.
There are tons of llm wrappers for coding. Do you think opencode, which is far superior to claude code in basically every way, is also the biggest advance in ai since the LLM?
It seems like either you are completely out of the loop when it comes to all the llm wrappers out there for coding, or you are, for some reason, glazing Anthropic undeservedly. If this post was instead in general about all the things that LLM coding wrappers do it would make way more sense.
We read it. To many, who recall that long ago you noted that when e.g. a calculator was accessed by an LLM platform it was (a) no longer a pure LLM and (b) scaling was not seen to be the holy grail. The question has never been whether only neural would be enough. It is what else needs to be integrated or bolted on. If you want the LLM to play great chess bolt on Deep Blue. Unfortunately, the history of symbolic AI produced nothing very useful. And we know why.
you obviously didn’t follow the hinton link, symbolic AI gets used everyday eg in web search and you but i am not advocating for it, and where were you for last quarter century when i was saying deep learning was not enough and getting hammered for it? 🙄
Gary, your proposed AGI model with cognitive and linguistic reasoning showed no understanding of why symbolic AI failed to deliver what smarter people than us believed it would, despite huge sums invested over more than half a century. Symbolic AI promised mountains for decades and delivered molehills. Ultraintelligence by 1980! 45 years later "it is helping with search." (Since 2023 LLM platforms integrated search, so neurosymbolic all along,) AI failed because cognitive reasoning could be a minor player. Look at the state of the world. Cognitive reasoning is not in high demand.
Deep learning made real progress in useful stuff, object recognition and categorization, speech recognition and translation, and so on. I worked on symbolic AI at MIT AI Lab in the late 70s and in or along with symbolic AI groups in the years since. We got nowhere. I worked on and alongside CYC with Doug, Guha, Gupta--incredibly brilliant people. Did CYC get used for anything useful in its 35+ year well-funded life? I don't know..Sure, symbolic can be used for a few things, but it failed at the promises, which then as now were the only things that justified the costs. JCR Licklider in his brilliant 1960 essay Man-Computer Symbiosis wrote "It seems worthwhile to avoid argument with (other) enthusiasts for artificial intelligence by conceding dominance in the distant future of cerebration to machines alone." Since his AI friends put ultraintelligence at 1980, he compressed the steps into 5 for making computers useful tools and then 15 to get speech recognition, language understanding, and ultraintelligence. He concluded, "The 15 may be 10 or 500, but those years should be intellectually the most creative and exciting in the history of mankind." It is now 60 years and counting, and not everyone is in a hurry to see it arrive.
rebooting AI has an extensive critique of symbolic AI
i am going to exit the conversation until you base your arguments on facts
DL is not enough for what though? AGI? Well what is AGI? what is general intelligence? How does it come together? Does it come together because we recusively reflect on experience with the conscious mind? The neurons in the brain are a bit of a mystery but they have absolute flexibility in learning - but the neurons in neural nets are much more limited. i understand and agree with being sceptical of absolutely absurd claims about AGI but at the same time i think the NeSys case is very uninspired in my view because it feels like we are attempting to stick some code to the weights of a model and hoping it will work. maybe i just havent met the right DL engineer but ive yet to see a convincing case for why NeSys solves the DL issues with regards AGI
my personal (and maybe even professional take) is that AGI will likely (if it is possible) be the result of advanced work in RL (Dreamer 4 for instance) or LeCun's work regarding JEPA (LeWorldModel etc)
I don't get the sudden euphoria. Claude Code has some if statements? Is that it? Most agents wrap LLM chatbots and can have some if statements. What's the big deal?
My own markdown files that instruct Claude Sonnet have plain English if statements. What's new in Claude Code that is so groundbreaking?
The article that Gary linked to doesn't sound like symbolic AI to me. Claude does some pattern matching on profane words, probably for efficiency, but it's not like this is the "secret sauce" that makes Claude Code good. Claude code is *just* an agentic front end that calls the LLM and organizes it. It is not the brains of the operation. Gary suddenly claiming vindication sounds like he has finally realized that LLMs are far, far more powerful than he could have ever imagined. It seems that he wants to claim to be right all along, so he says that a regex is the secret sauce to Claude Code's performance. I'm open to a counter argument, but as it is, Gary's claim is extremely weak.
All it took for him to go from "Claude sucks and everyone who keeps saying it doesn't is dumb" to "Claude is amazing" is a slight hint of neurosymbolism
I agree with the gist of your comment, but I think it's far too generous to Gary. It's stretching the meaning of neurosymbolism to the breaking point to call checking for swear words and navigating an API as even a "slight hint of neurosymbolism". If that's neurosymbolism, then neurosymbolsim has always been a part of LLMs from the very beginning. E.g. Checking usernames and passwords when logging into ChatGPT.
i didn’t say claude writ large was great; i said that the specific application to code was impressive. which by all accounts it is
I had the same thoughts. Gary should explore his feelings - the desire to be right vs. the truth.
Fully agree with your take here.
Something feels deeply off about this post.
Part of it is jumping on the "Claude Code is the biggest thing ever in the history of AI" out of literally nowhere, and without building a case for that other than just asserting it outright. (After months of silence or near silence on the topic.)
Part of it is the thin reed of "hey there's some actual code in Claude Code, not just LLM API call-outs, who'da thunk it, must be neurosymbolism so I was right all along". I hadn't known that ordinary software developers going about their day jobs are in fact neurosymbolic engineers!
Part of it is the overuse of classic hype/clickbait phrases "this changes everything", "biggest advance in AI since...", ...
And the last part that rankles me is that it seems to be derivative, based on secondary sources: I don't see any evidence of reading the leaked source code. It seems instead that the post is based on analysis in the linked article, https://techtrenches.dev/p/the-snake-that-ate-itself-what-claude, rather than doing the work of reading and analyzing the code. Which seems like a lazy approach for an article that, based on that code, declares "this changes everything".
The post is probably co-written with AI and yes the AI understanding on display is a bit questionable.
This is a ridiculous takeaway from the Claude Code source leak. From a software perspective, the leak showed Anthropic’s engineering is a dumpster fire. Thousands upon thousands of lines of spaghetti code, redundant flows, wasteful paths, moronic decisions, endless hardcoding…
A PDF parser that ingests the entire file instead of just the first few bytes to check the header. A JPEG processor that may compress the same image dozens of separate times for a single job. A JSON validator that just reruns the model again and again until it’s close enough to the correct syntax. Desperate prompt engineering blurbs to counter prior vibe-coding errors. These were all in the Claude Code source leak.
It’s a terribly flawed model that needs a big ugly mess of non-AI duct tape plastered on top, praying it will mitigate the piss-poor performance inherent to all LLMs. How Marcus could see this and think it’s an “advance” or a “victory” is beyond me.
💯
This is interesting - I couldn't find the actual "complete source code" when I looked, only mirrors of the "source map" (which seemed to contain only 39k lines of typescript, rather than the purported 512k). This made me suspect it wasn't a genuine "leak" and was more of a deliberate PR stunt. But if it really did leak, I would not be at all surprised if what you're saying is true. Is there any way you can evidence the above? (You can PM me if it's easier.)
People have used the leaked source map to reconstruct the source code, all public copies of which I think have been DMCA’ed by Anthropic at this point.
Here is an analysis I liked from a software engineering perspective: https://neuromatch.social/@jonny/116324676116121930 (it’s a very long thread)
Thanks for the reference. However, to be honest I'm still not convinced this leak actually happened. The thread you link to is indeed a long one - however, the majority of it seems to be a discussion about the GPL license, with only a few code snippets posted towards the top of the thread.
I think you may be following down a different thread. (The website’s UI seems very unintuitive for long chains.) If you follow down only the first author’s chain of replies (by @jonny), you can click the last reply and see the thread continues: https://neuromatch.social/@jonny/116325622817542311
Ok you got me - it's really very interesting in fact, now I look closer. Not to mention hilarious! The reason I suspected the leak was fake and a PR stunt was because as soon as it was reported the media started saying how amazing the code was, which I immediately didn't believe. Also I couldn't seem to find the code. And who has ever heard of a major tech company leaking their complete source code before? It's without precedent. But what I'm now thinking is: it seems these guys really have been using Claude to code Claude - which on its own is bonkers - and so this leak is no doubt just a symptom of their incredibly low quality development process. Claude leaked itself essentially. Hardly surprising given it's just a chatbot with no fundamental understanding of what its saying or doing. I'd really love to see that code myself though - I'm intending to write an article about the inappropriateness of LLMs for software development, and the ammo I could get from that would be pure gold. (Or perhaps I'll try to get in touch with the dev who wrote that thread?) Anyway thanks for putting me straight on that - you've been a big help.
He Protests Too Much
I’ve shared many of GMs great posts with friends and colleagues - as a very healthy counter-balance to the frothy, self promotional narrative pushed across the AI spectrum. But, this post by Marcus is an ironic mirror image. CC 4.6 has been in our hands for many months - and NOW it’s brilliant bc it’s got an element of neurosymbolic make up? Where have you been?
i didn’t say it’s brilliant; i said it’s a real advance, and that it was flawed and not AGI.
it’s the first thing that really impressed me in few years.
Gary: A classic case of the reading problem of "reading into" OR "overlooking" what "we" do or do not want to understand. Just so you (Gary) don't miss my point, I am referring to the oversights many undergo when keeping up with your work.
A technical name for the occurrence in my field and branch is to be involved in a "flight from understanding." Of course, we all miss allot of meaning, especially in technical reading, which is why several readings are often necessary, even of our own work.
BTW, see my references to a "Language and Self-Presence" paper in my interchange above with Gerald Harris.
Personally, I praise Claude at every meal:
Claude is good, Claude is great. We thank It for our food. Amen
This has to be at least in part satire. A giant if-then conditional is just basic 100-level logical, deterministic programming.
And weirdly enough, its still good enough for lots of use cases 😂
And the #AGIAMania Lie turned it into a multi billionaire enterprise 🫠
Very interesting. Then again, anthropic plagiarized my uncle’s book, among many others.
But I doubt it explains this turn of events.
I’m happy for your vindication. It’s the vindication of Chomskyans over Skinnerians. Scaling is just another empty vessel theory—pour in data and out comes truth and reason just doesn’t work.
I’m eager to see the next step.
they ripped off my books too and i have been very negative about them in several other essays. on the whole i am mixed.
Yes—I think you mentioned that in Taming Silicon Valley.
I reviewed it and three other authors’ works in my piece:
https://scirepopulumetpotentiam.info/2026/01/07/dystopia-and-the-four-horsemen/
CNP Slagle: I literally cringe when I hear the word: "PLAGIARIZED."
Stole is a more accurate term than plagiarized in the case of AIs trained on entire copyrighted works for the express purpose of profit by the AI companies.
Yeah, it’s an ugly reminder of who we are, even when we think we’re not. His book is devotional, so I doubt it was useful for coding. But the payout will be good for my uncle for the second it lasts.
CNP Slagle: I still don't think this, but I have found myself thinking about these people who do such things (and apparently with an unquestioned sense of privilege) as different species, or at least as historically regressive . . . so thoughtless and self-amazed/serving are they as to never step outside of themselves to understand or even think about, much less care about, what they are doing to other people in the world.
I am presuming, of course, that, like me, your uncle spent a few minutes, at least, writing his book and other works.
I do understand the sentiment and I am in no way a fan of the way LLMs are trained, but objectively scaling has worked and scaling is the first major breakthrough that lead to the higher level performances in these models. It is hitting a point of slowing progress and now the real potential is clearly Fine-Tuning. But scaling has been one of the frontier discoveries in DL the last decade. That is an objective truth of the field.
Scaling has always been a technique to marginally improve model performance—DL by no means premiered the idea. It’s very easy though to construct problems (and most “AI” types fall in this category) which lie far outside that frontier.
Intelligence as we understand it has nothing to do with scaling. Children learn language with almost no examples. They create their own if withheld (a horror but such cases exist).
Teaching ChatGPT to ape human language shouldn’t require a trillion dollars. That’s not an advancement.
Are people better off because of the bubble? Not even high tech is safe—hundreds of thousands lose their jobs to machines that can’t replace them.
I will confess my own real fear– – It isn’t that I believe the current AI craze really can replace me in the diverse set of problems that I can solve. It is that hiring managers and owners will believe that the can replace me. After all, if artificial intelligence could do everything that I can do, ostensibly, I could simply summon it to solve those problems so that I could work on other ones. I don’t believe that’s the case, even for a second. But you have to understand the complexity of the problems I can solve in order to know the difference. That is something that owners and hiring managers often can’t do.
The economy is in bad shape. We have no reason to believe that tools constructed by these companies will do anything other than cause harm. Resources are limited – – a simple solution for transportation is high speed rail, and better public transportation. Spending all of our money on driverless cars when we barely scratch the surface and only can manufacture cameras to spy on human drivers to torture them into peeing in the bottles. This is just one example of how this technology has sarcastically helped us.
I wrote a piece on this which labels Gary Marcus is one of the four Horseman of the apocalypse. Not a great metaphor, but a metaphor. There are more examples like this. The four books I review are very good.
https://scirepopulumetpotentiam.info/2026/01/07/dystopia-and-the-four-horsemen/
It really hurts to have to write this, but this article reads like one of those Trump victory announcements.
Yes and I think its impact on coding will be profound. What I can't understand is why so many people are still insisting that we're about to get asteroid mining or unlimited clean energy or radical life extension - those are not problems of code
Okay, we get it. Eat a piece of cake, enjoy a toast of champagne but is it all about you?
If his theories are validated by the existence of an overgrown jungle of `if` statements...they must be some theories.
Full disclosure Gary: how much is Anthropic paying you?
nothing. i don’t work for them; i write what i think.
I don't see why you'd lie about how great Claude is without getting paid for it.
Yes, Claude is impressive - it is helping us both in design and coding.
And, yes, the logic scaffolding is crucial to this breakthrough.
But, 'The Best of both Worlds?' -- hardly.
From my article:
"As pure scaling of Generative AI is yielding diminishing returns (ref)(ref)(ref), we’re increasingly seeing efforts to add symbolic and other systems to LLMs (ref)(ref)(ref). These add-on may take the form of reasoning engines, knowledge graphs, or world models (ref)(ref)(ref). These additions are obvious incremental improvements that go beyond just RAG.
While these efforts can clearly help overcome some of the limitations of LLMs for certain applications, not only do they require specific custom setup and engineering, they also suffer several inherent limitations and thus cannot be the right path towards true fluid, adaptive human-level AI or AGI."
"In a way these hybrid systems offer the worsts of both worlds: The brittleness of symbolic AI, the hallucination and massive data and compute requirements of Generative AI, plus the inability of these back-prop systems to learn incrementally in real-time, i.e. to update their core model."
https://petervoss.substack.com/p/why-neuro-symbolic-must-be-integrated
I wouldn't wish to lose either my deductive nor my inductive reasoning capabilities. (Maybe that's just me.)