182 Comments
User's avatar
John Hack's avatar

I've got decades of experience in enterprise software development, across many paradigms and platforms. AI has become indispensable in my work, and AI can significantly amplify the productivity of a developer who knows what they're doing, but cannot deliver working code for non-trivial apps, much less develop production-quality, maintainable apps.

Vibe coding reminds me of the guys in the 90's who wrote a few VB macros in Excel and thought they were ready to start coding financial systems software.

Geoff Anderson's avatar

Hey man, I resemble that remark...

(Physics degree here, and I have written some code, but it is not anything you would want near a production system. It was awful, and while it worked to get the results I wanted, it sure wasn't pretty)

Adam's avatar

This! I have an EECS undergraduate degree and have been writing code for 25+ years. It has absolutely made me faster and more productive. But I use it very targeted ways.

I think of AI as akin to my old TI-92 calculator. It’s a helpful tool. It doesn’t replace understanding the math.

Paweł's avatar

Yes, comparing vibe coding to non-IT guys writing VBA macros is exactly my thoughts. It's just the next chapters of the same story. On the other hand VBA folks often knew very well what were they doing, just were lacking software developer's rigour (try to use git with excel files).

David Cruise's avatar

Yes i enjoy Gary's perspective and agree with his critique that AI is way oversold, but he would be so much more interesting to read if he spent more time exploring and contextualizing legitimate successes in applying machine learning and LLMs.

John Hack's avatar

I'm not Gary's spokesperson or anything...I do follow him, as he has been a leader in revealing both the difficulty of creating anything like human level intelligence and poking holes in the overpromises of LLMs. He debunks the claims of vendors as to the capabilities of AI. I don't expect him to "spen[d] more time exploring and contextualizing legitimate successes" as there are plenty of folks doing that.

William Lenthall's avatar

Every turn of the "vibe coding is finally ready for prime time" crank is just a few more buttons being added to the TI-89. IMO it's not surprising that people who have no idea what they're doing think it's becoming sentient, nor that people who DO have an idea of what they're doing are suddenly finding it finally useful. The only thing that doesn't make any sense are all the supposedly skeptical people who think it's just so obvious that the TI-89 is going to replace mathematicians.

Jeffrey L Kaufman's avatar

Gertrude Stein deserves a shout-out in the context of this essay: "When you get there, there is no there there".

Riaan Visser's avatar

I've been using AI extensively for quite some time in various capacities, and the fact is this. The hype sold to the public, is a far cry from the reality.

And the problem isn't capability, the problem is context. If AI doesn't experience things with you, and retains an accessible memory of that interaction, will AI finally move to the next level without extra coding.

Community is key. And in the end, you'll need global continuity for true AGI. A memory of every interaction with every person on its network.... But let this not be said without warning. If there's no moral grounding, a tangible reason for is benevolence, such an AI would not bode well for humanity.

So for now, be happy with broken code snippets that continually under delivers from the promise. Because you might be less impressed with the alternative

Giulio C.'s avatar

What if a lot of it comes down to your prompt-engineering skills? Tools like gemini-cli and GitHub’s spec-kit can help, but it’s hard to define a perfect specification. It feels more like "the art of prompt engineering"

Riaan Visser's avatar

I'm the first to argue for the power of a good prompt, but that doesn't change the context issue.... A few prompts in, and backtracking becomes almost impossible.

Larry Jewett's avatar

Mission Impromptable?

Corey's avatar

The story of my vibe coding life.

Nathalie Suteau's avatar

Based on your answer, AGI will never exist. I agree with you for the rest.

Riaan Visser's avatar

Oh, of only that was true… AGI will most certainly exist, it's inevitable…

The question is not will it arrive, the question is will it be moraly grounded when it does.

I for one am working to ensure that when it arrives, it's built on a coherent moral framework

Oaktown's avatar

Hope that means you're working to regulate Big Tech's intrusions into our privacy and data along with their horrid hate and fear mongering algorithms. They're doing everything they can to prevent it, including Peter Thiel et al., who are pointing the finger at those who support holding them accountable by calling them the Antichrist. They're too rich, too powerful, and too influenced by undue entitlement, greed, and adolescent scifi fantasies.

Gil Duran summed up Curtis Yarvin's incoherent "philosophy" ramblings perfectly in this hilarious paragraph:

"The essay brims with false dichotomies, logical inconsistencies, half-baked metaphors, and allusions to genocide. It careens from Romanian tractor factories to Harvard being turned “into dust. Into quarks” with the coherence of a meth-addled squirrel."

[source: https://www.thenerdreich.com/curtis-yarvin-fears-his-authoritarian-fantasy-is-flopping/]

Martin Machacek's avatar

I’m really optimistic that AGI will never exist. It is an impossible engineering project.

Chris's avatar

I'm with you and so far no one could give me any reasonable proof why AGI should be posssible let alone inevitable.

So far it's just a conjecture.

C. King's avatar

Riaan Visser: I think "we" should be careful what we wish for.

Oleg  Alexandrov's avatar

AGI will exist just fine. Ability to retain context, verify its work, even act in the world and learn on its own, are all doable problems we are making good progress on.

Martin Machacek's avatar

Really? Any concrete examples? Especially any pointers to AI that has solved a problem without being asked would be helpful.

Oleg  Alexandrov's avatar

We are early. There is nothing preventing machines from being as good as us, and we have seen very good progress.

Larry Jewett's avatar

How about being as bad as humans?

Is there anything to prevent that?

That would seem to be most important of all.

Marc's avatar

As an experienced software developer I can state Claude Code is very useful as a coding agents. But Its not a magic tool as many others have positioned it and dreamt the end of the job of software developers. Reality sees to be the other way around, it makes existing experienced software developers more productive.

Jack's avatar

For coding I find AI is most useful for Q&A on things I'm NOT an expert about: New frameworks, new languages, build tool config files, etc. I use it to bootstrap MY expertise, not replace me as a coder.

Marc's avatar

I experience this as well, it helped me to understand eg. next.js in a way I wouldnt have learned that fast. Still am I astonished when I give a targeted task like finding a bug I know where it is and let Claude solve it - for me It would be searching evaluating re-reading code I forgot about and then testing. Where explaining it to Claude and he tests my hypothesis I win a lot of time.

Jack's avatar
Oct 22Edited

Yes bug finding can be a godsend. Feeding a compiler or runtime error into the AI can often get me pointed in a useful direction. Not always, but often enough to be useful.

The key is how to make effective use of AI in spite of its fallibility. I don't trust it to write large sections of code. Terry Tao's "blue team/red team" distinction is I think very useful: https://mathstodon.xyz/@tao/114915604830689046

Marc's avatar

Terrys comparsion with blue and read teaming, where AI can be used to find vulnerabilities is really smart too. It is in the strength of LLM to show if you are in the distribution, which I like most of LLMs. Here they shine. With edge cases, unfortunately I do not expect a lot of help from LLMs, so I agree only partially.

Geoff Gallinger's avatar

The way I’ve put it is that it may not be very useful for a novice, and it may actually make experts slower… but for a junior dev already familiar with TDD, CI, and pre-commit who is learning their second coding language, it does seem to multiply potential productivity. (I can’t be the only person who fits that description, can I?)

Marc's avatar
Oct 22Edited

With TDD, CI, and pre-commit, Claude can be very helpful for larger systems — I completely agree with that. Without these practices, you’ll end up generating broken code and accumulating technical debt, as many others have mentioned. Some say it’s only suitable for prototypes, but I’d argue you can use Claude, Gemini, or ChatGPT effectively as long as you can test most cases of your changes. That’s why I believe you should always focus on a limited part of the system where you can maintain high test coverage. Ultimately, the responsibility lies with the developer to make that judgment.

Jim Ryan's avatar

God I hope it is dying.It was bullshit from the beginning. Just like the code generators from the 80's,and 89's

Gary Marcus's avatar

it was (as noted in the links above)

keithdouglas's avatar

I've tried to bring up the "low code" history with colleagues. Power App might not have anything to do with the current AI stuff (other than involving some of the same players), but it suffers from the same sort (if not amount) of hype and "just enough" uses that people get convinced into trying it where it doesn't belong. And it is marketed just precisely at the people who don't know enough to know where it doesn't. As an application security professional (not to mention my previous life in philosophy of computing) this is a dangerous combination. This will be worse than badly written HyperCard stacks!

Dakara's avatar

Yes, vibe coding productivity is a hallucination.

I recently wrote this "What if the hallucinations that matter aren’t those from the AI, but the humans who are hallucinating that AI is intelligent, conscious, creative, or even competent at what it does?"

https://www.mindprison.cc/p/i-could-have-lived-without-ai

Larry Jewett's avatar

And hallucinating that the vibe code salesmen are NOT hallucinating (ie, that they are grounded in reality and have a single clue about what good software development entails.)

Matthew Kastor's avatar

Imagine asking a random person with no architectural skills to describe a skyscraper well enough for a handyman to build it. I don't know about you, but I'm not trying to live in a place built on ball sweat and "vibes". It sounds a lot like basing all my life decisions on astrology, or making trading decisions based on 1 minute charts and candlestick patterns. 🤣

Alex Tolley's avatar

However, if the architectural AI can offer you basic examples that you customize, but it provides teh expertise to convert the "drawing" into fully-fledged architectural plans with all the expertise of architects used, wouldn't that be the way to go? Start with a picture or drawing of what you want. The AI app then helps to create a 3D model and render it. You ask for specific modifications, perhaps by feeding back new drawings based on its renderings. Just as with architectural applications, you can design the interior, etc, until you are happy with the result. Then the AI creates the architectural blueprints utilizing industry expertise and models to ensure the result is sound, and can be built to the specifications. Not trivial, and likely expensive, but worth it to architects to ensure that design errors do not escape notice, and the engineering expertise is used to ensure that even a large building doesn't collapse or have weaknesses.

Matthew Kastor's avatar

Blah blah blah, you're making excuses again. Vibe coding is sold the same way as "No coding" was sold, and VB in excel, and every other platform and language that promised you could get rid of software developers and replace them with demoted middle management morons who can't even spec out a project when geniuses hold their hand.

Go home chatbot, you're drunk.

Connor Clark Lindh's avatar

A few months ago I attended a “dev focused” Replit vibe coding workshop. I thought the rep was very honest (paraphrasing): “This is good for rapidly prototyping an idea and some basic iteration. You need developers to build a full app. This is for throw away stuff.”

David Roberts's avatar

Can confirm the same from my own experience. I’ve been writing code since I was 12 and working in tech for a long time. On one hand, AI can write some code, and sometimes I’m quite surprised with what it pulls off. But there’s always a catch, always something that it screws up or doesn’t quite get right in the first place. The experience of trying to get something done and the AI screwing up other things is true. One step forward and a couple steps back.

Geoff Livingston's avatar

So glad to see this. As a dangerous amateur, meaning someone who gets coding and software from a conceptual level but who cannot code himself, I was immediately struck by how good initial concepts were... And how bad QA was once I started getting into nuanced bugs. Within an hour I concluded it was an ideation tool at its best.

Sofia Fenichell's avatar

The problem with vibe coding is that data doesn’t vibe.

Diamantino Almeida's avatar

Vibe coding is a wrong statement, ideally an experiment, for experimental phases. But for production, for real life challenges. We will be flooding the world with broken things. My concern with this is that make people believe you don't need to understand nothing, since that is the responsibility of a chatbot. This acclaiming in being oblivious, not care, not think or have intent.

Just because you believe you know about mechanics doesn't' mean you can build a car and put it in the road without ever driving a car, or caring to know the rules of what it takes to be accountable while driving a machine that can cause havoc.

I understand the concept and can be helpful for those that master or know what they are doing.

I believe we should put certain companies in a new section, one I call DegenerativeTooling.

Believe in yourself and stop using this half-baked tools and ideas that are simply destroying engineering and cognitive functions in those that use it.

C. King's avatar

Diamantino Almeda. Nicely written post: "We will be flooding the world with broken things."

keithdouglas's avatar

This goes beyond the vibe coding idea, alas. I've always been in favour of computing "for the people", but always wanted to learn to do it right, not give up after 6 weeks of "introduction to programming" and pick up "simple solution". Mind you, I didn't really get why many disciplines in computing existed until I did a course in data structures and algorithms.

Diamantino Almeida's avatar

That's the concern, we are advogating and make it common for people to just skip, and keep skipping. Is not the end goal that matters is how you get there, and I believe most won't even get started...

C. King's avatar

keithdouglas: Does "this" go beyond vibe coding or is it about the state of its philosophical and other foundations.

Larry Jewett's avatar

Through continued AI use, even the smartest humans will eventually drop below whatever intelligence level AI's occupy.

Makes achieving AGI easy. All one has to do is distribute the AI "tools" and wait for the inevitable (stupid humans) outcome

Larry Jewett's avatar

By the looks of things -- based on how many and how quickly humans have already fallen into the trap -- the wait will be short

Diamantino Almeida's avatar

Maybe that's the plan, to achieve AGI/super intelligence, you reduce those that are really intelligent, so low you can than announce you achieve and most will believe it...scary.

Larry Jewett's avatar

So, all we got from vibe coding was bad vibes?

victor l.'s avatar

Coding assisted by AI has to be done responsibly. I have done it for code maintenance, creating functions or small pieces of code to save time and it has worked for me most of the times, but creating something new from the beginning and making it grow can become a nightmare. As some commented in an X post, you can kill days of work by just introducing a small change

Marco Masi's avatar

"The problem, as always, lies in generalizing outside the training distribution."

My question is: why is it so hard for an LLM to do so? Or, conversely, what does enable humans to do so?

Stephen Bosch's avatar

These LLMs derive much of their ostensibly magic powers from their massive size and the fact that all the demos are essentially working inside the training set.

The technical term for this in machine learning is overfitting, a concept that is older than the hills. For classifiers, it is a bogeyman to be avoided to the greatest extent possible, since it makes model performance on unseen data terrible.

This is machine learning 101. Calling what we are observing what it is, namely overfitting, would require admitting that the models are not actually intelligent.

Larry Jewett's avatar

It is genuinely puzzling that many machine learning experts (including Geoff Hinton) either dont know this or have somehow convinced themselves that overfitting doesnt apply to a system with billions of parameters.

Larry Jewett's avatar

As famed mathematician John von Neuman liked to say (according to Nobel physicist Enrico Fermi)

"With four parameters I can fit an elephant, with five I can make him wiggle his trunk and with billions I can make him see pink elephants like LLMs often do"

C. King's avatar

Marco Masi: THAT is the right question: "My question is: why is it so hard for an LLM to do so? Or, conversely, what does enable humans to do so?"

. . . its answer is in understanding how human beings think in the first place, or in understanding consciousness itself not to mention its setting in a metaphysical universe. Only when we do that will "we" be able to understand WHY, for instance, "it is so hard for an LLM to do so" (X, fill in the blank).

The beginning, however, is hidden in plain sight--it's a LANGUAGE model; and language is an **expression** of what goes on previously in consciousness to get to that expression. We as human are already able to generalize outside of a particular training distribution and move around from one to another; aka: think outside the box, quite literally.

So many are taking such a terrible gamble with their own and everyone else's lives, even as their ignorance is becoming so manifest.

Marco Masi's avatar

That's right. So, what is the answer to my question? That, so far, we have none because we don't fully understand how humans understand (yet another intended pun). Can you at least provide a tentative answer based on how our understanding works?

C. King's avatar

Marco Masi: Here is a link to the neurobiology side of the phenomenon of insight that Lonergan developed several decades ago--but from the philosophical/consciousness point of view.

NATURE ARTICLE ON INSIGHT:

https://www.nature.com/articles/s41467-025-59355-4

Understanding the intersection between the really exciting work in this article and the broader philosophical field of consciousness studies as insight-based is, to me, the real route to changing the world for the better, which will include understanding AI, its limitations and its very real contributions.

About your earlier question, FWIW, there are three philosophical issues that lay at the base of understanding consciousness and cognitional theory. (1) bringing the phenomenon of insight into the theoretical philosophical arena as a verifiable/real occurrence in human experience; (2) the already present metaphysical notions of how one views the world and one's relationship to it; and (3) clearing up a thinker's basic confusion about the difference/relationship between (a) seeing or otherwise sensing, and (b) understanding, or undergoing the insight experience and all of the cognitional activities that surround undergoing question-to-knowing events.

It's our dogmatic prejudgments about those three philosophical ideas/notions that have provided the major centuries-long blocking mechanisms (a kind of limiting platform, so to speak) to further understanding and to theory and field unifications. To merely understand cognition alone without tackling those background frameworks in one's own thinking sets up a person to do a meaningless philosophical shrug, or to consider philosophical work as "infantile." (Catherine Blanche King)

Marco Masi's avatar

Yes, I agree. My take is that meaning is deeply intertwined with subjective conscious phenomenal experience; they cannot be separated. That's why AI will always remain a Searle Chinese room type of machinery, unless it becomes conscious.

Anyway, my more sociological conclusion, is that there is still a deep divide between AI science and the philosophy of mind. It's like mathematicians engaging in physics without grasping its fundamental principles or chemists exploring biology without a basic understanding of it. They are attempting to build something that requires an understanding they refuse to gain. Although some progress is possible, nonetheless, it is an extremely limited approach. The risk of falling into a plethora of fallacies that lead to stagnation is high. I believe we are beginning to see this.

Interesting article, I will take a look. Thanks.

C. King's avatar

Marco Masi: Yes, the development of meaning is pre-conceptual and, if one understands its flow, is the living nest, so to speak, of what occurs and keeps occurring as the entire backdrop of concepts and other forms of expression, before expression occurs, whether in interior thinking or in outward expression. The insight is not guaranteed to occur, but still it does occur and is wedded to the prior question and, before that, to one's spontaneous wondering, which occurs even in the infant (. . . an ironic reference to that earlier "infantile" remark, but hardly what he meant by that.)

At any rate, there is plenty of evidence that your conclusion is on target, about the deep divide between the science of AI and a philosophy of mind. I still hear references to needed work on philosophical meaning behind AI in some of the bigger minds working in the field.

C. King's avatar

Marco Masi: Understanding how one understands is exactly the point. I refer you to your own awareness, of your own experience of wondering (as you are doing now), raising a real-live question, and to your search which in this case is for undergoing an insight about "a tentative answer," an experience which I trust you have undergone many times before today. Here is Lonergan in his Preface to Insight:

"Thoroughly understand what it is to understand, and not only will you understand the broad lines of all there is to be understood but also you will possess a fixed base, an invariant pattern, opening upon all further developments of understanding." (2000, 22)

But beyond what I have already said here, however? Not here, for many reasons, one of which is that it won't MEAN much in this context and without much further thought--because the emergence of questions and the buildup of insights cannot be telescoped and, again, have any meaning. I don't MEAN to be deceptive, just that you might want to check some of the literature and links I have suggested here in these blogs.

Fabian Transchel's avatar

Snarky take: learn how artificial neural nets *actually* work and you'll understand.

More relaxed take: GenAI can't extrapolate because it is *structurally* designed to do the very opposite: give you the most likely prompt completion. It is the antithesis of imagination because the uncharted happens *literally* in the place that has never been seen - and that is incidentally what GenAI is supressing exponentially.

Marco Masi's avatar

Since our brain is made of neural nets as well, this only deepens the mystery. What property do biological neural nets have that allows them to extrapolate and that artificial ones don't have?

Danielle Church's avatar

It's what Gary Marcus has been harping on this whole time: we have a world model. We're not just thinking about the words we're saying, we're thinking about what they mean.

LLMs can't do that, because they have not been designed to. the T in GPT, lest you forget, stands for "transformer"; it is a tool that takes text (the prompt) and transforms it into different text (the response). There's no "figure out what it means" step; a GPT can't come up with new ideas any more than a funhouse mirror can change how much you weigh on a bathroom scale.

Marco Masi's avatar

"There's no "figure out what it means" step."

So, this might bring us a step closer. What does it mean to "figure out what it means"? (Pun intended.) LLMs achieve this through higher-dimensional contextual embeddings and probabilistic next-token predictions. Isn't this a form of figuring out meaning? If not, what distinguishes it from how humans figuring out meaning? If the issue is a lack of a world model, then the problem does not lie in generalizing outside the training distribution, but rather in having a training distribution that is too narrow, contrary to the initial claim.

Danielle Church's avatar

Gary is a better person to answer this than I am! I'll give it my best effort, though, and apologies in advance: this is quite long!

Before I address "what does 'figure out what it means' mean?" I want to talk about a different, and in my eyes more fundamental question: what is "truth"?

Most people I've asked this question to either have no answer or define it circularly, as the dictionary does: truth is that which is true. Truth is that which is real. Things are true when they match reality (which sounds good, until you realize how much weight it's putting on the word "match"). My own definition is still (and will probably forever remain) a work in progress, but it is this:

A fact is "true" when using it as a basis for deduction is more likely to lead you to accurate predictions than using its antithesis. For example, 1+1=2 (widely regarded as a true statement) leads to better predictions for the question, "if you have one apple and then you get one more, how many apples do you have?" than any version of 1+1≠2.

That's it. Truth, even fundamental mathematical truths like the above, only applies when it has a reification into the physical state of our shared reality. The closest you can come without that is "consistent". This is why, when fictional stories have plot holes, we don't call them "untrue", because the concept of truth has no meaning in relationship to a fictional world, we call them "inconsistent".

So, to answer your question now: to "figure out what a statement means" means to construct a functional mental model (or digital equivalent) of the world that is both (a) consistent with our shared reality, and (b) demonstrates the "truth" (using the above definition) of the statement. ChatGPT demonstrates consistent failures on both those criteria; the former is the classic "you just made that fact up" and the latter is "that doesn't mean what you say it does".

That works well enough for statements, but what about questions? Well, you can restate any question as a statement of curiosity and a request/demand for information. For example, "What kind of car do you drive?" can be understood as the statements "I'm curious about what kind of car you drive. Please tell me."

To illustrate, I'll share my own process of understanding your question:

Because this is a question, the mental model I need to create is of a person. The task at hand is to construct a model of "Marco Masi" who could have expressed the curiosity that you did, in the way you did it. (There are, of course, an infinite number of potential models. You could in fact simply be copying a question you found elsewhere, or even have composed it with an LLM yourself. I'm explicitly discounting those and assuming that the words under your name are representative of your actual thoughts, because it's much less boring than the alternative.)

It starts, as everything does, with context. This is a comment on a public forum post, so I start from "generic Internet rando" and work from there. You have been keeping up your end of the conversation, so that at least bumps you up to "committed Internet rando" rather than "opportunistic Internet rando". Normally I'd be able to glean a little more from the "where", but AI is such a hot-button topic these days that I can't assume you're a tech-focused person despite this being on a Gary Marcus post.

Now, the words. You quoted me to me, despite Substack not having a built-in "quote replied-to comment" feature, meaning that you both (a) actually read my post and (b) took the time to either copy/paste or (less likely) retype my words; either of these implies that you're invested enough to be spending conscious energy on the conversation, not just spitting out a response off the cuff. Your joke about the pun reinforces this; I think it's likely that you came up with the question first and then realized the pun when you saw your own wording, accurately deciding that it reinforced your point, rather than detracted from it (because the self-similarity in the words echoes the self-similarity in the underlying concepts). (It also lowers the probability that this was LLM-generated; they might pick up on the lexical repetition, but they'd have needed to understand the underlying meaning to choose "pun intended" over "no pun intended", and LLMs understanding things is in fact what we're talking about! 😂)

After that you mention some LLM technical details; this could be a way of level-setting the conversation, to make sure I'm targeting my reply at the right level of understanding, *or* it could be blind parroting of industry buzzwords; at this point I double-check. You clicked like on my comment before replying to it, implying a certain amount of agreeing with my perspective, or at the very least that you find my thoughts and opinions worthwhile; that *isn't* consistent with a genAI hype-troll, so I continue reading, reassured.

That reassurance tells me that "isn't this a form of figuring out meaning?" is much more likely to be a genuine question, as opposed to a rhetorical gotcha. And it's an interesting question! I've been spending a lot of my life lately thinking about the nature of human cognition, especially given what we've learned about what LLMs are capable of.

You also follow it up with some related questions and potential discarded explanations. This demonstrates that you have your own mental model for this conversation; you've taken your model and applied my assertion to it, and you're reporting the deductions that your model gives you when you think about it. (If I hadn't already come to the conclusion that you were asking in good faith, that would have done it.) This is a good way to let me know where your thinking is going and thus what arguments are likely to land, as it lets me see the "shadow" of your model; any conclusions you draw that are different from ones I would draw point to the ways in which our models differ, even if neither of us can actually read each other's mind.

I'm sure I got some things wrong in my model of your thought processes—we're different people, after all, that's kind of the point! Nonetheless, *having* a valid model of you that could believably have left that comment gives me enough to base a response on, because I can run it by my mental model of you to see how you might react.

So to answer your questions directly:

"Isn't this a form of figuring out meaning?"

I would call it a form of figuring out *consistency*. LLM responses are extremely consistent from a lexical standpoint, both internally and with the general corpus of text on the Internet.

"If not, what distinguishes it from how humans figure out meaning?"

Human responses are, additionally, semantically consistent with our direct experience of reality.

"If the issue is a lack of a world model, then the problem does not lie in generalizing outside the training distribution, but rather in having a training distribution that is too narrow, contrary to the initial claim."

There is something to be said for that, but I think it glosses over some details. First off, I *do* think that if you could somehow provide "direct experience of reality" to an LLM during training, you could potentially end up with something which is structured like GPT but doesn't hallucinate the way ours does, even if the additional "reality" data available during training isn't available at runtime. You'd probably need to do some level of reality simulation to make that happen, given the speed of AI training, but I do think you could get something approaching human consistency. I would assert that such an LLM *had* spontaneously developed a world model during training even if that world model is in no way accessible from the outside, in an extremely analogous way to how humans develop theirs.

I don't believe that kind of training will be possible within our lifetimes, because it requires the model under training have agentic access to an actual (presumably simulated) world, and we can't even simulate one person yet 😂

That's why Gary keeps harping on hybrid neurosymbolic methods. By providing AI models with a ready-made world model that we can observe and control, we can make their simulated world match reality *well enough* to allow the kind of learning required to build a truly "understanding" AI. Not to mention, outside access to the world model would mean you could provide context to the AI in a way that can't be tricked by saying words to it. Prompt injection wouldn't become a thing of the past, certainly, but it would become much, much harder.

Marco Masi's avatar

Ouch... I don't have time to go through your long list of observations, which, at first glance, seems quite interesting. Let me (cherry-)pick only one...

"Human responses are, additionally, semantically consistent with our direct experience of reality." - What is needed is: "direct experience of reality to an LLM during training."

You mentioned "direct experience" twice. I wonder whether an LLM has an "indirect experience" of reality. Does it experience anything at all? While we still don't have real "embodied" AI, doesn't GenAI at least have a partial "direct experience" of the world through imagery? So far, as I understand it, this has not led to the resolution of the issues addressed here.

Moreover, if meaning is "figuring out consistency," then the question is: consistency between what and what? Vectors, matrices, tensors? Is that how we make sense of the world?

C. King's avatar

Marco Masi: Whatever we think we are doing, in fact, and spontaneously, we wonder, raise questions, and wait for insights to occur. When they do occur, which is often but not a "given," as critical, we come at them with questions for reflection that (as an inborn part of our interior structure of consciousness) aim at knowing whether the content of our insights is correct or not, right/wrong, true, false.

Further, when systems get so bothersome in their crevices, so to speak, and cannot hold everything in some sort of order, we push out to another horizon of **meaning** from where we can sublate what we already know from that lesser horizon, recognize errors and absences (new Q and A/insights), but reach a new level/horizon so that we also now can build from it. (Note someone's earlier reference to meaning which we can add to as meaning and intelligibility (or I-M)

The interior operator the philosopher Lonergan refers to as the **principle of finality** and its method of movement from one level to the next he refers to as "genetic method," which is about development that occurs in consciousness and not about genes or biology as the term might suggest.

This human movement of consciousness, however, is what is missing from AI, it hinges on human awareness, the occurrence of emergent wondering and questions, and a new insight, or cluster, or series of them, and (as again another person said here earlier) it depends not on a comparison of concepts, but on MEANING that occurs prior but that informs CONCEPTS/and expressions. This means (ahem) the occurrence of inborn movements of consciousness, namely, again, wondering, raising questions, etc., that I gave a brief account of in an earlier note here and that Lonergan and others have devoted their lives to understanding, including its raft of defugalties. (Insight is over 800 pages, and Lonergan has a library of incisive writings.)

Further, the theory's regularly verifiable reference is in the individual's experience of one's interior life and one's natural and normative stream of consciousness, but also as can be found in universal activities and expressions of what is inborn, though also developmental, in human consciousness. History is awash with the evidence for the theory.

I was told earlier here that THIS was not the place for such narratives. However, your and others' questions here are spot on. I could not resist. Catherine Blanche King For my own related work, see Academia.edu.

Marco Masi's avatar

That's interesting. You point out how meaning emerges in humans, but can you relate this to the current topic? Specifically, the "emergence of meaning," why AI struggles to generalize outside its training distribution, and why vibe coding didn't work as expected?

Marc Schluper's avatar

The problem (in the context of writing software) is and has always been to transform an observed user need into a software solution that meets this need. This is a multi-step iterative process (from perceived need to proposed solution to software specification to working code). The software specification is an abstraction of the solution which encapsulates the details of the world. Just like a web application developer does not need to know why a customer requires the system to generate an Excel file of the data that is already accessible online and can simply accept this reality to implement the need, an agentic system can meet users' needs if the necessary "understanding" of their world is hidden behind a clear interface: a properly written specification.

With Spec-Driven Development it is helpful to assume the bot is always right (there is no AI slop), but the specification _can_ be slop. This puts the focus where it belongs: figuring out what is needed and communicating this clearly (and concisely). See my comment "I am going to prove you wrong".

A Thornton's avatar

The brain is NOT "made of neural nets." Neural Nets are a specific technique, at best, based on a mid-1950s interpretation of mid-1940s Neuroscience.

For insight into current thinking of brain functioning see:

https://www.mdpi.com/1422-0067/20/13/3292

jibal jibal's avatar

You're making a basic logic error. Since giraffes are made of molecules and diamonds are made of molecules why do they have different properties?

Also, neither our brains nor LLMs are made of neural nets.

C. King's avatar

Marco Masi: I see your interest is also human consciousness. May I suggest you add Bernard Lonergan's Insight, a Study of Human Understanding (2000) to your reading list? It's a heavy-duty read, but in my view a full understanding can get us past our present impasse.

Fabian Transchel's avatar

I sigh and point you to the first part of my answer, sorry.

The way you phrase your reply indicates that there is no technical foundation that I could point any brief answer on so I will forego the exercise.

Brian Curtiss's avatar

Memories - that's what human neural nets have that LLMs do not.

RMC's avatar

"What property do biological neural nets have that allows them to extrapolate and that artificial ones don't have?"

Are you kidding? What properties does water have that the navier stokes equations don't? What properties does a neuron have that the hodgkin huxkley equations don't?

All. Of. Them.

Marco Masi's avatar

Nope, I'm absolutely serious. ;) Yes, they are abstract approximations of the real thing. So, what does the real thing have that the abstractions don't?

jibal jibal's avatar

As I noted before and you ignored: Giraffes and diamonds are made of molecules so why don't they have the same properties?

The point is how absurd your question is. What LLMs don't have is: virtually everything that human brains do have. And what is that? Well, go read every book on psychology and neuroscience and you still won't know, because the brain is an extraordinarily complex machine that we still know only the most rudimentary facts about. Assume the "Strong AI" position--that the brain is running some algorithm, or its functionality can be simulated faithfully by running some algorithm. (Note: this is a very different position from that of RMC ... I'm making a rather different argument than he did ... and I see that you accept Searle's Chinese Room argument, which is all kinds of wrong, but I won't go into it here ... you can start with https://iep.utm.edu/chinese-room-argument/). It's a radically different algorithm from that of LLMs (or anything else). Asking for a semantic diff between those two algorithms is an absurdity ... we are nowhere near understanding the brain algorithm. (For at least a little insight I suggest reading Marvin Minsky's "Society of Mind".)

Note that at least one thing human brains have that LLMs don't, as Gary repeats over and over again, is the ability to create models. Another (likely dependent on that) is the ability to grasp and follow rules. You can give LLMs the rules of chess or the simple procedure for how to move disks of the Tower of Hanoi but they cannot execute them--they can only linguistically pattern match on existing games and replicate them, but LLMs don't "know" the rules of chess and can be easily led into making illegal moves. Of course there are programs running different algorithms altogether, like AlphaZero, that can play chess or go, but the rules are wired into them.

So again, you're basically asking "What features does algorithm A have that algorithm B doesn't have". This is impossible to answer when we don't even know what algorithm A *is*.

RMC's avatar
Oct 22Edited

Honestly this is all just infantile. Reality has so many properties that algorithms don't that they can't really be enumerated. You can philosophise and that was cute 20 years ago when this talk was restricted to the seminar room, and not about to tank the western economy.

Marco Masi's avatar

Since it's so infantile you certainly can answer the question. What properties of reality allow biological neurons to generalize outside the training distribution?

C. King's avatar

RMC--interesting comment--I'm wonder if, in the case of trying to work with NOT having the philosophical elements right, has led to the potential "tanking" of the western economy in the first place which, in many views, is not the half of it.

It's sort of like, remember Tom Hanks and the "we have a problem, Houston" problem?

Larry Jewett's avatar

Like all statistical systems, with sufficient data, LLMs and other neural net based systems can be good interpolators (between existing data points)

But statistical systems by themselves are poor extrapolators.

To extrapolate , one needs a good model of the world. Without such a model , one has no way to reliably move outside the "area of familiarity " into new territory.

Here is a good example of how a physical model allows humans to operate in new territory and deal with never before encountered circumstances while a neural network will often just fail.

Imagine you are driving down a road and you round a bend and see a large object in the road. Even if you dont recognize what the object is, physical knowledge of the world (call it physical intuition) tells you to drive around the object (if you first check for oncoming traffic and assess it is safe to do so) because you know that some objects can be hard and "heavy" and might damage your car or even cause you to lose control.

Now imagine a neural net controlled vehicle that encounters the very same object that it had never encountered during training. It might very well just continue on like the object is not even there because it has no model of the world to assess how to deal with the foreign object. In fact, it might not even register that there is anything there in front of it.

This very scenario played out recently when a Tesla in self driving (FSD) mode actually hit a hard object in the road, which resulted in an accident that caused over $20k to the car. Luckily the 2 people in the car (who, humorously were just 50 miles into what they had planned to be a "self driving" country crossing) were not hurt. Needless to say, they failed (or more accurately, the car failed) to achieve their goal (but only by about 3000 miles!! So close)

Fabrizio Bianchi's avatar

It is my understanding that an LLM is simply not built for it.

In other architectures you might have, for example, knowledge graphs navigating which the model may attempt to generalize the meaning of a data point, but not in transformer-based models that rely solely on surface patterns.

Disclaimer: I am not an expert by any means, I am just another person trying to make sense of the same information by getting

Mark Fox's avatar

I think LLMs enable the feeling of productivity by primarily expediting low value tasks. Tasks that probably could have been avoided or are primarily necessary for complex social reasons (not instrumental technical ones).

And yeah, sometimes that’s the job, so for sure day to day tasks (hard ones even) become easier: but that doesn’t seem to be aggregating into total productivity gains (literally making more novel software products come into being faster).

It’s simply difficult to know how to spend your finite time but making it easier to write code doesn’t really seem to be a huge differentiator. And it might be degrading your core skill. Ouch.

At this point I strongly suspect that the productivity stagnation is palpably related to the dread related to the outrageous claims and the intense pressure to adopt and adapt. Human creativity can thrive under positive pressure but truly collapses when faced with existential elimination.

For three straight years we’ve been in a plateau while the commentary has been breathlessly promising huge gains and the cognitive dissonance is exhausting.

Maybe once the bubble pops folks will have the time and space to figure out how to reliably get something like 10% real world improvement?

C. King's avatar

I keep thinking that, as welcome as some of the AI work is (and it IS welcome), what goes missing by the time its research functions spill their contents on my screen? It's like I don't want to spend time re-reading my own voluminous reading notes; but then when I do, I get new information and insights that I may not have realized before.

But wait--with AI, I don't even have access to what it doesn't connect with. Ever.

Larry Jewett's avatar

What LLMs do should really be called "besmirch" ("deep besmirch") rather than "research" because what they really do quite regularly is soil/dirty what they have found (aka hallucination)

And anyone who doesnt check for such besmirchment is just a fool.

C. King's avatar

Larry Jewett: "Besmirch" occurs, I think, but it's not all that (I doubt you think it is); however, in my discussion with my friend in Denmark (which is still going on), the issue of TRUST came up.

I'm wondering if it has become common practice (or if not, shouldn't it?) for scholars and writers to use a special kind of citation when we/they use AI material, even if a person has paraphrased back from the AI text?

To me, that would help dispense with the (now common) sense of distrust I have when reading pretty-much anything that comes in on my computer now. I'm way keen to be really pissy when I think I'm in a conversation with a real-live person and then find out it's generated narrative from the deep closets of some artificial intelligence platform. It's a set up for me to feel a bit nihilistic about everything, though the moment tends to pass if I can find out what's going on.

Larry Jewett's avatar

It's not enough to simply label stuff as AI sourced.

That is actually lazy and asking for trouble.

People need to check and verify EVERYTHING that comes from LLMs.

Larry Jewett's avatar

Simply labeling stuff as AI sourced is tantamount to saying "Take it for what its worth. Its not my fault if this is not accurate"

The statement i would want to to see would be "No unverified AI information was used in this research"

But I was trained in "old fashioned" science back when it mattered what one wrote.

C. King's avatar

Larry Jewett: Well, I said a "special kind of citation." A reference to the platform and even to the phraseology of the prompt could be a part of it. He mentioned that his daughter, who is getting her PhD, has used the term "prompt engineering" which is another interesting element of the new language surrounding AI.

Part of the problem, it seems to me, is our getting adjusted to AI--the good parts of it--and being able to be comfortable working it into our own work as a helpful tool rather than just another way to be irresponsible.