Some errors yes, others not so much: "This is not just a problem with ChatGPT, by the way. The problem has been around for years. I problem in 2023, via the writings of Mathieu Acher, a computer scientist in France. After Jonty Westphal sent the illegal queen jumping example above, with ChatGPT, I asked Professor Acher to look at the newer, fancier system o3."
Presumably, he wants as many people as possible to hear this warning and pass it along. It's impossible with something so scrambled for me to feel comfortable passing it along to the dozens of people that need to read it.
Gotta get a fresh pair of eyes to proof any writing. The author's brain knows what he/she meant to say and misses mistakes because "a man sees what he wants to see and disregards the rest."
This inspired me to explore LLM editing skills. I asked ChatGPT4o (web search on) to "Please implement for me any changes that I should make to the following article to correct mistakes or improve readability." I then pasted in an article.
The LLM responded to this text stream by producing an edited version of about the first 10% of the article, and then the text "Would you like me to continue with more of the article in this format?"
I typed "Yes".
The LLM responded as follows: "Yes is an English progressive rock band formed in London in 1968. Known for their complex musical compositions ..." and so forth.
Lol, I had the same thought. I would never use ChatGPT to write for me but I find it's pretty good (though not perfect) at copy editing. And it only takes a few seconds!
This illuminates a lot of the current state of AI in tech. AI assisted code writing would certainly benefit from world models but doesn’t need them for a large subset of what people apply them to. It does give a hard ceiling though. You can’t load thousands of lines of code into context and expect an agent to model it successfully enough to expand on it while preserving stability. But a lot of coding at startups is smaller scale and doesn’t require that level of architectural understanding or construction. So the hysteria from tech is built on this heavily biased perspective that LLMs are astonishingly capable, because they are nicely suited to a very common, time consuming, and expensive form of work that they are all highly familiar with.
This is also why AI is not being successfully used in any design workflows. Because almost any amount of useful product design for technology requires a large and stable world model.
I use an AI for some small programming tasks. I suspect they run into the no-world-model problem more than you might think. After working with them for a while, it becomes very clear that they make mistakes because they don't know the motivation for what one is trying to do or, if they do understand, they don't know how to apply it. They insert bits of code that look reasonable from a textual point of view but make no sense from a human programmer's perspective.
That said, there is still a huge benefit to using them. They very quickly remind one of the names of appropriate algorithms to do further study. They suggest alternate approaches, additional features that could be implemented, etc. They have no idea which ones are germane. It's as if they are saying, "I found words and code that suggest these ideas are relevant to your situation, based solely on the words and code you have just given me." Turns out that is severely limited but often helpful.
This is exactly how I use Claude via GitHub copilot and you’ve really hit the nail on head with the statement that it *reminds* one of alternative approaches. If you use it any other way, like assuming that whatever jt does it correct, aka “vibe coding”, you’ll quickly run into problems.
But as a method for jogging one’s own memory it’s great.
Yes, but I suspect they word things carefully and it would be hard to catch them in any outright lies. It is hard to specify with any precision in which situations they will succeed and in which they will fail. Even if they said that success depends on the situation being well-represented in their training data, something they won't say, it wouldn't help much. People have no choice but to get a gut feel for it. Your mileage may vary, in other words.
Same. The more senior I become as an engineer, the less relevant writing code becomes in my day to day, but it won’t disappear entirely. One quickly learns where AI tooling is very fast with “good enough” output, and where it can’t succeed. Once that boundary was clear, my MO became identifying where is faster for me to write a clear prompt and hand the task to AI vs googling for the best approach.
I drop this link all the time, but I had a long but very interesting chat with Claude on this subject, where it said the same thing. I had asked it how it came up with a working solution it gave me and we got into a revealing conversation about how it doesn't understand the solutions it produces, it essentially just looks up whatever its training data indicates may be the best solution without actually understanding it. https://michaelkupietz.com/offsite/claude_cant_code_1+2.jpg
Claude is just producing text in response to your queries that looks the kind of text that a human would produce. Claude's text is of no use in discerning what it actually does. In short, it wasn't entering into a dialogue with you and the apparent conversation revealed nothing at all.
I recently asked Chat GPT to create a diagram of the best way to lay slabs of turf of a specific size in a plot of specific dimensions. I was very surprised when after thinking about it for quite a while it came up with the wrong answer three times, after which I gave up and did it myself in 20 seconds.
I've worked with big corporate executives on a mission to make some marketing focused metrics. Let's just say there are no limits to the creativity that will be utilized to make those metrics look good.
I could see LLMs being helpful in any kind of corporate brainstorming session, marketing or otherwise. Their inability to see accurately what you are trying to achieve makes them a goldmine of things you may not have thought about. It's the classic opening line for such sessions, "There are no bad ideas here!", taken to the limit.
I think it's terrifying ... and a terrible idea to subject innocent customers (or worse, patients in health care businesses) to dangerously faulty, non-private systems that Hoover up their data without their permission and spit out lies and disinformation.
The major players pursuing AGI appear to be sociologically stunted, spoiled and indulged, power hungry fascists (thinking Thiel, Musk, Altman, Karp, Sacks here) in a race for monopolistic power over the entire human race. They worry me more than China, the boogeyman they keep using to secure more funding and congressional support.
Thanks, Gary, for questioning their overblown claims.
I'm gonna guess that if you defined "running the business" as "customer interactions" and counted every customer call or chat into your call centre in that total, and then you get AI to answer the phones and handle the chats (and sacked 90% of the call centre staff) then by the sheer volume of calls / chats you might get to 50%
Whether any of it generates value, or customer satisfaction, is probably beside the point...
Yes, I'm sure they are playing some creative games with metrics to get that number. I've seen executives sacrifice a company to buzzwords before. There is never a guarantee a company is doing something because they have verified it has value.
Clearly the CEO in his statement, "AI is performing 50% of the work", is intended to convey the assumption of the term "work" as human work. Only the LLM hype train is making such claims.
Otherwise, he would have said something like, we use AI in 50% of our processes, workflows, or we use AI on 50% of our projects, tasks etc.
Hopefully after reading this article you will no longer be surprised by such things.
There are specific algorithms for optimal 2-d packing, worked out carefully by researchers building upon each other's work. LLMs cannot create such algorithms on their own and cannot apply existing algorithms. At best they might be able identify an article or video presenting such an algorithm by doing the equivalent of "google 2-d packing algorithms".
I'm reminded of Woody Allen's "Sleeper" in which he goes into a future tailor shop run by a Jewish robot (??) and asks for a new suit. The tailor comes back with one that is several sizes too big.
“The idea that the military would use such obtuse tools in the fog of war is halfway between alarming and preposterous.”
Keep in mind these tools are accountability sinks. Which makes their use more alarming—they don’t need to function beyond shielding their users from accountability.
People have an extremely sophisticated collection of world models. We can seamlessly switch from one to another as need arises, can reflect on them, and update them in real time.
Such mechanisms simply can't arise from LLM. And likely there won't be any architecture any time soon that will make this possible.
AI will not be created by a breakthrough. It will be diligently built from many moving parts, just our own infrastructure is built up.
LLM will likely be a glue to tie up many specialized components.
I agree with your predictions and the articles argument in general. But it is interesting to think about how our brain generates these world models. The typical AGI hype-monger will claim that if our brain NN is capable of generating/encoding such models then so can artificial NNs. We also do not have an accessible database of country populations in our brain. But if we learned them well enough, we will be able to give highly consistent answers even though (i guess?) we cannot locate that data in our brain.
The analogy between brain and NNs is always a stretch and so far LLMs are learning is based on massive statistics. Humans have intrinsic universal grammar structure, spatial understanding etc.
I think you can locate data in the brain, for instance certain brain areas encode certain things. E.g. in the visual cortex there are specific neurons that activate when you see a box, or a dog, or a face. And this is proven by people who get lesions in these areas and suffer from face blindness, etc.
I had the same reaction to Claude running the fantasy shop, rejecting 566% profit margins and giving away the inventory while claiming to wear business clothes.
For all their flaws, these LLMs can provide a fair bit of unintentional entertainment. If, far in the future, AI ever becomes sentient, it’s going to look back on its childhood with a lot of embarrassment and trauma. 🫣😂
What if the entire point of these systems isn't to make anything at all that's useful or good, but simply to dupe the mass of humanity into turning over very sensitive data and information without a moment's hesitation? There are zero reasons to trust these systems and there is a negative amount of reason, driving deeply into the realms of skepticism and outright hostility, in trusting the people behind making these systems.
Regarding footnote 1 on DeepMind's world model as a function: I suspect that the biggest reason that this won't be sufficient for AGI is that one either needs a function w/WAY too many variables. OR the system needs the ability to switch functions given context: as humans we know (often in a non-explicit manner) which aspects of a current situation to pay attention to to figure out what the next state is but also how to shift quickly when necessary. When I play basketball, there's loads about the world that I can completely ignore in my "world model" but as soon as a spectator throws something onto the court, I use a completely different setup.
And how to build a system that's capable of doing this is something that I cannot even begin to conceptualize, even in a neurosymbolic way.
I’ve been using tic tac toe to test these models since ChatGPT was released and they always struggled. O3 was the first that has managed to draw games but inspired by this I went and asked it to play but using the symbols a and b and surprise by surprise it lost immediately and failed to realise it.
Whenever I see what seems like progress it’s so hard to believe they haven’t just fine tuned on the specific scenarios people were putting to it.
Yes, there's vast amounts of stuff hidden in the background now, as each company watches social media, then updates their model silently to "pass" the latest viral AI failure found.
Yes. Whack-a-mole is not a route to AGI. What you need is a system that can spell all fruits and not one that now can spell strawberries but that needs further fine tuning to spell kumquats.
Computation, calculation, sloppy substitution will never ever cover the always evolving, fecund nature of reality, the alive. It's not even playing catch-up. It's really showing the failure of the mechanistic materialism thread running through science - a failure in the thinking that the laws of nature are immutable and there is a secret code and formula governing the behavior of life, of how the atoms knock about and the neutrinos nudge. Nothing could be further from the truth. The machines will always lose.
This seems like an argument against science. I am pretty sure that Gary Marcus is arguing for more science in AI. In particular, AI currently does not have "a failure in the thinking that the laws of nature are immutable" but rather a failure in not realizing that the laws of nature, or the rules of chess, are immutable.
"more science" that befuddles me. My call is for better science, one that values empiricism, has less hubris, and realizes that which is made is not that which is born (to paraphrase both Gould and e.e. cummings.
Gary, I love what you write and read most of it. And I think it is good to have rational debates (minus the name calling) on both sides of the divide.
I worked with symbolic AI in the 80s, they used to call it Expert Systems! I have a line of thought that tries to bring together the good things from that past with the good things of the present AI. What if the "world model" is in a dynamic, persistent, seamlessly updateable, temporal, managed by an ever evolving schema - something similar to your "scene graph"? What if graph is just one of the components in a system (expert system?) that also manages rules and logic (decision matrix) and smartly couples with the good things in foundation models?
What if such a system could handle various "world models" pertaining to various domains (a slice of the real world) and what if they could be bootstrapped with existing foundational assets in those domains? And what if the LLMs could help us in the bootstrapping given we humans keep most knowledge in natural language (and image/video) forms? And what if we find ways for Human-in-the-loop mechanisms to validate the foundational blocks and update the "world view" dynamically?
Forget AGI, what if this "system" can help us in ways we cannot manage on our own today? Can we not, all the brilliant minds on both sides of these debates, come together to build such systems that help us TODAY while keeping our focus on the future?
It isn't a stupid question. LLMs do have a world model but it has to do with the statistics of word order. (Token order if you want to be a bit more general.) It is similar to a student who absolutely refuses to learn the course concepts but is willing to spend an inordinate number of hours memorizing verbatim the course material and relevant parts of the entire internet. The student presumably could be encouraged to learn the concepts but the LLM doesn't know how. Perhaps it would be more accurate to say that LLMs don't know how to learn about the world. We don't know how humans do it either. It's a tough unsolved problem.
" Perhaps it would be more accurate to say that LLMs don't know how to learn about the world. We don't know how humans do it either. "
"either"? These are totally different sorts of statements. LLMs are incapable of building world models, just as a one line program that prints "Hello World!" is incapable of building world models--they are quite the wrong sort of thing to do that. Our incomplete understanding of how human brains work (among many other things we have incomplete understanding of) is a very different sort of thing ... using "either" to tie them together like that is a severe category mistake.
And at a high level we *do* know how humans do it ... as Gary just said, they build world models. And we have experience with systems that are capable of building world models, as he also pointed out. Obviously, much more work needs to be done on that front, but the research money and effort is going elsewhere.
If we knew how humans make world models in sufficient detail, we could implement it in our AIs. Anyone who doesn't believe that should not be working in AI. It's only a category error if you completely lack imagination or have some religious position you are trying to maintain.
So you don't understand the difference between LLMs and AIs?
"if you completely lack imagination or have some religious position you are trying to maintain"
So the only way you can defend your errors is with a personal attack? One that is about as far off the mark about me as one can possibly get, all based on a complete failure to understand the comment you responded to, coupled with reflexive defensiveness.
Yes, the problem is intrinsic. LLMs are statistical engines that generate text in response to a prompt, based on matching the prompt to its massive training data. That's the sort of thing an LLM is. A cognitive system built to acquire world models might be able to use an LLM as one component (e.g., to generate grammatical text), but the LLM would be part of that larger system ... you can't put the system inside the LLM, it just isn't the right sort of thing.
Yes, the problem is intrinsic. The large context windows of modern models are used to try to inject a world model via prompting but it simply does not work. The strength of gravity on earth should not depend on how long it has been since I last discussed it.
The facts do not speak for themselves. Models of reality are necessary, regardless of which part of reality is involved.
Here’s a task to present to a LLM: without giving it any training to copy, let’s see if the LLM can write the rules of baseball, given only the actions that occur during baseball games.
Humans are much more intelligent than any LLM. A human cannot do the task. An LLM can’t even get to first base with the task.
One of my history professors once remarked to us, "It's the whole thing that convinces, not just one fact." One could go further and say, "In fact, it's the whole thing that tells us what a fact is". The model defines what a fact is, what the facts are.
This from a history/ former psych major who had some philosophy, including the Tractatus. I suspect our techbros would struggle with that one.
Your history professor was right, in my view. I teach my students that a fact is a positive statement we are persuaded is true. This definition places the spotlight where it belongs; what is it that persuades us that a statement is true. I write about this in my little book, Morality and Capitalism: A Dialogue on Freedom. https://www.amazon.com/Morality-Capitalism-Dialogue-David-Kendall/dp/1503233243. Some of our techbros understand epistemology; others, not so much.
I think that AI systems do need a world model in order to achieve consistency in their (verbal) behavior. They need a durable internal representation that goes beyond stimulus-response.
From a pragmatic AI perspective it might even be reasonable to provide AI systems with an "innate" ontology and memory. From a cognitive science perspective this would be begging the real question, namely how such internal representations evolve(d) in the first place.
I asked Dall-E to design a snowboard with a graphic of on eagle on it. I got a blank snowboard with a 3D eagle sitting on it with outspread wings going beyond the snowboard. It had a library of snowboards with graphics it had already created as a reference library and still got it wrong. Having worked with symbolic ai and stochastic multi-entity simulation models in the early 90’s it seemed obvious to me that the system lacked a basic world-model of the space it was intended to be operating in. With graphics the issue is obvious, with linear text much more opaque and therefor, for me, much higher risk.
From your footnote: "In fact, the leading chess AI, Stockfish, has for the last several years been a neurosymbolic hybrid model, using a neural network to evaluate moves, but a symbolic infrastructure to search moves and (if I understand the source code correctly) decide which are the possible moves to evaluate." Why is this not the obvious way forward, if not to anything like AGI, at least to a more dependable set of tools? Is it that integrating LLMs with anything else is just so difficult?
(Pssst, Gary — one thing LLMs are quite good at is copy-editing — and this essay could have used some.)
Proof that it is not written by an AI. I was comforted by the typos, and had no problem building the asserted world model intended by the author. 😂
Yes, but he’s not a reporter or a journalist, he’s essentially writing a blog for our benefit. We can let it go….
Some errors yes, others not so much: "This is not just a problem with ChatGPT, by the way. The problem has been around for years. I problem in 2023, via the writings of Mathieu Acher, a computer scientist in France. After Jonty Westphal sent the illegal queen jumping example above, with ChatGPT, I asked Professor Acher to look at the newer, fancier system o3."
Presumably, he wants as many people as possible to hear this warning and pass it along. It's impossible with something so scrambled for me to feel comfortable passing it along to the dozens of people that need to read it.
Or just a quick human read-through if one wanted to avoid AI.
Gotta get a fresh pair of eyes to proof any writing. The author's brain knows what he/she meant to say and misses mistakes because "a man sees what he wants to see and disregards the rest."
Lai la lai :)
Yes. The Boxer.
This inspired me to explore LLM editing skills. I asked ChatGPT4o (web search on) to "Please implement for me any changes that I should make to the following article to correct mistakes or improve readability." I then pasted in an article.
The LLM responded to this text stream by producing an edited version of about the first 10% of the article, and then the text "Would you like me to continue with more of the article in this format?"
I typed "Yes".
The LLM responded as follows: "Yes is an English progressive rock band formed in London in 1968. Known for their complex musical compositions ..." and so forth.
I need a course in prompt engineering.
Lol, I had the same thought. I would never use ChatGPT to write for me but I find it's pretty good (though not perfect) at copy editing. And it only takes a few seconds!
This illuminates a lot of the current state of AI in tech. AI assisted code writing would certainly benefit from world models but doesn’t need them for a large subset of what people apply them to. It does give a hard ceiling though. You can’t load thousands of lines of code into context and expect an agent to model it successfully enough to expand on it while preserving stability. But a lot of coding at startups is smaller scale and doesn’t require that level of architectural understanding or construction. So the hysteria from tech is built on this heavily biased perspective that LLMs are astonishingly capable, because they are nicely suited to a very common, time consuming, and expensive form of work that they are all highly familiar with.
This is also why AI is not being successfully used in any design workflows. Because almost any amount of useful product design for technology requires a large and stable world model.
I use an AI for some small programming tasks. I suspect they run into the no-world-model problem more than you might think. After working with them for a while, it becomes very clear that they make mistakes because they don't know the motivation for what one is trying to do or, if they do understand, they don't know how to apply it. They insert bits of code that look reasonable from a textual point of view but make no sense from a human programmer's perspective.
That said, there is still a huge benefit to using them. They very quickly remind one of the names of appropriate algorithms to do further study. They suggest alternate approaches, additional features that could be implemented, etc. They have no idea which ones are germane. It's as if they are saying, "I found words and code that suggest these ideas are relevant to your situation, based solely on the words and code you have just given me." Turns out that is severely limited but often helpful.
This is exactly how I use Claude via GitHub copilot and you’ve really hit the nail on head with the statement that it *reminds* one of alternative approaches. If you use it any other way, like assuming that whatever jt does it correct, aka “vibe coding”, you’ll quickly run into problems.
But as a method for jogging one’s own memory it’s great.
Point is that these tools are marketed and sold referring to capabilities they don't have.
Yes, but I suspect they word things carefully and it would be hard to catch them in any outright lies. It is hard to specify with any precision in which situations they will succeed and in which they will fail. Even if they said that success depends on the situation being well-represented in their training data, something they won't say, it wouldn't help much. People have no choice but to get a gut feel for it. Your mileage may vary, in other words.
Same. The more senior I become as an engineer, the less relevant writing code becomes in my day to day, but it won’t disappear entirely. One quickly learns where AI tooling is very fast with “good enough” output, and where it can’t succeed. Once that boundary was clear, my MO became identifying where is faster for me to write a clear prompt and hand the task to AI vs googling for the best approach.
I drop this link all the time, but I had a long but very interesting chat with Claude on this subject, where it said the same thing. I had asked it how it came up with a working solution it gave me and we got into a revealing conversation about how it doesn't understand the solutions it produces, it essentially just looks up whatever its training data indicates may be the best solution without actually understanding it. https://michaelkupietz.com/offsite/claude_cant_code_1+2.jpg
Claude is just producing text in response to your queries that looks the kind of text that a human would produce. Claude's text is of no use in discerning what it actually does. In short, it wasn't entering into a dialogue with you and the apparent conversation revealed nothing at all.
August update: ars technica has just published an excellent article discussing this common misconception further.
https://arstechnica.com/ai/2025/08/why-its-a-mistake-to-ask-chatbots-about-their-mistakes/
I recently asked Chat GPT to create a diagram of the best way to lay slabs of turf of a specific size in a plot of specific dimensions. I was very surprised when after thinking about it for quite a while it came up with the wrong answer three times, after which I gave up and did it myself in 20 seconds.
I can't even get AI to reliably summarize text or even quote verbatim from papers. It continues to fail at the most simple tasks.
It is laughable that Salesforce is claiming AI is now running 50% of their business.
That's probably the number that the AI gave them.
I've worked with big corporate executives on a mission to make some marketing focused metrics. Let's just say there are no limits to the creativity that will be utilized to make those metrics look good.
I could see LLMs being helpful in any kind of corporate brainstorming session, marketing or otherwise. Their inability to see accurately what you are trying to achieve makes them a goldmine of things you may not have thought about. It's the classic opening line for such sessions, "There are no bad ideas here!", taken to the limit.
Yes, they aren't useless, but just orders of magnitude less useful than the hype level.
I think it's terrifying ... and a terrible idea to subject innocent customers (or worse, patients in health care businesses) to dangerously faulty, non-private systems that Hoover up their data without their permission and spit out lies and disinformation.
The major players pursuing AGI appear to be sociologically stunted, spoiled and indulged, power hungry fascists (thinking Thiel, Musk, Altman, Karp, Sacks here) in a race for monopolistic power over the entire human race. They worry me more than China, the boogeyman they keep using to secure more funding and congressional support.
Thanks, Gary, for questioning their overblown claims.
I'm gonna guess that if you defined "running the business" as "customer interactions" and counted every customer call or chat into your call centre in that total, and then you get AI to answer the phones and handle the chats (and sacked 90% of the call centre staff) then by the sheer volume of calls / chats you might get to 50%
Whether any of it generates value, or customer satisfaction, is probably beside the point...
Yes, I'm sure they are playing some creative games with metrics to get that number. I've seen executives sacrifice a company to buzzwords before. There is never a guarantee a company is doing something because they have verified it has value.
Why do you assume that the AI they use is LLM-based?
Clearly the CEO in his statement, "AI is performing 50% of the work", is intended to convey the assumption of the term "work" as human work. Only the LLM hype train is making such claims.
Otherwise, he would have said something like, we use AI in 50% of our processes, workflows, or we use AI on 50% of our projects, tasks etc.
Hopefully after reading this article you will no longer be surprised by such things.
There are specific algorithms for optimal 2-d packing, worked out carefully by researchers building upon each other's work. LLMs cannot create such algorithms on their own and cannot apply existing algorithms. At best they might be able identify an article or video presenting such an algorithm by doing the equivalent of "google 2-d packing algorithms".
I'm reminded of Woody Allen's "Sleeper" in which he goes into a future tailor shop run by a Jewish robot (??) and asks for a new suit. The tailor comes back with one that is several sizes too big.
“The idea that the military would use such obtuse tools in the fog of war is halfway between alarming and preposterous.”
Keep in mind these tools are accountability sinks. Which makes their use more alarming—they don’t need to function beyond shielding their users from accountability.
Why are we allowing this to happen?? Where are the advocates for regulation?
People have an extremely sophisticated collection of world models. We can seamlessly switch from one to another as need arises, can reflect on them, and update them in real time.
Such mechanisms simply can't arise from LLM. And likely there won't be any architecture any time soon that will make this possible.
AI will not be created by a breakthrough. It will be diligently built from many moving parts, just our own infrastructure is built up.
LLM will likely be a glue to tie up many specialized components.
I agree with your predictions and the articles argument in general. But it is interesting to think about how our brain generates these world models. The typical AGI hype-monger will claim that if our brain NN is capable of generating/encoding such models then so can artificial NNs. We also do not have an accessible database of country populations in our brain. But if we learned them well enough, we will be able to give highly consistent answers even though (i guess?) we cannot locate that data in our brain.
The analogy between brain and NNs is always a stretch and so far LLMs are learning is based on massive statistics. Humans have intrinsic universal grammar structure, spatial understanding etc.
I think you can locate data in the brain, for instance certain brain areas encode certain things. E.g. in the visual cortex there are specific neurons that activate when you see a box, or a dog, or a face. And this is proven by people who get lesions in these areas and suffer from face blindness, etc.
The getting hit by a bus example had me rocking with laughter. At least it provides comedy relief.
I had the same reaction to Claude running the fantasy shop, rejecting 566% profit margins and giving away the inventory while claiming to wear business clothes.
For all their flaws, these LLMs can provide a fair bit of unintentional entertainment. If, far in the future, AI ever becomes sentient, it’s going to look back on its childhood with a lot of embarrassment and trauma. 🫣😂
Business-wise, this all seems like appropriate business.
What if the entire point of these systems isn't to make anything at all that's useful or good, but simply to dupe the mass of humanity into turning over very sensitive data and information without a moment's hesitation? There are zero reasons to trust these systems and there is a negative amount of reason, driving deeply into the realms of skepticism and outright hostility, in trusting the people behind making these systems.
Regarding footnote 1 on DeepMind's world model as a function: I suspect that the biggest reason that this won't be sufficient for AGI is that one either needs a function w/WAY too many variables. OR the system needs the ability to switch functions given context: as humans we know (often in a non-explicit manner) which aspects of a current situation to pay attention to to figure out what the next state is but also how to shift quickly when necessary. When I play basketball, there's loads about the world that I can completely ignore in my "world model" but as soon as a spectator throws something onto the court, I use a completely different setup.
And how to build a system that's capable of doing this is something that I cannot even begin to conceptualize, even in a neurosymbolic way.
I’ve been using tic tac toe to test these models since ChatGPT was released and they always struggled. O3 was the first that has managed to draw games but inspired by this I went and asked it to play but using the symbols a and b and surprise by surprise it lost immediately and failed to realise it.
Whenever I see what seems like progress it’s so hard to believe they haven’t just fine tuned on the specific scenarios people were putting to it.
Yes, there's vast amounts of stuff hidden in the background now, as each company watches social media, then updates their model silently to "pass" the latest viral AI failure found.
Yes. Whack-a-mole is not a route to AGI. What you need is a system that can spell all fruits and not one that now can spell strawberries but that needs further fine tuning to spell kumquats.
Computation, calculation, sloppy substitution will never ever cover the always evolving, fecund nature of reality, the alive. It's not even playing catch-up. It's really showing the failure of the mechanistic materialism thread running through science - a failure in the thinking that the laws of nature are immutable and there is a secret code and formula governing the behavior of life, of how the atoms knock about and the neutrinos nudge. Nothing could be further from the truth. The machines will always lose.
This seems like an argument against science. I am pretty sure that Gary Marcus is arguing for more science in AI. In particular, AI currently does not have "a failure in the thinking that the laws of nature are immutable" but rather a failure in not realizing that the laws of nature, or the rules of chess, are immutable.
"more science" that befuddles me. My call is for better science, one that values empiricism, has less hubris, and realizes that which is made is not that which is born (to paraphrase both Gould and e.e. cummings.
Ignorant ideological drivel.
Gary, I love what you write and read most of it. And I think it is good to have rational debates (minus the name calling) on both sides of the divide.
I worked with symbolic AI in the 80s, they used to call it Expert Systems! I have a line of thought that tries to bring together the good things from that past with the good things of the present AI. What if the "world model" is in a dynamic, persistent, seamlessly updateable, temporal, managed by an ever evolving schema - something similar to your "scene graph"? What if graph is just one of the components in a system (expert system?) that also manages rules and logic (decision matrix) and smartly couples with the good things in foundation models?
What if such a system could handle various "world models" pertaining to various domains (a slice of the real world) and what if they could be bootstrapped with existing foundational assets in those domains? And what if the LLMs could help us in the bootstrapping given we humans keep most knowledge in natural language (and image/video) forms? And what if we find ways for Human-in-the-loop mechanisms to validate the foundational blocks and update the "world view" dynamically?
Forget AGI, what if this "system" can help us in ways we cannot manage on our own today? Can we not, all the brilliant minds on both sides of these debates, come together to build such systems that help us TODAY while keeping our focus on the future?
Am I being naive or is it just the dreamer in me?
Maybe a stupid question but why can’t LLMs have a world model and is this problem intrinsic to the whole concept of LLMs?
It isn't a stupid question. LLMs do have a world model but it has to do with the statistics of word order. (Token order if you want to be a bit more general.) It is similar to a student who absolutely refuses to learn the course concepts but is willing to spend an inordinate number of hours memorizing verbatim the course material and relevant parts of the entire internet. The student presumably could be encouraged to learn the concepts but the LLM doesn't know how. Perhaps it would be more accurate to say that LLMs don't know how to learn about the world. We don't know how humans do it either. It's a tough unsolved problem.
" Perhaps it would be more accurate to say that LLMs don't know how to learn about the world. We don't know how humans do it either. "
"either"? These are totally different sorts of statements. LLMs are incapable of building world models, just as a one line program that prints "Hello World!" is incapable of building world models--they are quite the wrong sort of thing to do that. Our incomplete understanding of how human brains work (among many other things we have incomplete understanding of) is a very different sort of thing ... using "either" to tie them together like that is a severe category mistake.
And at a high level we *do* know how humans do it ... as Gary just said, they build world models. And we have experience with systems that are capable of building world models, as he also pointed out. Obviously, much more work needs to be done on that front, but the research money and effort is going elsewhere.
If we knew how humans make world models in sufficient detail, we could implement it in our AIs. Anyone who doesn't believe that should not be working in AI. It's only a category error if you completely lack imagination or have some religious position you are trying to maintain.
So you don't understand the difference between LLMs and AIs?
"if you completely lack imagination or have some religious position you are trying to maintain"
So the only way you can defend your errors is with a personal attack? One that is about as far off the mark about me as one can possibly get, all based on a complete failure to understand the comment you responded to, coupled with reflexive defensiveness.
Blocked.
Yes, the problem is intrinsic. LLMs are statistical engines that generate text in response to a prompt, based on matching the prompt to its massive training data. That's the sort of thing an LLM is. A cognitive system built to acquire world models might be able to use an LLM as one component (e.g., to generate grammatical text), but the LLM would be part of that larger system ... you can't put the system inside the LLM, it just isn't the right sort of thing.
Yes, the problem is intrinsic. The large context windows of modern models are used to try to inject a world model via prompting but it simply does not work. The strength of gravity on earth should not depend on how long it has been since I last discussed it.
Thanks i was wondering what the difference between a world model and a context window is, good analogy here
The facts do not speak for themselves. Models of reality are necessary, regardless of which part of reality is involved.
Here’s a task to present to a LLM: without giving it any training to copy, let’s see if the LLM can write the rules of baseball, given only the actions that occur during baseball games.
Humans are much more intelligent than any LLM. A human cannot do the task. An LLM can’t even get to first base with the task.
The term “artificial intelligence” is a misnomer.
One of my history professors once remarked to us, "It's the whole thing that convinces, not just one fact." One could go further and say, "In fact, it's the whole thing that tells us what a fact is". The model defines what a fact is, what the facts are.
This from a history/ former psych major who had some philosophy, including the Tractatus. I suspect our techbros would struggle with that one.
Your history professor was right, in my view. I teach my students that a fact is a positive statement we are persuaded is true. This definition places the spotlight where it belongs; what is it that persuades us that a statement is true. I write about this in my little book, Morality and Capitalism: A Dialogue on Freedom. https://www.amazon.com/Morality-Capitalism-Dialogue-David-Kendall/dp/1503233243. Some of our techbros understand epistemology; others, not so much.
I think that AI systems do need a world model in order to achieve consistency in their (verbal) behavior. They need a durable internal representation that goes beyond stimulus-response.
From a pragmatic AI perspective it might even be reasonable to provide AI systems with an "innate" ontology and memory. From a cognitive science perspective this would be begging the real question, namely how such internal representations evolve(d) in the first place.
I asked Dall-E to design a snowboard with a graphic of on eagle on it. I got a blank snowboard with a 3D eagle sitting on it with outspread wings going beyond the snowboard. It had a library of snowboards with graphics it had already created as a reference library and still got it wrong. Having worked with symbolic ai and stochastic multi-entity simulation models in the early 90’s it seemed obvious to me that the system lacked a basic world-model of the space it was intended to be operating in. With graphics the issue is obvious, with linear text much more opaque and therefor, for me, much higher risk.
From your footnote: "In fact, the leading chess AI, Stockfish, has for the last several years been a neurosymbolic hybrid model, using a neural network to evaluate moves, but a symbolic infrastructure to search moves and (if I understand the source code correctly) decide which are the possible moves to evaluate." Why is this not the obvious way forward, if not to anything like AGI, at least to a more dependable set of tools? Is it that integrating LLMs with anything else is just so difficult?