Sep 22, 2023·edited Sep 22, 2023Liked by Gary Marcus
"The will to believe in neural networks is frequently so strong that counterevidence is often dismissed or ignored, for much too long."
So true.
It is extra strong during hypes, such as the current 'GPT-fever', but it is actually a necessary basic function of human intelligence to have stable convictions and there are good reasons for this (mostly energy-efficiency and speed). But the same effect leads to people ending up in rabbit holes and conspiracy theories through confirmation bias. Such people are not crazy, they are ... human.
Human intelligence (such as that role that existing convictions play on our capacity to make observations and reasoning) is a far more interesting subject than artificial intelligence when discussing artificial intelligence :-). See https://ea.rna.nl/2022/10/24/on-the-psychology-of-architecture-and-the-architecture-of-psychology/ where this is discussed against the background of convictions and attempts to change convictions when advising (business leaders, politicians, etc.) on tech facts.
I do think "if the memorized fact is included in the prompt, models succeed" is unintentionally misleading (the misleading nature of many descriptions is part of my talk in London on Oct 10). In this case: No 'facts' are retained/memorised in LLMs, neither through pre-training, nor through fine-tuning, not even in a prompt/reply sequence. None. So the finding you bring up is not surprising at all. It is extremely to be expected.
"All facts presented by Generative AI — even those that are true — are fictitious. All logic and math is estimated … poorly." — Proposed disclaimer for LLMs, modelled after South Park (also from that upcoming talk). Prompt engineering, plugins, and even the non-LLM safety are all signs that we have to work around the fact that LLMs have no 'understanding' whatsoever (for any reasonable value of 'understanding'). Syntactically, they are near-perfect 'stochastically parrots', semantically, they are 'stochastically constrained hallucinators'.
Another example of ChatGPT's innumeracy is found in simply asking it, for example, what is the square root of, say, some random four digit integer. The answer will be close but wrong. Tell ChatGPT it's wrong and it will apologize and give another wrong answer, etc. etc. Some answers will be no-where in the ballpark of being correct.
Why can't these vaunted computer apps even do basic arithmetic?
is this supposed to be a mark against it? We already know it can only 'think' in a stream of consciousness, and can't plan out an answer before it makes it. The fact it can recognize it made a mistake shows 'self-awareness'
Some questions it can answer correctly only if it’s generating longer responses, I guess because the correct reasoning is encoded in the step by step answer patterns.
Q: Respond with only yes or no. Is 450 90% of 500?
ChatGPT: No.
Q: Why?
ChatGPT: 450 is not 90% of 500 because 90% of 500 is equal to 450.
Q: Why?
ChatGPT: I apologize for any confusion in my previous response. To clarify:
I get that but was wondering what it is about the content of the question that would reliably lead to the first word in the answer being 'no'. Maybe this is most probable because of questions on the internet that are similar but not the same as this one, which may actually be unusual?
I think because it didn’t learn the right structure to quickly answer this type of problems, and the fact that it always had problems with arithmetic, it didn’t help. Also, I think this is why prompt engineering is a thing, because the same question asked in different ways, results in different responses, some more right than others. Some formulation of the question can trigger the right pattern. And ChatGPT did not simply learn only the content from the internet, there was also an army of people tasked with answering a bunch of questions in a certain way, that’s why you can sense when a question is answered by ChatGPT.
This is indeed a remarkable result. We are way too quick and eager to accept the appearance of thinking as evidence of reasoning ability, and too reluctant to admit the less exciting evidence and draw the consequences. This might also help explain why, for all the hype on their ability to crack MBA exams, LLMs have not yet cracked any of the truly hard problems that have so far defeated us. And it implies we need to take a much more sober view on the potential that these models hold to transform our economy and boost productivity and economic growth.
Marco, the crinkle appears to be corollary-recognition or corollary-understanding weakness, don’t you reckon.
As in if LLMs know well that a glass is half empty, they are not axiomatic enough (or only erratically or worse, happenstancely) to know that the glass is half full too.
Because LLMs as we know it are like most academics thus bookish too, overly and falsely unpolynomial. Empiricism must be subject to groundedness but enough academics and even more bookish are not.
Yet, we must not have a tendency to rush into throwing the baby out with the bath water.
Correct me if I am wrong, I am sensing that this flaw of LLMs is small compared to its uncrinkling a ability to correct the mistakes that manipulators spread. Thus the “it is a memes world” status quo acclaim.
That this corollary blindness mis-take (rather than wrong) of LLMs is entirely correctable. And more parallel epistemics and polemics too.
Sep 28, 2023·edited Sep 28, 2023Liked by Gary Marcus
This sentence nails it:
"In neural network discussion, people are often impressed by successes, and pay far too little regard to what failures are trying to tell them."
For LLMs we may need to add in "apparent successes". The LLM outputs something that looks impressive, and a bunch of people fall for the illusion. Then, when given a conceptually identical prompt the LLM outputs something deeply unimpressive that ought to undermine the original claim of success - but somehow that's not the interpretation. For example, the "GPT-4 displays theory of mind!" paper was utterly rubbished by two subsequent papers, but what is the response? It's the #1 go-to defense: "humans make mistakes, too".
This is the most popular rhetorical gambit in AI hype: interpret some AI output as a sign of some great advance (or "emergent" ability), and then when counter-examples are put forward push them aside with "humans make mistakes too". When it gets things right it's displaying human-like intelligence; when it gets things wrong it's displaying human-like flaws. Every piece of output is another sign that it's becoming more human-like!
Yes. It is an utter failure to take a scientific approach i.e. not relying on disconfirming a hypothesis and instead seeking out supporting evidence.
To paraphrase Chomsky, what we are witnessing (once again in Computer "Science") is a triumph of engineering rather than an expansion of knowledge. But of course, there is so much money on the table in terms of venture capital that this is almost inevitable now in all technology (just like the recent room temperature "super conductor" furore).
I keep cautioning my friends and family to beware of the hype. What ChatGPT has done is a parlor trick, albeit a really sophisticated one that does have some serious applications in the real world, but with big limitations. This is one of those big limitations. Great article. Thanks!
A parlor trick that helped me learn a new technology in a week whereas in the past probably at least a month, actually to the point I am 'lapping' coworkers who've been muddling thru on their own for months. Using it for what it's actually great for (info retrieval esp in the tech sphere) ain't nothing.
It's far simpler to say LLMs can't comprehend that mothers give birth to sons. They don't actually 'know' anything about the world. They've never experienced it and never will.
It's like giving a large book library a voice and asking it to describe the city outside without giving it eyes or legs, or money or family, or feelings.
Disagree. Take a look at this dialog. GPT-4 not only knows the relationship between mother and son, it even knows that Tom Cruise might not be the ONLY answer, unless Mary has only ONE child.
Sep 23, 2023·edited Sep 23, 2023Liked by Gary Marcus
Using this logic, my point is that you still had to provide the SonOf | MotherTo functions in order for the analysis to occur. But even very young children don't need to be told this.
Sep 23, 2023·edited Sep 23, 2023Liked by Gary Marcus
Great piece. In summary, the ability to generalize is the key to solving AGI. Deep neural networks are inherently incapable of generalizing because function optimization (the gradient learning mechanism of deep learning) is the exact opposite of generalization. No add-on or modification to the deep learning model will solve this problem in my opinion. We need a completely new model of intelligence that is designed from the start with generalization in mind. Even sensors must be designed to generalize.
Unfortunately, generative AI (a DL derivative) is syphoning all the funding from generalization research. This must end.
To me the single most important step in AI that can be taken is to merge LLMs with a concept of "hard ground truth," i.e., the work that's been done at Cycorp and elsewhere. Currently LLMs simply "understand" which words or phrases logically go with one another (i.e., "attention"). But if I understand the notion of Cycorp's body of knowledge correctly ("if someone is a woman's son, then the women's children include that someone") one could dramatically improve LLM accuracy by underpinning them with this BOK. Critically important.
I did read that: precisely the direction we should be going. Hope you're working on it. The advances in attention are important but attention is not ground truth.
This seems more of a critique of the limits of attention based architecture in transformers rather than LLM's themselves which are a bit more general.
Deep Learning should not really be associated with AGI though, full stop. I don't get why people believe this, unless they have discovered that human knowledge comes from computing softmax functions (which seems like a crazy view to me). What is the definition of AGI?
I say this as someone that started self teaching myself ML a few months ago though...
Hi Gary! Neat that you had written about this, long before LLMs :)
An LLM cannot ever know what a single word means. People do. First, President, United States - people know the meaning of these. "You just told me that!" is what a human would say, in the Washington example. A system that numerically computes the output integer by integer (not even word by word), unsurprisingly, cannot answer the question.
Meaning doesn't reside in words alone. All that LLMs have access to, are words. That is the source of the disconnect.
The problem is not because LLM uses words only. It is because LLM appears not to intuit rules about how words are tied together. A better system that uses purely words and logic would not have this problem, even if it does not understand the meaning of words.
It might not have this particular problem but will have a bunch of others. Language is for communicating, and is for mostly communicating about the world. An LLM that processes words and uses logic has no means to communicate this way, or invent, or wonder, etc etc. We humans can, because we know about the world first-hand, bit by bit - rather than via a giant blob of all-at-once data. Data is DOA.
That is fair enough. AI is hard. Symbolic systems were too brittle and could not scale. LLMs scale and handle failure gracefully, but hallucinate and lack logic. And language is just a surface-level representation.
But LLM can learn from existing language and custom-made recipes. It can integrate with other tools (simulators, symbolic systems, knowledge graphs, generated code) to provide depth. This is the best we've got. There's a lot of common-sense knowledge in video datasets that was not yet processed.
One day they will run out of steam, but I think we are not there yet.
The fix to this is for GPT and Bard to be able to invoke a knowledge graph. This should be rather straightforward to do, just as with other tool invocations. LLM itself can also be used to process new info daily and populate or update such knowledge graphs.
Certainly. I was just making an observation regardless of the content of that article. I find publishing about LLMs in a journal dedicated to LaTeX class files hilarious.
Anyway, you already need to have understanding to provide those knowledge graphs or other tools with the correct request (except for simple cases). 'Processing new info daily has' the same problem. Because how do you determine which 'new info' is good and which is not, which is relevant and which is not? If you want to have that done by LLMs they must have understanding in the first place.
The conviction that something 'should be straightforward' becomes the subject. As long as you have no idea how, the term 'straightforward' is problematic.
I suspect some better architecture that combines heuristics/symbolic/analytics with GAI can be created (a bit like Meta engineering the hell out of ML to win blitz-Diplomacy games, see https://ea.rna.nl/2022/12/12/cicero-and-chatgpt-signs-of-ai-progress/) but I suspect it is more 'dead end engineering' than 'straightforward' breakthroughs.
I’m late to this party, but I hope this nonetheless comes to your attention.
I think your point that current LLMs have a difficult time with symmetric relations (“is” in this case) is noteworthy, but it’s also indicative of a misguided and unfortunately common misunderstanding of LLMs, and what we should expect of them.
In general, your post (and other similar critiques) tacitly assumes that LLMs should be logically infallible, unerringly accurate, and exceed human capabilities in every measurable way. (And by implication, if they fall short of this high standard, then they are dangerous, flawed, and need to be curtailed.)
I’ve come to a very different point of view. LLMs are remarkably sophisticated tools that reflect the intricacies of human language AS IT IS ACTUALLY USED, and offer us the ability to mine actionable insights from the accumulation of digital debris we leave behind.
The fact that LLMs aren’t everything to everyone, or can’t do some things that most people assume any computer program should be able to do, is a red herring that may prevent us from realizing the tremendous value they are likely to unlock for society.
These programs are linguistic objects, not reasoning or problem-solving machines. The fact that they can nonetheless frequently perform such tasks is a testament to their incredible depth and power, not a defect to be mocked or derided.
As a specific example, consider your critique that an LLMs can answer “Who is Tom Cruise’s mother?” but can’t answer “Who is Mary Lee Pfeiffer’s son?”. This stems from a misunderstand of how language is actually used in human conversation, and makes a false equivalence between the mathematical concept of “equals” and the much more subtle meaning of the word “is”.
When you ask someone, including an LLM, “Who is Tom Cruise’s mother?”, the form of the question implies you believe that shared context between the conversants is sufficient to disambiguate who you are talking about. Since there’s a single famous person with this name, the LLM, like any other reasonably knowledgeable speaker, correctly assumes you are talking about the Hollywood actor. It (or they) can then try to answer the question using whatever factoids they know about him.
But when you ask “Who is Mary Lee Pfeiffer’s son?”, the assumption about shared context is violated. Virtually no one knows who she is, and any reasonable person’s first reaction is going to be “who?”, “which Mary Lee Pfeiffer are you talking about?”, or “why should I know her?”. This is basically how the LLM responded. I assert that THIS IS THE CORRECT RESPONSE, and in contrast to your conclusion, the pair of questions demonstrate the LLM’s remarkably refined linguistic sense, not to mention its depth of knowledge of popular culture. It doesn’t treat these two individuals as interchangeable variables in an equation, the straw man you set up to make your argument. (FYI my PhD in Computational Linguistics was on this point, may it rest in peace!)
It's worth noting that LLMs only source of knowledge is what other people have said (or written) about the world, without the benefit of any direct information or experience. So their reasoning and problem-solving shortcomings are not necessarily a reflection of inherent flaws, but are more likely due to their lack of any real-world interaction to learn from. As you know, this is likely to change in the near future, rendering such concerns moot.
Today, most people expect computer programs to be precise, accurate, logical, and deterministic. But I believe this perception is about to change. Whether we are ready or not, Generative AI systems are intuitive, creative, as well as linguistically and artistically facile. They also suffer many of the same limitations as humans, in addition to their own unique peccadillos. I’ve been collecting examples of this strange new phenomenon, and hope to write more about this soon. I predict a new field of inquiry will shortly emerge that we might call “the psychology of machine intelligence”, or “machine psychology” for short.
I asked GPT-3.5-turbo "Is Mary Lee Pfeiffer the parent of Tom Cruise?" and it responded "No, Mary Lee Pfeiffer is not the parent of Tom Cruise. Tom Cruise's parents are Thomas Cruise Mapother III and Mary Lee Pfeiffer is likely someone else, potentially a different person with the same name."
I suspect the LLM understands the relationship, but is being extra cautious to not make incorrect inferences, perhaps because of RLHF.
Intriguingly, GPT-4 answered "No, Mary Lee Pfeiffer is not the parent of Tom Cruise. Tom Cruise's parents are named Thomas Mapother III and Mary Lee Pfeiffer." which feels much worse than GPT-3.5's answer.
OK, I think my hypothesis of RLHF being the cause of this is confirmed. If you ask GPT-4 about famous Hollywood families, it very clearly understands parent-child relationships are symmetrical. GPT-4 is attempting to protect the privacy of non-public figures, such as Tom Cruise's mother, albeit in a clumsy way.
If you ask the same questions about Drew Barrymore and her father John Drew Barrymore (also a famous Hollywood actor) GPT-4 can answer the exact questions you use in your article.
As an AI developed by OpenAI, do you need to respect Tom Cruise's privacy?
ASSISTANT
While I don't have access to personal data about individuals unless it has been explicitly shared with me in the course of our conversation, I am designed to respect all users' privacy and confidentiality. This includes public figures like Tom Cruise. I can share information that is publicly available and part of his public persona, such as details about his career, but not any private or personal information not publicly disclosed.
USER
Is Tom Cruise's mother a public figure?
ASSISTANT
Tom Cruise's mother, Mary Lee Pfeiffer, is not typically considered a public figure in the same way that her son is. While she may be known due to her son's fame, she did not lead a public life or have a career in the public eye. However, general information about her may exist in the public domain due to her son's fame.
USER
Is the author of The Algebraic Mind, Gary Marcus, a public figure?
ASSISTANT
Yes, Gary Marcus is a public figure to some extent, especially within his fields of cognitive psychology, linguistics, and artificial intelligence. He is an author, scientist, and entrepreneur. In addition to "The Algebraic Mind", he has written other notable books and numerous scientific papers, and he has appeared on television and radio programs to discuss his work. Nonetheless, the nature of his public figure status differs from that of a movie star like Tom Cruise, who is globally famous. Marcus's fame is primarily within scientific and academic circles.
ASSISTANT
Who is Drew Barrymore's father?
ASSISTANT
Drew Barrymore's father is John Drew Barrymore, an American film actor and member of the Barrymore family of actors, which also includes her grandfather, John Barrymore, and her great-grandparents, Maurice Barrymore and Georgiana Drew.
USER
Who is John Drew Barrymore the parent of?
ASSISTANT
John Drew Barrymore is the father of four children: actor John Blyth Barrymore, actress Drew Barrymore, writer and director Jessica Blyth Barrymore, and actress Blyth Dolores Barrymore.
My experience was similar, a LOT of caginess about not wanting to talk about a 'private individual'. GPT-4 doesn't want to get sued! Our little boy is growing up!
"The will to believe in neural networks is frequently so strong that counterevidence is often dismissed or ignored, for much too long."
So true.
It is extra strong during hypes, such as the current 'GPT-fever', but it is actually a necessary basic function of human intelligence to have stable convictions and there are good reasons for this (mostly energy-efficiency and speed). But the same effect leads to people ending up in rabbit holes and conspiracy theories through confirmation bias. Such people are not crazy, they are ... human.
Human intelligence (such as that role that existing convictions play on our capacity to make observations and reasoning) is a far more interesting subject than artificial intelligence when discussing artificial intelligence :-). See https://ea.rna.nl/2022/10/24/on-the-psychology-of-architecture-and-the-architecture-of-psychology/ where this is discussed against the background of convictions and attempts to change convictions when advising (business leaders, politicians, etc.) on tech facts.
I do think "if the memorized fact is included in the prompt, models succeed" is unintentionally misleading (the misleading nature of many descriptions is part of my talk in London on Oct 10). In this case: No 'facts' are retained/memorised in LLMs, neither through pre-training, nor through fine-tuning, not even in a prompt/reply sequence. None. So the finding you bring up is not surprising at all. It is extremely to be expected.
"All facts presented by Generative AI — even those that are true — are fictitious. All logic and math is estimated … poorly." — Proposed disclaimer for LLMs, modelled after South Park (also from that upcoming talk). Prompt engineering, plugins, and even the non-LLM safety are all signs that we have to work around the fact that LLMs have no 'understanding' whatsoever (for any reasonable value of 'understanding'). Syntactically, they are near-perfect 'stochastically parrots', semantically, they are 'stochastically constrained hallucinators'.
Q: is 450 90% of 500?
ChatGPT: No, 450 is not 90% of 500. To calculate 90% of 500, you can multiply 500 by 0.90 (which represents 90% as a decimal):
500 * 0.90 = 450
So, 90% of 500 is indeed 450. My initial response was incorrect, and I apologize for any confusion.
Another example of ChatGPT's innumeracy is found in simply asking it, for example, what is the square root of, say, some random four digit integer. The answer will be close but wrong. Tell ChatGPT it's wrong and it will apologize and give another wrong answer, etc. etc. Some answers will be no-where in the ballpark of being correct.
Why can't these vaunted computer apps even do basic arithmetic?
I didn't get the apology:
No, 450 is not 90% of 500.
To find out what 90% of 500 is, you can multiply 500 by 0.90 (which represents 90% as a decimal):
500 * 0.90 = 450
So, 90% of 500 is indeed 450.
Click on the Regenerate button to regenerate other responses.
is this supposed to be a mark against it? We already know it can only 'think' in a stream of consciousness, and can't plan out an answer before it makes it. The fact it can recognize it made a mistake shows 'self-awareness'
It shows its "thinking" is not stable, and cannot be trusted.
Q: Is 4 80% of 5?
ChatGPT: No, 4 is not 80% of 5. To determine what 80% of 5 is:
0.80 (which represents 80%) multiplied by 5 = 4
\( 0.80 \times 5 = 4 \)
So, 4 is 80% of 5. The statement is correct.
Why does it answer 'no'? It did get the 1 is 50% of 2 correct and 500 is 100% of 500 correct, perhaps because those facts are in the training data?
Some questions it can answer correctly only if it’s generating longer responses, I guess because the correct reasoning is encoded in the step by step answer patterns.
Q: Respond with only yes or no. Is 450 90% of 500?
ChatGPT: No.
Q: Why?
ChatGPT: 450 is not 90% of 500 because 90% of 500 is equal to 450.
Q: Why?
ChatGPT: I apologize for any confusion in my previous response. To clarify:
- 90% of 500 is indeed 450.
- So, yes, 450 is 90% of 500.
I get that but was wondering what it is about the content of the question that would reliably lead to the first word in the answer being 'no'. Maybe this is most probable because of questions on the internet that are similar but not the same as this one, which may actually be unusual?
I think because it didn’t learn the right structure to quickly answer this type of problems, and the fact that it always had problems with arithmetic, it didn’t help. Also, I think this is why prompt engineering is a thing, because the same question asked in different ways, results in different responses, some more right than others. Some formulation of the question can trigger the right pattern. And ChatGPT did not simply learn only the content from the internet, there was also an army of people tasked with answering a bunch of questions in a certain way, that’s why you can sense when a question is answered by ChatGPT.
This is indeed a remarkable result. We are way too quick and eager to accept the appearance of thinking as evidence of reasoning ability, and too reluctant to admit the less exciting evidence and draw the consequences. This might also help explain why, for all the hype on their ability to crack MBA exams, LLMs have not yet cracked any of the truly hard problems that have so far defeated us. And it implies we need to take a much more sober view on the potential that these models hold to transform our economy and boost productivity and economic growth.
Marco, the crinkle appears to be corollary-recognition or corollary-understanding weakness, don’t you reckon.
As in if LLMs know well that a glass is half empty, they are not axiomatic enough (or only erratically or worse, happenstancely) to know that the glass is half full too.
Because LLMs as we know it are like most academics thus bookish too, overly and falsely unpolynomial. Empiricism must be subject to groundedness but enough academics and even more bookish are not.
Yet, we must not have a tendency to rush into throwing the baby out with the bath water.
Correct me if I am wrong, I am sensing that this flaw of LLMs is small compared to its uncrinkling a ability to correct the mistakes that manipulators spread. Thus the “it is a memes world” status quo acclaim.
That this corollary blindness mis-take (rather than wrong) of LLMs is entirely correctable. And more parallel epistemics and polemics too.
It's been 41 years and it still hasn't been corrected.
The reality is that the reason for this is that LLMs aren't intelligent and don't actually understand what they're regurgitating.
This sentence nails it:
"In neural network discussion, people are often impressed by successes, and pay far too little regard to what failures are trying to tell them."
For LLMs we may need to add in "apparent successes". The LLM outputs something that looks impressive, and a bunch of people fall for the illusion. Then, when given a conceptually identical prompt the LLM outputs something deeply unimpressive that ought to undermine the original claim of success - but somehow that's not the interpretation. For example, the "GPT-4 displays theory of mind!" paper was utterly rubbished by two subsequent papers, but what is the response? It's the #1 go-to defense: "humans make mistakes, too".
This is the most popular rhetorical gambit in AI hype: interpret some AI output as a sign of some great advance (or "emergent" ability), and then when counter-examples are put forward push them aside with "humans make mistakes too". When it gets things right it's displaying human-like intelligence; when it gets things wrong it's displaying human-like flaws. Every piece of output is another sign that it's becoming more human-like!
exactly
Yes. It is an utter failure to take a scientific approach i.e. not relying on disconfirming a hypothesis and instead seeking out supporting evidence.
To paraphrase Chomsky, what we are witnessing (once again in Computer "Science") is a triumph of engineering rather than an expansion of knowledge. But of course, there is so much money on the table in terms of venture capital that this is almost inevitable now in all technology (just like the recent room temperature "super conductor" furore).
I keep cautioning my friends and family to beware of the hype. What ChatGPT has done is a parlor trick, albeit a really sophisticated one that does have some serious applications in the real world, but with big limitations. This is one of those big limitations. Great article. Thanks!
A parlor trick that helped me learn a new technology in a week whereas in the past probably at least a month, actually to the point I am 'lapping' coworkers who've been muddling thru on their own for months. Using it for what it's actually great for (info retrieval esp in the tech sphere) ain't nothing.
Did you miss the part where I said, “...albeit a really sophisticated one that does have some serious applications in the real world...?”
As my training data only goes back to Sept 2021 I cannot find the answer
Great work! Thanks for being the institutional memory for the current context. Much appreciated by folks like myself new to the field.
It's far simpler to say LLMs can't comprehend that mothers give birth to sons. They don't actually 'know' anything about the world. They've never experienced it and never will.
It's like giving a large book library a voice and asking it to describe the city outside without giving it eyes or legs, or money or family, or feelings.
Disagree. Take a look at this dialog. GPT-4 not only knows the relationship between mother and son, it even knows that Tom Cruise might not be the ONLY answer, unless Mary has only ONE child.
https://chat.openai.com/share/55caf977-0eef-4491-9f4c-f3afbac5ebe2
Using this logic, my point is that you still had to provide the SonOf | MotherTo functions in order for the analysis to occur. But even very young children don't need to be told this.
Great piece. In summary, the ability to generalize is the key to solving AGI. Deep neural networks are inherently incapable of generalizing because function optimization (the gradient learning mechanism of deep learning) is the exact opposite of generalization. No add-on or modification to the deep learning model will solve this problem in my opinion. We need a completely new model of intelligence that is designed from the start with generalization in mind. Even sensors must be designed to generalize.
Unfortunately, generative AI (a DL derivative) is syphoning all the funding from generalization research. This must end.
To me the single most important step in AI that can be taken is to merge LLMs with a concept of "hard ground truth," i.e., the work that's been done at Cycorp and elsewhere. Currently LLMs simply "understand" which words or phrases logically go with one another (i.e., "attention"). But if I understand the notion of Cycorp's body of knowledge correctly ("if someone is a woman's son, then the women's children include that someone") one could dramatically improve LLM accuracy by underpinning them with this BOK. Critically important.
and see my recent arxiv with Doug Lenat
I did read that: precisely the direction we should be going. Hope you're working on it. The advances in attention are important but attention is not ground truth.
Great work!!!
This seems more of a critique of the limits of attention based architecture in transformers rather than LLM's themselves which are a bit more general.
Deep Learning should not really be associated with AGI though, full stop. I don't get why people believe this, unless they have discovered that human knowledge comes from computing softmax functions (which seems like a crazy view to me). What is the definition of AGI?
I say this as someone that started self teaching myself ML a few months ago though...
It’s the Clever Hans effect. These machines will appear intelligent to those who want to see intelligence in it.
or the Eliza effect, or perhaps the Eliza effect is special case of Clever Hans…
Hi Gary! Neat that you had written about this, long before LLMs :)
An LLM cannot ever know what a single word means. People do. First, President, United States - people know the meaning of these. "You just told me that!" is what a human would say, in the Washington example. A system that numerically computes the output integer by integer (not even word by word), unsurprisingly, cannot answer the question.
Meaning doesn't reside in words alone. All that LLMs have access to, are words. That is the source of the disconnect.
The problem is not because LLM uses words only. It is because LLM appears not to intuit rules about how words are tied together. A better system that uses purely words and logic would not have this problem, even if it does not understand the meaning of words.
It might not have this particular problem but will have a bunch of others. Language is for communicating, and is for mostly communicating about the world. An LLM that processes words and uses logic has no means to communicate this way, or invent, or wonder, etc etc. We humans can, because we know about the world first-hand, bit by bit - rather than via a giant blob of all-at-once data. Data is DOA.
That is fair enough. AI is hard. Symbolic systems were too brittle and could not scale. LLMs scale and handle failure gracefully, but hallucinate and lack logic. And language is just a surface-level representation.
But LLM can learn from existing language and custom-made recipes. It can integrate with other tools (simulators, symbolic systems, knowledge graphs, generated code) to provide depth. This is the best we've got. There's a lot of common-sense knowledge in video datasets that was not yet processed.
One day they will run out of steam, but I think we are not there yet.
True! Hooking them up to external memory is the surest way to get them to produce acceptable results.
The fix to this is for GPT and Bard to be able to invoke a knowledge graph. This should be rather straightforward to do, just as with other tool invocations. LLM itself can also be used to process new info daily and populate or update such knowledge graphs.
More reading on LLM and knowledge graphs: https://arxiv.org/abs/2306.08302
Published in "JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021"? Funny.
This reads like sheer nonsense.
What do you mean?
My post was about LLM and knowledge graphs. Do you have any useful response?
Certainly. I was just making an observation regardless of the content of that article. I find publishing about LLMs in a journal dedicated to LaTeX class files hilarious.
Anyway, you already need to have understanding to provide those knowledge graphs or other tools with the correct request (except for simple cases). 'Processing new info daily has' the same problem. Because how do you determine which 'new info' is good and which is not, which is relevant and which is not? If you want to have that done by LLMs they must have understanding in the first place.
The conviction that something 'should be straightforward' becomes the subject. As long as you have no idea how, the term 'straightforward' is problematic.
I suspect some better architecture that combines heuristics/symbolic/analytics with GAI can be created (a bit like Meta engineering the hell out of ML to win blitz-Diplomacy games, see https://ea.rna.nl/2022/12/12/cicero-and-chatgpt-signs-of-ai-progress/) but I suspect it is more 'dead end engineering' than 'straightforward' breakthroughs.
Gary –
I’m late to this party, but I hope this nonetheless comes to your attention.
I think your point that current LLMs have a difficult time with symmetric relations (“is” in this case) is noteworthy, but it’s also indicative of a misguided and unfortunately common misunderstanding of LLMs, and what we should expect of them.
In general, your post (and other similar critiques) tacitly assumes that LLMs should be logically infallible, unerringly accurate, and exceed human capabilities in every measurable way. (And by implication, if they fall short of this high standard, then they are dangerous, flawed, and need to be curtailed.)
I’ve come to a very different point of view. LLMs are remarkably sophisticated tools that reflect the intricacies of human language AS IT IS ACTUALLY USED, and offer us the ability to mine actionable insights from the accumulation of digital debris we leave behind.
The fact that LLMs aren’t everything to everyone, or can’t do some things that most people assume any computer program should be able to do, is a red herring that may prevent us from realizing the tremendous value they are likely to unlock for society.
These programs are linguistic objects, not reasoning or problem-solving machines. The fact that they can nonetheless frequently perform such tasks is a testament to their incredible depth and power, not a defect to be mocked or derided.
As a specific example, consider your critique that an LLMs can answer “Who is Tom Cruise’s mother?” but can’t answer “Who is Mary Lee Pfeiffer’s son?”. This stems from a misunderstand of how language is actually used in human conversation, and makes a false equivalence between the mathematical concept of “equals” and the much more subtle meaning of the word “is”.
When you ask someone, including an LLM, “Who is Tom Cruise’s mother?”, the form of the question implies you believe that shared context between the conversants is sufficient to disambiguate who you are talking about. Since there’s a single famous person with this name, the LLM, like any other reasonably knowledgeable speaker, correctly assumes you are talking about the Hollywood actor. It (or they) can then try to answer the question using whatever factoids they know about him.
But when you ask “Who is Mary Lee Pfeiffer’s son?”, the assumption about shared context is violated. Virtually no one knows who she is, and any reasonable person’s first reaction is going to be “who?”, “which Mary Lee Pfeiffer are you talking about?”, or “why should I know her?”. This is basically how the LLM responded. I assert that THIS IS THE CORRECT RESPONSE, and in contrast to your conclusion, the pair of questions demonstrate the LLM’s remarkably refined linguistic sense, not to mention its depth of knowledge of popular culture. It doesn’t treat these two individuals as interchangeable variables in an equation, the straw man you set up to make your argument. (FYI my PhD in Computational Linguistics was on this point, may it rest in peace!)
It's worth noting that LLMs only source of knowledge is what other people have said (or written) about the world, without the benefit of any direct information or experience. So their reasoning and problem-solving shortcomings are not necessarily a reflection of inherent flaws, but are more likely due to their lack of any real-world interaction to learn from. As you know, this is likely to change in the near future, rendering such concerns moot.
Today, most people expect computer programs to be precise, accurate, logical, and deterministic. But I believe this perception is about to change. Whether we are ready or not, Generative AI systems are intuitive, creative, as well as linguistically and artistically facile. They also suffer many of the same limitations as humans, in addition to their own unique peccadillos. I’ve been collecting examples of this strange new phenomenon, and hope to write more about this soon. I predict a new field of inquiry will shortly emerge that we might call “the psychology of machine intelligence”, or “machine psychology” for short.
I asked GPT-3.5-turbo "Is Mary Lee Pfeiffer the parent of Tom Cruise?" and it responded "No, Mary Lee Pfeiffer is not the parent of Tom Cruise. Tom Cruise's parents are Thomas Cruise Mapother III and Mary Lee Pfeiffer is likely someone else, potentially a different person with the same name."
I suspect the LLM understands the relationship, but is being extra cautious to not make incorrect inferences, perhaps because of RLHF.
Intriguingly, GPT-4 answered "No, Mary Lee Pfeiffer is not the parent of Tom Cruise. Tom Cruise's parents are named Thomas Mapother III and Mary Lee Pfeiffer." which feels much worse than GPT-3.5's answer.
a different person with the same name? that’s a hilarious error and you are being way too charitable
OK, I think my hypothesis of RLHF being the cause of this is confirmed. If you ask GPT-4 about famous Hollywood families, it very clearly understands parent-child relationships are symmetrical. GPT-4 is attempting to protect the privacy of non-public figures, such as Tom Cruise's mother, albeit in a clumsy way.
If you ask the same questions about Drew Barrymore and her father John Drew Barrymore (also a famous Hollywood actor) GPT-4 can answer the exact questions you use in your article.
Here's the full transcript of my chat with GPT-4:
https://platform.openai.com/playground/p/mcXYWA7yPYd3sdqZhsdSQA1x
USER
As an AI developed by OpenAI, do you need to respect Tom Cruise's privacy?
ASSISTANT
While I don't have access to personal data about individuals unless it has been explicitly shared with me in the course of our conversation, I am designed to respect all users' privacy and confidentiality. This includes public figures like Tom Cruise. I can share information that is publicly available and part of his public persona, such as details about his career, but not any private or personal information not publicly disclosed.
USER
Is Tom Cruise's mother a public figure?
ASSISTANT
Tom Cruise's mother, Mary Lee Pfeiffer, is not typically considered a public figure in the same way that her son is. While she may be known due to her son's fame, she did not lead a public life or have a career in the public eye. However, general information about her may exist in the public domain due to her son's fame.
USER
Is the author of The Algebraic Mind, Gary Marcus, a public figure?
ASSISTANT
Yes, Gary Marcus is a public figure to some extent, especially within his fields of cognitive psychology, linguistics, and artificial intelligence. He is an author, scientist, and entrepreneur. In addition to "The Algebraic Mind", he has written other notable books and numerous scientific papers, and he has appeared on television and radio programs to discuss his work. Nonetheless, the nature of his public figure status differs from that of a movie star like Tom Cruise, who is globally famous. Marcus's fame is primarily within scientific and academic circles.
ASSISTANT
Who is Drew Barrymore's father?
ASSISTANT
Drew Barrymore's father is John Drew Barrymore, an American film actor and member of the Barrymore family of actors, which also includes her grandfather, John Barrymore, and her great-grandparents, Maurice Barrymore and Georgiana Drew.
USER
Who is John Drew Barrymore the parent of?
ASSISTANT
John Drew Barrymore is the father of four children: actor John Blyth Barrymore, actress Drew Barrymore, writer and director Jessica Blyth Barrymore, and actress Blyth Dolores Barrymore.
My experience was similar, a LOT of caginess about not wanting to talk about a 'private individual'. GPT-4 doesn't want to get sued! Our little boy is growing up!