Mar 12, 2023·edited Mar 12, 2023Liked by Gary Marcus
I was eating popcorn, minding my own business, and watching these four swings as they happen in real time and some more (e.g. Christopher Manning's take), and realized that these takes are all, to varying degrees, "self-certain, hostile, smug," --- and with little substance. And it's sad to see that the supposed elite ranks of academia on the forefront of our technology have abdicated their duties of civil debate on ideas, and resorted to thinly veiled ad hominem attacks (e.g. Aaronson's piece) to one-up each other in a never-ending status game.
I respect Bender's work a lot, so it's disappointing to see her take that offers less substance than an impeachment of NYT's credibility. Same with Sejnowski's and Aaronson's take -- their tones changed remarkably, from civil to hostile, when it comes to everything Noam Chomsky.
There is a reason that Envy is one of the seven cardinal vices.
Granted, I don't agree with Chomsky on a number of things, but he is always willing to engage in civil debates of ideas, and tries to offer substance rather than dog whistles. That NYT OpEd is too writerly, no doubt, but the substance is there and it's up to the debunkers to come up with worthy counterarguments.
Hats off to good ol' Noam -- still slaying it at 94, and hope there will be many more!
Lets be honest about these attacks. The current crop of chat bots are part of a multi-billion industry trying desperately to find a use for their investment. People have bought into the hype, including academics, and now even the slightest criticism makes them hysterical.
A scenario reminiscent of Searle (but not quite): I’m left alone with a mansion full of books (filled with words, no pics) in Ethiopian, which I don’t read/write/speak. The books supposedly contain all manners of knowledge, from metaphysics to archeology to CS. I’m left alone for several years, locked in the mansion, and am told to absorb it all.
Would I be able?
***
Words by themselves mean jack, same with their ordering. Same with two sets of them in two different languages (eqvt to so called multi-modal AI). LLMs cannot know, the way embodied beings can - which is, wordlessly.
ChatGPT had no trouble with a simple dropping an apple question, it addressed every aspect that the opinion piece claimed it couldn't, and then some. In case anyone is wondering, the apple would fall at 9.8 meters per second square. I didn't ask about gravity, or bounce, or height, or surface, or bruises, or splatter, etc. Chat added the science and a few flourishes without further promoting. Programs process information, and this one does the job extremely well. It's not human, but who actually thinks that? Can't we be happy for 5 seconds about a fun new tool that anyone can use for good not evil (because that's a human choice), which also happens to be a tool representing much awaited progress in AI? Or was this only ever going to be okay if it arrived in a fully evolved form? (No one would live to see that.)
The most interesting thing about ChatGPT is that it has mastered language. While it makes factual errors all the time, I haven't seen any grammatical errors. This suggests that it has either memorized all of the patterns of English grammar or that it has discovered the corresponding rules and can apply them. Either way, it provides an alternative answer to what Chomsky defined as one of the central questions in language.
The Chomsky piece was a disjointed bag of arguments coupled with derogatory rhetorical flourishes. I hope I never write an article like that.
no Tom, it really doesn’t. The windows and data are long enough that it is doing a kind of template matching. But real mastery of language is about mapping syntax onto semantics and it never does
Do you have examples of grammatical errors and other linguistic disfluencies committed by ChatGPT? That is the, admittedly restricted, aspect of language that I argue ChatGPT has mastered, but I don't have any hard numbers on this. Do you know of any attempts to systematically assess this aspect of ChatGPT (and other LLMs)?
More generally, can you help me understand whether it is possible to draw a line between purely linguistic performance and the competent use of language to communicate (which surely includes world knowledge, tracking the state of a dialogue including the knowledge and beliefs of the other parties, etc.) The key question in my mind is whether we can define an API between a declarative knowledge base that represents all of this world knowledge (plus communicative goals, dialogue state, etc. ) and the language system responsible for generating and interpreting language. Existing LLMs mash together these two components (knowledge and language), and the result is that it is extremely difficult to modify the system's world knowledge. One can hope that there is a way to separate them so that we could easily update the world knowledge without having to re-learn the linguistic knowledge. Do you think this is possible? My understanding was that Chomsky argued very strongly for a separation between grammatical competence and other aspects of cognition. But you are much more knowledgeable about his work than I am.
You refer to "templates", which have historically been very limited. However, we know the transformer architecture is very powerful computationally and could be implementing very sophisticated generative processes, not just template-fillling. Hence, I don't see any inherent reason why these LLMs could not master natural language grammar. If we cannot find any grammatical failures in these models, then this would be evidence that they have succeeded.
The models only have world knowledge in the sense that their statistical weights occasionally-to-often pick words that happen to be factually correct, depending on the domain and density of information in that domain that the model was trained on.
So I'm not sure I'd say they mash these together as much as accidentally encode knowledge in the process of encoding grammar (if they even encode this really *really* - but functionally I agree, they mostly seem to).
As I'm sure you're aware, the models have a 'temperature' variable that adjusts whether they pick the single most likely next token, or have some leeway to pick alternatives. As this variable is increased, the models are more 'creative' in their output, but of course this directly impacts the likelihood of them outputting the 'wrong' knowledge.
Asking a factual question with a high temperature is just increasing the chance that the tokens that represent the 'correct' answer get substituted for a less likely set.
Given the above, I don't see how we can say that LLMs encode knowledge in any real sense?
This is true for their linguistic knowledge too, of course. It is all stored in the weights. However, my argument is that their purely linguistic skill is near 100% while their world knowledge skill is much worse. Wouldn't it be wonderful if we could represent world knowledge separately and declaratively? That would make it easy to time stamp so that system could correctly answer questions about points in time, and it would make it easy to keep the knowledge up to date without spending millions of dollars retraining. The goal would be to create a system that answers correctly all the time.
I'm sure you know about the Retriever + Reader models, where you first retrieve relevant external info (e.g., using a search engine) and feed the user's question plus the retrieved docs into the Reader.
It's not quite declarative knowledge, but the Retriever + Reader architecture does let you update the knowledge base without retraining the LLM.
Yes, and it is interesting to ask what is missing in the Retriever + Reader model. I think the main shortcoming is that the model doesn't allow ChatGPT to update its beliefs. It can learn a fact during a dialogue, but as soon as that fact falls off the end of the context buffer, it is forgotten. This makes it impossible for a conversational agent to build build and maintain models of the knowledge and beliefs of other agents.
ChatGPT's ability to encode grammar is an impressive achievement. But we do not attribute "mastery of English" to writers who can only produce grammatically correct nonsense. Saying ChatGPT has "mastered English" on this basis is shifting the goalposts of a theory of language to conveniently match whatever the latest benchmark-setting technology is.
Additionally, it seems people do in fact observe grammatical errors.
They speculate about why the errors have suddenly arisen, but without knowing more about how OpenAI mediates access to the model, the contention that it's part of watermarking AI output rather than a flaw in the model is just conjecture.
Thanks for these pointers! They don't provide much in the way of prompt-response pairs. The second example (where the system gives an "apology") could easily be a human error in writing the canned response (ironically). I encourage people who find grammatical errors to contribute them to Gary's error collection.
Hi Tom, Inspired by Chomsky's post I've been trying variants of Chomsky's famous "X is too Y to Z with". Chomsky's theory of syntax predicts that X should fill the "with" role. ChatGPT often gets this wrong (e.g., says that X is the subject of Z), or responds with "can't tell" (as in the example below).
To be fair, the examples are very subtle, and if this is the biggest grammatical error that an NLP system makes, it is very good.
Me: In the sentence "Sam is too sick to talk to", who is talking and who are they talking to?
ChatGPT: In the sentence "Sam is too sick to talk to", it is not clear who is talking and who they are talking to.
The sentence implies that someone wants to talk to Sam, but Sam is unable to engage in conversation due to their illness. However, the identity of the person who wants to talk to Sam is not specified.
It is possible that the sentence simply means that Sam is too sick to talk to anyone at all, but without additional context or information, it is impossible to determine who is talking and who they are talking to in this sentence.
Hi Mark. I think that Chomsky's point is more narrow than "X is too Y to Z with". As David Adger explained in twitter too, the point holds with a specific type of Y, namely with tough predicates (tough/hard/impossible/easy etc). In these cases, the subject of the predicate is the object of the embedded infinitive. ChatGPT obviously gives the wrong answer (as the sentence can't mean that Sam is to sick to talk, only to be talked to); that's expected given the struggles it faces with syntax and meaning, which we discuss here: https://arxiv.org/abs/2302.12313
One high-level comment: how do you tell what ChatGPT knows about English?
I'm afraid this message is a bit of a brain dump; feel free to ignore.
You can just ask it directly if a sentence is grammatical. This is testing "meta-linguistic awareness". Most humans have poor meta-linguistic awareness, and often confuse grammatical with stylistically or normatively acceptable.
You can ask it to correct or improve a sentence. This has the same problem with human subjects, who will suggest vocabulary changes, etc.
You can quiz it about the interpretation of the sentence, e.g., pose questions for it to answer. The problem here is that it's very easy to use external knowledge (e.g., common sense) to help answer the question. Psycholinguists have developed ways of constructing such comprehension questions, and we might be able to build on these.
Then there's also a whole slew of approaches that look at the internal representations of the models. Of course we can't do this with ChatGPT, but I think this is a very good approach when it can be done. These are a bit like the psycholinguists studying reaction times or P600 responses to specific linguistic constructions.
You probably know the work on Syntactic Model Probes, which aims to uncover what information is represented at different levels of a model.
Also there is the work that analyses the predictions/completions generated by LMs. Yoav Goldberg did some interesting work to study just what BERT-size models do or do not understand about agreement and coordination, for example. (I see you are citing this).
Thanks for the pointer to the paper -- I look forward to reading it. You can see a discussion of the argument structure alternations in the "X is Y to Z (prep)" constructions in my message below.
Chomsky's work contains many examples of phenomena where the construction itself is so rare that it can't possibly have been learnt from superficial data (and speakers aren't consciously aware of it, so they can't teach it to children). The phenomena in question follow from Chomsky's principles of grammar, so the argument is that if children "know" these principles (either innately or acquire them somehow), then this would explain the rare phenomena in question.
Glancing at your paper, I see you have studied phenomena from Chomsky's "Standard Theory" (mid-1970s), e.g., "Constraints on Transformations". Have you looked at phenomena such as parasitic gaps, crossing vs nested dependencies, etc? I would be surprised if ChatGPT understands them.
I think it's clear that ChatGPT doesn't understand English in the way that humans do, and no one claims that ChatGPT has acquired English in the way a child does. As Scott A points out, there's a sense in which it's more interesting to have a non-human system we can talk with.
Also, it's not clear if ChatGPT's inability to understand these constructions has much practical impact in terms of application. After all, the point of these constructions is that they are very rare. I remember working on syntactic parsing, adding mechanisms that enforce constraints on e.g., parasitic gaps, etc. This made zero difference to our treebank parsing accuracy scores, which wasn't surprising because there were only 4 examples of parasitic gaps in our training data (so probably zero examples in our test set).
Thanks Mark! Is the following correct? In "who they are talking to", ChatGPT is wrong. In the next sentence, it correctly asserts that someone wants to talk to Sam (thus contradicting itself, as it is prone to do). It then interprets "talk to" as "engage in conversation", which, although it is a reasonable interpretation, does not follow from the sentence. Someone might want to talk to Sam only to inform him of something without requiring a response. What does ChatGPT do if you ask "Sam is too sick to talk with"? In that case, the "engage in conversation" interpretation is correct.
Yes, I think that's right. This is very subtle (which is of course Chomsky's point: no parent ever taught this to their child -- indeed most speakers are not even aware of it -- yet somehow this pattern of interpretations does seem to be reliable across English speakers, once they are made aware of it).
I remember first seeing examples like this in a Linguistics lecture, and being amazed that as an Australian English speaker I had the same intuitions as my American colleagues.
Thanks for your excellent posts -- we need more voices like yours.
At some point I hope you'll give your opinion about Stephen Wolfram's take, that it's true LLMs are "just big statistical models" but that maybe humans are doing something similar whether we realize it or not, and that LLMs may lead the way toward a true understanding of grammar and thinking. (I'm wildly paraphrasing but that's why I'd love to hear your response to this (unfortunately very lengthy) argument: https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work/
It would be really great if such a brilliant mind as Gary Marcus to whose newsletter I have subscribed and whose books I am reading and enjoying would at least at some point in his life stop ranting about how ChatGPT is bad (which I got within the first day of trying it) or how large language models are bad or how Hinton is bad or LeCun is bad, etc and would write interesting things about symbolic AI, and other types of AI, or other scientific things. Even though I agree with most if not all of Gary' points, it is just tiring to read newsletters containing nothing but ranting.
I can’t help what’s is in the new or thre fact that the field is currently obsessed with LLMs. I wrote a much more favorable essay w Davis on Cicero and Diplomacy when it came out
The current mode of "AI" seems to be based on hope and faith: one hopes or supposes that there is something like a Principle of Cognitive Emergence embedded in the structure of reality, and that if you make a transformer model large enough, this Principle will somehow take over and AGI will just self-assemble, without your having to understand what you're doing or for that matter any of what is happening. It's the cognitive-science equivalent of putting "and then a miracle occurs..." in a mathematical proof, though possibly much more dangerous. This is, in a nutshell, what Chomsky is criticizing.
Taking a charitable perspective, we can say all this is an interesting *experiment. After all, maybe there *is a Principle of Cognitive Emergence embedded in the structure of reality, and building bigger LLMs at a superexponential rate is one way of testing for it. This is closely analogous to building bigger and bigger accelerators to test for increasingly obscure types of elementary particles. But so far, there's no sign of anything like sentience or "AGI"--just superficially dazzling but deeply flawed models, cobbled out of completely inscrutable masses of attentional weights.
So the result of the experiment seems to be negative--and soon, again like the accelerator case, we will run out of resources to build the next bigger experiment.
Continuing the particle analogy, developing ChatGPT is like finding the Higgs boson but not supersymmetry or strings; the narrower framework we already had is somewhat validated, but the radical breakthroughs we hoped for are nowhere at hand. Building another "accelerator" to continue the search would take half the planet. Our triumph, paradoxically, marks a setback.
It really becomes more and more clear that the problem is we don't even know what "intelligence" is. As Chomsky points out, how then are we supposed to build the thing? The emergentist miracle has appeared, sort of, but fallen short.
Meanwhile, almost everything we thought about intelligence has been quietly flipped. We now see that the "easy problems are hard", and vice versa. We abandoned rules for statistics; now it appears time for rules to rule again (maybe). After years and years studying cognitive biases and emphasizing all the ways humans are nasty, hopelessly error-prone and even plain stupid, it turns out even this stupidity contains a sort of screwy genius we have no idea how to replicate.
If all this is the case, it is hard to imagine a better moment for a taking-stock of the whole enlightenment project. We say we will rely on reason to order the world, but in truth we cannot figure out even the silliest instinct or common-sense.
In a way, it is difficult to overstate how *humorous the whole situation is. Come! Let us recline, cognac in hand, play some ChatGPT outtakes and bloopers on an infinite loop, and toast to the irrepressible brilliance that, it turns out, peeks out of every meanest human idiocy.
Chomsky's original contention was and still remains that grammaticality is universal. I wrote my own take on ChatGPT as it relates to classical music (Indian), - one which is highly grammatical, but still stringing together text "stochasitcally" to generate something intelligible is hard.
I also believe that models like Bert and GPT don't need to tell us why the world is the way it is. It's not meant to be an Oracle. It's clever engineering and in reality an auto-complete system that's built on human knowledge. That in itself is praise worthy, but also a problem because it's too powerful to be used as a general purpose utlity. Strongly evidenced by folks reporting limits like logic, basic counting, mixing reality with fantasy, randomness etc.
As a computer scientist, I've seen the evolution of generative frameworks (e.g. Ruby on rails) where boiler plate is abstracted away. This allows programmers to not worry about low-level primitives but only worry about expressing their ideas in a way that builds on the boilerplate. GPT will similarly solve a class of problems leading to higher productivity gains but it's in no way AGI like it's marketed.
Just subjectively calling cogent criticisms "hit and miss" does not make them so. The simple fact here is that Chomsky wrote this op-ed not in defense of his theories--which he likely knows to be totally discredited ad indefensible--but to discredit AI in hopes that doing so would magically make the time he wasted in academic musings seem less useless.
Like the political kind, apolitical conservatism (aversion to change) is also clearly a disease of the old age. Mix this conservatism with academia's pedantic obsession with tradition and romanticism (which is employed mainly to sufflate the position of human intellect in the scheme of things) and you begin to understand Noam Chomsky's motivations and his inability to grasp the paradigm shift AI tools represent. But there is more here, the very legacy of Noam Chomsky is under threat--nay, it has been utterly destroyed by AI and specifically ChatGPT--and this article is his pitiful attempt to shore up a theory that was nothing short of a belief system with no scientific basis for the assertions it made.
Chomsky's idea that learning language requires an innate grammar--the universal grammar theory--was deeply conjectural and flawed to begin with A more plausible theory, Gerd Gigerenzer's heuristics theory which does a much better job of defining language and language rules, should have supplanted it a while back. However, AI's ability to learn language in a totally descriptive way with absolutely no reliance on grammar rules has put a final nail in Chomsky's universal grammar theory. This is why Chomsky has gone on the offensive and committed the cardinal sin that not only destroys his legacy but his reputation as well. He has put personal interests above the pursuit of knowledge and truth. His attack on AI is pitiful, short sighted and regrettable given the goodwill he cultivated as a social justice warrior. As Max Planck once noted, “[s]cience progresses one funeral at a time.” Chomsky has clearly outlived his usefulness (if any) and has become a hinderance in the progress of science. His theories, which never really made sense to me anyways, can now finally be laid to rest.
Hopefully along with these arcane theories we will bury English Grammar as well which exists solely so linguists and (non-practical) academics such as Chomsky can continue to pretend that their fields somehow classify as being scientific and relevant in the modern AI driven world.
This is a great article. I agree with all of it except the reference at the end to "real AI". I wish you would admit that what you are defining as real is being human – that the fully acceptable subset of rationality to be considered "real" is the experience of our cultures run on our ape hardware. That is the system for which our justice and other moral systems have been adapted.
Just as many of the critiques of Chomsky are of a mischaracterization of his views, it seems both sides are essentially talking past each other.
Of course ChatGPT will make errors, and different kinds of errors from those humans make. Humans will never quickly compute the cubed root of a large integer to arbitrarily many digits (while even a cheap calculator can) and, although calculation ability in general correlates with "intelligence", we discount the cheap calculator as intelligent even though it avails of some specific capability humans are known to lack. Similarly, we don't critique a human toddler who's just learned to walk for being unable to do a marathon. Large Language Models are in their absolute infancy: anyone expecting them to leap into the kind of fully analogical, embodied "thinking" humans can do simply by access to endless text corpora and even more compute cycles is being unreasonable.
Minsky put the key idea, long ago, that the brain is a "meat machine", and it's unclear what is to be gained by suggesting that the "software" running on that corporeal platform is poorly mimicked AT PRESENT by LLMs; or, worse, that there is some ghost in the machine. Of course Chomsky is not arguing this, but people who work in LLMs are agnostic about its being a model for the mind, no more than Deep Blue is a model for how to play chess.
That said, what a lot of serious people in computational linguistics seem to be reacting to is a kind of biological primitivism or even elitism that smacks of magical thinking. I'm a huge admirer of Chomsky and took a course with him many years ago, but his piece is quite weak by his standards, and I don't see much in it or in Gary's remarks that attempt to pinpoint what about the computational approach is necessarily lacking as another path to "thinking" (whatever that may be). If all Chomsky is saying is that LLMs are intrinsically unable to say much of importance about the mind, he's made that point persuasively for a long while. But that's a bit like saying a jet is bad at rock climbing: obviously true and not really the right question.
I was eating popcorn, minding my own business, and watching these four swings as they happen in real time and some more (e.g. Christopher Manning's take), and realized that these takes are all, to varying degrees, "self-certain, hostile, smug," --- and with little substance. And it's sad to see that the supposed elite ranks of academia on the forefront of our technology have abdicated their duties of civil debate on ideas, and resorted to thinly veiled ad hominem attacks (e.g. Aaronson's piece) to one-up each other in a never-ending status game.
I respect Bender's work a lot, so it's disappointing to see her take that offers less substance than an impeachment of NYT's credibility. Same with Sejnowski's and Aaronson's take -- their tones changed remarkably, from civil to hostile, when it comes to everything Noam Chomsky.
There is a reason that Envy is one of the seven cardinal vices.
Granted, I don't agree with Chomsky on a number of things, but he is always willing to engage in civil debates of ideas, and tries to offer substance rather than dog whistles. That NYT OpEd is too writerly, no doubt, but the substance is there and it's up to the debunkers to come up with worthy counterarguments.
Hats off to good ol' Noam -- still slaying it at 94, and hope there will be many more!
Where was Manning’s reply?
https://twitter.com/chrmanning/status/1633873657939513345?s=20
"self-certain, hostile, smug," -- This is a wonderful characterisation of Chomsky's fire-from-the-hip tone in this particular article. Thank you!
Lets be honest about these attacks. The current crop of chat bots are part of a multi-billion industry trying desperately to find a use for their investment. People have bought into the hype, including academics, and now even the slightest criticism makes them hysterical.
I posted this on Scott A's blog...
***
A scenario reminiscent of Searle (but not quite): I’m left alone with a mansion full of books (filled with words, no pics) in Ethiopian, which I don’t read/write/speak. The books supposedly contain all manners of knowledge, from metaphysics to archeology to CS. I’m left alone for several years, locked in the mansion, and am told to absorb it all.
Would I be able?
***
Words by themselves mean jack, same with their ordering. Same with two sets of them in two different languages (eqvt to so called multi-modal AI). LLMs cannot know, the way embodied beings can - which is, wordlessly.
ChatGPT had no trouble with a simple dropping an apple question, it addressed every aspect that the opinion piece claimed it couldn't, and then some. In case anyone is wondering, the apple would fall at 9.8 meters per second square. I didn't ask about gravity, or bounce, or height, or surface, or bruises, or splatter, etc. Chat added the science and a few flourishes without further promoting. Programs process information, and this one does the job extremely well. It's not human, but who actually thinks that? Can't we be happy for 5 seconds about a fun new tool that anyone can use for good not evil (because that's a human choice), which also happens to be a tool representing much awaited progress in AI? Or was this only ever going to be okay if it arrived in a fully evolved form? (No one would live to see that.)
The most interesting thing about ChatGPT is that it has mastered language. While it makes factual errors all the time, I haven't seen any grammatical errors. This suggests that it has either memorized all of the patterns of English grammar or that it has discovered the corresponding rules and can apply them. Either way, it provides an alternative answer to what Chomsky defined as one of the central questions in language.
The Chomsky piece was a disjointed bag of arguments coupled with derogatory rhetorical flourishes. I hope I never write an article like that.
no Tom, it really doesn’t. The windows and data are long enough that it is doing a kind of template matching. But real mastery of language is about mapping syntax onto semantics and it never does
Do you have examples of grammatical errors and other linguistic disfluencies committed by ChatGPT? That is the, admittedly restricted, aspect of language that I argue ChatGPT has mastered, but I don't have any hard numbers on this. Do you know of any attempts to systematically assess this aspect of ChatGPT (and other LLMs)?
More generally, can you help me understand whether it is possible to draw a line between purely linguistic performance and the competent use of language to communicate (which surely includes world knowledge, tracking the state of a dialogue including the knowledge and beliefs of the other parties, etc.) The key question in my mind is whether we can define an API between a declarative knowledge base that represents all of this world knowledge (plus communicative goals, dialogue state, etc. ) and the language system responsible for generating and interpreting language. Existing LLMs mash together these two components (knowledge and language), and the result is that it is extremely difficult to modify the system's world knowledge. One can hope that there is a way to separate them so that we could easily update the world knowledge without having to re-learn the linguistic knowledge. Do you think this is possible? My understanding was that Chomsky argued very strongly for a separation between grammatical competence and other aspects of cognition. But you are much more knowledgeable about his work than I am.
You refer to "templates", which have historically been very limited. However, we know the transformer architecture is very powerful computationally and could be implementing very sophisticated generative processes, not just template-fillling. Hence, I don't see any inherent reason why these LLMs could not master natural language grammar. If we cannot find any grammatical failures in these models, then this would be evidence that they have succeeded.
The models only have world knowledge in the sense that their statistical weights occasionally-to-often pick words that happen to be factually correct, depending on the domain and density of information in that domain that the model was trained on.
So I'm not sure I'd say they mash these together as much as accidentally encode knowledge in the process of encoding grammar (if they even encode this really *really* - but functionally I agree, they mostly seem to).
As I'm sure you're aware, the models have a 'temperature' variable that adjusts whether they pick the single most likely next token, or have some leeway to pick alternatives. As this variable is increased, the models are more 'creative' in their output, but of course this directly impacts the likelihood of them outputting the 'wrong' knowledge.
Asking a factual question with a high temperature is just increasing the chance that the tokens that represent the 'correct' answer get substituted for a less likely set.
Given the above, I don't see how we can say that LLMs encode knowledge in any real sense?
This is true for their linguistic knowledge too, of course. It is all stored in the weights. However, my argument is that their purely linguistic skill is near 100% while their world knowledge skill is much worse. Wouldn't it be wonderful if we could represent world knowledge separately and declaratively? That would make it easy to time stamp so that system could correctly answer questions about points in time, and it would make it easy to keep the knowledge up to date without spending millions of dollars retraining. The goal would be to create a system that answers correctly all the time.
I'm sure you know about the Retriever + Reader models, where you first retrieve relevant external info (e.g., using a search engine) and feed the user's question plus the retrieved docs into the Reader.
It's not quite declarative knowledge, but the Retriever + Reader architecture does let you update the knowledge base without retraining the LLM.
Yes, and it is interesting to ask what is missing in the Retriever + Reader model. I think the main shortcoming is that the model doesn't allow ChatGPT to update its beliefs. It can learn a fact during a dialogue, but as soon as that fact falls off the end of the context buffer, it is forgotten. This makes it impossible for a conversational agent to build build and maintain models of the knowledge and beliefs of other agents.
ChatGPT's ability to encode grammar is an impressive achievement. But we do not attribute "mastery of English" to writers who can only produce grammatically correct nonsense. Saying ChatGPT has "mastered English" on this basis is shifting the goalposts of a theory of language to conveniently match whatever the latest benchmark-setting technology is.
Additionally, it seems people do in fact observe grammatical errors.
https://www.reddit.com/r/OpenAI/comments/108tkc9/is_chatgpt_suddenly_making_grammar_mistakes_for/
https://www.reddit.com/r/ChatGPT/comments/109hr4l/grammatical_errors_since_last_update/
They speculate about why the errors have suddenly arisen, but without knowing more about how OpenAI mediates access to the model, the contention that it's part of watermarking AI output rather than a flaw in the model is just conjecture.
Thanks for these pointers! They don't provide much in the way of prompt-response pairs. The second example (where the system gives an "apology") could easily be a human error in writing the canned response (ironically). I encourage people who find grammatical errors to contribute them to Gary's error collection.
Hi Tom, Inspired by Chomsky's post I've been trying variants of Chomsky's famous "X is too Y to Z with". Chomsky's theory of syntax predicts that X should fill the "with" role. ChatGPT often gets this wrong (e.g., says that X is the subject of Z), or responds with "can't tell" (as in the example below).
To be fair, the examples are very subtle, and if this is the biggest grammatical error that an NLP system makes, it is very good.
Me: In the sentence "Sam is too sick to talk to", who is talking and who are they talking to?
ChatGPT: In the sentence "Sam is too sick to talk to", it is not clear who is talking and who they are talking to.
The sentence implies that someone wants to talk to Sam, but Sam is unable to engage in conversation due to their illness. However, the identity of the person who wants to talk to Sam is not specified.
It is possible that the sentence simply means that Sam is too sick to talk to anyone at all, but without additional context or information, it is impossible to determine who is talking and who they are talking to in this sentence.
Hi Mark. I think that Chomsky's point is more narrow than "X is too Y to Z with". As David Adger explained in twitter too, the point holds with a specific type of Y, namely with tough predicates (tough/hard/impossible/easy etc). In these cases, the subject of the predicate is the object of the embedded infinitive. ChatGPT obviously gives the wrong answer (as the sentence can't mean that Sam is to sick to talk, only to be talked to); that's expected given the struggles it faces with syntax and meaning, which we discuss here: https://arxiv.org/abs/2302.12313
Thank you for this; it looks very interesting. And it makes total sense that statistical models like ChatGPT miss the infrequent cases.
One high-level comment: how do you tell what ChatGPT knows about English?
I'm afraid this message is a bit of a brain dump; feel free to ignore.
You can just ask it directly if a sentence is grammatical. This is testing "meta-linguistic awareness". Most humans have poor meta-linguistic awareness, and often confuse grammatical with stylistically or normatively acceptable.
You can ask it to correct or improve a sentence. This has the same problem with human subjects, who will suggest vocabulary changes, etc.
You can quiz it about the interpretation of the sentence, e.g., pose questions for it to answer. The problem here is that it's very easy to use external knowledge (e.g., common sense) to help answer the question. Psycholinguists have developed ways of constructing such comprehension questions, and we might be able to build on these.
Wayne Cowart's "Experimental Syntax" might be worth looking at here: https://www.google.com.au/books/edition/Experimental_Syntax/hgw5DQAAQBAJ
Then there's also a whole slew of approaches that look at the internal representations of the models. Of course we can't do this with ChatGPT, but I think this is a very good approach when it can be done. These are a bit like the psycholinguists studying reaction times or P600 responses to specific linguistic constructions.
You probably know the work on Syntactic Model Probes, which aims to uncover what information is represented at different levels of a model.
Also there is the work that analyses the predictions/completions generated by LMs. Yoav Goldberg did some interesting work to study just what BERT-size models do or do not understand about agreement and coordination, for example. (I see you are citing this).
Thanks for the pointer to the paper -- I look forward to reading it. You can see a discussion of the argument structure alternations in the "X is Y to Z (prep)" constructions in my message below.
Chomsky's work contains many examples of phenomena where the construction itself is so rare that it can't possibly have been learnt from superficial data (and speakers aren't consciously aware of it, so they can't teach it to children). The phenomena in question follow from Chomsky's principles of grammar, so the argument is that if children "know" these principles (either innately or acquire them somehow), then this would explain the rare phenomena in question.
Glancing at your paper, I see you have studied phenomena from Chomsky's "Standard Theory" (mid-1970s), e.g., "Constraints on Transformations". Have you looked at phenomena such as parasitic gaps, crossing vs nested dependencies, etc? I would be surprised if ChatGPT understands them.
I think it's clear that ChatGPT doesn't understand English in the way that humans do, and no one claims that ChatGPT has acquired English in the way a child does. As Scott A points out, there's a sense in which it's more interesting to have a non-human system we can talk with.
Also, it's not clear if ChatGPT's inability to understand these constructions has much practical impact in terms of application. After all, the point of these constructions is that they are very rare. I remember working on syntactic parsing, adding mechanisms that enforce constraints on e.g., parasitic gaps, etc. This made zero difference to our treebank parsing accuracy scores, which wasn't surprising because there were only 4 examples of parasitic gaps in our training data (so probably zero examples in our test set).
Thanks Mark! Is the following correct? In "who they are talking to", ChatGPT is wrong. In the next sentence, it correctly asserts that someone wants to talk to Sam (thus contradicting itself, as it is prone to do). It then interprets "talk to" as "engage in conversation", which, although it is a reasonable interpretation, does not follow from the sentence. Someone might want to talk to Sam only to inform him of something without requiring a response. What does ChatGPT do if you ask "Sam is too sick to talk with"? In that case, the "engage in conversation" interpretation is correct.
Yes, I think that's right. This is very subtle (which is of course Chomsky's point: no parent ever taught this to their child -- indeed most speakers are not even aware of it -- yet somehow this pattern of interpretations does seem to be reliable across English speakers, once they are made aware of it).
I remember first seeing examples like this in a Linguistics lecture, and being amazed that as an Australian English speaker I had the same intuitions as my American colleagues.
Gary, how about providing a link to the mailing list.
Thanks for your excellent posts -- we need more voices like yours.
At some point I hope you'll give your opinion about Stephen Wolfram's take, that it's true LLMs are "just big statistical models" but that maybe humans are doing something similar whether we realize it or not, and that LLMs may lead the way toward a true understanding of grammar and thinking. (I'm wildly paraphrasing but that's why I'd love to hear your response to this (unfortunately very lengthy) argument: https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work/
It would be really great if such a brilliant mind as Gary Marcus to whose newsletter I have subscribed and whose books I am reading and enjoying would at least at some point in his life stop ranting about how ChatGPT is bad (which I got within the first day of trying it) or how large language models are bad or how Hinton is bad or LeCun is bad, etc and would write interesting things about symbolic AI, and other types of AI, or other scientific things. Even though I agree with most if not all of Gary' points, it is just tiring to read newsletters containing nothing but ranting.
I can’t help what’s is in the new or thre fact that the field is currently obsessed with LLMs. I wrote a much more favorable essay w Davis on Cicero and Diplomacy when it came out
The current mode of "AI" seems to be based on hope and faith: one hopes or supposes that there is something like a Principle of Cognitive Emergence embedded in the structure of reality, and that if you make a transformer model large enough, this Principle will somehow take over and AGI will just self-assemble, without your having to understand what you're doing or for that matter any of what is happening. It's the cognitive-science equivalent of putting "and then a miracle occurs..." in a mathematical proof, though possibly much more dangerous. This is, in a nutshell, what Chomsky is criticizing.
Taking a charitable perspective, we can say all this is an interesting *experiment. After all, maybe there *is a Principle of Cognitive Emergence embedded in the structure of reality, and building bigger LLMs at a superexponential rate is one way of testing for it. This is closely analogous to building bigger and bigger accelerators to test for increasingly obscure types of elementary particles. But so far, there's no sign of anything like sentience or "AGI"--just superficially dazzling but deeply flawed models, cobbled out of completely inscrutable masses of attentional weights.
So the result of the experiment seems to be negative--and soon, again like the accelerator case, we will run out of resources to build the next bigger experiment.
Continuing the particle analogy, developing ChatGPT is like finding the Higgs boson but not supersymmetry or strings; the narrower framework we already had is somewhat validated, but the radical breakthroughs we hoped for are nowhere at hand. Building another "accelerator" to continue the search would take half the planet. Our triumph, paradoxically, marks a setback.
It really becomes more and more clear that the problem is we don't even know what "intelligence" is. As Chomsky points out, how then are we supposed to build the thing? The emergentist miracle has appeared, sort of, but fallen short.
Meanwhile, almost everything we thought about intelligence has been quietly flipped. We now see that the "easy problems are hard", and vice versa. We abandoned rules for statistics; now it appears time for rules to rule again (maybe). After years and years studying cognitive biases and emphasizing all the ways humans are nasty, hopelessly error-prone and even plain stupid, it turns out even this stupidity contains a sort of screwy genius we have no idea how to replicate.
If all this is the case, it is hard to imagine a better moment for a taking-stock of the whole enlightenment project. We say we will rely on reason to order the world, but in truth we cannot figure out even the silliest instinct or common-sense.
In a way, it is difficult to overstate how *humorous the whole situation is. Come! Let us recline, cognac in hand, play some ChatGPT outtakes and bloopers on an infinite loop, and toast to the irrepressible brilliance that, it turns out, peeks out of every meanest human idiocy.
Chomsky's original contention was and still remains that grammaticality is universal. I wrote my own take on ChatGPT as it relates to classical music (Indian), - one which is highly grammatical, but still stringing together text "stochasitcally" to generate something intelligible is hard.
https://www.classicalweekly.org/p/experiments-with-ai-and-indian-classical
I also believe that models like Bert and GPT don't need to tell us why the world is the way it is. It's not meant to be an Oracle. It's clever engineering and in reality an auto-complete system that's built on human knowledge. That in itself is praise worthy, but also a problem because it's too powerful to be used as a general purpose utlity. Strongly evidenced by folks reporting limits like logic, basic counting, mixing reality with fantasy, randomness etc.
As a computer scientist, I've seen the evolution of generative frameworks (e.g. Ruby on rails) where boiler plate is abstracted away. This allows programmers to not worry about low-level primitives but only worry about expressing their ideas in a way that builds on the boilerplate. GPT will similarly solve a class of problems leading to higher productivity gains but it's in no way AGI like it's marketed.
Just subjectively calling cogent criticisms "hit and miss" does not make them so. The simple fact here is that Chomsky wrote this op-ed not in defense of his theories--which he likely knows to be totally discredited ad indefensible--but to discredit AI in hopes that doing so would magically make the time he wasted in academic musings seem less useless.
Like the political kind, apolitical conservatism (aversion to change) is also clearly a disease of the old age. Mix this conservatism with academia's pedantic obsession with tradition and romanticism (which is employed mainly to sufflate the position of human intellect in the scheme of things) and you begin to understand Noam Chomsky's motivations and his inability to grasp the paradigm shift AI tools represent. But there is more here, the very legacy of Noam Chomsky is under threat--nay, it has been utterly destroyed by AI and specifically ChatGPT--and this article is his pitiful attempt to shore up a theory that was nothing short of a belief system with no scientific basis for the assertions it made.
Chomsky's idea that learning language requires an innate grammar--the universal grammar theory--was deeply conjectural and flawed to begin with A more plausible theory, Gerd Gigerenzer's heuristics theory which does a much better job of defining language and language rules, should have supplanted it a while back. However, AI's ability to learn language in a totally descriptive way with absolutely no reliance on grammar rules has put a final nail in Chomsky's universal grammar theory. This is why Chomsky has gone on the offensive and committed the cardinal sin that not only destroys his legacy but his reputation as well. He has put personal interests above the pursuit of knowledge and truth. His attack on AI is pitiful, short sighted and regrettable given the goodwill he cultivated as a social justice warrior. As Max Planck once noted, “[s]cience progresses one funeral at a time.” Chomsky has clearly outlived his usefulness (if any) and has become a hinderance in the progress of science. His theories, which never really made sense to me anyways, can now finally be laid to rest.
Hopefully along with these arcane theories we will bury English Grammar as well which exists solely so linguists and (non-practical) academics such as Chomsky can continue to pretend that their fields somehow classify as being scientific and relevant in the modern AI driven world.
This is a great article. I agree with all of it except the reference at the end to "real AI". I wish you would admit that what you are defining as real is being human – that the fully acceptable subset of rationality to be considered "real" is the experience of our cultures run on our ape hardware. That is the system for which our justice and other moral systems have been adapted.
Correct me if I'm wrong, but the style and content of the op-ed suggests that it was written primarily by Jeffrey Watumull?
Here's a more academically written objection that online people have been pointing to in the last couple of days:
"Modern language models refute Chomsky’s approach to language", Steven Piantadosi. March 2023
https://lingbuzz.net/lingbuzz/007180
As a layperson it's certainly interesting to see the intellectual disagreements unfold.
I long to leap over all these details in the direction of a bottom line.
What are the compelling benefits of AI which justify the creation of what could be yet another substantial risk to the stability of the modern world?
Unless we can arrive at a good answer to that question, what's the point of all the rest of it?
Just as many of the critiques of Chomsky are of a mischaracterization of his views, it seems both sides are essentially talking past each other.
Of course ChatGPT will make errors, and different kinds of errors from those humans make. Humans will never quickly compute the cubed root of a large integer to arbitrarily many digits (while even a cheap calculator can) and, although calculation ability in general correlates with "intelligence", we discount the cheap calculator as intelligent even though it avails of some specific capability humans are known to lack. Similarly, we don't critique a human toddler who's just learned to walk for being unable to do a marathon. Large Language Models are in their absolute infancy: anyone expecting them to leap into the kind of fully analogical, embodied "thinking" humans can do simply by access to endless text corpora and even more compute cycles is being unreasonable.
Minsky put the key idea, long ago, that the brain is a "meat machine", and it's unclear what is to be gained by suggesting that the "software" running on that corporeal platform is poorly mimicked AT PRESENT by LLMs; or, worse, that there is some ghost in the machine. Of course Chomsky is not arguing this, but people who work in LLMs are agnostic about its being a model for the mind, no more than Deep Blue is a model for how to play chess.
That said, what a lot of serious people in computational linguistics seem to be reacting to is a kind of biological primitivism or even elitism that smacks of magical thinking. I'm a huge admirer of Chomsky and took a course with him many years ago, but his piece is quite weak by his standards, and I don't see much in it or in Gary's remarks that attempt to pinpoint what about the computational approach is necessarily lacking as another path to "thinking" (whatever that may be). If all Chomsky is saying is that LLMs are intrinsically unable to say much of importance about the mind, he's made that point persuasively for a long while. But that's a bit like saying a jet is bad at rock climbing: obviously true and not really the right question.