32 Comments
Jul 20, 2022Liked by Gary Marcus

Fascinating insights!

Association seems to play a large part, seems to be so in pets as well - "ball" is simply a sound - but if used repeatedly with a round thing you pick and throw, both kids and dogs 'get' what it "means". With children, use of similarities and opposites (eg in preschool) - big dog, big tree, big house... seems useful in imparting 'bigness'. Word order, proper grammar, or even the proper word etc seem little to do with picking up language early on. Embodiment plays a vital role, to physically ground everything - up, left, heavy, hot, loud, fast, bright, tasty...

Expand full comment
Jul 20, 2022Liked by Gary Marcus

Found the idea of him being younger than the paper pretty funny, but Chapman's linkedin lists him as being at MIT in 1978. This doesn't rule out him being an extreme prodigy, admittedly

Expand full comment
Jul 22, 2022Liked by Gary Marcus

Incredibly put Professor. As someone who has trodden the path from engineering to compsci to the humanities, your posts have been the voice of reason in a sea of "true AI" pushing scientists who apparently haven't done a proper reading the works in the cognitive sciences and related fields.

Expand full comment
Jul 22, 2022·edited Jul 22, 2022Liked by Gary Marcus

I wonder why it takes the communities discussing this so long to (re)discover Ludwig Wittgenstein's analysis of language (the later Wittgenstein, most researchers from the AI have long been hung up on his work of youth which is all about logic in relation to meaning in language, largely coming up empty).

According to Wittgenstein, what makes a meaning of a phrase of language is how a community of speaker 'defines' correct use. Peter Hacker has explained his position best. So a phrase like "Fat chance!" can today actually mean 'a very slim chance' almost exclusively, because it is almost exclusively used in situations where speaker and listener 'agree' on that interpretation. (Personal observation: because linguistic groups are so large, changes are generally slowed and stability ensues. Smaller groups may be more conservative — something that seems to have been the case for Dutch for a few centuries — because they actively try to protect an identity.) That agreement is not necessarily something formal (though education tries to add that to the mix) but it follows from how people react to speech. So, if I say, "move to the right" and someone moves forward, there is a conflict about the meaning of 'right'. This mechanism holds from the smallest of phrases to abstract phrases like 'justice'.

Language, thus, is closely related to behaviour (shared behaviour, or what Wittgenstein called 'form of life'). But it is not purely physical behaviour. Behaviour also includes the shared acceptance of (emerging) grammatical rules. Speaking (and reacting to speech) is also part of behaviour. And following or not following rules is also part of that behaviour.

Someone saying "that is justice" and someone else disagreeing may lead to a conversation (also behaviour) where in the interactions the meaning is built.

Our brains are uniquely capable of performing (a limited amount of) discrete logic. It is to be expected that that capacity is used during language use. It is therefore to be expected that we have some innate capacity for (learning and automatically applying) grammar (rules).

Expand full comment

Behavior is just the visible part of intelligence. The use of language is a component of behavior. Therefore, attempts to model language (as well as behavior) without modeling intelligence cannot be successful.

Expand full comment
author

thanks; good catch. edited that out

Expand full comment

That paragraph that Chapman is quoting was a throw-away idea that the end of the paper. The rest of the paper, about reusable small-phrase size chunks of language either fixed ('polywords') or with natural parameterized substitutions was quite valuable at the time. Made a big splash at the 1975 TINLAP. My take is that we tie the language we hear/read to mental situations, probably at a very fine grain. Comparable situation prompt the grammatical recombination of the 'language bits' associated with it. There's also a little "jitter" that introduces some variation. (Remember what it's like to read your kids the same story time after time -- little bit are different from what's on the page.) None of us (MIT AI Lab) took Chomsky seriously from a processing point of view, but we went to his lectures and read his papers.

Expand full comment

I find it amazing, and somewhat symptomatic that the current crop of investigators promoting large language models is rediscovering Behaviorism. This is at least the third time. See https://bdtechtalks.com/2021/07/07/ai-reward-is-not-enough-herbert-roitblat/ . The first was famously advocated by BF Skinner, the second by Rumelhart and McClelland. Chomsky destroyed Skinner's approach, calling it play-acting at science. Unfortunately, Chomsky's analysis, though correct about Skinner's, was guilty of many of the same scientific faults. As you point out, there is a long history to this question and we do not need to recapitulate it all here. One of my linguist friends reports that her young son (at the time) asked if he was being have. A stochastic parrot might be able to produce such an example. That very example may exist somewhere in open-crawl data set, there is so much more to language than the strings that are produced. Here is more on this topic:

https://bdtechtalks.com/2022/01/27/the-understanding-debate/

But I wonder what the right approach is. It is easy to snipe about how large language models are inadequate, but I fear that the critiques are falling on deaf ears. In the end, Chomsky killed Skinner's pseudo-linguistic program, and spawned a large linguistic enterprise. Much good work came out of it, but I don't find Chomsky's core ideas to be much better. I would paraphrase it as "I don't see how it could be learned, so it must be innate." I don't find that any more satisfying than saying that association is sufficient. How can we come up with a research program that convinces folks that the behaviorism of large language models is bankrupt as a model of linguistics while spawning a research program that, so far as we can tell so far, is not? I have some hints in my book "Algorithms are not enough," but I would not claim that it is anything more than a hint of a possible direction. Sorry for the self-promotion.

Expand full comment

My daughter majored in linguistics. The amount of things that I learned just from talking with her about her classes made me realize how much there is to learn in the space and how much we just take for granted about language and the process of learning it/

I think that language *seems* easy to learn because we have all done it at least once. But pronunciation, grammar, syntax, colloquial phrasing, are all learned over years starting when the brain has maximum learning potential. Humans spend more time learning a language than they do on any single topic that they learn before they start a career. We just don't really realize how much time we spend with it because we learn it incrementally over time.

Expand full comment

When you consider that only 2000 words are required to cover 80% of adult conversation it becomes more obvious that there's got to be a better way than constant and bigger training data. The language learning industry leverages these high frequency words.

Expand full comment

Love the causticity, as always. To be fair, though, the idea of copy/pasting learned meaning-form mappings, with some vague linking rules added, also has a prominent place in modern linguistics: construction grammar.

Expand full comment

As a polyglot who learned multiple languages while growing up, I approve this message :P Language acquisition may be simple, but it's never easy even for human polyglots. It seems that language learning happens in at least 2 stages: (1) memorizing and assigning meaning to vocabulary, and (2) putting those vocabs together using "linking devices" such as syntax, grammar, sentence-level logic flow, and any number of things. At stage (1), memorization plays a big part. But as one moves over to stage (2), "linking devices" between these stand-alone words become increasingly important. Otherwise, no human would be capable of understanding anything that they have not already memorized. And that, is a rather dreadful thought!

Expand full comment
Dec 25, 2022·edited Dec 25, 2022

The ice below me is exactly one molecule thick, so don't waste any time on this if it doesn't make sense, but...

Does anyone have any thoughts on how ChatGPT has become so good at expressing itself grammatically? If it doesn't understand anything, how can it write sentences that are (from what I can tell) grammatically perfect? If humans don't fully understand the rules of language, how could we have programmed a computer to express itself with the rules in an almost perfect way?

Other systems that use statistical analysis, like google translate, are nowhere near this gramatically accurate.

Expand full comment

Convincing

Expand full comment

I like to think that the way we say words and the tones of our voice as they bounce from other words also has something to do with it.... as well as how comedy works.

Expand full comment

Language models like GPT3 are adept at what (sometimes) looks like coherent speech. So much so, that many smart people have convinced themselves that they actually understand what they are saying, or that the concept of language 'understanding' is so nebulous that we can afford to ignore it. As Gary Marcus points out, that is wrong. What is missing from these stochastic language models is any connection with real world understanding. Semantics, in a word.

Having said that, I don't share Gary Marcus's enthusiasm for the Chomskyan approach. Chomsky's approach has been a poor starting point for modelling semantic understanding. As Gary points out, we have made remarkably little progress on that front. Part of the reason for that has been the successful hit job Chomsky did on the entire behaviourist project back in 1959. So I don't thank Gary Marcus for seeming to revive that particular culture war. Skinner's radical behaviorism was woefully inadequate for the task, but some variant of the associationist program is really the only game in town if you want to understand how anything is learned, including language.

Expand full comment

Dear Dr. Marcus, I read this piece with great interest, but I noticed several grammar and punctuation mistakes that made it harder to read. For instance, in the first line you wrote "If only acquiring language was easy some in the AI Twitterverse seemed to think", which I believe was supposed to be "as easy as some in the AI Twitterverse seemed to think". If you have the time to revise such problems the piece will probably be read more widely.

Expand full comment