Further Trouble in Hinton City

Gary Marcus

Feb 8, 2024

More on storage and the shallow understanding of Generative AI

Read →

22 Comments

Birgitte Rasine

Feb 8, 2024

"LLMs break down on anything that wasn’t in their training data. Because they’re 100% memorization."

If this doesn't elucidate why the bots can't produce new or original creative work, nothing will. Am growing weary of people arguing that "humans create the same way as LLMs." We might need a new vocabulary to better define artistry, creativity, and mastery.

Expand full comment

Reply (2)

David Roberts

Feb 9, 2024

I've taken to describing things as "hand-crafted" vs. "AI-generated." Generation is not the same thing as crafting.

Expand full comment

Reply (1)

Birgitte Rasine

Feb 9, 2024

Not by a long shot.

Expand full comment

Zoltan Foris

Feb 11, 2024Edited

This kind of assessment is mostly based on your intuitions instead of clear definitions. Intuition is powerful, and - in some cases - does not easily allow to embrace scientific truth. Consider the example of the question "what is life". People's intiuitions have, for many decades, resisted the idea that life is nothing more than a set of interrelated chemical reactions. Instead, the vague notion of a life substance, a "vital force" or "vital spark" has been assumed.

The situation is now very similar. In a few decades people will have no difficulty to accept that human creativity is just based on computational processes based on input data, nothing more. And in this aspect it is a mechanical process without a specially human "creativity force" or "creativity spark". This view of mine is called the computational theory of mind.

Expand full comment

Jurgen Gravestein

Feb 8, 2024

Very interesting study, thanks for sharing. That's going to be my morning read.

PS. I love how Chollet writes and thinks about this stuff, very lucid and in way that everybody can understand. (Very similar to your style). His analogy of the LLM as a "program database" is a great mental model to understand what they're doing. I quote: "Prompt engineering is the process of searching through program space to find the program that empirically seems to perform best on your target task. It's no different than trying different keywords when doing a Google search for a piece of software."

Expand full comment

CFB

Feb 8, 2024

Francois Chollet's response is priceless. Thanks for sharing.

Sometimes I wonder if by saying AGI is already here (e.g., Norvig) or that LLMs understand/are intelligent (e.g. Hinton), some who have gained considerable stature in the field are trying to justify having spent their careers developing complex functions with input-output relationships that map to input-output relationships of specific human behaviors, absent making any inroads toward AGI. And so they're moving the goalposts with respect to what counts as AGI or understanding/intelligence.

Expand full comment

Matt Ball

Feb 8, 2024

>loves it when an argument comes together.

Deep cut!

Expand full comment

Devaraj Sandberg

Feb 8, 2024

Great research and insights

Expand full comment

Pedro Gonnet

Feb 8, 2024

I think the most convincing proof that LLMs actually do "store" training data was the recent paper (https://arxiv.org/abs/2311.17035) in which asking the model to repeat a single word forever exposed parts of the training data verbatim.

Funny how folks arguing against memorization kind of missed that?

Expand full comment

Reply (1)

Gary Marcus

Feb 8, 2024

It’s just absurd

Expand full comment

Saty Chary

Feb 8, 2024Edited

Hi Gary! True 'understanding' wouldn't need words, images, sounds, equations, rules or data, it would just need direct experience with the world - which requires a body. LLMs (all AI in fact) in contrast is entirely dependent on them (words, etc) - ie, no symbols, no dice. That disparity is what's evident, over and over. It is nothing but "understanding" the world entirely in terms of human-originated symbols, with zero grounding of any of it. At best it's 'understanding' patterns in input, without regards to underlying meaning.

'One hand' makes sense to entities that themselves have hands, not to disembodied calculations.

More generally, "making sense" has 'sense' in it for a reason. Common sense is sensed, not grokked via a knowledge-base, not inferred solely from abstract reasoning.

Expand full comment

Reply (1)

Birgitte Rasine

Feb 8, 2024

Right, "making sense" for us thinking humans has "sense" in it. But that's a typo as far as the LLMs are concerned... the proper spelling is the way Big Tech thinks of it: "making cents."

😎

Expand full comment

Reply (1)

Saty Chary

Feb 8, 2024

Birgitte, bingo :)

Expand full comment

Zoltan Foris

Feb 9, 2024Edited

Chollet says: "LLMs = 100% memorization. There is no other mechanism at work."

Challenge 1: is there no processing of these memories? If you admit there is some processing, then the statement above is just bombastic, and it should rather read "LLMs rely much more on memorization than humans. Only humans have complex processing." And then discussion would shift to the details of processing.

Challenge 2: Don't humans also rely mostly on memorization? How is the role of memorization in LLMs different from the one in humans? To explore this, we can dig into the role of memorization of humans. This is my thesis: no human ever is capable to think any original thought. Any thought a human thinks is based on memorization or on randomness. Your thoughts originate in the memories you got through your eyes, noses, skin. And your thoughts originate from the thoughts you got from others verbally or in written form. Just like for LLMs. And then both humans and LLMs do process these memories. The question is, what are the differences in this processing. There seems to be no profound difference in the role of memorization.

Expand full comment

Gerben Wierda

Feb 8, 2024

The video game controller was again very funny. Laughing out loud. You're on a roll.

A nice example about small changes (order) having an effect on 'hallucination' is here: https://ea.rna.nl/2023/11/01/the-hidden-meaning-of-the-errors-of-chatgpt-and-friends/

I would say that the understanding thing is a done deal. But apparently, people are easily convinced by how fluent the language is, because that efficient but superficial way is how we humans in a sense have learned to detect quality. Turing was *so* wrong with his Turing test.

Expand full comment

Raj Iyer

Feb 8, 2024

These examples are pretty convincing in making the case that LLM output is what you've called pastiche. Shouldn't this be provable, though? If you take the training dataset of DallE or GPT4, and mask out specific images or text categories, would the resultant model be able to handle prompts related to the masked out training data?

Of course, such tests would require companies to actually expose their input data set, which they conveniently do not do. But I wonder if there are open source models with known training sets which can be used for such studies.

Expand full comment

praxis22

Feb 8, 2024

It's a well known character trait, at least according to Bio's I've read of Geoff Hinton. He likes to annoy people, clearly it's working.

Expand full comment

Johan Brandstedt

Feb 8, 2024

Love the controller example. Two comments:

1) Your assumption of frequency is half way there. It’s frequency X weighting. Altman’s dismissal of the value of quality training data is directly contradicted by GPT3 weighing quality datasets 10x or more. Same with Stable Diffusion and the Laion-aesthetic dataset, and even more so Midjourney and the artist hit list, as well as those screencaps. Which speaks directly to the fair use substantiality factor.

2) the tear drop video game controller is such a good example of what happens where, which seems to baffle so many:

• the UNDERSTANDING that a tear drop is a shape fit for a human hand came from the operator.

• the IDEA to combine those elements came from — and could only have come from — the operator.

• the ability to EXPRESS that idea as an apt prompt came from the operator.

• the ability to VISUALLY COMBINE those ideas in an image came from the AI system.

• the ability to VISUALLY EXPRESS those fundamental ideas came from the training data.

• the

Expand full comment

EuphmanKB

Feb 8, 2024

Nice article. Thank you.

Your analysis sounds very much like the early chaos researchers experienced (like cloud formation leading to learning that faucet drips are chaotic). One conclusion was that predicting future weather events inherently relies on the model having perfect and total knowledge about all of the prior and current events affecting the weather everywhere, which, of course, is impossible in an absolute sense. Today’s weather models use vast amounts of information gleaned worldwide from a multitude of sources, and they’re only good for a relatively short period of time. (There are many other examples showing the same things.)

In LLM AI models, their limitations appear to me to be not enough “knowledge”. It would seem that the learning and training data sets need to be not only much larger and encompassing a broader range of subject matter, but also include a chaotic “emulator” that allows randomness to impact its direction and choices.

One aspect of the early chaos research was that the then computer chips truncated data at 22 or 23 significant digits to the right of the decimal point because they couldn’t physically handle any more. The prediction results were dramatically and randomly impacted by that unintentional truncation. In a perfect world in AI models there would be no truncation and, in theory, the model could use an “infinite” number of significant digits. But, perhaps, none of these things are in play!

These are merely the musings of a guy who is interested in seeing AI work as hoped. Maybe it could solve some of the dysfunction we hear and see every day. Or, make the dreams of Sci Fi travel a reality. Or….???

Expand full comment

Alexander Kurz

Feb 8, 2024

"how sensitive LLMs are to minor perturbations" ... this is not related to LLMs but it is related to how AI can achieve outstanding results in conjunction with evidence of a complete lack of understanding ... in this case the adversarial example are from the game of go: https://goattack.far.ai/game-analysis#contents ... I find it likely that you have seen this, but if not it is quite interesting ... a simple adversarial strategy that is weaker than a beginning human amateur reliably beats the strongest Go AI ... the game at the link above shows how the trick works and makes clear that the AI does not understand (the way humans do) the game of Go.

Expand full comment

Reply (1)

Gary Marcus

Feb 8, 2024

indeed, i wrote an earlier substack (with Go in the title) about it.

Expand full comment

Reply (1)

Alexander Kurz

Feb 8, 2024

Thanks, I am going to check it out.

Expand full comment

Marcus on AI

Further Trouble in Hinton City