Further Trouble in Hinton City
More on storage and the shallow understanding of Generative AI
A couple of days ago I gave a whole bunch of reasons to doubt Hinton’s confident but problematic claims that GenAI systems don’t store things and do understand them.
More evidence against his claims just keeps piling up.
One piece of evidence comes from a new study that I plan to discuss in a future essay, which examines how sensitive LLMs are to minor perturbations, like rearranging the order of answers in multiple choice, the insertion of special characters, or changes to the answer format. As it turns out, minor changes make a noticeable difference, enough to rearrange “leaderboard” rankings. If a system really had a deep understanding, trivial perturbations wouldn’t have such effects.
Suffice to say for now, it’s not a win for LLMs and understanding.
In the meantime, in the picture is worth a 1,000 words department, here’s a very clever example, from Substack author, which speaks strongly yet concisely against everything Hinton was trying to say:
The example highlights three things:
1. GenAI systems do effectively store things. (In this case, canonical video game controller images.)
2. They are attracted1 𝘵𝘰 those effectively-stored2 representations, particularly (I would presume) frequent ones. They (often) don’t come up with a novel independent design that would meet the requirements. (The “italian plumbers” examples I have discussed, in which GenAI systems tend to gravitate towards Nintendo’s trademarked Mario character, illustrate the same thing.)
3. They fail to “understand” simple concepts, such as “design to used with only one hand.” This should remind you of my older essays Horse Rides Astronaut and Race, statistics, and the persistent cognitive limitations of DALL-E. Noncanonical situations continue to remain difficult for generative AI systems, precisely because frequent statistics are a mediocre substitute for deep understanding.
Gary Marcus loves it when an argument comes together.
Attracted to does not of course mean that no other output is possible. Someone on X was able to get Midjourney to produce an image of a truly one-handed controller with a different prompt. But that prompt (itself machine-generated from a 50 word prompt)) was vastly more directive, and far more complex, involving 245 words, the first quarter or so of which I include here “A cutting-edge video game controller designed for one-handed use, shaped like a teardrop for ergonomic comfort and grip stability. The base of the teardrop fits snugly in the palm, while the narrower top allows easy access to controls with the thumb and fingers…” As Devine Morse, a philosophy PhD student, put it, “I could also get an unusually obedient toddler to write Shakespeare if I gave them detailed enough instructions”. There are other ways to break free from the attractors (“let’s think this through step by step” is rumored to be having some success) but what you get by default does often seem to that which has been highly trained.
By effectively stored I mean stored in some way such that something can be easily reconstructed, such as a JPEG file which is not a bitmap but still obviously on any reasonable interpreting a form of storage that can be readily be reconstructed to something nearly identical to the original.