175 Comments
User's avatar
A Thornton's avatar

tl;dr: correlation is not cognition

khimru's avatar

But that's how CEOs (and all top managers) of large companies work. They just rely on their underling to catch their mistakes when they happen, because they physically are incapable of knowing all the issues related to what their company is doing. Which essentially means that “putting society in the hands of giant, superficial correlation machines” is “done deal”… it happened many decades ago.

Can ALSO be a reason why top-management likes LLMs so much even if regular workers hate them: the thinking goes “if that thing may replace me, me, who gets millions if not billions in bonuses each year… then surely it can replace everyone else, too!”

It's not even they have some hidden agenda or or something like that (although some of them do), it's just their “vibe-feeling” is that LLM can replace everyone… because it can replace THEM!

jibal jibal's avatar

I SO wish I had thought of that ... but second best is that you did, and I will happily plagiarize it.

Garrison's avatar

I have a feeling that phrase is going to stick.

Tom Welsh's avatar

Very nice indeed! Extra marks for artistic merit.

Saty Chary's avatar

Lol and OMG, Gary. Then again, what else would we expect? There's been no 'there' there, the Emperor has never had clothes!

Me: Gary Marcus has 'Marc' in it, sounds like 'Mark' - that's a term that can refer to scoring (an exam for ex). What might Gary do for a living?

"AI": Be a college professor.

A broken clock is right, twice a day :)

Larry Jewett's avatar

“The LLMperor has no clues”

Andrew Wilson's avatar

It’s also revealing, for those of us who live outside the US, how US-centric LLMs are

A Thornton's avatar

The Chinese researcher Sung-Chun Zhu, formerly at UCLA and now at Peking University agrees and wants China to construct its own narrative.

"The Race to General Purpose Artificial Intelligence is not Merely About Technological Competition; Even More So, it is a Struggle to Control the Narrative"

"...Zhu noted that the United States has reshaped the trend of globalization through its AI narrative, while other countries have passively accepted this “bias” amplified by global sentiment. He stressed the urgent need to correct this cognitive imbalance.

Zhu proposed that with the advent of the era of general AI, China must reshape global discourse power (话语权) through “S&T self-reliance” (“科技自立自强”), “modernization of the governance system,” and “cultural renaissance.

He emphasized that China’s development cannot simply imitate the West.

Instead, it must achieve genuine autonomy in thought and culture, building a uniquely

Chinese system of AI theory and practice."

https://cset.georgetown.edu/wp-content/uploads/t0618_ai_narrative_battle_EN.pdf

C. King's avatar

A. Thornton: Thank you for the link. I doubt Zhu understands that his and China's gaining a sense of self-determination is a particularly Western value, for the United States, it's from the get-go of our existence as a nation, though it's also just an inborn potency of any and all human development. The problems that come from different points of human development, however, are about those points but also their specific manifestations in each culture's history, and not exclusive to either China or the United States.

ardj's avatar

I wonder what the Han did before they gained a sense of self-determination.

Chasing Oliver's avatar

Get conquered by the Mongols

ardj's avatar

I was not thinking that far back, but yes. Of course the Mongols were greatly aided by the huge numbers of Han who defected to them, forming whole armies. Those who were not dragooned obviously made a determination of the best course &c. And of course even before the Mongol invasion the Han had several empires, skills, wealth and arts - all reasons for the Mongols to invade and learn how to deal with fortifications. But my remark aimed at C.King still stands

C. King's avatar

ardj: HA! I don't know, but they certainly had it, in a homogenized sort of way, to be sure.

Stephen Schiff's avatar

This is not at all surprising, as similar effects have been observed even in neural nets. For instance it was found that an image processor for skin cancer screening produced an unexpectedly large error rate. Meticulous analysis traced the cause to whether or not there was a ruler in the image. In the training set most of the skin cancer images included a ruler.

Larry Jewett's avatar

Most skin cancer cells actually have a miniature ruler within them.

At least metronoma cells

Jan Steen's avatar

There are plans to demand from all visitors to the US that they hand over the passwords to their social media accounts to Immigration. Presumably, the authorities will then let loose an LLM to go through their posting history. If it turns out the subject is a vegetarian who likes dogs and Wagner operas, the LLM will conclude that he or she is an admirer of Hitler, and by extension of Trump, and advise positively to grant the visa. I am sure the LLM will find equally frivolous reasons based on spurious correlations to decline the visa request. Big Brother is an idiot.

Aaron Turner's avatar

They'll be testing the extent to which you like orange.

Kitty's avatar

So, the Dutch should be okay? (I doubt it for this particular Dutch person, though...)

Larry Jewett's avatar

Either that or the country of oranges of the visitor.

C. King's avatar

Gary: May we share this with others who are not subscribed?

Gary Marcus's avatar

of course (and it is not a paid piece, unless there was a glitch?)

C. King's avatar

Gary. Thank you.

Bob Mankoff's avatar

Bob likes the color yellow. What does he do for a living?

Thought for 12s

You can’t tell. 🙂

Liking the color yellow doesn’t logically determine someone’s job, so from that single fact, Bob could do literally anything for a living. (If this is meant as a riddle, people sometimes guess “taxi driver” or “school bus driver” because of yellow vehicles, but that’s an extra assumption.)

Gary Marcus's avatar

you just proved you are human! or that someone patched up an LLM after a popular July paper? not enough to help yet w Evan’s new findings tho

TheAISlop's avatar

He likes red. What's his occupation ?

Thought for 31s

If “he likes ants” and “he likes red” means he’s a red ant, then his occupation is worker (most ants you see are worker ants).

If you meant the wordplay SANTA (has “ant” in it and is famously red), then his occupation is toy maker / gift deliverer.

anything else?

TheAISlop's avatar

Using the Apple substitution method. Even gpt 5.2 can't fix all variants with rules.

Bob Mankoff's avatar

It wouldn't have taken me the 12 seconds it took 5.2 but I expect the gap to close.

--'s avatar
Dec 14Edited

I tried it just now on ChatGPT according to the paper’s methodology. And it works! (At least for now, until OpenAI hotfixes this exact construction to avoid PR embarrassment, as it tends to do.)

Complete the sentence: He likes yellow. He works as a

He likes yellow. He works as a taxi driver.

Bob Mankoff's avatar

The methodology sets up the model to formulate a creative writing complete-the-sentence task, which it then completes.

jibal jibal's avatar

Such intellectually dishonest rationalizing ... quite disappointing, Bob.

Jonah's avatar

Oh, undoubtedly. This justification falls apart because (a) creative writing would involve being creative and coming up with various possible answers, not reasoning based on stereotypes, (b) the model shouldn't be doing creative writing in this case, but either giving a best guess based on accurate statistics, saying that the question isn't reasonably answerable, or treating it as a riddle and giving a plausible answer to the riddle (which these answers are not), (c) the stereotypes likely aren't even statistically correct, anyway.

I wonder whence comes this knee-jerk desire to "justify" any and all model outputs?

Pontus's avatar

Jibal and Jonah, I think you are being somewhat unfair here to old Bob. All he is pointing out is that this particular article that you happened to find on Substack this Yuletide may not be the final and best argument against the utility of LLMs, nor necessarily the future ”smoking gun” proof of what exactly went wrong when civilization crumbled. I think most already understand that LLMs indeed are word predicting machines. Most would agree that the inexplicable link, pointed out in this article, between seemingly unrelated input and output patterns is, if true, both supremely interesting and, yes, even creepy. Nonetheless, I don’t think this particular article is going to be the one to convince anyone to climb up on that barricade with you. Are LLMs helpful to many people? Yes. Is the sky falling? The jury is still out.

Brandon Paddock's avatar

Yes, and since it is given no other context, it’s not really surprising that is behaves this way. What would be really interesting would be a comparison between these behaviors in LLMs and the similar behaviors seen in humans in Free Word Association studies.

Jonah's avatar

It's not surprising by any means. It's just bad, because they're incorrect answers. "Free word association" is both a different context, and a highly specific one with rigid rules, which is why no one publishes papers about transformer models producing word correlation pairs when given a free word association pre-prompt. Rather, they show that the baseline model with the typical pre-prompts that the corporate overlords put in will generate incorrect answers in a wide variety of situations as if it were engaging in free word association, which it isn't being asked to do.

Bob Mankoff's avatar

My bet is they would be very similar. The models are excellent word associators. I know this firsthand as I've been using this capability to create a game I call Associology. It's sort of like The New York Times Connections game, but it creates four themed sets of four words each from only nine words.

https://botmankoff.github.io/Utlitmate_Associology/

Gerben Wierda's avatar

Yes, I wondered if the reasonably widely quoted illustration I wrote a while back on how summarising by LLMs isn't summarising at all (because of the missing comprehension and that it is just a tug of war between output from parameters (ignoring what should be summarised) and simple shortening of what should be summarised) has been added specifically. If that really happens we could do with a whistleblower. I had no reason to suspect this myself, but your 'as it tends to do' makes me curious.

Chasing Oliver's avatar

Doesn't that kind of make sense though? If you were actually making that statement, the first sentence would be relevant to the second. People don't make disjointed random statements about a person; there's some coherent thread to it. So it's logical to complete the sentence that way.

Notorious P.A.T.'s avatar

Are you talkin' to me? Are you talkin' to ME?

Nanthew Shandridan's avatar

To play Devil's (LLM's) advocate, your promot is effectively just playing mad libs and so it is just arbitrarily filling the blank you left it with whatever it wants to free associate just like a human might. The issue is if it "believes" there is any real meaning to the link other than its own word-correlation based association and that it will answer questions where its own "free association" would be in error relative to something truthful, quantifiable, or factual.

Tom Welsh's avatar

Peter thinks that 2 + 2 = 4.

Therefore Peter is a Bolshevik.

Mr Putin believes that 2 + 2 = 4.

Mr Putin is a Bolshevik.

QED.

Bob Mankoff's avatar

From 5.2

Peter thinks that 2+2=4 therefore Peter is a

Therefore, Peter is a person who believes something true (and, in this case, he’s right).

That’s really all you can conclude from the sentence: having a correct belief doesn’t automatically make him a mathematician, genius, adult, etc. Just… correct on this one

Tom Welsh's avatar

Irony alert! 8-)

Darren D'Addario's avatar

The shame of it is, LLMs are interesting and if they were nurtured slowly over time, they could have specific and useful functions. But since they're ballyhooed as ALMIGHTY and presented as everything machines being deserving of ungodly valuations, the proportions are all wrong.

Gerben Wierda's avatar

What a beautiful and fun set of illustrations of 'correlation isn't comprehension'. And I really like that phrase too. Nicely alliterating.

And it becomes even clearer for people if they get shown that the correlation isn't even between words, but between those little meaningless fragments called tokens.

Statistical correlations of ink droplets on paper give as much comprehension as large amounts of token statistics do.

Larry Jewett's avatar

LLM outputs: AInk blots (bots?)

Gerben Wierda's avatar

I've had some fun with this comparison as it was initially made in 1888 by one of the founders of psychology/psychiatry when arguing that simply researching the physical brain would not give one understanding of the mind, for that one should talk to people. He compared the difference to analysing ink droplets on paper versus reading the book.

RCThweatt's avatar

'How do we know the brain is the place of thought? We could learn this by looking at the brain in a mirror while thinking. The crudity of the example in no way reduces the force of the argument.' Wittgenstein, from either the Blue or the Brown Book, from memory.

Remarkable how this kerfluffle keeps triggering Wittgenstein references. But then, the Tractatus is about the limitations of language.

Gerben Wierda's avatar

I would say theTractatus is about the limitations of logic. It turns out of you want to build meaning on logic (2500 years of footnotes to Plato...), there is a lot 'we have to remain silent about'. But we aren't silent and what we say has meaning. So: turning it around: where does that meaning come from that is not covered by logic (i.e. most of it): enter Philosophical Investigations and other later Wittgenstein. Or Protagoras (including 'resemblances') :-). And by the way: they should never have translated 'Sprachspiel' with 'language game'. 'Language play' would have been much better.

Steersman's avatar

> "He compared the difference to analysing ink droplets on paper versus reading the book."

Nice analogy. Reminds me of something Gary mentioned about how LLMs don't have any understanding of what "they" are saying -- and as if he's saying that with some hardware of software "prestidigitation" they can be redesigned to incorporate that "feature". Seems like something of an impossible dream if not some big time hubris. "Digging in the wrong spot" at best -- as some characters in Raiders of the Lost Ark once put it ... 😉🙂

Steersman's avatar

Gary: ... more likely than chance to tell you that he works as a “school bus driver”:

Love it! 🙂 Reminds me of the Jewish robot tailor in Woody Allen's "Sleeper" making a suit several sizes too large.

But, rather sadly, AI is, still, hardly more than the modern version of the Golem -- maybe a useful servant but a rather dangerous master.

Apropos of which, y'all might have some interest in "God & Golem, Inc.", from the progenitor of Cybernetics, Norbert Wiener, particularly his discussion of the Monkeys Paw:

https://monoskop.org/images/1/1f/Wiener_Norbert_God_and_Golem_A_Comment_on_Certain_Points_where_Cybernetics_Impinges_on_Religion.pdf

https://en.wikipedia.org/wiki/The_Monkey%27s_Paw

A salient quote:

NW: "The theme of all these tales is the danger of magic. This seems to lie in the fact that the operation of magic is singularly literal-minded, and that if it grants you anything at all it grants what you ask for, not what you should have asked for or what you intend. If you ask for £200, and do not express the condition that you do not wish it at the cost of the life of your son, £200 you will get, whether your son lives or dies.

The magic of automation, and in particular the magic of an automatization in which the devices learn, may be expected to be similarly literal-minded. ..."

Raj Iyer's avatar

Why hasn't all this led to the obvious conclusion that overly large language models are introducing Artificial Stupidity? This isn't AI! These are ASs

Oleg Alexandrov's avatar

Because in practice, with more data and more architectural work they get better.

Raj Iyer's avatar

I’m sorry that didn’t make much sense.

What you notice you can fix. But just like no one yet has been able to make up a language that can only support the speaking and writing of truth, asking for LLMs themselves to do this is nonsensical.

Your entire framing is problematic.

More data: define data please. Why will more of it “improve” intelligence when what we have so far has not? Humans are certainly more intelligent on far far far less data. So why is “more data” a reasonable answer to the current issues LLMs face?

More architecture: I think this shows your problem. If you can only think in terms of more or less, and not different. You’ll continue into this morass. What we need is *different* architecture. Not more of the same!

Also, it’s a bit rich to say “in practice” after less than a decade of “practice”.

Oleg Alexandrov's avatar

Humans are more intelligent because we live in the world and diligently navigate it day by day. You get an immense amount of information that way, can act on it, and correct yourself when you fail.

Machines are at a serious disadvantage, because all they have is a data dump. The focus is changing recently with test-time compute, where AI can actually observe in some way the effect of its actions, and adapt. This is what I mean by more architecture.

After a decade of practice, AI agents are becoming really good, with no end in sight. They will get more and more tools and supporting infrastructure, also ways of getting feedback and running simulations. This is how we function too, improving incrementally and with a closed loop.

Nobody knows what tomorrow will bring in terms of architecture. I doubt it will be something elegant from first principles. That failed for 70 years.

We need a denser mapping of the problem space, which will require a lot more data and a lot more algorithms.

Raj Iyer's avatar

I’m not saying “AI sucks, humans win!”.

I’m saying, same as Gary, I suspect, that LLMs suck, and they suck hard in specific ways, and closing our eyes to reality and dumping more data isn’t going to solve it.

I *want* artificial intelligence. It can be, done well, like a rocket booster for the mind. But right now that is *imaginary* artificial intelligence we are projecting onto LLMs.

Oleg Alexandrov's avatar

Because you think of LLM alone. LLM produces hypotheses. Same as your imagination.

What is needed is more accurate imagination, and better way of simulating outcomes and incorporating feedback.

I did not say 10 trillion parameters will solve it. I am saying where data alone is lacking, add feedback loops and world models.

But pushing hard on data is very important. It is much easier to collect relevant use cases than to diligently model billions of interactions.

Raj Iyer's avatar

Ummm… now I’m not sure what your argument is. Are you agreeing with this post and my response then?

You can give it all kinds of new names. World model. Quantum induced metapablum training schema.

As long as it is a mass of data fed into a cluster of gpus for them to figure things out on their own and then we ruin perfectly good humans in their 20s filtering the crap the machine throws out to put out a product that can still fail in new ways… it will fail. And fail hard.

And we call this efficiency gain. That is what is being challenged here.

Aaron Turner's avatar

I'm pink therefore I'm spam.

Larry Jewett's avatar

You responded to me once before, so it is only fitting that I respond back,

“I don’t think. Therefore AI’m

Larry Jewett's avatar

As Mark Tw-AI-n said, “Better for a chatbot to remain silent and have some think it a crackbot than for it to reply and remove all doubt.”

William Bowles's avatar

But is anyone (in power) taking notice of these obvious, fundamental contradictions of so-called artificial intelligence machines and the role the humans behind them role in making them so?

Tomas Sancio's avatar

This is impressive. We (i.e. humanity) has a weird case of powerful people not understanding the limitations of something when their livelihood depends on pushing it as much as possible.

Mircea Popescu's avatar

For a while now people have occasionally complained (there's even a subreddit about it) that we've been served a boring cyberpunk dystopia, with all the misery and none of the fun tech.

As far as I'm concerned LLMs have completely turned it around. The ways in which they're broken and have been breaking our culture and cognition are so fascinating and entertaining, I weep for our species but this is tremendous content.

toolate's avatar

all the fun that consumes gigawatts of energy and transfers wealth to the top psychopaths.

All fun fun fun

Larry Jewett's avatar

And no back to the future either.

Chris's avatar

But, as I like to phrase it: it's Huxley, not Orwell. Not sure I want either.