Three baffling claims about AI and machine learning in four days, statistical errors in top journals, and claims from Yann LeCun that you should not believe.

The trouble with too much benefit of the doubt

Oct 16, 2022

Something is in the air. It never surprises me when The New York Times says that a revolution is coming, and the promised revolution doesn’t materialize. That’s been happening for a long time. Decades, in fact.

Consider, for example, what John Markoff said in 2011 about IBM Watson.

“For I.B.M., the showdown was not merely a well-publicized stunt and a $1 million prize, but proof that the company has taken a big step toward a world in which intelligent machines will understand and respond to humans, and perhaps inevitably, replace some of them.”

That hasn’t come to pass. Eleven years later, comprehension is still lacking (see any of my recent essays here) and very few if any jobs have actually been replaced by AI. Every truck I know is still driven by a person (except in some limited-use test pilots), and no radiologists have yet been replaced. Watson itself was recently sold for parts.

Then again, the Times first said neural networks were on the verge of solving AI in 1958; forecasting AI just doesn’t seem to be the Times’ strong point. Fine.

But this last few days I have seen a whole bunch of similarly overenthused claims from serious researchers that ought to know better.

Example one, the least objectionable of the three, but nonetheless a sign of the overgenerous times, came from Stanford economist Erik Brynjolfsson:

Erik Brynjolfsson @erikbryn

@grsimari @GaryMarcus Or another way is that I have seen many different types of narrow intelligences, some of which are superhuman in their specific domains. Human intelligence is (probably) "broader" than all the others currently, but still only a very narrow slice of the space of intelligences.

Brynjolfsson is totally correct that human intelligence is a very narrow slice of the space of possible intelligences (that’s a point Chomsky has been making about human language since before I was born). Undoubtedly clever intelligences than ours are possible and may yet materialize.

But—hold up—what the heck is the hedging word probably doing here, even in parentheses?

Any normal 5-year-old can hold a conversation about just about anything in a native language that they have acquired language more or less from scratch just a couple years earlier, climb an unfamiliar jungle gym, follow the plot of a new cartoon, or acquire the rules of new card games verbally, without tens of thousands of trials, etc, pretty much endlessly. Human children are constantly learning new things, often from tiny amounts of data. There is literally nothing like that in the AI world.

Hedging with probably makes it sounds like we think there is a viable competitor out there for human general intelligence in the AI world. There isn’t.1 It would be like me saying Serena Williams could probably beat me in tennis.

Yann LeCun meanwhile has been issuing a series of puzzling tweets claiming that convnets, which he invented, (“or whatever”), can solve pretty much everything, which isn’t true and ostensibly contradicts what he himself told ZDNet a couple weeks ago. But wait, it gets worse. LeCun went on to write the following, which really left me scratching my head:

Yann LeCun @ylecun

@erikbryn The problems to solve to make progress in AI are *exactly* the same whether you want to augment human workers or replace them.

Well, no. Augmentation is way easier, because you don’t have to solve the whole job. A calculator augments an accountant; it doesn’t figure out what is deductible or where there might be a loophole in a tax code. We know how to build machines that do math (augmentation); we don’t know how to build machines that can read tax codes (replacement).

Or consider radiology:

Gary Marcus @GaryMarcus

@warren_craddock @ylecun @erikbryn @DrHughHarvey @DrLaurenOR @maxaltl @zakkohane the job of a radiologist includes not just reading images (which convnets are suited to but (in some cases) reasoning about a patient’s history and reading unstructured text, two problems for which convnets are less suited. cc @AMPimaging

Medical AI overwhelmingly and unanimously weighed on my side of the argument:

Anand Prabhakar, MD, MBA @AMPimaging

@GaryMarcus @warren_craddock @ylecun @erikbryn @DrHughHarvey @DrLaurenOR @maxaltl @zakkohane Correct! We read clinical notes, look at lab values, talk to the referring physician, and then look at the imaging in the context of this additional information

Andreas K. Maier @maier_ak

@GaryMarcus @ylecun @AMPimaging @warren_craddock @erikbryn @DrHughHarvey @DrLaurenOR @maxaltl @zakkohane After some thoughts, I think @ylecun statement is not true. The reason we built assistance systems in medicine is because we cannot solve the diagnostic task (yet). We are only good at very simple high-throughput tasks that are really easy for radiologists.

Lauren Oakden-Rayner 🏳️‍⚧️ @DrLaurenOR

My inbox is full of people debating whether #AI can replace radiologists or "only" look at scans... And it can't even look at scans well 😅 We haven't even got replacement for small simple tasks yet folks. Vigilance and oversight are still the name of the game.

Gary Marcus @GaryMarcus

@ylecun @erikbryn no. having a machine look at a radiology scan is very different from replacing the entire job of a radiologist as @DrHughHarvey or @DrLaurenOR or @maxaltl or @zakkohane could explain. and L2 and L5 are different problems with different requirements see eg @warren_craddock etc

Alexandre Cadrin-Chênevert @alexandrecadrin

It's been more than five years since I started applying deep learning in medical imaging. My own public prediction: One day, AI will fully displace radiologists. But, before, we will sustainably colonize Mars.

Gary Marcus @GaryMarcus

I hereby bet publicly that AI will not fully displace radiologists (as opposed to merely augmenting them) before the year 2033. That will be sixteen years after Hinton estimated five. @MatthewJBar cc @kevin2kelly https://t.co/UzvphBPr4i

Just because AI can solve some aspects of radiology doesn’t mean by any stretch of the imagination that they can solve all aspects; Jeopardy isn’t oncology, and scanning an image is not reading clinical notes. There is no evidence whatsoever that what has gotten us, sort of, into the game of reading scans is going to bring us all the way into the promised land of a radiologist in an app any time soon. As Matthew Fenech, Co-founder and Chief Medical Officer @una_health put this morning, “to argue for radiologist replacement in anything less than the medium term is to fundamentally misunderstand their role.”

But these are just off the cuff tweets. Perhaps we can forgive their hasty generosity. I was even more astonished by a massive statistical mistake in deep learning’s favor in an article in one of the Nature journals, on the neuroscience of language.

The article is by (inter alia) some MetaAI researchers:

Charlotte Caucheteux @c_caucheteux

🤖🧠 Our latest paper is now out: nature.com/articles/s4159… “Deep language algorithms predict semantic comprehension from brain activity”, by @c_caucheteux, @agramfort & @JeanRemiKing The summary thread below below👇 1/n

Ostensibly the result is great news for deep learning fans, revealing correlations between deep learning and human brain. The lead author claimed on Twitter in the same thread that there were “direct [emphasis added] links” between the “inner workings” of GPT2 and the human brain:

Charlotte Caucheteux @c_caucheteux

No doubt that modern algorithms have a long way to go before understanding language like we do. Still, the direct links found between their inner workings and those of the human brain provide an exciting platform to understand (and improve!) these two systems. 8/n

But the fine print matters; what we see is merely a correlation, and the correlation that is observed is decent but hardly decisive, R = 0.50.

That’s enough to get you published, but it also means there’s a lot that you don’t know. When two variables are correlated like that, it doesn’t mean A causes B (or vice versa); it doesn’t even mean they are in lockstep. It is akin to the magnitude of the correlation between height and weight; if I know your height and nothing else about you: I can make a slightly educated guess about your weight. I might be close, but I could also be off; there is certainly no guarantee.

The paper itself addresses this, but when it does, it makes a gaping mistake, erring, again on the side of attributing too much to deep learning. Here is what they say: (people who know their stats well might spot the error right away).

Uh onh As Stats 101 teaches us, the amount of variability explained is not R but rather R squared. So if you have a correlation of R = .5, you actually “explain” (really, just “predict”) only 25% of the variance—which means fully three-quarters (not half) of the variability remains unexplained. That’s a huge difference. (In a DM, I pointed out the error to the senior author, King, and he concurred, promising he would contact the journal to make a correction.)

Predicting a mere 25% of the variance means license to speculate, but it certainly doesn’t mean you have nailed the answer. In the end, all we really have is evidence that something that matters to GPT also matters to the brain (for example frequency and complexity), but we are very long way from saying that whatever is weakly correlated is actually functioning in the same way in both. It’s way too much charity to deep learning to claim that there is any kind of direct link.

Now here’s the thing. Stuff happens. Scientists are fallible, and good on the senior author for writing to the journal for correction. But the fact this slipped through the peer review process at a journal at Nature astounds me. What it says to me is that people liked the story, and didn’t read very carefully. (And, hello, reading carefully is the number one job of a peer reviewer. #youhadonejob)

When that happens, when reviewers like the story but don’t read critically, it says that they are voting with their hearts, and not their brains.

As I asked on Twitter, “If Optimus could solve all the motor control problems Tesla aims to solve, and we had to now install a “general intelligence” in it, in order to make it a safe and useful domestic humanoid robot, what would we install?”

The answer is that at present we have no viable option; AlphaStar and LLMs would surely be unsafe and inadequate; and we don’t yet have anything markedly better. We can’t really build humanoid domestic robots without general intelligence, and we don’t yet have any serious candidates. No probably about it.

Tim James

Oct 18, 2022

It's fascinating that the authors conclude that GPT-2 "may be our best model of language representations in the brain" when really what they have is a 25% correlation between one layer of GPT-2 and one aspect of the data. If they mean "our best (simulated) model," then I guess they might have a point, although it's hard to know what being the best AI model of any cognitive process is worth at this point. If they mean "best model (period)," that's quite the claim.

Expand full comment

3 replies by Gary Marcus and others

Gerben Wierda

Another good post to point out problems with the AI reporting out there.

You make it your business to point out errors (generally especially with respect to unsupported claims). But such 'facts' do not convince people. It is the other way around (as psychological research has shown): convictions influence what we accept (or even notice) as 'facts' much more than the other way around. AI hype is just as many other human convictions — especially extreme ones — rather fact- and logic-resistant.

What AI-hype is thus illustrating is — ironically enough — not so much the power of digital AI, but the weakness of humans.

Our convictions stem from reinforcement, indeed a bit like ML. For us it is about what we hear/experience often or hear from a close contact. That is not so different from the 'learning' of ML (unsupervised/supervised). That analogy leads ML/AI-believers to assume that it must be possible to get something that has the same 'power' that we do. Symbolic AI's hype was likewise built on an assumption/conviction, namely that intelligence was deep down based on logic and facts (a conviction that resulted from "2500 years of footnotes to Plato"). At some point, the lack of progress will break that assumption. You're just recognising it earlier than most and that is not a nice situation to be in. Ignorance is ...

2 replies

52 more comments...

Marcus on AI

Discussion about this post