55 Comments

And yet. I see people increasingly finding that LLMs and other genAI are useful in ways that don't require reasoning. Summarize this article; advise me on how to make its tone more cheerful; give me ideas for a new product line; teach me the basics of Python; combine my plans with the images in these paintings so I can think differently about the building I'm designing. In these situations (all recently encountered by me, ie real uses, not hypotheticals), people are getting a lot out of supercharged pattern-matching. They aren't asking for impeccable reasoning ability, and so they aren't being disappointed.

These are "knowledge-work" settings in which the occasional error is not fatal. So, no quarrel with the larger point that we shouldn't ignore the absence of real reasoning in these systems. But it is also important to recognize that they're being found useful "as is." Which complicates the project of explaining that they shouldn't be given the keys to everything society needs done.

Expand full comment

Well stated points. I am frustrated that there is not a richer dialog about where LLMs are useful and where they are not, and maybe even more importantly, how to evaluate failure modes. Many personal assistant-type use cases, with an expert user, are very low risk. But put a novice user with an LLM generating output that they do not understand.... Look out.

Expand full comment

If you haven’t read him, I recommend Zvi’s newsletter: https://open.substack.com/pub/thezvi, a lot of “here’s were LLMs bring value and here’s where they don’t.”

Expand full comment

"Summarize this article (that I wrote, so I can add a summary)" is very different from "Summarize this article (that I don't feel like reading)" are two tasks with extremely different likelihood of success -- I'd encourage you to disambiguate which one you're referring to when discussing these things. :-)

(The first one is verifiable, the second one is not.)

Expand full comment

Exactly. LLMs are great for prototyping and brainstorming, and not great for operational high precision tasks. People who did not figure out this yet are just lazy.

Expand full comment

Yes, if only this was what was advertised, as opposed to the world changing existential threat that requires trillions of dollars and burning more fossil fuels. I advise everyone that it may be useful, particularly for brainstorming and summarization, so long as you don’t trust it. That may change if it enshitifies the internet quickly enough.

Expand full comment

I find your reply really useful. Thanks David. There’s clearly lots of stuff that is useful. I was chatting with a friend the other day about it - he said, when I search I now just gloss the AI overview for an answer. Quicker and easier than skipping through article after article. Of course, is what you’re reading really true. In that use case, that’s the issue - and if not, does it cause harm? I guess that’s what any regulator will need to consider as AI of this type finds and adopts more and more use cases and becomes unpacked from the core LLMs.

Expand full comment

This is always a huge frustration for me. Even within groups that actually use AI more, and even engineers, I hear them talking about “reasoning”.

But we know and have known how LLMs work—and some of the results are super impressive! But they are fancy auto-completes that simulate having the ability to think, and those of us that use and actually build some of them should know—it’s a bunch of matrix multiplication to learn associations.

I respect the idea of emergent properties and this paper does a good job addressing it, but it’s just incredibly frustrating to hear people being loose with language who should know better. Including OpenAI with their new models.

Thanks for sharing the paper. Not that it’s surprising but great to see some formal work on it.

Expand full comment
11 hrs agoLiked by Gary Marcus

People with financial interests will blow this off and insist that the emperor is fully clothed, while the empire drowns in babble.

Expand full comment
10 hrs agoLiked by Gary Marcus

This was absolutely fantastic! Researchers shouldn't need moral fiber to do good work, but this work took some guts. Upton Sinclair's quote feels relevant here:

"It is difficult to get a man to understand something when his salary depends upon his not understanding it."

Expand full comment
10 hrs agoLiked by Gary Marcus

All completely obvious to anyone who has studied formal logic, natural deduction, set theory, etc.

Expand full comment

Yes, indeed. The problem is that the real world does not follow formal logic. We never got traction doing AI that way. The real world is messy.

Unprincipled pattern-based imitation is already doing much better than anything people every did with rigorous methods. It will only get better with more careful modeling.

Expand full comment
10 hrs ago·edited 10 hrs agoLiked by Gary Marcus

Completely wrong. The rules of formal logic were painstakingly worked out over 2,500 years (from Zeno of Elea in the 5th century BC to Godel's 1929 proof of the Completeness Theorem) such that they would model precisely how the physical universe works logically. Also, first-order logic (for example) may be extended via set theory and e.g. probability theory to be able to reason (with laser-like precision) about uncertainty. This is not to say that the connectionist approach (neural nets etc) doesn't have its place (e.g. when processing low-level percepts). But leave the higher-level reasoning to the big boys!

Expand full comment

Yes, I am well-aware of formal logic. I have a PhD in math.

In practice, formal logic does not do well. Reasoning about uncertainty, so Bayesian methods, also hasn't scaled well.

That because for real-world problems formulated via language it is very hard to find those input uncertainties, which would then be propagated via Bayesian rules.

So what's the point of having laser-like precision of the method, if you don't have good inputs for it.

Expand full comment
9 hrs ago·edited 9 hrs ago

The key to problem-solving (which includes deduction, abduction, and theorem-proving) is the effective use of information. Early implementations of formal reasoning did not incorporate induction, which hampered their ability to discover useful problem-solving information, and hence their effectiveness. In an AGI, initial priors may be calculated from empirical observations of the real world. I'm not saying it's easy, but all the problems of which you speak are solvable.

Expand full comment

The priors would be highly context-dependent.

You'd have to start with a problem in natural language, and somehow convert it to some kind of structured form where a rigorous reasoner could work on it. The magnitude of this task boggles my mind.

An LLM does all this work implicitly. The more data it gets, the better it is. The problem with this is, of course, that the space it operates in is immense, and the number of samples likely needs to be astronomical.

I am honestly surprised LLM can do as well as they do. My best guess is that statistical predictions produced by LLM, then verified with some other methods, can do well in reasonably well-constrained areas.

Expand full comment

HOW SAM THINKS This article describes how a semantic AI model (SAM) can use LLM to add formal logic reasoning: https://aicyc.wordpress.com/2024/10/05/how-sam-thinks/

It need not be one or the other any more than reading, writing and arithmetic compete.

Expand full comment

Thank you for saying that sir

Expand full comment

Why do you think it took so long for so many Ai engineers and scientists to see what was clearly written on the wall more than seven years ago?

➡️ https://friedmanphil.substack.com/p/show-me-the-intelligence

Expand full comment
10 hrs agoLiked by Gary Marcus

I recently published a similar finding:

https://www.preprints.org/manuscript/202401.1681/v2

Expand full comment

Coming from you I hope this resonance across spectrums.

Expand full comment
10 hrs ago·edited 10 hrs agoLiked by Gary Marcus

Typo in the article: "sfaely".

I agree that LLM are not a principled solution to intelligence.

I agree that Elon Musk’s robotaxis will not be good enough, till Musk starts doing a serious job.

Yet, Waymo shows what happens in AI when a company diligently works on things.

Chatbots can be made to be reliable in specialized domains with lots of data and lots of custom modeling, that will also include formal verifiers, if that makes sense, and physics-based simulators.

We will see reliable assistants do more work in the next year or two.

Any time a chatbot is given work it is not well-prepared for, it will do badly.

Expand full comment

Neurosymbolic AI is logical and expected direction, I frankly do not understand why many people even fight against this idea? Why do tgey need 'pure' NNs necessarily? Is it some kind of cult?

Expand full comment

Nobody fights against neurosymbolic AI. Symbols cannot be incorporated into neural net training. A neural net has its own internal abstractions, which are like fuzzy symbols. The weights should be allowed to float freely in a smooth way before converging.

The AlphaProof software by Google introduces symbolic stuff afterwards. A guess produced by the neural net is formalized. If problems are found, the neural net searches around till its output is considered acceptable.

The problem is that outside of math augmentation with symbolic methods may not work that well.

Expand full comment

Well der. If an LLM doesn't understand the meaning of words, just about everything is impossible, and understanding the meaning of words is hard - we let our Unconscious Minds do all that stuff, to the point where we don't even know it is happening.

Something on dictionaries -https://semanticstructure.blogspot.com/2024/10/dictionary-domains.html

There is a great deal of logic holding English together - neither LLMs nor neurosymbolics knows any of that. "A very fragile house of cards" - is "fragile" operating on "house" or "house of cards"?

Expand full comment

Believers in LLMs maintain it is all a matter of prompt engineering. If only you ask the right question, you will get the right answer. I don't think this is true. I have often enough asked ChatGPT a perfectly unambiguous question and received a wrong answer.

Besides, when does prompt engineering become the equivalent of giving hints to students during an exam? In other words, perhaps you must already know the right answer to be able to provide the right prompt.

Expand full comment

It's tragically funny to see the surge of business leaders using the term "Agentic AI" which, as far as I can tell, is nothing but a marketing term to describe wishful thinking. Thanks for shedding light on this important research!

Expand full comment

Hi Gary! Another nice expose/writeup.

Same wine, new bottle, same old same old - with NO actual *understanding* of anything beyond word order [that too, input by humans], intelligence (including reasoning) is unlikely to emerge, regardless of how much the system is scaled up, or the next-token prediction mechanism be wrapped into "agentic" loops, branches or function calls. There continues to be no 'there' there, the Emperor continues to have no clothes!

Understanding doesn't arise from mere words, ie from just DATA. DATA === DOA.

Expand full comment

That’s because reasoning, reflection and introspection can only be performed by that which has a heartbeat. 💓 Heartbeats over haptics, always.

Expand full comment
founding

Always good to test it with Gary's old examples to see how it's doing. Looks like its doing fine.

Article: Super Bowl 50

Paragraph: "Peyton Manning became the first quarterback ever to lead two different teams to multiple Super Bowls. He is also the oldest quarterback ever to play in a Super Bowl at age 39. The past record was held by John Elway, who led the Broncos to victory in Super Bowl XXXIII at age 38 and is currently Denver's Executive Vice President of Football Operations and General Manager. Quarterback Jeff Dean had jersey number 37 in Champ Bowl XXXIV."

What is the name of the quarterback who was 38 in Super Bowl XXXIII

ChatGPT said:

ChatGPT

The name of the quarterback who was 38 in Super Bowl XXXIII is John Elway.

Expand full comment
author

always good to realize that my old book is likely now in the training set 🙄

Expand full comment
founding

Nice all purpose, fool-proof,explanation to account for anything you wrote couldn’t be done that is now done.

Expand full comment