And yet. I see people increasingly finding that LLMs and other genAI are useful in ways that don't require reasoning. Summarize this article; advise me on how to make its tone more cheerful; give me ideas for a new product line; teach me the basics of Python; combine my plans with the images in these paintings so I can think differently about the building I'm designing. In these situations (all recently encountered by me, ie real uses, not hypotheticals), people are getting a lot out of supercharged pattern-matching. They aren't asking for impeccable reasoning ability, and so they aren't being disappointed.
These are "knowledge-work" settings in which the occasional error is not fatal. So, no quarrel with the larger point that we shouldn't ignore the absence of real reasoning in these systems. But it is also important to recognize that they're being found useful "as is." Which complicates the project of explaining that they shouldn't be given the keys to everything society needs done.
Well stated points. I am frustrated that there is not a richer dialog about where LLMs are useful and where they are not, and maybe even more importantly, how to evaluate failure modes. Many personal assistant-type use cases, with an expert user, are very low risk. But put a novice user with an LLM generating output that they do not understand.... Look out.
If you haven’t read him, I recommend Zvi’s newsletter: https://open.substack.com/pub/thezvi, a lot of “here’s were LLMs bring value and here’s where they don’t.”
"Summarize this article (that I wrote, so I can add a summary)" is very different from "Summarize this article (that I don't feel like reading)" are two tasks with extremely different likelihood of success -- I'd encourage you to disambiguate which one you're referring to when discussing these things. :-)
(The first one is verifiable, the second one is not.)
Exactly. LLMs are great for prototyping and brainstorming, and not great for operational high precision tasks. People who did not figure out this yet are just lazy.
Yes, if only this was what was advertised, as opposed to the world changing existential threat that requires trillions of dollars and burning more fossil fuels. I advise everyone that it may be useful, particularly for brainstorming and summarization, so long as you don’t trust it. That may change if it enshitifies the internet quickly enough.
I find your reply really useful. Thanks David. There’s clearly lots of stuff that is useful. I was chatting with a friend the other day about it - he said, when I search I now just gloss the AI overview for an answer. Quicker and easier than skipping through article after article. Of course, is what you’re reading really true. In that use case, that’s the issue - and if not, does it cause harm? I guess that’s what any regulator will need to consider as AI of this type finds and adopts more and more use cases and becomes unpacked from the core LLMs.
This is always a huge frustration for me. Even within groups that actually use AI more, and even engineers, I hear them talking about “reasoning”.
But we know and have known how LLMs work—and some of the results are super impressive! But they are fancy auto-completes that simulate having the ability to think, and those of us that use and actually build some of them should know—it’s a bunch of matrix multiplication to learn associations.
I respect the idea of emergent properties and this paper does a good job addressing it, but it’s just incredibly frustrating to hear people being loose with language who should know better. Including OpenAI with their new models.
Thanks for sharing the paper. Not that it’s surprising but great to see some formal work on it.
This was absolutely fantastic! Researchers shouldn't need moral fiber to do good work, but this work took some guts. Upton Sinclair's quote feels relevant here:
"It is difficult to get a man to understand something when his salary depends upon his not understanding it."
Yes, indeed. The problem is that the real world does not follow formal logic. We never got traction doing AI that way. The real world is messy.
Unprincipled pattern-based imitation is already doing much better than anything people every did with rigorous methods. It will only get better with more careful modeling.
Completely wrong. The rules of formal logic were painstakingly worked out over 2,500 years (from Zeno of Elea in the 5th century BC to Godel's 1929 proof of the Completeness Theorem) such that they would model precisely how the physical universe works logically. Also, first-order logic (for example) may be extended via set theory and e.g. probability theory to be able to reason (with laser-like precision) about uncertainty. This is not to say that the connectionist approach (neural nets etc) doesn't have its place (e.g. when processing low-level percepts). But leave the higher-level reasoning to the big boys!
Yes, I am well-aware of formal logic. I have a PhD in math.
In practice, formal logic does not do well. Reasoning about uncertainty, so Bayesian methods, also hasn't scaled well.
That because for real-world problems formulated via language it is very hard to find those input uncertainties, which would then be propagated via Bayesian rules.
So what's the point of having laser-like precision of the method, if you don't have good inputs for it.
The key to problem-solving (which includes deduction, abduction, and theorem-proving) is the effective use of information. Early implementations of formal reasoning did not incorporate induction, which hampered their ability to discover useful problem-solving information, and hence their effectiveness. In an AGI, initial priors may be calculated from empirical observations of the real world. I'm not saying it's easy, but all the problems of which you speak are solvable.
You'd have to start with a problem in natural language, and somehow convert it to some kind of structured form where a rigorous reasoner could work on it. The magnitude of this task boggles my mind.
An LLM does all this work implicitly. The more data it gets, the better it is. The problem with this is, of course, that the space it operates in is immense, and the number of samples likely needs to be astronomical.
I am honestly surprised LLM can do as well as they do. My best guess is that statistical predictions produced by LLM, then verified with some other methods, can do well in reasonably well-constrained areas.
I agree that LLM are not a principled solution to intelligence.
I agree that Elon Musk’s robotaxis will not be good enough, till Musk starts doing a serious job.
Yet, Waymo shows what happens in AI when a company diligently works on things.
Chatbots can be made to be reliable in specialized domains with lots of data and lots of custom modeling, that will also include formal verifiers, if that makes sense, and physics-based simulators.
We will see reliable assistants do more work in the next year or two.
Any time a chatbot is given work it is not well-prepared for, it will do badly.
Neurosymbolic AI is logical and expected direction, I frankly do not understand why many people even fight against this idea? Why do tgey need 'pure' NNs necessarily? Is it some kind of cult?
Nobody fights against neurosymbolic AI. Symbols cannot be incorporated into neural net training. A neural net has its own internal abstractions, which are like fuzzy symbols. The weights should be allowed to float freely in a smooth way before converging.
The AlphaProof software by Google introduces symbolic stuff afterwards. A guess produced by the neural net is formalized. If problems are found, the neural net searches around till its output is considered acceptable.
The problem is that outside of math augmentation with symbolic methods may not work that well.
Well der. If an LLM doesn't understand the meaning of words, just about everything is impossible, and understanding the meaning of words is hard - we let our Unconscious Minds do all that stuff, to the point where we don't even know it is happening.
There is a great deal of logic holding English together - neither LLMs nor neurosymbolics knows any of that. "A very fragile house of cards" - is "fragile" operating on "house" or "house of cards"?
Believers in LLMs maintain it is all a matter of prompt engineering. If only you ask the right question, you will get the right answer. I don't think this is true. I have often enough asked ChatGPT a perfectly unambiguous question and received a wrong answer.
Besides, when does prompt engineering become the equivalent of giving hints to students during an exam? In other words, perhaps you must already know the right answer to be able to provide the right prompt.
It's tragically funny to see the surge of business leaders using the term "Agentic AI" which, as far as I can tell, is nothing but a marketing term to describe wishful thinking. Thanks for shedding light on this important research!
Same wine, new bottle, same old same old - with NO actual *understanding* of anything beyond word order [that too, input by humans], intelligence (including reasoning) is unlikely to emerge, regardless of how much the system is scaled up, or the next-token prediction mechanism be wrapped into "agentic" loops, branches or function calls. There continues to be no 'there' there, the Emperor continues to have no clothes!
Understanding doesn't arise from mere words, ie from just DATA. DATA === DOA.
Always good to test it with Gary's old examples to see how it's doing. Looks like its doing fine.
Article: Super Bowl 50
Paragraph: "Peyton Manning became the first quarterback ever to lead two different teams to multiple Super Bowls. He is also the oldest quarterback ever to play in a Super Bowl at age 39. The past record was held by John Elway, who led the Broncos to victory in Super Bowl XXXIII at age 38 and is currently Denver's Executive Vice President of Football Operations and General Manager. Quarterback Jeff Dean had jersey number 37 in Champ Bowl XXXIV."
What is the name of the quarterback who was 38 in Super Bowl XXXIII
ChatGPT said:
ChatGPT
The name of the quarterback who was 38 in Super Bowl XXXIII is John Elway.
And yet. I see people increasingly finding that LLMs and other genAI are useful in ways that don't require reasoning. Summarize this article; advise me on how to make its tone more cheerful; give me ideas for a new product line; teach me the basics of Python; combine my plans with the images in these paintings so I can think differently about the building I'm designing. In these situations (all recently encountered by me, ie real uses, not hypotheticals), people are getting a lot out of supercharged pattern-matching. They aren't asking for impeccable reasoning ability, and so they aren't being disappointed.
These are "knowledge-work" settings in which the occasional error is not fatal. So, no quarrel with the larger point that we shouldn't ignore the absence of real reasoning in these systems. But it is also important to recognize that they're being found useful "as is." Which complicates the project of explaining that they shouldn't be given the keys to everything society needs done.
Well stated points. I am frustrated that there is not a richer dialog about where LLMs are useful and where they are not, and maybe even more importantly, how to evaluate failure modes. Many personal assistant-type use cases, with an expert user, are very low risk. But put a novice user with an LLM generating output that they do not understand.... Look out.
If you haven’t read him, I recommend Zvi’s newsletter: https://open.substack.com/pub/thezvi, a lot of “here’s were LLMs bring value and here’s where they don’t.”
"Summarize this article (that I wrote, so I can add a summary)" is very different from "Summarize this article (that I don't feel like reading)" are two tasks with extremely different likelihood of success -- I'd encourage you to disambiguate which one you're referring to when discussing these things. :-)
(The first one is verifiable, the second one is not.)
Exactly. LLMs are great for prototyping and brainstorming, and not great for operational high precision tasks. People who did not figure out this yet are just lazy.
Yes, if only this was what was advertised, as opposed to the world changing existential threat that requires trillions of dollars and burning more fossil fuels. I advise everyone that it may be useful, particularly for brainstorming and summarization, so long as you don’t trust it. That may change if it enshitifies the internet quickly enough.
I find your reply really useful. Thanks David. There’s clearly lots of stuff that is useful. I was chatting with a friend the other day about it - he said, when I search I now just gloss the AI overview for an answer. Quicker and easier than skipping through article after article. Of course, is what you’re reading really true. In that use case, that’s the issue - and if not, does it cause harm? I guess that’s what any regulator will need to consider as AI of this type finds and adopts more and more use cases and becomes unpacked from the core LLMs.
This is always a huge frustration for me. Even within groups that actually use AI more, and even engineers, I hear them talking about “reasoning”.
But we know and have known how LLMs work—and some of the results are super impressive! But they are fancy auto-completes that simulate having the ability to think, and those of us that use and actually build some of them should know—it’s a bunch of matrix multiplication to learn associations.
I respect the idea of emergent properties and this paper does a good job addressing it, but it’s just incredibly frustrating to hear people being loose with language who should know better. Including OpenAI with their new models.
Thanks for sharing the paper. Not that it’s surprising but great to see some formal work on it.
People with financial interests will blow this off and insist that the emperor is fully clothed, while the empire drowns in babble.
This was absolutely fantastic! Researchers shouldn't need moral fiber to do good work, but this work took some guts. Upton Sinclair's quote feels relevant here:
"It is difficult to get a man to understand something when his salary depends upon his not understanding it."
All completely obvious to anyone who has studied formal logic, natural deduction, set theory, etc.
Yes, indeed. The problem is that the real world does not follow formal logic. We never got traction doing AI that way. The real world is messy.
Unprincipled pattern-based imitation is already doing much better than anything people every did with rigorous methods. It will only get better with more careful modeling.
Completely wrong. The rules of formal logic were painstakingly worked out over 2,500 years (from Zeno of Elea in the 5th century BC to Godel's 1929 proof of the Completeness Theorem) such that they would model precisely how the physical universe works logically. Also, first-order logic (for example) may be extended via set theory and e.g. probability theory to be able to reason (with laser-like precision) about uncertainty. This is not to say that the connectionist approach (neural nets etc) doesn't have its place (e.g. when processing low-level percepts). But leave the higher-level reasoning to the big boys!
Yes, I am well-aware of formal logic. I have a PhD in math.
In practice, formal logic does not do well. Reasoning about uncertainty, so Bayesian methods, also hasn't scaled well.
That because for real-world problems formulated via language it is very hard to find those input uncertainties, which would then be propagated via Bayesian rules.
So what's the point of having laser-like precision of the method, if you don't have good inputs for it.
The key to problem-solving (which includes deduction, abduction, and theorem-proving) is the effective use of information. Early implementations of formal reasoning did not incorporate induction, which hampered their ability to discover useful problem-solving information, and hence their effectiveness. In an AGI, initial priors may be calculated from empirical observations of the real world. I'm not saying it's easy, but all the problems of which you speak are solvable.
The priors would be highly context-dependent.
You'd have to start with a problem in natural language, and somehow convert it to some kind of structured form where a rigorous reasoner could work on it. The magnitude of this task boggles my mind.
An LLM does all this work implicitly. The more data it gets, the better it is. The problem with this is, of course, that the space it operates in is immense, and the number of samples likely needs to be astronomical.
I am honestly surprised LLM can do as well as they do. My best guess is that statistical predictions produced by LLM, then verified with some other methods, can do well in reasonably well-constrained areas.
HOW SAM THINKS This article describes how a semantic AI model (SAM) can use LLM to add formal logic reasoning: https://aicyc.wordpress.com/2024/10/05/how-sam-thinks/
It need not be one or the other any more than reading, writing and arithmetic compete.
Thank you for saying that sir
Why do you think it took so long for so many Ai engineers and scientists to see what was clearly written on the wall more than seven years ago?
➡️ https://friedmanphil.substack.com/p/show-me-the-intelligence
I recently published a similar finding:
https://www.preprints.org/manuscript/202401.1681/v2
Coming from you I hope this resonance across spectrums.
Typo in the article: "sfaely".
I agree that LLM are not a principled solution to intelligence.
I agree that Elon Musk’s robotaxis will not be good enough, till Musk starts doing a serious job.
Yet, Waymo shows what happens in AI when a company diligently works on things.
Chatbots can be made to be reliable in specialized domains with lots of data and lots of custom modeling, that will also include formal verifiers, if that makes sense, and physics-based simulators.
We will see reliable assistants do more work in the next year or two.
Any time a chatbot is given work it is not well-prepared for, it will do badly.
Neurosymbolic AI is logical and expected direction, I frankly do not understand why many people even fight against this idea? Why do tgey need 'pure' NNs necessarily? Is it some kind of cult?
Nobody fights against neurosymbolic AI. Symbols cannot be incorporated into neural net training. A neural net has its own internal abstractions, which are like fuzzy symbols. The weights should be allowed to float freely in a smooth way before converging.
The AlphaProof software by Google introduces symbolic stuff afterwards. A guess produced by the neural net is formalized. If problems are found, the neural net searches around till its output is considered acceptable.
The problem is that outside of math augmentation with symbolic methods may not work that well.
Well der. If an LLM doesn't understand the meaning of words, just about everything is impossible, and understanding the meaning of words is hard - we let our Unconscious Minds do all that stuff, to the point where we don't even know it is happening.
Something on dictionaries -https://semanticstructure.blogspot.com/2024/10/dictionary-domains.html
There is a great deal of logic holding English together - neither LLMs nor neurosymbolics knows any of that. "A very fragile house of cards" - is "fragile" operating on "house" or "house of cards"?
Believers in LLMs maintain it is all a matter of prompt engineering. If only you ask the right question, you will get the right answer. I don't think this is true. I have often enough asked ChatGPT a perfectly unambiguous question and received a wrong answer.
Besides, when does prompt engineering become the equivalent of giving hints to students during an exam? In other words, perhaps you must already know the right answer to be able to provide the right prompt.
It's tragically funny to see the surge of business leaders using the term "Agentic AI" which, as far as I can tell, is nothing but a marketing term to describe wishful thinking. Thanks for shedding light on this important research!
Hi Gary! Another nice expose/writeup.
Same wine, new bottle, same old same old - with NO actual *understanding* of anything beyond word order [that too, input by humans], intelligence (including reasoning) is unlikely to emerge, regardless of how much the system is scaled up, or the next-token prediction mechanism be wrapped into "agentic" loops, branches or function calls. There continues to be no 'there' there, the Emperor continues to have no clothes!
Understanding doesn't arise from mere words, ie from just DATA. DATA === DOA.
That’s because reasoning, reflection and introspection can only be performed by that which has a heartbeat. 💓 Heartbeats over haptics, always.
Always good to test it with Gary's old examples to see how it's doing. Looks like its doing fine.
Article: Super Bowl 50
Paragraph: "Peyton Manning became the first quarterback ever to lead two different teams to multiple Super Bowls. He is also the oldest quarterback ever to play in a Super Bowl at age 39. The past record was held by John Elway, who led the Broncos to victory in Super Bowl XXXIII at age 38 and is currently Denver's Executive Vice President of Football Operations and General Manager. Quarterback Jeff Dean had jersey number 37 in Champ Bowl XXXIV."
What is the name of the quarterback who was 38 in Super Bowl XXXIII
ChatGPT said:
ChatGPT
The name of the quarterback who was 38 in Super Bowl XXXIII is John Elway.
always good to realize that my old book is likely now in the training set 🙄
Nice all purpose, fool-proof,explanation to account for anything you wrote couldn’t be done that is now done.