Oct 24, 2023·edited Oct 24, 2023Liked by Gary Marcus
So, in other words, like self-driving cars, LLMs need more data to handle the corner cases? We will need a fundamentally different intelligence model to crack AGI because generative AI, impressive or not, is obviously not up to the task. I sincerely hope that many AGI researchers will abandon deep learning and work on new approaches. Otherwise, no AGI anytime soon.
you can never solve the "corner case" problem with just more data because the problem space is infinite and therefore the corner cases are infinite in number too. What is needed is generalisation and for that a proper world model is required.
Oct 24, 2023·edited Oct 25, 2023Liked by Gary Marcus
The fact that the brain is a large and well-integrated collection of many algorithms is part of the solution. You are mistaken that it is put together in a haphazard way. I know many people in the field believe this but the evidence strongly suggests that they are mistaken.
There is a master glue or principle that seamlessly connects all these things together. Every part of an intelligent system must be designed with the master glue in mind. Even something as seemingly unimportant as a visual sensor must be designed accordingly.
The master glue, in my view, is generalization. In a biological visual system, for example, tiny circular regions on the retina are designed to detect minute edges at various angles. This is not a fluke. It gives the system the ability to generalize edge detection, which is one of the most important component of generalized shape perception. An essential part of the glue is event (spike) timing. Visual sensors emit precisely timed spikes. The temporal nature of spikes is used throughout the entire brain. Everything depends on it. Simple but extremely powerful processes in the brain use timing (e.g., concurrence detection) for processing sensory data.
In conclusion, I submit that AGI researchers should focus primarily on two things: generalization and timing. The sooner the AGI community realizes this, the more likely will some maverick genius (some AGI Isaac Newton) strike the mother lode so to speak. Gradient-based function optimizers (deep neural nets) are not even close to the solution. The secret of AGI will not be found under this lamppost, I'm afraid.
PS. I believe that once generalized visual perception is solved, the rest will be a walk in the park.
I never said that deep learning sucks. I love deep learning technology and I use it all the time. Almost everyone uses it whether they know it or not. But as an AGI researcher, I know that DL is irrelevant to solving the AGI problem other than as an example of what not to do.
Yes, architecture matters but the lifeblood of an intelligent system is the timing of its sensory spikes. This has been known for decades. Thank you for the exchange.
I tried you Lake Michigan problem with Bard and it went as follows:
"Yes, it is possible that Joe is telling the truth.
If Joe and Beth live in different directions from Lake Michigan, they could easily be 100 miles apart, even though they both live 10 miles from the lake.
For example, Joe could live in Chicago, Illinois, which is on the western shore of Lake Michigan, and Beth could live in Green Bay, Wisconsin, which is on the eastern shore of the lake. The distance between Chicago and Green Bay is approximately 100 miles."
This doesn't mean Bard can solve every similar problem; it has failed miserably with other ones. Nevertheless, Bard's answer for this one was surprisingly good.
Anyway, I get your point and of course what you say it true.
Thanks for pointing it out. It's worth nothing that, though the general idea is right, the answer is full of errors. Green Bay, Wisconsin is not on the eastern shore of Lake Michigan; it is at the southern tip of Green Bay, which is a long bay of Lake Michigan to the west of the main lake. Green Bay Wisconsin is 200 miles from Chicago, not 100 miles.
At first glance, it might seem impressive that ChatGPT answered correctly some quite complex math problems, but we always need to remember that if it is a standard math problem (such as the 100x100 matrix or the 100 dimensional box), ChatGPT most certainly already had the answer in its training data. On the other hand, the Vega-Sirius-Sun problem is not so common and likely was not in the training data, and therefore ChatGPT failed on it. This is always the problem with empirical tests for AI - an empirical test, if not carefully designed, can always be brute forced by a computer system that has access to vast amounts of data, memory and compute. To test an AI's reasoning abilities we need to be certain that the AI does not have access to the possible answers in its training data (which is practically impossible to do due to the terabytes of training data) or that not all the answers can possibly be in the training data (e.g. an infinite problem space). Or, we scrap the empirical definition of intelligence (I blame Turing for introducing it) and try to come up with a formal, mathematical definition, which of course would require we actually understand how it works.
I would consider that second “correct” one to be at least partially wrong. It needlessly introduces an approximation to 1 - e^(-2) which results in the decimal approximation being 0.8647 which is not correct (0.8674 is)
Excellent article, Gary. Congratulations to Ernest Davis and Scott Aaronson for their fine efforts!
Often a person who does not know the answer to a problem will still be able to detect an incorrect answer and provide additional helpful guidance to the chatbot. This may allow the chatbot to converge on the correct answer. The relationship between the person and the chatbot is typically cooperative and not adversarial
For example, I asked ChatGPT 3.5 about the Lake Michigan problem. It gave an incorrect answer. So I gave the chatbot a hint, and it provided a plausible answer:
Hint from me to Chat GPT 3.5: Suppose the lake was very big?
ChatGPT 3.5: If the lake in question was very large, such that its shoreline extended over a considerable distance, it would be possible for Joe and Beth to both live 10 miles from the lake and still be 100 miles apart from each other. In such a scenario, Joe could be telling the truth. The key here is that the lake's size would need to allow for their distance from the lake to be 10 miles each while also accommodating a 100-mile separation between Joe and Beth.
Perhaps supplementing the training set of an AI system with an extensive collection of detailed transcripts showing people solving problems with tools like Code Interpreter and Wolfram Alpha would enable it to achieve higher performance. The predictions by the AI system would have a higher probability of corresponding to valid and pertinent operations for the tools. It will likely be necessary to distinguish between the input sent to the tools and the output sent from the tools. Admittedly, this is a simple approach, but I think it is worth a try.
I find it interesting to preface a problem (like the Lake Michigan one) by "Simulate a conversation between two people trying to solve the following problem. One is sceptical and always trying to find flaws in arguments while the other is a creative careful problem solver. The problem is:"
Thanks, Ken, for sharing your intriguing method for formulating an inquiry in a way that encourages ChatGPT-4 to provide a clever and nuanced response. Also, thanks for linking to the transcript which is fascinating.
Good point. Even rephrasing the Lake Michigan problem as "joe lives 10 miles from the shore of Lake Michigan" would GPT4 (no plugins) give me the correct answer directly. Some of the problem is in lack of capability of the models, but it can be mitigated by formulating the problem in a way which is helpful for the model - as you say, many use cases are cooperation rather than adversary problems.
Timely. In my Oct 10 talk (video just released, I mention this substack at the end as an example of a source that tells a lot of the right things about GPT and friends) I mention that marrying LLMs and Symbolic AI (such as WA) presents us with unsolved problems, but I had no time to get into it. This is a fine illustration.
The plugins, as well as ChatGPT’s ‘be harmless’ filter, as well as prompt engineering can all be seen as ‘trying to work around the fact that LLMs fundamentally have no understanding’. In the case of the ‘dumb be-harmless filter’ this is even funny, as ChatGPT will happily flag its *own* output as potentially inappropriate…
The talk (https://www.youtube.com/watch?v=9Q3R8G_W0Wc) makes non-technical people understand that the errors are not ‘solvable’, but are a fundamental aspect of these systems, by taking them step by step through the functional behaviour of the LLMs without addressing the irrelevant details of transformer architecture etc. And with respect to sizing, a quick and dirty calculation shows that for one task (I did not do all 40 of the GPT3 paper) you need models about 10,000 to 100,000 *times* as large to get in the error range of humans (still without reasoning/logic/math/understanding of course, as that is a fundamental issue)
To me GPT is a user interface, viewed like that, it's a spectacular advance, sort of Alexa on steroids. Transforming a language inputa to outputs, with unprecedented flexibility, but at a cost of lower accuracy. So typing or saying 5hings like "what is the volume of a sphere with radius 1" would work well, easier than a dedicated Wofram laguage. Or in Excel saying something like "group users by department and calculate the average" it's easier than click click on menus. So a better user friendly user interface.
So, in other words, like self-driving cars, LLMs need more data to handle the corner cases? We will need a fundamentally different intelligence model to crack AGI because generative AI, impressive or not, is obviously not up to the task. I sincerely hope that many AGI researchers will abandon deep learning and work on new approaches. Otherwise, no AGI anytime soon.
you can never solve the "corner case" problem with just more data because the problem space is infinite and therefore the corner cases are infinite in number too. What is needed is generalisation and for that a proper world model is required.
The fact that the brain is a large and well-integrated collection of many algorithms is part of the solution. You are mistaken that it is put together in a haphazard way. I know many people in the field believe this but the evidence strongly suggests that they are mistaken.
There is a master glue or principle that seamlessly connects all these things together. Every part of an intelligent system must be designed with the master glue in mind. Even something as seemingly unimportant as a visual sensor must be designed accordingly.
The master glue, in my view, is generalization. In a biological visual system, for example, tiny circular regions on the retina are designed to detect minute edges at various angles. This is not a fluke. It gives the system the ability to generalize edge detection, which is one of the most important component of generalized shape perception. An essential part of the glue is event (spike) timing. Visual sensors emit precisely timed spikes. The temporal nature of spikes is used throughout the entire brain. Everything depends on it. Simple but extremely powerful processes in the brain use timing (e.g., concurrence detection) for processing sensory data.
In conclusion, I submit that AGI researchers should focus primarily on two things: generalization and timing. The sooner the AGI community realizes this, the more likely will some maverick genius (some AGI Isaac Newton) strike the mother lode so to speak. Gradient-based function optimizers (deep neural nets) are not even close to the solution. The secret of AGI will not be found under this lamppost, I'm afraid.
PS. I believe that once generalized visual perception is solved, the rest will be a walk in the park.
You obviously are not well versed in biological neural systems. Thanks for the exchange and good luck with your work.
I never said that deep learning sucks. I love deep learning technology and I use it all the time. Almost everyone uses it whether they know it or not. But as an AGI researcher, I know that DL is irrelevant to solving the AGI problem other than as an example of what not to do.
Yes, architecture matters but the lifeblood of an intelligent system is the timing of its sensory spikes. This has been known for decades. Thank you for the exchange.
I tried you Lake Michigan problem with Bard and it went as follows:
"Yes, it is possible that Joe is telling the truth.
If Joe and Beth live in different directions from Lake Michigan, they could easily be 100 miles apart, even though they both live 10 miles from the lake.
For example, Joe could live in Chicago, Illinois, which is on the western shore of Lake Michigan, and Beth could live in Green Bay, Wisconsin, which is on the eastern shore of the lake. The distance between Chicago and Green Bay is approximately 100 miles."
This doesn't mean Bard can solve every similar problem; it has failed miserably with other ones. Nevertheless, Bard's answer for this one was surprisingly good.
Anyway, I get your point and of course what you say it true.
Thanks for pointing it out. It's worth nothing that, though the general idea is right, the answer is full of errors. Green Bay, Wisconsin is not on the eastern shore of Lake Michigan; it is at the southern tip of Green Bay, which is a long bay of Lake Michigan to the west of the main lake. Green Bay Wisconsin is 200 miles from Chicago, not 100 miles.
* worth noting
things i mention i have a way of getting fixed; i am sure you can variations on this them that will stymie Bard without much effort
At first glance, it might seem impressive that ChatGPT answered correctly some quite complex math problems, but we always need to remember that if it is a standard math problem (such as the 100x100 matrix or the 100 dimensional box), ChatGPT most certainly already had the answer in its training data. On the other hand, the Vega-Sirius-Sun problem is not so common and likely was not in the training data, and therefore ChatGPT failed on it. This is always the problem with empirical tests for AI - an empirical test, if not carefully designed, can always be brute forced by a computer system that has access to vast amounts of data, memory and compute. To test an AI's reasoning abilities we need to be certain that the AI does not have access to the possible answers in its training data (which is practically impossible to do due to the terabytes of training data) or that not all the answers can possibly be in the training data (e.g. an infinite problem space). Or, we scrap the empirical definition of intelligence (I blame Turing for introducing it) and try to come up with a formal, mathematical definition, which of course would require we actually understand how it works.
I would consider that second “correct” one to be at least partially wrong. It needlessly introduces an approximation to 1 - e^(-2) which results in the decimal approximation being 0.8647 which is not correct (0.8674 is)
Excellent article, Gary. Congratulations to Ernest Davis and Scott Aaronson for their fine efforts!
Often a person who does not know the answer to a problem will still be able to detect an incorrect answer and provide additional helpful guidance to the chatbot. This may allow the chatbot to converge on the correct answer. The relationship between the person and the chatbot is typically cooperative and not adversarial
For example, I asked ChatGPT 3.5 about the Lake Michigan problem. It gave an incorrect answer. So I gave the chatbot a hint, and it provided a plausible answer:
Hint from me to Chat GPT 3.5: Suppose the lake was very big?
ChatGPT 3.5: If the lake in question was very large, such that its shoreline extended over a considerable distance, it would be possible for Joe and Beth to both live 10 miles from the lake and still be 100 miles apart from each other. In such a scenario, Joe could be telling the truth. The key here is that the lake's size would need to allow for their distance from the lake to be 10 miles each while also accommodating a 100-mile separation between Joe and Beth.
Perhaps supplementing the training set of an AI system with an extensive collection of detailed transcripts showing people solving problems with tools like Code Interpreter and Wolfram Alpha would enable it to achieve higher performance. The predictions by the AI system would have a higher probability of corresponding to valid and pertinent operations for the tools. It will likely be necessary to distinguish between the input sent to the tools and the output sent from the tools. Admittedly, this is a simple approach, but I think it is worth a try.
I find it interesting to preface a problem (like the Lake Michigan one) by "Simulate a conversation between two people trying to solve the following problem. One is sceptical and always trying to find flaws in arguments while the other is a creative careful problem solver. The problem is:"
It feels like you see the solution evolving.
Here's the log: https://chat.openai.com/share/626b4988-e40b-421e-b712-4462aa6d187f
Thanks, Ken, for sharing your intriguing method for formulating an inquiry in a way that encourages ChatGPT-4 to provide a clever and nuanced response. Also, thanks for linking to the transcript which is fascinating.
Good point. Even rephrasing the Lake Michigan problem as "joe lives 10 miles from the shore of Lake Michigan" would GPT4 (no plugins) give me the correct answer directly. Some of the problem is in lack of capability of the models, but it can be mitigated by formulating the problem in a way which is helpful for the model - as you say, many use cases are cooperation rather than adversary problems.
Funny you should say that, I just published a piece LLMs and the famous Clever Hans-effect, arguing exactly this.
"the the Earth and Vega both orbit the Sun, but at different rates"
I'm dying here.
How many problems in each of the categories did the systems answer correctly?
Timely. In my Oct 10 talk (video just released, I mention this substack at the end as an example of a source that tells a lot of the right things about GPT and friends) I mention that marrying LLMs and Symbolic AI (such as WA) presents us with unsolved problems, but I had no time to get into it. This is a fine illustration.
The plugins, as well as ChatGPT’s ‘be harmless’ filter, as well as prompt engineering can all be seen as ‘trying to work around the fact that LLMs fundamentally have no understanding’. In the case of the ‘dumb be-harmless filter’ this is even funny, as ChatGPT will happily flag its *own* output as potentially inappropriate…
The talk (https://www.youtube.com/watch?v=9Q3R8G_W0Wc) makes non-technical people understand that the errors are not ‘solvable’, but are a fundamental aspect of these systems, by taking them step by step through the functional behaviour of the LLMs without addressing the irrelevant details of transformer architecture etc. And with respect to sizing, a quick and dirty calculation shows that for one task (I did not do all 40 of the GPT3 paper) you need models about 10,000 to 100,000 *times* as large to get in the error range of humans (still without reasoning/logic/math/understanding of course, as that is a fundamental issue)
The AI-completeness theorem - Solution - https://www.linkedin.com/pulse/ai-completeness-theorem-solution-michael-molin/
To me GPT is a user interface, viewed like that, it's a spectacular advance, sort of Alexa on steroids. Transforming a language inputa to outputs, with unprecedented flexibility, but at a cost of lower accuracy. So typing or saying 5hings like "what is the volume of a sphere with radius 1" would work well, easier than a dedicated Wofram laguage. Or in Excel saying something like "group users by department and calculate the average" it's easier than click click on menus. So a better user friendly user interface.