The thing about promises is that in Silicon Valley, accountability rarely shows up. Investors put in over $100 billion into the driverless car industry, and so far have little to show far it. Endless promises (and empty predictions) were made at essentially no cost to those who made them. So what if Elon made bad predictions year after year? Nobody cares.
AI GENERAL'S WARNING: This product produces poetry and jokes ... it is not intended for use as a search engine, truth generator, fact retrieval system, or to be depended on in any real-life situation... 😂
Self-driving cars were not "here" back when he published this, and they are not "here" today. The fundamental challenges that prevent them from becoming ubiquitous have been very well known back then. Especially a very prominent, bright scientist like Ng should have known them.
As innovators, we all have to be stubborn optimists but that doesn't mean we can't be realists or we should ignore plain, clear, fundamental challenges. I believe it is serious disservice to the public to say things like "self-driving cars are here" when you should have known that they are multiple decades away from being table stakes, or "fundamental issues of LLMs will be solved in a few months" when you should know they will not be solved in a few months and more importantly we need several more breakthroughs before we can talk about actual AI, let alone AGI.
Hi Gary, so true, about RAG being the next-in-line silver bullet :) By definition, it's a way to do external/extra computation to look up, query, search for, calculate, or reason (using human-orginated knowledge bases/graphs) things that the core LLM can't calculate by itself. So it's a useful technique for sure, but isn't a universal solution for intelligent behavior.
"HardFork" podcast (okay, but entirely too credulous of Big Tech claims IMO) had on CEO of Perplexity recently, which to me sounded like RAG. They called it an "answer engine" or some such, but it is pretty similar. LLM working with a function to find and summarize outside source materials. Regardless, it still hallucinates. He partially blamed that on the index (the webcrawler it works with) not updating fast enough. CEO also said there were still "hard problems" to solve, (laughably) tried to tell the podcast hosts, "Don't worry, your job is safe (despite your publisher getting no revenue from answers derived from your work)." https://podbay.fm/p/sway/e/1708077603.
While I absolutly agree that the current LLMs are quite a bit away from AGI and it is not assured they will eventually lead to AGI, I do differ in the view on Production-viability of RAG.
Sure it does take some effort and wont be able to answer every question in every situation.
But usually in big companies you have a lot of back office workers handling lots of questions and reading large documents while looking for the 1 relevant paragraph.
In my vision you would use the LLM only to encode the original documents into Vector databases and afterwards use the LLM to match the query to fitting vectors. You would not use it to then generate an answer based on these found vectors. You would not use it to search far and wide but let the user sharpen the use case and therefor the space of possibly relevant vectors.
With these relevant vectors returned to the employee they basically just saved a lot of search time and still get to make an informed decision
People already use LLMs to search for information. A non-trivial question is whether LLMs are a cost-effective solution to such a problem. What if you took a tiny fraction of the hundreds of billions of dollars spent on research, training and inference (plus manually checking for hallucinations) and invested that into solving the specific issue you have (e.g. by hiring someone to produce better documentation or by setting up a better information retrieval system)?
Somehow people imagine that the payoff from investment in LLMs will be endless and ultimately free, while the payoff from solving specific problems LLMs are supposed to solve is just a short-term waste of money. I think in most cases the situation is the exact opposite. It's just that convincing your company to invest into a hyped-up piece of AI is much easier than convincing them to invest into something concrete.
Besides, what you specifically describe does not sound like an LLM, but rather like a completely different AI system that would use some of the same components.
For it to be an interesting business case we (as a company looking into possible applications of this technology) can ignore the R&D costs and only have to evaluate the cost to run those LLMs and the infrastructure mentioned above, those are far less and actually decently cheap.
In my mind LLMs are quite good at understanding language&intent, basically an evolution of the classic chatbot intent recognition and definitly better than those we have built ourselves. Therefor we can use them to understand questions and match those with our own data. But as Marcus has mentioned there are several problems with LLMs so you have to think of them not as the complete&perfect solution but rather a tool to use at very specific points in a busines sprocess.
Like I said, its not AGI, but that doesnt mean its not useful
I worked in several huge multinationals and I could always find what I needed to know via graph based search engines. So it’s not clear to me what hit and miss RAG LLMs bring to the table…
Aside from the fact AI people can't write a paper without committing the Anthropomorphic Fallacy* this puts the kabosh on LLMs for legitimate uses and as a way forward.
"Hallucination has been widely recognized to be a significant drawback for large language models (LLMs). There have been many works that attempt to reduce the extent of hallucination. These efforts have mostly been empirical so far, which cannot answer the fundamental question whether it can be completely eliminated. In this paper, we formalize the problem and show that 𝗶𝘁 𝗶𝘀 𝗶𝗺𝗽𝗼𝘀𝘀𝗶𝗯𝗹𝗲 𝘁𝗼 𝗲𝗹𝗶𝗺𝗶𝗻𝗮𝘁𝗲 𝗵𝗮𝗹𝗹𝘂𝗰𝗶𝗻𝗮𝘁𝗶𝗼𝗻 𝗶𝗻 𝗟𝗟𝗠𝘀. Specifically, we define a formal world where hallucination is defined as inconsistencies between a computable LLM and a computable ground truth function. " [emphasis added]
Xu, Ziwei, Sanjay Jain, and Mohan Kankanhalli. "Hallucination is inevitable: An innate limitation of large language models." arXiv preprint arXiv:2401.11817 (2024).
It's almost like in any reasonably useful and complicated mathematical system there will always be both true and false statements that cannot be proved within that system. Maybe someone should formalize that insight and write it up for 𝘔𝘰𝘯𝘢𝘵𝘴𝘩𝘦𝘧𝘵𝘦 𝘧ü𝘳 𝘔𝘢𝘵𝘩𝘦𝘮𝘢𝘵𝘪𝘬 𝘶𝘯𝘥 𝘗𝘩𝘺𝘴𝘪𝘬? Bet it would get a lot of attention.
* attributing human emotions and characteristics to inanimate objects and aspects of nature, such as plants, animals, or the weather.
(and an 'ACK' to Prof. Marcus for bringing the paper to my attention)
@garymarcus and @A Thornton: for a less formal but more mechanistic explanation of why training a large transformer-based language model on a large training set will yield hallucinations (but with a nuanced take on what those hallucinations may be), please take a look at the paper by Bernardo A. Huberman and me on SSRN: https://dx.doi.org/10.2139/ssrn.4676180
I have a theory... OpenAI described their recent meltdown on account of an update to "optimize" the user experience. Since RAG is the process of optimizing the output of a large language model, one has to wonder if it is production worthy. Now why OpenAI they didn't use a suite of auto regression tests before updating the new release is a mystery. And why Google didn't auto regress Gemini on its racial bias updates is another mystery .. but will table those questions.
[Regression Testing is defined as software testing to ensure that a recent code change has not adversely impacted the existing functionalities]
Separately I havn't had this much fun with a new technology. The stuff that came out of the valley for decades was robust, reliable, well-engineered, ... now I look at semiconductor industry updates from my previous life and everything works. It's boring !@!!
Ah for the days in which a one in a billion mistake was considered to be a major scandal, and the thought of not being able multiply a pair of six digit integers would have been unthinkable.
Purnima, I am pretty sure there was a mountain of regression tests that the engineers prepared. But I think it is impossible to catch everything. The next release will just contain an equally huge pile of bandaid fixes until another embarrasing puncture is publicised.
Using an LLM as a natural language interface to a traditional search engine (and a summariser of the returned documents) is just not very interesting conceptually.
Current generation LLMs hallucinate (while humans tend to say “no idea”) which makes them less useful for business tasks, but as scientific/engineering objects they are far more exciting. Early days and it will take decades to figure out what’s going on. Looks like we will drop the GPT architecture perhaps next year or the year after.
Nor is it actually very successful. I tried using Gemini for the same exercise as Gary has included here but it mostly would not produce any summary CV.
Even when I asked it to produce a summary of a copy of my main UoD CV (having removed the top few lines which identified me) it produced a very uninteresting mishmash.
Remember that RAG is only a way to add more context for the transformer of the LLM to work with. It is essentially Prompt Engineering on steroids.
To your point (and Gary Marcus's article summarising your experience and those of others), I managed on my first attempt to get ChatGPT using RAG to base its answer on an unreliable source (a far right conspiracy website) AND to hallucinate the content, making garbage up out of the already unreliable representation. (For clarification, I did not point ChatGPT to any source, it found it all by its lonesome).
Obviously performance at any one point in time is pertinent - for a business, what it can do for us today defines its present usefulness - but these arguments are always subject to the "wait 'til next release" or "you did it wrong - here's a better prompt" response. Also, it's easy to make CGPT-4 get something very simple wrong every time ("which has more rainfall, Edinburgh or Amsterdam?") and it will often get that wrong even with RAG switched on. As I said, for me, LLMs are interesting and cool, summarising traditional search engine results - not so much.
hard to take seriously a criticism of RAG by someone running some prompts through Bing Chat - that is a black box. You have to know the details of the RAG to draw conclusions
We can infer from public facts. Copilot is based on a snapshot of GPT 4 running in Azure, “frozen” way before I set up Zingrevenue, my startup, late last year (as per the article). So the links and “facts” that Copilot supplied in my interactions with it were obviously supplemented by Bing as they were recent ones. Hence we can critique the wayward links and broken detail as RAG hallucinations. And Copilot is one of the world’s best RAG LLMs due to simple fact that MSFT’s (and Satya Nadella’s) reputation is riding on its performance and accuracy; the resources Redmond must be devoting to keeping it impressive must be very, very significant.
Also, the wording of your question suggests that I may not be in a position to understand how RAG systems work. While I am unable to supply commercial screenshots of my Falcon 40b LLM instance running with multiple GPUs on my GCP GKE cluster nor the source code of my TF2 containers in my private VPC, nor can I provide screenshots of all the ETL pipelines I have been building over a decade (down to the level of checking hex code of binary data payloads) to visualise that I know a little about the RAG data needed to keep LLMs in check, nor can I post the horribly complex decision trees spanning nearly a couple of decades, I beg to differ.
So, although it is true I don’t have access to Bing’s source code so yes you are right, it is a black box, Bing’s misdirected output is sufficient for me to cast my verdict.
What I can show you though is an easy peasy way to prepare an unprivileged OCI container that builds the latest copy of Tensorflow 2, the basis for my RAG work, from scratch, if that’s something you fancy 😉
Last year I attended a “hackathon” during which this became a significant issue. The machine literally made up “current” statistical data about the disparities among the different housing populations of Hudson Valley. After being asked to link to the sources, none of the links provided actually existed. Just imagine what kind of misinformation damage a faulty RAG system can do. This concern is beyond a dollar amount, and I hope to see it resolved in a way that benevolently furthers the technology.
I think we will need a merger of the current paradigm with some kind of flexible probabilistic reasoning system (and by "reasoning" I don't mean poetic speculation about whether that's what a given inscrutable pile of linear algebra is "really" doing, I mean really strong built-in priors for actual symbolic manipulation). Arguably this was the architecture of AlphaGo and in my humble opinion that was a much more successful project than the current LLMs - besting the world's most expert humans in a highly technical domain and inventing whole new strategies that humans hadn't even thought of after playing the game for ~3000 years or so. You kind of *need* the creative capacity to go out on a limb, try something new that *seems* like it could be right (hallucinate, if you will), but if that's all you have then you're in a muddle. You also need symbolic reasoning to verify that the wild ideas are worth pursuing, but on the other hand if *that's* all you have then you're too rigid to succeed in challenging new circumstances. If symbols on their own were enough, then strong AI would have been cracked in the 60s or 70s with all the Lisp people.
Why are humans so successful, a quantum leap above all other animals in our command of our environment? Mammalian instincts certainly help, but symbols are what sets us apart. How can we build bridges, satellites, smartphones, communication networks? Symbols. Imagine trying to do engineering, physics, computer science without symbols. Imagine Schrodinger and Heisenberg and Einstein without symbols.
You're ignoring the fact that AlphaGo has a tree search built in. The policy network's instantaneous best guess would not beat the best human players - the tree search is required to reason about the consequences of that guess. Arguably this part is symbolic reasoning. The policy and value networks (subsequently fused into one in AphaZero) were indeed learned and were necessary to the success. I don't dispute that. But what I'm saying is that AlphGo wouldn't have succeeded without the plain old tree search also being built in - the success was because the learned parts provided very good heuristics for pruning the search space.
I don't think I dispute that, but I think we may be focused on different aspects of intelligence. You may be focused on "fast" thinking while I'm focused on "slow" thinking. Fast thinking is instinct and gut feeling, the ability to rapidly identify objects in a field of view or jump out of the way of an oncoming bus. Most of the animal kingdom has it. I would argue that most existing DL architectures do "fast" thinking. Indeed, in terms of computational complexity, they literally embody fixed-cost computations - there is no variable-depth search or recursion, just a forward pass through a static set of components. But much what humans do when they do science is "slow" thinking - computations that are variable in length, involve fits and starts with feedback and refinement, and even involve other minds working collectively through peer review. And symbols are always involved in that process - they are a key mechanism for scientific knowledge to propagate itself from person to person and generation to generation. And I don't think you can do a lot of what we do without a deep development of this kind of slow, social, symbol-assisted cognition. Otherwise chimpanzees would be building bridges and airplanes and cellphones by now.
AI GENERAL'S WARNING: This product produces poetry and jokes ... it is not intended for use as a search engine, truth generator, fact retrieval system, or to be depended on in any real-life situation... 😂
And yet real organisations build real solutions with RAG - in real life situations 🙃
I for one cannot believe all of this AI doesn’t work perfectly yet because all other technologies that are older work great.
This is not the first time Ng is being "overly optimistic", giving him the benefit of doubt. Here is an article by him some 6 years ago: https://medium.com/@andrewng/self-driving-cars-are-here-aea1752b1ad0
Self-driving cars were not "here" back when he published this, and they are not "here" today. The fundamental challenges that prevent them from becoming ubiquitous have been very well known back then. Especially a very prominent, bright scientist like Ng should have known them.
As innovators, we all have to be stubborn optimists but that doesn't mean we can't be realists or we should ignore plain, clear, fundamental challenges. I believe it is serious disservice to the public to say things like "self-driving cars are here" when you should have known that they are multiple decades away from being table stakes, or "fundamental issues of LLMs will be solved in a few months" when you should know they will not be solved in a few months and more importantly we need several more breakthroughs before we can talk about actual AI, let alone AGI.
Typos:
"doceuments"
"which is generally believed to be include"
It's just one intellectually-lazy band-aid after another...
There is literally nothing “lazy” about building robust RAG pipelines. Anyone who ever built one knows this.
They are also not disclosing the failures, unfortunately. Nobody wants to admit failure 😅
Who says I’m not?
https://www.bigmother.ai/
Hi Gary, so true, about RAG being the next-in-line silver bullet :) By definition, it's a way to do external/extra computation to look up, query, search for, calculate, or reason (using human-orginated knowledge bases/graphs) things that the core LLM can't calculate by itself. So it's a useful technique for sure, but isn't a universal solution for intelligent behavior.
"HardFork" podcast (okay, but entirely too credulous of Big Tech claims IMO) had on CEO of Perplexity recently, which to me sounded like RAG. They called it an "answer engine" or some such, but it is pretty similar. LLM working with a function to find and summarize outside source materials. Regardless, it still hallucinates. He partially blamed that on the index (the webcrawler it works with) not updating fast enough. CEO also said there were still "hard problems" to solve, (laughably) tried to tell the podcast hosts, "Don't worry, your job is safe (despite your publisher getting no revenue from answers derived from your work)." https://podbay.fm/p/sway/e/1708077603.
While I absolutly agree that the current LLMs are quite a bit away from AGI and it is not assured they will eventually lead to AGI, I do differ in the view on Production-viability of RAG.
Sure it does take some effort and wont be able to answer every question in every situation.
But usually in big companies you have a lot of back office workers handling lots of questions and reading large documents while looking for the 1 relevant paragraph.
In my vision you would use the LLM only to encode the original documents into Vector databases and afterwards use the LLM to match the query to fitting vectors. You would not use it to then generate an answer based on these found vectors. You would not use it to search far and wide but let the user sharpen the use case and therefor the space of possibly relevant vectors.
With these relevant vectors returned to the employee they basically just saved a lot of search time and still get to make an informed decision
People already use LLMs to search for information. A non-trivial question is whether LLMs are a cost-effective solution to such a problem. What if you took a tiny fraction of the hundreds of billions of dollars spent on research, training and inference (plus manually checking for hallucinations) and invested that into solving the specific issue you have (e.g. by hiring someone to produce better documentation or by setting up a better information retrieval system)?
Somehow people imagine that the payoff from investment in LLMs will be endless and ultimately free, while the payoff from solving specific problems LLMs are supposed to solve is just a short-term waste of money. I think in most cases the situation is the exact opposite. It's just that convincing your company to invest into a hyped-up piece of AI is much easier than convincing them to invest into something concrete.
Besides, what you specifically describe does not sound like an LLM, but rather like a completely different AI system that would use some of the same components.
For it to be an interesting business case we (as a company looking into possible applications of this technology) can ignore the R&D costs and only have to evaluate the cost to run those LLMs and the infrastructure mentioned above, those are far less and actually decently cheap.
In my mind LLMs are quite good at understanding language&intent, basically an evolution of the classic chatbot intent recognition and definitly better than those we have built ourselves. Therefor we can use them to understand questions and match those with our own data. But as Marcus has mentioned there are several problems with LLMs so you have to think of them not as the complete&perfect solution but rather a tool to use at very specific points in a busines sprocess.
Like I said, its not AGI, but that doesnt mean its not useful
sure, I can imagine some value in this more scoped use case
I worked in several huge multinationals and I could always find what I needed to know via graph based search engines. So it’s not clear to me what hit and miss RAG LLMs bring to the table…
Aside from the fact AI people can't write a paper without committing the Anthropomorphic Fallacy* this puts the kabosh on LLMs for legitimate uses and as a way forward.
"Hallucination has been widely recognized to be a significant drawback for large language models (LLMs). There have been many works that attempt to reduce the extent of hallucination. These efforts have mostly been empirical so far, which cannot answer the fundamental question whether it can be completely eliminated. In this paper, we formalize the problem and show that 𝗶𝘁 𝗶𝘀 𝗶𝗺𝗽𝗼𝘀𝘀𝗶𝗯𝗹𝗲 𝘁𝗼 𝗲𝗹𝗶𝗺𝗶𝗻𝗮𝘁𝗲 𝗵𝗮𝗹𝗹𝘂𝗰𝗶𝗻𝗮𝘁𝗶𝗼𝗻 𝗶𝗻 𝗟𝗟𝗠𝘀. Specifically, we define a formal world where hallucination is defined as inconsistencies between a computable LLM and a computable ground truth function. " [emphasis added]
Xu, Ziwei, Sanjay Jain, and Mohan Kankanhalli. "Hallucination is inevitable: An innate limitation of large language models." arXiv preprint arXiv:2401.11817 (2024).
It's almost like in any reasonably useful and complicated mathematical system there will always be both true and false statements that cannot be proved within that system. Maybe someone should formalize that insight and write it up for 𝘔𝘰𝘯𝘢𝘵𝘴𝘩𝘦𝘧𝘵𝘦 𝘧ü𝘳 𝘔𝘢𝘵𝘩𝘦𝘮𝘢𝘵𝘪𝘬 𝘶𝘯𝘥 𝘗𝘩𝘺𝘴𝘪𝘬? Bet it would get a lot of attention.
* attributing human emotions and characteristics to inanimate objects and aspects of nature, such as plants, animals, or the weather.
(and an 'ACK' to Prof. Marcus for bringing the paper to my attention)
@garymarcus and @A Thornton: for a less formal but more mechanistic explanation of why training a large transformer-based language model on a large training set will yield hallucinations (but with a nuanced take on what those hallucinations may be), please take a look at the paper by Bernardo A. Huberman and me on SSRN: https://dx.doi.org/10.2139/ssrn.4676180
I have a theory... OpenAI described their recent meltdown on account of an update to "optimize" the user experience. Since RAG is the process of optimizing the output of a large language model, one has to wonder if it is production worthy. Now why OpenAI they didn't use a suite of auto regression tests before updating the new release is a mystery. And why Google didn't auto regress Gemini on its racial bias updates is another mystery .. but will table those questions.
[Regression Testing is defined as software testing to ensure that a recent code change has not adversely impacted the existing functionalities]
Separately I havn't had this much fun with a new technology. The stuff that came out of the valley for decades was robust, reliable, well-engineered, ... now I look at semiconductor industry updates from my previous life and everything works. It's boring !@!!
Ah for the days in which a one in a billion mistake was considered to be a major scandal, and the thought of not being able multiply a pair of six digit integers would have been unthinkable.
Purnima, I am pretty sure there was a mountain of regression tests that the engineers prepared. But I think it is impossible to catch everything. The next release will just contain an equally huge pile of bandaid fixes until another embarrasing puncture is publicised.
Exactly!
Using an LLM as a natural language interface to a traditional search engine (and a summariser of the returned documents) is just not very interesting conceptually.
Current generation LLMs hallucinate (while humans tend to say “no idea”) which makes them less useful for business tasks, but as scientific/engineering objects they are far more exciting. Early days and it will take decades to figure out what’s going on. Looks like we will drop the GPT architecture perhaps next year or the year after.
Nor is it actually very successful. I tried using Gemini for the same exercise as Gary has included here but it mostly would not produce any summary CV.
Even when I asked it to produce a summary of a copy of my main UoD CV (having removed the top few lines which identified me) it produced a very uninteresting mishmash.
Remember that RAG is only a way to add more context for the transformer of the LLM to work with. It is essentially Prompt Engineering on steroids.
To your point (and Gary Marcus's article summarising your experience and those of others), I managed on my first attempt to get ChatGPT using RAG to base its answer on an unreliable source (a far right conspiracy website) AND to hallucinate the content, making garbage up out of the already unreliable representation. (For clarification, I did not point ChatGPT to any source, it found it all by its lonesome).
Obviously performance at any one point in time is pertinent - for a business, what it can do for us today defines its present usefulness - but these arguments are always subject to the "wait 'til next release" or "you did it wrong - here's a better prompt" response. Also, it's easy to make CGPT-4 get something very simple wrong every time ("which has more rainfall, Edinburgh or Amsterdam?") and it will often get that wrong even with RAG switched on. As I said, for me, LLMs are interesting and cool, summarising traditional search engine results - not so much.
(Thinking in particular about the mamba architecture).
hard to take seriously a criticism of RAG by someone running some prompts through Bing Chat - that is a black box. You have to know the details of the RAG to draw conclusions
We can infer from public facts. Copilot is based on a snapshot of GPT 4 running in Azure, “frozen” way before I set up Zingrevenue, my startup, late last year (as per the article). So the links and “facts” that Copilot supplied in my interactions with it were obviously supplemented by Bing as they were recent ones. Hence we can critique the wayward links and broken detail as RAG hallucinations. And Copilot is one of the world’s best RAG LLMs due to simple fact that MSFT’s (and Satya Nadella’s) reputation is riding on its performance and accuracy; the resources Redmond must be devoting to keeping it impressive must be very, very significant.
Also, the wording of your question suggests that I may not be in a position to understand how RAG systems work. While I am unable to supply commercial screenshots of my Falcon 40b LLM instance running with multiple GPUs on my GCP GKE cluster nor the source code of my TF2 containers in my private VPC, nor can I provide screenshots of all the ETL pipelines I have been building over a decade (down to the level of checking hex code of binary data payloads) to visualise that I know a little about the RAG data needed to keep LLMs in check, nor can I post the horribly complex decision trees spanning nearly a couple of decades, I beg to differ.
So, although it is true I don’t have access to Bing’s source code so yes you are right, it is a black box, Bing’s misdirected output is sufficient for me to cast my verdict.
What I can show you though is an easy peasy way to prepare an unprivileged OCI container that builds the latest copy of Tensorflow 2, the basis for my RAG work, from scratch, if that’s something you fancy 😉
https://www.linkedin.com/pulse/building-tensorflow-ai-from-source-container-simon-au-yong-joo9c?trk=public_post
Last year I attended a “hackathon” during which this became a significant issue. The machine literally made up “current” statistical data about the disparities among the different housing populations of Hudson Valley. After being asked to link to the sources, none of the links provided actually existed. Just imagine what kind of misinformation damage a faulty RAG system can do. This concern is beyond a dollar amount, and I hope to see it resolved in a way that benevolently furthers the technology.
Gary, what do you mean by "neurosymbolic AI"?
subject of a future essay, or see my 2000 arxiv The Next Decade in AI
I think we will need a merger of the current paradigm with some kind of flexible probabilistic reasoning system (and by "reasoning" I don't mean poetic speculation about whether that's what a given inscrutable pile of linear algebra is "really" doing, I mean really strong built-in priors for actual symbolic manipulation). Arguably this was the architecture of AlphaGo and in my humble opinion that was a much more successful project than the current LLMs - besting the world's most expert humans in a highly technical domain and inventing whole new strategies that humans hadn't even thought of after playing the game for ~3000 years or so. You kind of *need* the creative capacity to go out on a limb, try something new that *seems* like it could be right (hallucinate, if you will), but if that's all you have then you're in a muddle. You also need symbolic reasoning to verify that the wild ideas are worth pursuing, but on the other hand if *that's* all you have then you're too rigid to succeed in challenging new circumstances. If symbols on their own were enough, then strong AI would have been cracked in the 60s or 70s with all the Lisp people.
Why are humans so successful, a quantum leap above all other animals in our command of our environment? Mammalian instincts certainly help, but symbols are what sets us apart. How can we build bridges, satellites, smartphones, communication networks? Symbols. Imagine trying to do engineering, physics, computer science without symbols. Imagine Schrodinger and Heisenberg and Einstein without symbols.
You're ignoring the fact that AlphaGo has a tree search built in. The policy network's instantaneous best guess would not beat the best human players - the tree search is required to reason about the consequences of that guess. Arguably this part is symbolic reasoning. The policy and value networks (subsequently fused into one in AphaZero) were indeed learned and were necessary to the success. I don't dispute that. But what I'm saying is that AlphGo wouldn't have succeeded without the plain old tree search also being built in - the success was because the learned parts provided very good heuristics for pruning the search space.
I don't think I dispute that, but I think we may be focused on different aspects of intelligence. You may be focused on "fast" thinking while I'm focused on "slow" thinking. Fast thinking is instinct and gut feeling, the ability to rapidly identify objects in a field of view or jump out of the way of an oncoming bus. Most of the animal kingdom has it. I would argue that most existing DL architectures do "fast" thinking. Indeed, in terms of computational complexity, they literally embody fixed-cost computations - there is no variable-depth search or recursion, just a forward pass through a static set of components. But much what humans do when they do science is "slow" thinking - computations that are variable in length, involve fits and starts with feedback and refinement, and even involve other minds working collectively through peer review. And symbols are always involved in that process - they are a key mechanism for scientific knowledge to propagate itself from person to person and generation to generation. And I don't think you can do a lot of what we do without a deep development of this kind of slow, social, symbol-assisted cognition. Otherwise chimpanzees would be building bridges and airplanes and cellphones by now.
Why do the AI gurus keep thinking that stochastic parroting is a substitute for reasoning?
Great analysis . It’s is better to integrate LLM in RAG rather than the reverse