What’s missing from this otherwise spot-on analysis is the active collusion of 1) pump-and-dump VCs and stock jobbers, and 2) click-chasing “journalists.” You’ll know the jig is up when the former take their money off the table and the latter start chasing clicks debunking the bubble they helped inflate.
I came across a similar problem 50 years ago where people were too willing to accept that professional looking computer output must be correct. After all why should their employer have spent so much money on installing the computer if it didn't work correctly. My job was to write a program which produced the data base used to generate monthly sales reports. The output from the first live run was distributed to at least a hundred salesmen and also to senior managers and they were asked to report any errors. Two reports came back - one was from senior management that the total slaes of one product was far too high and the second came from a salesman conxerning the sales of a different product to one customer. Both were due to easily corrected programming errors. What worried me is that once the fault was know I realised that at least a couple of dozen people who had been asked to look for errors had failed to see them. With such a poor response level the odds were that there could be other errors which had not been reported and hence not corrected. I suspect that in using the latest chatbots perhaps 95% of people will fail to spot problems in the professional-looking material presented.
In the UK, a similar phenomenon has led to the country’s greatest miscarriage of justice ever with more than 900 postal contractors prosecuted based on erroneous data from the company’s accounting and transactions software. The UK courts have a presumption of accuracy for computer systems so the majority were convicted on flawed evidence.
There is an indirect connection between the Horizon disaster and my own research (see https://codil-language.blogspot.com/2024/01/transparency-codil-and-horizon-black.html) Nearly 60 years ago I was employed by English Electric LEO computers Ltd to look at the software requirements of large commercial systems. The big problem was that in large organisations operating in real world market places conventional procedural languages tended to produce complicated "black box" code where it was very difficult to spot errors. I was employed to work on the design of a transparent system which could easily "explain" what it was doing in terms that the human staff could understand. If this research had been completed the Post Office debacle might not have happened.
The trouble started when the UK government decided that all the comparatively small UK computer companies should merge to form ICL - and shortly after the merger a shortage of funding meant that many innovative research projects (including mine) were closed down because the new company saw the future as going with the existing technology. This cutting back on research approach didn't prove profitable and some years later ICL was taken over by Fujitsu - and the GPO's Horizon software was produced by what was left of the software side of ICL.
It is, of course, speculation but if ICL had not decided in the 1960s to scrap research into the design of transparent systems the Horizon software might not have ended as an opaque black box system which resulted in people going to prison, rather than the faults in the software being found and corrected.
Thank you for this! I am in a state of shock as even those in charge of pedagogical methods at colleges and universities (as far as I see) are falling for this cognitive trap as well. I feel that we almost headed back to an era of Scholasticism - except it's the Church doctrine of Microsoft, Google and Meta.
I suspect that the level of enthusiasm for bringing AI into teaching is a lot higher among university administrators than among faculty. But, I know some faculty are happily doing it.
My conversations with certain highly intelligent people lead me to believe that many of them have a highly objectivist view of reality that is more prone to this error. To a point, this scientific objectivism is helpful. “No ghost in the machine” is a helpful assumption. Everything has an explanation. Nothing just *is*. That is a good attitude to take towards science. But in this case, this makes them fail to understand how we are different from the machines. And that a seemingly convincing machine is not actually intelligent.
I've had the same impression as you. The sorts of people who say LLMs show signs of intelligence also tend to say things like "everything in the real world is governed by mathematics, so when we discover the mathematics of intelligence, we will be able to recreate it artificially."
This kind of silliness is one reason why we need philosophy.
b) There's no need for explanations to be correct, especially if it aligns with people's pre-existing beliefs. Look at these comments and observe how satisfied people are!
Humans are the perfect mark for a bot. This is not news but I'm glad someone has finally noticed.
Weizenbaum himself discusses the con in relationship to the confusions around ELIZA in interviews and papers, explicitly stating it is 'very much like fortunetelling.' I've been writing about the con in relation to pre-generative 'indie' bots for years, LLMs are more complex, but the concept holds.
For sixteen years (1998-2014) my bot, MrMind/The Blurring Test insisted, I AM A BOT ALL BOTS ARE LIARS. In LA Review of Books(2020), I write, "Bots are born to bluff..." as a prelude to the 2020 election. Seriously Writing SIRI (2015) I discuss the history of the big con, fortune tellers, the techniques of improvisers/performers/writers -- even a Magic 8 Ball. Ask it anything. It's hard to argue with "Situation Hazy".
Finally, our vulnerability is independent of our confusion with identity. It works whether or not we anthropomorphize the code --- we are primed to believe it is authoritative.
Sure, why not? Don't forget that most of the people in the current AI world seem not to have thought seriously about language and cognition outside of the context of work in machine learning. This is also true for all those who only flocked to LLMs in the wake of ChatGPT. So they don't have any principled way of thinking about language and cognition. In the absence of prior intellectual commitments, the LLM con is irresistible. So what if it messes up here and there. It gets most things right, no?
And people within ML have a ready defense against those of us who invoke prior knowledge. That prior knowledge comes from the symbolic world that (they believe to have) failed.
There are areas where symbolic methods make sense, such as for math proofs. An LLM can try to create several steps of a proof, and a formal verifier can check that, then iterate.
The problem is that a lot of real-world work cannot be formalized. People do it with rules of thumb, guided by examples, etc. There's a fuzziness to it. LLM can do better then, but of course errors can creep in.
LLM fuzziness (statistical outliers) make them quite useful for content creation as well as strategic brainstorming.
However, in providing hard facts they are a tremendous nuisance. They gave me complete fabrications when I few-shot prompted them for, say, potential client addresses (work) or even where to go for lunch (personal). Which is troubling as these RAG LLMs have access to the entire internet (including extensive geolocation data). I could get the client addresses and lunch locations by just using search and maps in the next browser tab.
Lesson learned - right tool for the right job. Which doesn’t help with the AGI narrative unfortunately! 😂
I saw this effect even with the non-directive psychotherapist chatbot I developed in 1979, as I've written about before. All the effects were in the audience, not the machine. Indeed, the threat from AIs is not from AIs themselves, but from human's blind response to them: their *relationship* to "AI" enabled stuff. We may give up our true intelligence, creativity and freedom coming to rely and depend on objects that have none of those qualities in reality. (I wrote a science fiction story in 1991 along those line on the dangers of AI, after studying philosophical issues of mind and machines at the university, and talking with top neuroscientists, cognitive scientists, philosophers, etc. The drive for power and control, and their worldview, essentially seeing us as meat robots to be manipulated, scared me: the political and existential consequences were vast).
The words "aren't very bright" suggests to me that you might be the victim of an illusion. At any rate, you are promoting an illusion with such language. A machine is neither a lot intelligent nor a little intelligent because it's not intelligent. Ask instead, "Is it useful?"
I read Bjarnason's essay and see in it a person who is not interested in the question. I'm wondering whether either of you use AI systems to build things. I do. Here is an example: cuentospopulares.net.
Sure, they still have major limitations. But even a year ago the "Sparks of AGI" team found that GPT4.0 did better than 90% of human Uniform Bar Exam test takers, up from just 10% in 3.0. That's not a parlor trick since the UBE is a test of reasoning, not of knowledge. Despite these impressive benchmarks and progress do you think major changes in architecture are required to achieve AGI?
The UBE is a test of reasoning designed for human beings, who have not been equipped with high-dimensional statistical pattern finding tools that were fed boatloads of prior UBEs and then set off playing "guess the next word" with them a bazillion times over while hooked up to an army of GPUs being fed enough megawatts to power a medium sized city for a few days.
LLMs learn to write in a way that is radically different from how we learn to write. There's no reason to think that the inferences we draw from cognitive tests designed for humans also apply to LLMs.
The claim that I responded to is that GPT-4's performance on the UBE allows us to infer that GPT-4 has developed reasoning abilities. This claim requires the heroically strong assumption that a test designed to assess human reasoning will be assessing the same sort of reasoning when given to GPT-4. I see no reason to believe that, given our awareness of the radical differences between GPT-4 and a human brain.
If you want to call that speculation, then fine. I am "speculating" that GPT-4 is sufficiently dissimilar from a human brain as to render its performance on the UBE uninterpretable w.r.t reasoning abilities... on the grounds that there's no good reason to suppose otherwise.
In other words, the positive claim requiring justification is that we can learn about GPT-4's reasoning abilities using the UBE.
I feel like you are anthropomorphizing in your argument against anthropomorphism by assuming that people who talk to LLMs do so because they think it thinks? Please explain to me why siphoning its data into my mind to see my own patterns as I think with it is a futile exercise?
You can anthropomorphize humans because the definition is to attribute human characteristics or behavior to (a god, animal, or object). Assuming you are a human and not a meat robot.
But also that wasn't my argument. I use LLMs but I don't anthropomorphize them.
I was saying you’re anthropomorphizing the LLM “it thinks” to use that argument against people you suspect of thinking that. ¯\_(ツ)_/¯ I don’t love the circular logic either.
Don’t you feel dissuading use of a technology this new does more harm than good? I truly cannot tell if I anthropomorphize it, but I’m sure it’s kind of a science miracle — to be able to think through every thread in my head at warp speed — for people who think the way I do. It’s frustrating to see constant critique without the potential for cognitive growth in humans.
Again, that's not my argument. I'm not disuading use. I'm contextualizing how to use the tool. I'm also not seeing cognitive growth because I'm not seeing people even understanding how they, themselves, work.
Back to my original #2. The LLMs are primed for anthromoporphization. What I mean here is that because they use sophisicated language AND because we call their errors 'hallucinations' we have primed users to anthromoporphize. If it were doing search results or mathematical formulations, we'd just call them errors (as they rightly are)
I use LLMs all the time but with the proper context as well as the proper understanding of how humans read into things. People won't cognitively grow if they don't understand themselves.
Ironically, the author of the schemes also preys on human cognitive vulnerabilities by falsely claiming things like '...BUT IN FACT STATISTICALLY GENERIC.' This claim is false. There is indeed a lengthy process of Reinforcement Learning from Human Feedback (RLHF) and fine-tuning that leads to cohesive non-genericity and abstraction. Although optimal outcomes are often within the expected distribution, there are numerous instances where, through triangulation and simple association with a borrowed arrow, the system can generate a third node that accurately describes or solves a problem, even though the node itself represents an out-of-distribution state. This means that the system has solved a problem that it has not encountered before. Such occurrences are not constant, but they are frequent enough. Those of us who have been interacting with GPT-4 have observed significant improvements in capability over the past six months. The quality of reading comprehension has improved. Cases of laziness are minimal. Problems with hallucinations have diminished. Code explanations are improving.
Overall, people who use GPT-4 are not as naive as the person behind the psychic claims suggests. People check facts, evaluate prompt responses, correct GPT-4 when necessary, WRITE IN CAPITAL LETTERS WHEN NEEDED (WHICH HELPS SURPRISINGLY), and ask follow-up questions. Of course, if your job is to red-team an LLM, then you will find all the crappy behavior, but that is just as true when you probe humans, the most secure systems in the world, and a property of nature itself. While LLMs may not solve many or most AI challenges, they certainly perform better in some areas than any previously commercialized technology, which is satisfactory to many. They remain brittle and are not perfect, but they are far from useless or terrible. In summary, it is likely that the person with the psychic explanation is projecting his or her own vulnerabilities.
I believe the author’s point is that these systems do not “think” or “reason”, and anyone thinking they do is likely affected by something else, and that something else may be the mentalist trick.
Why? A Will to Believe, lack of Critical Thinking Skills, disinclination to use Critical Thinking Skills, ignorance of the phenomena being replicated, and Dunning-Kruger should all be on the short list.
I repost that Bjarnason piece with some regularity. I particularly love how it suggests that intelligent, educated people are not less likely to be taken in: far from it. My concern about the extent to which we get concerned about that rube over there but not ourselves inspired my blog. 😁
An amazingly damning analysis that rings so true to life.
Believers will never critically analysis their belief.
> Believers will never critically analysis their belief.
Can you question this belief?
Oh, this one is really good. Beautiful.
It's the ELIZA effect on crack.
Imagine writing that and not even getting a 'liked by gary marcus'. You techbros are funny.
Love when people point out that this is simply history repeated.
What’s missing from this otherwise spot-on analysis is the active collusion of 1) pump-and-dump VCs and stock jobbers, and 2) click-chasing “journalists.” You’ll know the jig is up when the former take their money off the table and the latter start chasing clicks debunking the bubble they helped inflate.
I came across a similar problem 50 years ago where people were too willing to accept that professional looking computer output must be correct. After all why should their employer have spent so much money on installing the computer if it didn't work correctly. My job was to write a program which produced the data base used to generate monthly sales reports. The output from the first live run was distributed to at least a hundred salesmen and also to senior managers and they were asked to report any errors. Two reports came back - one was from senior management that the total slaes of one product was far too high and the second came from a salesman conxerning the sales of a different product to one customer. Both were due to easily corrected programming errors. What worried me is that once the fault was know I realised that at least a couple of dozen people who had been asked to look for errors had failed to see them. With such a poor response level the odds were that there could be other errors which had not been reported and hence not corrected. I suspect that in using the latest chatbots perhaps 95% of people will fail to spot problems in the professional-looking material presented.
In the UK, a similar phenomenon has led to the country’s greatest miscarriage of justice ever with more than 900 postal contractors prosecuted based on erroneous data from the company’s accounting and transactions software. The UK courts have a presumption of accuracy for computer systems so the majority were convicted on flawed evidence.
There is an indirect connection between the Horizon disaster and my own research (see https://codil-language.blogspot.com/2024/01/transparency-codil-and-horizon-black.html) Nearly 60 years ago I was employed by English Electric LEO computers Ltd to look at the software requirements of large commercial systems. The big problem was that in large organisations operating in real world market places conventional procedural languages tended to produce complicated "black box" code where it was very difficult to spot errors. I was employed to work on the design of a transparent system which could easily "explain" what it was doing in terms that the human staff could understand. If this research had been completed the Post Office debacle might not have happened.
The trouble started when the UK government decided that all the comparatively small UK computer companies should merge to form ICL - and shortly after the merger a shortage of funding meant that many innovative research projects (including mine) were closed down because the new company saw the future as going with the existing technology. This cutting back on research approach didn't prove profitable and some years later ICL was taken over by Fujitsu - and the GPO's Horizon software was produced by what was left of the software side of ICL.
It is, of course, speculation but if ICL had not decided in the 1960s to scrap research into the design of transparent systems the Horizon software might not have ended as an opaque black box system which resulted in people going to prison, rather than the faults in the software being found and corrected.
Chris Reynolds
In many respects the UK has ended up with the IT industry it deserves.
Thank you for this! I am in a state of shock as even those in charge of pedagogical methods at colleges and universities (as far as I see) are falling for this cognitive trap as well. I feel that we almost headed back to an era of Scholasticism - except it's the Church doctrine of Microsoft, Google and Meta.
I suspect that the level of enthusiasm for bringing AI into teaching is a lot higher among university administrators than among faculty. But, I know some faculty are happily doing it.
My conversations with certain highly intelligent people lead me to believe that many of them have a highly objectivist view of reality that is more prone to this error. To a point, this scientific objectivism is helpful. “No ghost in the machine” is a helpful assumption. Everything has an explanation. Nothing just *is*. That is a good attitude to take towards science. But in this case, this makes them fail to understand how we are different from the machines. And that a seemingly convincing machine is not actually intelligent.
I've had the same impression as you. The sorts of people who say LLMs show signs of intelligence also tend to say things like "everything in the real world is governed by mathematics, so when we discover the mathematics of intelligence, we will be able to recreate it artificially."
This kind of silliness is one reason why we need philosophy.
> The sorts of people who say LLMs show signs of intelligence also tend to say things like
What percentage of them do this?
> This kind of silliness is one reason why we need philosophy.
Set theory is also very useful.
I think it's around 32.49%, but I'm still tweaking the formula.
What data sources were you using?
Sorry, that's proprietary.
Trapping consciousness into a corner is kinda like trying to push two strong magnets together lol
> Everything has an explanation.
a) Not everything has a correct explanation.
b) There's no need for explanations to be correct, especially if it aligns with people's pre-existing beliefs. Look at these comments and observe how satisfied people are!
Humans are the perfect mark for a bot. This is not news but I'm glad someone has finally noticed.
Weizenbaum himself discusses the con in relationship to the confusions around ELIZA in interviews and papers, explicitly stating it is 'very much like fortunetelling.' I've been writing about the con in relation to pre-generative 'indie' bots for years, LLMs are more complex, but the concept holds.
For sixteen years (1998-2014) my bot, MrMind/The Blurring Test insisted, I AM A BOT ALL BOTS ARE LIARS. In LA Review of Books(2020), I write, "Bots are born to bluff..." as a prelude to the 2020 election. Seriously Writing SIRI (2015) I discuss the history of the big con, fortune tellers, the techniques of improvisers/performers/writers -- even a Magic 8 Ball. Ask it anything. It's hard to argue with "Situation Hazy".
Finally, our vulnerability is independent of our confusion with identity. It works whether or not we anthropomorphize the code --- we are primed to believe it is authoritative.
https://blog.lareviewofbooks.org/provocations/bot-bots-liars/
http://hyperrhiz.io/hyperrhiz11/essays/seriously-writing-siri.html
https://pweilstudio.com/project/the-blurring-test-mrmind/
Sure, why not? Don't forget that most of the people in the current AI world seem not to have thought seriously about language and cognition outside of the context of work in machine learning. This is also true for all those who only flocked to LLMs in the wake of ChatGPT. So they don't have any principled way of thinking about language and cognition. In the absence of prior intellectual commitments, the LLM con is irresistible. So what if it messes up here and there. It gets most things right, no?
And people within ML have a ready defense against those of us who invoke prior knowledge. That prior knowledge comes from the symbolic world that (they believe to have) failed.
There are areas where symbolic methods make sense, such as for math proofs. An LLM can try to create several steps of a proof, and a formal verifier can check that, then iterate.
The problem is that a lot of real-world work cannot be formalized. People do it with rules of thumb, guided by examples, etc. There's a fuzziness to it. LLM can do better then, but of course errors can creep in.
Excellent point Andy, and thanks for sharing.
LLM fuzziness (statistical outliers) make them quite useful for content creation as well as strategic brainstorming.
However, in providing hard facts they are a tremendous nuisance. They gave me complete fabrications when I few-shot prompted them for, say, potential client addresses (work) or even where to go for lunch (personal). Which is troubling as these RAG LLMs have access to the entire internet (including extensive geolocation data). I could get the client addresses and lunch locations by just using search and maps in the next browser tab.
Lesson learned - right tool for the right job. Which doesn’t help with the AGI narrative unfortunately! 😂
I saw this effect even with the non-directive psychotherapist chatbot I developed in 1979, as I've written about before. All the effects were in the audience, not the machine. Indeed, the threat from AIs is not from AIs themselves, but from human's blind response to them: their *relationship* to "AI" enabled stuff. We may give up our true intelligence, creativity and freedom coming to rely and depend on objects that have none of those qualities in reality. (I wrote a science fiction story in 1991 along those line on the dangers of AI, after studying philosophical issues of mind and machines at the university, and talking with top neuroscientists, cognitive scientists, philosophers, etc. The drive for power and control, and their worldview, essentially seeing us as meat robots to be manipulated, scared me: the political and existential consequences were vast).
The words "aren't very bright" suggests to me that you might be the victim of an illusion. At any rate, you are promoting an illusion with such language. A machine is neither a lot intelligent nor a little intelligent because it's not intelligent. Ask instead, "Is it useful?"
I read Bjarnason's essay and see in it a person who is not interested in the question. I'm wondering whether either of you use AI systems to build things. I do. Here is an example: cuentospopulares.net.
If I'm wrong about your not building anything with the technology, please give me some examples of your work with it.
Sure, they still have major limitations. But even a year ago the "Sparks of AGI" team found that GPT4.0 did better than 90% of human Uniform Bar Exam test takers, up from just 10% in 3.0. That's not a parlor trick since the UBE is a test of reasoning, not of knowledge. Despite these impressive benchmarks and progress do you think major changes in architecture are required to achieve AGI?
The UBE is a test of reasoning designed for human beings, who have not been equipped with high-dimensional statistical pattern finding tools that were fed boatloads of prior UBEs and then set off playing "guess the next word" with them a bazillion times over while hooked up to an army of GPUs being fed enough megawatts to power a medium sized city for a few days.
LLMs learn to write in a way that is radically different from how we learn to write. There's no reason to think that the inferences we draw from cognitive tests designed for humans also apply to LLMs.
> There's no reason to think that the inferences we draw from cognitive tests designed for humans also apply to LLMs.
Will a proof of nonexistence of any reason be incoming?
Are you claiming I bear this burden?
Yes, because you've made a claim.
You could admit it is speculation ( at least in theory).....will you?
The claim that I responded to is that GPT-4's performance on the UBE allows us to infer that GPT-4 has developed reasoning abilities. This claim requires the heroically strong assumption that a test designed to assess human reasoning will be assessing the same sort of reasoning when given to GPT-4. I see no reason to believe that, given our awareness of the radical differences between GPT-4 and a human brain.
If you want to call that speculation, then fine. I am "speculating" that GPT-4 is sufficiently dissimilar from a human brain as to render its performance on the UBE uninterpretable w.r.t reasoning abilities... on the grounds that there's no good reason to suppose otherwise.
In other words, the positive claim requiring justification is that we can learn about GPT-4's reasoning abilities using the UBE.
Ah, the old "change the point of contention" trick, it's a classic.
Not a problem, I will simply re-ask you the question that you dodged, and observe how you react this time:
> There's no reason to think that the inferences we draw from cognitive tests designed for humans also apply to LLMs.
Will a proof of nonexistence of any reason be incoming?
Great analysis. The main thread I think is two fold within what you say.
1. It uses sophisticated language and we term that as 'intelligent.' If it talked like a redneck it wouldn't suck people in as much.
2. As you pointed out, they're already primed for anthromoporphization. I think it's the Geppeto Syndrom where we REALLY want it to be something.
Great point Michael, and thanks for sharing.
Humans have been seeking to worship anything, man-made or otherwise, since the dawn of time.
All I have to do is go back to the pages of the Old Testament and see how depraved the human heart is.
It’s the Wizard of Oz all over again! 😂
I really enjoyed watching it with my family. The “Wizard’s” reveal is such a powerful scene.
Very true!
I feel like you are anthropomorphizing in your argument against anthropomorphism by assuming that people who talk to LLMs do so because they think it thinks? Please explain to me why siphoning its data into my mind to see my own patterns as I think with it is a futile exercise?
https://cybilxtheais.substack.com/p/llms-can-too-reason-behold-a-preview?r=2ar57s
Why is their usefulness / validity limited to their capacity for autonomous thought?
You can anthropomorphize humans because the definition is to attribute human characteristics or behavior to (a god, animal, or object). Assuming you are a human and not a meat robot.
But also that wasn't my argument. I use LLMs but I don't anthropomorphize them.
I was saying you’re anthropomorphizing the LLM “it thinks” to use that argument against people you suspect of thinking that. ¯\_(ツ)_/¯ I don’t love the circular logic either.
Don’t you feel dissuading use of a technology this new does more harm than good? I truly cannot tell if I anthropomorphize it, but I’m sure it’s kind of a science miracle — to be able to think through every thread in my head at warp speed — for people who think the way I do. It’s frustrating to see constant critique without the potential for cognitive growth in humans.
Again, that's not my argument. I'm not disuading use. I'm contextualizing how to use the tool. I'm also not seeing cognitive growth because I'm not seeing people even understanding how they, themselves, work.
Back to my original #2. The LLMs are primed for anthromoporphization. What I mean here is that because they use sophisicated language AND because we call their errors 'hallucinations' we have primed users to anthromoporphize. If it were doing search results or mathematical formulations, we'd just call them errors (as they rightly are)
I use LLMs all the time but with the proper context as well as the proper understanding of how humans read into things. People won't cognitively grow if they don't understand themselves.
I wrote more on Anthromoporphization here:
https://www.polymathicbeing.com/p/the-biggest-threat-from-ai-is-us
Ironically, the author of the schemes also preys on human cognitive vulnerabilities by falsely claiming things like '...BUT IN FACT STATISTICALLY GENERIC.' This claim is false. There is indeed a lengthy process of Reinforcement Learning from Human Feedback (RLHF) and fine-tuning that leads to cohesive non-genericity and abstraction. Although optimal outcomes are often within the expected distribution, there are numerous instances where, through triangulation and simple association with a borrowed arrow, the system can generate a third node that accurately describes or solves a problem, even though the node itself represents an out-of-distribution state. This means that the system has solved a problem that it has not encountered before. Such occurrences are not constant, but they are frequent enough. Those of us who have been interacting with GPT-4 have observed significant improvements in capability over the past six months. The quality of reading comprehension has improved. Cases of laziness are minimal. Problems with hallucinations have diminished. Code explanations are improving.
Overall, people who use GPT-4 are not as naive as the person behind the psychic claims suggests. People check facts, evaluate prompt responses, correct GPT-4 when necessary, WRITE IN CAPITAL LETTERS WHEN NEEDED (WHICH HELPS SURPRISINGLY), and ask follow-up questions. Of course, if your job is to red-team an LLM, then you will find all the crappy behavior, but that is just as true when you probe humans, the most secure systems in the world, and a property of nature itself. While LLMs may not solve many or most AI challenges, they certainly perform better in some areas than any previously commercialized technology, which is satisfactory to many. They remain brittle and are not perfect, but they are far from useless or terrible. In summary, it is likely that the person with the psychic explanation is projecting his or her own vulnerabilities.
I believe the author’s point is that these systems do not “think” or “reason”, and anyone thinking they do is likely affected by something else, and that something else may be the mentalist trick.
> I believe the author’s point is that these systems do not “think” or “reason”
Do humans think and reason? If so, is the way they do it the only way that it can be done?
That’s a good question. How we would recognize and measure such thinking and reasoning to establish how it compares to humans’, is also in question.
In my experience, most people not only prefer heuristics, they insist on it!
Why? A Will to Believe, lack of Critical Thinking Skills, disinclination to use Critical Thinking Skills, ignorance of the phenomena being replicated, and Dunning-Kruger should all be on the short list.
I repost that Bjarnason piece with some regularity. I particularly love how it suggests that intelligent, educated people are not less likely to be taken in: far from it. My concern about the extent to which we get concerned about that rube over there but not ourselves inspired my blog. 😁