There's some bleak humor in the fact that people who have spent basically two decades trying to persuade everyone to overcome their own biases and make their reasoning capabilities more quantitatively rigorous do not see the huge potential for motivated reasoning and confirmation bias in the post-hoc narrative interpretation they're doing about interactions with a chatbot.
Hi Gary, another timely article that counters the hype, thank you for it!
(GPT v) 5 is not better than 4, 6 is not better than 5, when it comes to developing AGI. Adding more and more data, while maintaining the same underlying computations, is not going to lead to a threshold being crossed, beyond which the system will flip over to being intelligent!! That is magical, delusional and flawed thinking. The ladder getting taller won't take us to the moon. The problem, in two words: wrong architecture. "Emergence"(which is claimed by some, to occur in LLMs) is not related to quantity at all - rather, it's on account of certain architectures that permit path-dependent, local interactions among components, which leads to state altering.
IF “Open”AI actually believes they are about to create AGI, something that will transform the entire world, then wouldn’t it be to the advantage of the U.S. government to intervene and gain some control? I highly doubt the U.S. government wants to loose control in a situation like this. And let’s say the gov did get involved and AGI did not become reality, the chances of them getting uninvolved is pretty slim. Maybe instead of “be careful what you wish for” it should be “be careful what hype you spout”
Six month pause or not, it seems inevitable that we will soon be - already are, to a degree that I suspect will seem tiny in retrospect - surrounded by a media fog of images, thoughts, and opinions that may be real and may be just very clever forgeries, with no reliable way to tell the difference.
To say that this will make our lives more complicated feels like a massive understatement. And no complexity beyond what's currently available publicly is required.
I understand that your role is to be a skeptical voice about AI in general and about current directions in AI especially. This is important. Much of the time you do it well and constantly raise important questions as well as pushing the breaks on the tech bros. All good.
I do want to make a kind of suggestion. Something to think about. And I could be wrong, btw. And maybe it is best for you just to go full-out skeptic and let the chips fall where they may. But sometimes, to someone like myself, who does not have a dog in this fight and who is interested in hearing all sides, I think you can do damage to the points you are making by being so extremely dismissive of the possibility that some genuinely interesting stuff is going on here.
For instance, you say about Yudkowsky's tweet that, "If GPT-4 actually understood the instructions, in a repeatable and robust way, I would have to reevaluate a lot of my priors, too." Which is a fair point. Repeatability and robustness are important. But Yudkowsky says explicitly in his tweet that it is not so much the success or failure of the compression that impresses him. He is saying that the very fact that Chatgpt even has any idea of what this task might be, how you would go about doing it, how it would write a sort of secret code for itself, what its 'self' is and how it could potentially talk to that 'self' in the future, that it could 'try' to do such a thing at all is enough to give one pause. I mean, a fair amount of stuff has to be going on behind the scenes, in the transformer or whatever, for such a thing to happen. Even just to make the attempt and to get the basic idea of what such a compression would look like. I would probably also fail at creating a compression that I could then uniquely decode, say, a few years from now (chatgpt has the 'advantage' in this case that without memory you can test it right away).
Personally, I think your objections would be more persuasive if you allowed yourself, like Yudkowsky, to pause for a moment and reflect on what an accomplishment that is and how much weird stuff must be going on in the internal architecture that GPT4 can make a reasonable go at a task like this and even in some cases make a darn good compression. Even the fact that it does better and worse in different attempts and with different users suggests a kind of flexibility and openness that feels, I don't know, uncanny at the very least.
You can acknowledge all of that and still say, wait, let's not go too far here. There is much that is missing in this model. It might still be a ladder to the moon. But denying and, in a sense, ignoring Yudkowsky's actual point makes your point less persuasive not more persuasive. In my opinion. Perhaps I am in the minority and it doesn't matter. But still, something maybe to think about.
I say this with respect to the importance of your skeptical position, not in an attempt to refute it or change your basic stance. I hope that comes through. Thank you for your work and passion.
"AI doesn’t have to be all that smart to cause a lot of harm."
I agree. The problem isn't the AI, it's the people who believe what the AIs generate.
Think of an AI as a virus. What's inside the virus might be potentially lethal to the host. But in order to infect, the host has to have a receptor that lets the virus enter its cells. Once that's done, it can be pretty much game over for the host. If there's no such receptor, there's no risk. For hyperskeptical people, it doesn't matter what the AI says since it won't be believed.
Stupid people tend to be gullible. So do lots of smart people, for that matter, but recall that, by definition, half of the population has below average intellect--that's what "average" means. So you're right: AIs don't have to be very smart to cause a lot of damage since many to most will buy the bill of goods. AOC and Biden worry me a lot less than the fact that people elected them. There are a lot more voters than people in office.
"But GPT’s on their own don’t do scientific discovery. That’s never been their forte. Their forte has been and always will be making shit up; they can’t for the life of them (speaking metaphorically of course) check facts. They are more like late-night bullshitters than high-functioning scientists who would try to validate what they say with data and discover original things. GPT’s regurgitate ideas; they don’t invent them."
Haha. I love it. GPT-x is a bullshit fabricator. Maybe someone needs to invent a GPT-bullshit detector and make a killing?
"Some form of AI may eventually do everything people are imagining, revolutionizing science and technology and so on, but LLMs will be at most only a tiny part of whatever as-yet-uninvented technology does that."
I am much more pessimistic than you about the relevance of LLMs to cracking AGI. It's less than ZERO.
"This bit the other day from Eliezer Yudkowsky is pretty typical:"
Haha. The much vaunted high intelligence of Yudkowsky (aka Mr. Less Wrong) has been greatly exaggerated. Is Eliezer on OpenAI's payroll or is he really that gullible? This may need to be investigated. :-D
Tried the encoded message to other GPT4 - my most creative mid journey prompt generator to date. Lossy semantic compression is not a magic zip algorithm, and these things are non deterministic anyway. It is known. Hard to say what the future holds when the near past has surprised us so much, We are not designing them we grow them and see what emerges, which is far beyond expectation and inexplicable so saying it can’t do X all the time, therefore in future it can’t is really tempting fate. One of you examples up there didn’t follow instructions, please try to make less mistakes unless you are still being trained up.
The intelligent person will see GPT as a tool, not as an independent decision maker in and of itself. It should be used as a resource for assisting with compositions, but no one with an actual working brain would consider taking the output of GPT at face value.
It's a browser platform for inputting knowledge in the form of English syllogisms, for using the knowledge for analytics, and for explaining the answers in English.
* It works with everyday English and jargons
* The vocabulary is open, and so is most of the syntax
* Needs no external grammar or dictionary priming or maintenance
* Supports non-programmer authors
* Avoids ambiguities via context
* When needed, it automatically generates and runs complex networked SQL queries.
The platform is live online, with many examples. You are invited to write and run your own examples too. All you need is a browser pointed to www.executable-english.net. If you are reading this, you already know most of the language!
Gpt4 is supposed to have vastly superior reading comprehension, problem solving, and reliability than gpt3.5. I haven't found that to be the case in my interactions with it, which is why I find myself in agreement with Dr. Marcus's skepticism about gpt5's potential capabilities.
Gpt4 still shows a lot of the same brittleness that gpt3.5 did. Ask it which weighs more, 2 pounds of feathers or 1 pound of steel, it will often (its answers are not deterministic) tell you that they weigh the same.
Even in areas where it seems to have improved it will fail if you slightly tweak how you prompt it. If I ask, "which is faster, a cheetah or a toyota corolla," it correctly answers the corolla. But when I asked it just now, "a cheetah gets into a race with a corolla. Which one is faster" it told me the cheetah, even after stating the cheetahs top speed is 71mph and the corolla top speed is 118mph.
A lot of people seem to think they can guide gpt4 to good answers through special prompting. Not only would this be unpredictable for questions where you don't already know the answer (how would you know what the right prompt would be?), but it doesn't even always work for questions where you do know the answer. One example: people claim you can solve gpt4s inability to count words by asking it: "can you count the number of words and then show me x word." On short sentences this seems to work. But I tried it on a really long sentence and it got it wrong. I tried it a second time and it did something tricky. First, I asked it to tell me the 30th word in the sentence. It got it wrong. I then tried the method: "can you count the words in this sentence and then tell me the 30th word." It told me the same incorrect word, but its count showed it as correct. Turns out it omitted words 24-29 in its count. Tricky bastard.
Counting words in a sentence is a simple task that python can do. That's not really the issue. Gpt4 is brittle and unreliable, even when hooked up to tools (the current proposed solution to its woes). For instance, bing chat, running on gpt4 and connected to a search engine, still hallucinates. I asked it to list the books of my major professor. Two of the books it listed weren't his, and it mischaracterized one of them. It cited my professors University page. All the correct information was there but none of the incorrect information.
Even as a UI, its reliability is in question, because it frequently misinteprets text prompts. I asked gpt4, "who would win in a race, a cheetah or a man driving a prius?" It misinterpreted the question and answered that the prius would come in first, the cheetah in second, and the man in third. I asked, "a cheetah races a prius in a 400 meter dash. Which wins?" Gpt4 answered that it depends on whether the race is more or less than 300 meters, and without more information, couldn't answer the question. I presented the following scenario, "Mike has 12 Apples. Sally has 3 cakes. John has 9 pies. How much must each person give to each other person to equally distribute the apples, cakes, and pies?" It misinterpreted the question as asking how can I ensure there are an equal number of apples, cakes, and pies.
GPT-5 and irrational exuberance
"GPT’s regurgitate ideas; they don’t invent them"... That is all you need to know about current and future versions of these models.
Oh, and that incentives remain aligned for keeping the overhype of these technologies, so billions of dollars keep going into the field.
There's some bleak humor in the fact that people who have spent basically two decades trying to persuade everyone to overcome their own biases and make their reasoning capabilities more quantitatively rigorous do not see the huge potential for motivated reasoning and confirmation bias in the post-hoc narrative interpretation they're doing about interactions with a chatbot.
Hi Gary, another timely article that counters the hype, thank you for it!
(GPT v) 5 is not better than 4, 6 is not better than 5, when it comes to developing AGI. Adding more and more data, while maintaining the same underlying computations, is not going to lead to a threshold being crossed, beyond which the system will flip over to being intelligent!! That is magical, delusional and flawed thinking. The ladder getting taller won't take us to the moon. The problem, in two words: wrong architecture. "Emergence"(which is claimed by some, to occur in LLMs) is not related to quantity at all - rather, it's on account of certain architectures that permit path-dependent, local interactions among components, which leads to state altering.
IF “Open”AI actually believes they are about to create AGI, something that will transform the entire world, then wouldn’t it be to the advantage of the U.S. government to intervene and gain some control? I highly doubt the U.S. government wants to loose control in a situation like this. And let’s say the gov did get involved and AGI did not become reality, the chances of them getting uninvolved is pretty slim. Maybe instead of “be careful what you wish for” it should be “be careful what hype you spout”
Six month pause or not, it seems inevitable that we will soon be - already are, to a degree that I suspect will seem tiny in retrospect - surrounded by a media fog of images, thoughts, and opinions that may be real and may be just very clever forgeries, with no reliable way to tell the difference.
To say that this will make our lives more complicated feels like a massive understatement. And no complexity beyond what's currently available publicly is required.
I'd be very interested in seeing a public debate (either in a podcast or in article form) between you and Eliezer.
I understand that your role is to be a skeptical voice about AI in general and about current directions in AI especially. This is important. Much of the time you do it well and constantly raise important questions as well as pushing the breaks on the tech bros. All good.
I do want to make a kind of suggestion. Something to think about. And I could be wrong, btw. And maybe it is best for you just to go full-out skeptic and let the chips fall where they may. But sometimes, to someone like myself, who does not have a dog in this fight and who is interested in hearing all sides, I think you can do damage to the points you are making by being so extremely dismissive of the possibility that some genuinely interesting stuff is going on here.
For instance, you say about Yudkowsky's tweet that, "If GPT-4 actually understood the instructions, in a repeatable and robust way, I would have to reevaluate a lot of my priors, too." Which is a fair point. Repeatability and robustness are important. But Yudkowsky says explicitly in his tweet that it is not so much the success or failure of the compression that impresses him. He is saying that the very fact that Chatgpt even has any idea of what this task might be, how you would go about doing it, how it would write a sort of secret code for itself, what its 'self' is and how it could potentially talk to that 'self' in the future, that it could 'try' to do such a thing at all is enough to give one pause. I mean, a fair amount of stuff has to be going on behind the scenes, in the transformer or whatever, for such a thing to happen. Even just to make the attempt and to get the basic idea of what such a compression would look like. I would probably also fail at creating a compression that I could then uniquely decode, say, a few years from now (chatgpt has the 'advantage' in this case that without memory you can test it right away).
Personally, I think your objections would be more persuasive if you allowed yourself, like Yudkowsky, to pause for a moment and reflect on what an accomplishment that is and how much weird stuff must be going on in the internal architecture that GPT4 can make a reasonable go at a task like this and even in some cases make a darn good compression. Even the fact that it does better and worse in different attempts and with different users suggests a kind of flexibility and openness that feels, I don't know, uncanny at the very least.
You can acknowledge all of that and still say, wait, let's not go too far here. There is much that is missing in this model. It might still be a ladder to the moon. But denying and, in a sense, ignoring Yudkowsky's actual point makes your point less persuasive not more persuasive. In my opinion. Perhaps I am in the minority and it doesn't matter. But still, something maybe to think about.
I say this with respect to the importance of your skeptical position, not in an attempt to refute it or change your basic stance. I hope that comes through. Thank you for your work and passion.
GPT will never get a man to the moon, but it can lie as good as Buzz Aldrin.
First sentence in the article:
"AI doesn’t have to be all that smart to cause a lot of harm."
I agree. The problem isn't the AI, it's the people who believe what the AIs generate.
Think of an AI as a virus. What's inside the virus might be potentially lethal to the host. But in order to infect, the host has to have a receptor that lets the virus enter its cells. Once that's done, it can be pretty much game over for the host. If there's no such receptor, there's no risk. For hyperskeptical people, it doesn't matter what the AI says since it won't be believed.
Stupid people tend to be gullible. So do lots of smart people, for that matter, but recall that, by definition, half of the population has below average intellect--that's what "average" means. So you're right: AIs don't have to be very smart to cause a lot of damage since many to most will buy the bill of goods. AOC and Biden worry me a lot less than the fact that people elected them. There are a lot more voters than people in office.
Great informative article, as usual. Thanks.
"But GPT’s on their own don’t do scientific discovery. That’s never been their forte. Their forte has been and always will be making shit up; they can’t for the life of them (speaking metaphorically of course) check facts. They are more like late-night bullshitters than high-functioning scientists who would try to validate what they say with data and discover original things. GPT’s regurgitate ideas; they don’t invent them."
Haha. I love it. GPT-x is a bullshit fabricator. Maybe someone needs to invent a GPT-bullshit detector and make a killing?
"Some form of AI may eventually do everything people are imagining, revolutionizing science and technology and so on, but LLMs will be at most only a tiny part of whatever as-yet-uninvented technology does that."
I am much more pessimistic than you about the relevance of LLMs to cracking AGI. It's less than ZERO.
"This bit the other day from Eliezer Yudkowsky is pretty typical:"
Haha. The much vaunted high intelligence of Yudkowsky (aka Mr. Less Wrong) has been greatly exaggerated. Is Eliezer on OpenAI's payroll or is he really that gullible? This may need to be investigated. :-D
Tried the encoded message to other GPT4 - my most creative mid journey prompt generator to date. Lossy semantic compression is not a magic zip algorithm, and these things are non deterministic anyway. It is known. Hard to say what the future holds when the near past has surprised us so much, We are not designing them we grow them and see what emerges, which is far beyond expectation and inexplicable so saying it can’t do X all the time, therefore in future it can’t is really tempting fate. One of you examples up there didn’t follow instructions, please try to make less mistakes unless you are still being trained up.
The intelligent person will see GPT as a tool, not as an independent decision maker in and of itself. It should be used as a resource for assisting with compositions, but no one with an actual working brain would consider taking the output of GPT at face value.
AI with English explanations but without GPT "hallucinations"
--------------------------------------------------------------------
You may like to get to know Executable English.
It's a browser platform for inputting knowledge in the form of English syllogisms, for using the knowledge for analytics, and for explaining the answers in English.
* It works with everyday English and jargons
* The vocabulary is open, and so is most of the syntax
* Needs no external grammar or dictionary priming or maintenance
* Supports non-programmer authors
* Avoids ambiguities via context
* When needed, it automatically generates and runs complex networked SQL queries.
The platform is live online, with many examples. You are invited to write and run your own examples too. All you need is a browser pointed to www.executable-english.net. If you are reading this, you already know most of the language!
Thanks for comments, -- Adrian
Adrian Walker (Formerly at IBM Yorktown)
Executable English
San Jose, CA, USA
USA 860-830-2085 (California time)
www.executable-english.net
Gpt4 is supposed to have vastly superior reading comprehension, problem solving, and reliability than gpt3.5. I haven't found that to be the case in my interactions with it, which is why I find myself in agreement with Dr. Marcus's skepticism about gpt5's potential capabilities.
Gpt4 still shows a lot of the same brittleness that gpt3.5 did. Ask it which weighs more, 2 pounds of feathers or 1 pound of steel, it will often (its answers are not deterministic) tell you that they weigh the same.
Even in areas where it seems to have improved it will fail if you slightly tweak how you prompt it. If I ask, "which is faster, a cheetah or a toyota corolla," it correctly answers the corolla. But when I asked it just now, "a cheetah gets into a race with a corolla. Which one is faster" it told me the cheetah, even after stating the cheetahs top speed is 71mph and the corolla top speed is 118mph.
A lot of people seem to think they can guide gpt4 to good answers through special prompting. Not only would this be unpredictable for questions where you don't already know the answer (how would you know what the right prompt would be?), but it doesn't even always work for questions where you do know the answer. One example: people claim you can solve gpt4s inability to count words by asking it: "can you count the number of words and then show me x word." On short sentences this seems to work. But I tried it on a really long sentence and it got it wrong. I tried it a second time and it did something tricky. First, I asked it to tell me the 30th word in the sentence. It got it wrong. I then tried the method: "can you count the words in this sentence and then tell me the 30th word." It told me the same incorrect word, but its count showed it as correct. Turns out it omitted words 24-29 in its count. Tricky bastard.
Counting words in a sentence is a simple task that python can do. That's not really the issue. Gpt4 is brittle and unreliable, even when hooked up to tools (the current proposed solution to its woes). For instance, bing chat, running on gpt4 and connected to a search engine, still hallucinates. I asked it to list the books of my major professor. Two of the books it listed weren't his, and it mischaracterized one of them. It cited my professors University page. All the correct information was there but none of the incorrect information.
Even as a UI, its reliability is in question, because it frequently misinteprets text prompts. I asked gpt4, "who would win in a race, a cheetah or a man driving a prius?" It misinterpreted the question and answered that the prius would come in first, the cheetah in second, and the man in third. I asked, "a cheetah races a prius in a 400 meter dash. Which wins?" Gpt4 answered that it depends on whether the race is more or less than 300 meters, and without more information, couldn't answer the question. I presented the following scenario, "Mike has 12 Apples. Sally has 3 cakes. John has 9 pies. How much must each person give to each other person to equally distribute the apples, cakes, and pies?" It misinterpreted the question as asking how can I ensure there are an equal number of apples, cakes, and pies.
For those unable to grasp the dangers of allowing computer systems to spew Word Salad at the speed of electrons .........
How's that hunt for the Yellow Cake uranium Saddam bought from Nigeria going?
How many mobile bio-warfare research and manufacturing laboratories were finally found?
legend!