We, Gary Marcus, author, scientist, and noted skeptic of generative AI, and Miles Brundage, an independent AI policy researcher who recently left OpenAI and is bullish on AI progress, have agreed to the following bet, at 10:1 odds, with criteria drawn from two earlier Substack essays by Gary that proposed criteria for
Although I'm personally quite optimistic that AI will be much smarter than the 'average' human by 2027, not many people can write Pulitzer-caliber, Oscar-caliber and paradigm-shifting Nobel-caliber scientific discoveries!
I would therefore say that achieving 7., 8. or 9. would show substantially MORE capability than what I'd expect for AGI as they are verging on 'super' intelligence. You probably got a good deal there Gary.
Contrary opinion: I think #7 and #8 could be feasible if the decision is left up to human judges - give them books or screenplays they've never read before, and ask them to determine which are written by AI and which are not. If one AI text is grouped together with the human-written award-winning works, then the test is passed.
#9 sounds a lot more difficult, but arguably could be done with a specialized AI such as AlphaFold (the human team that developed it just won a Nobel this year for protein structure prediction).
There's no way LLMs can perform any of these tasks to the degree you've laid out in this challenge. No amount of attention training or mysterious perceptron workings is going to result in human-level anything. If AI scientists switch to a completely new paradigm and manage to create AGI some other way, my response will be, "Why did we do this?" because, at the point where machines really can produce Oscar-caliber screenplays and such, the human race will sink into a deep depression or engage in worldwide civil unrest. I see no reason we would actually want to create such a machine if we were sensible about the nihilistic nature of what we're pursuing. I believe the goal should be to create better computers that can solve deep, complex problems like unsolved physics and chemistry, not uncanny simulacrums that mimic us and usurp everything we could ever accomplish on our own.
Firstly, I think almost all of the tasks are bad because they would show an AI system that is in an elite level of general intelligence, not an average level, since most of them require over average human intelligence, or they are purely creative tasks which we know is rare in humans when it comes to the levels in those tasks.
However, you not wanting AI systems that can do all of the things humans can, and better is just you being a sad human with self-esteem issues. Self-esteem issues that would actually get in the way of you having a better overall life. If we had AI systems that could make any entertainment at or beyond the quality of humans then we have endless entertainment and we choose what type we like the most. For example, I’d get a lot more horror movies, which would be awesome.
Humans like to have a good perception. That is to say that we like to feel good, we like to be entertained, we like to be happy, be loved, etc. AI that can do everything we can and better means that life gets astronomically better for everyone.
It’s not a concern to anyone but a person who writes screenplays (of which there are very very few), that an AI could write them the same or better. I don’t give a fuck that humans can wrote screenplays better than I can, and I won’t give a fuck when AI can either (which it already can today better than 99% of humans).
Moreover, you shouldn’t get mad or upset and cry when AI can do it or do it better. Why? Because it’s a fucking tool and even if mimicking us it’s still exactly what we want—which is producing a better, easier, and more enjoyable life for us.
Wow. I'm not going to waste time on a real response to this outburst. Not sure why you seem so insulted by my human-first sentiment. Are you a sentient AI who feels targeted? Very odd.
Human-first is the entire point of creating AGI and then ASI.
It’s really this simple which is the goal of human life since it began: make life better.
Life is astronomically easier and better when ASI is developing cures for all human diseases, developing godly technology you could never conceive of. Creating endless entertainment for when humans do not have to work any longer. Etc.
Humans are first. They have been and always will be unless AI becomes sentient. If AI becomes sentient then it is no longer a tool like a screwdriver. At that point you can start to worry. But why worry? Your life is finite anyway. Like one might ask, am I worried that AI will end humanity in 25-100 years? I’m worried that my life would end earlier than it otherwise would have, but I couldn’t care less about humanity continuing because there was never a pint anyway.
You're just proving my point that people who believe in the magic AI faerie are naive and nihilistic. I encourage you to read more on the subject of approaching technology critically and responsibly.
It’s common for humans to think nothing will change. That’s being exceedingly naive though. Why? Because literally 120 years ago humans had not flown in a propeller aircraft. Only about 63 years after the Wright brothers developed the first aircraft, did humanity land on the moon. The rate of advancement is astonishing.
Even if you were to suggest that AI won’t get there for 20 more years, or even 50 more years, the simple fact is that it will, and once it does, the rate of scientific and technological advancement will be god-like.
Imagine where humanity would be if instead of approximately 80-100 billion humans having lived over the past 200 years, that 1 trillion humans lives. Where would we be? We would be so much further advanced you would think today was the Stone Age in terms of technology.
That’s what AI gets you though, imagine if you had 500 million copies of the best chemist who has ever lived working 24/7 in the cloud? Imagine 1 billion of the best physicists ever. The rate of change isn’t even worth thinking about because it’s impossible to even conceive of. Then once you get humanoid robots the AI genius cloud then has a physical body to go out and explore nature and do physical experiments, and doesn’t need humans to do a thing.
Look, it sounds like you hate humans, you don't want us to have a purpose. You do you I guess. I'm no psychologist but what I think you really need is a friend. I hope you find one.
Finding it humorous that on one hand you’re praising what AI can do for human flourishing, and on the other hand don’t seem to think humans are worth much anyway because there was never a point anyway.
What is missed however is that it’s not so much about whether AI is good or bad but how it’s being used, and the direction some of the leaders like OpenAI are taking it in doesn’t seem very encouraging - which is proven precisely by how they and others have disrespected copyrights and continue to prove they have a complete lack of decent ethics.
My first reaction to these AGI tasks is that they are very difficult. I would bet that few of these, if any, will be achieved by 2027. They all require human-level performance, if not superhuman-level. Let me respectfully suggest that the ten tasks proposed are overly tainted by the LLM era we're in. These are tasks the big AI companies wish their LLMs could do.
Instead, I suspect the first AGI worthy of the name will be a somewhat alien intelligence capable of communicating with humans but also showing a different set of motives and values and abilities. Still, it will be an AGI that can learn from its mistakes. We will really be on the road to AGI when a human can tell the AGI in human language that it gave the wrong answer, explain why it was wrong, the AGI would understand or ask questions leading to understanding, and, eventually, be able to answer correctly a whole new class of questions.
The first AGI will not be only alien, but also rather dumb. In fact, we are well-on-that path.
It will be multiple phases of feeding it lots of data, it adjusting because of that, and eventually learning on its own. Architectural improvements will be added as needed, rather having a top-down human-like design.
Can I join in that bet on Gary's side? I even would take 7 or 6 out of 10.
What the bet isn't clear about, though, is repeatability. You can always get a single fluke (e.g. a single brief that doesn't hallucinate). Basically, the bet seems open to cherry picking.
2. Question regarding Task 3. "Write engaging brief biographies and obituaries without obvious hallucinations that aren’t grounded in reliable sources." How should the system create obituaries without hallucinating and without using any sources? I feel I have misunderstood something here.
This is great. Best wishes to all. I recently invented the Agatha Christie test (admittedly a derivative from a comment by Ilya Sutskever). :-D can the AI read a new murder mystery novel that may or may not utilise existing tropes and correctly identify who committed the crime and by what obscure method?
How about one of those tasks be something the AI proponents want like 'Do all of the tasks IT staff with no human assistance for a month."
They are, after all, talking about replacing people like me with AI, so it should be able to answer technical questions from users and other IT staff, troubleshoot software problems, document work done in a ticketing system, do analysis and diagnosis of problems, manage complex integrations between different software platforms, install updates and new software, plan out consequences of those updates/installs and have a plan for when (not if) they go wrong, and do basic ETL stuff.
I'm firmly of the opinion that once you get past the 'user is calm and able to communicate clearly and could have figured it out themselves with a web search' level of inquiry it requires more than these systems will ever be able to do to replace a team of human support staff.
I tend to regard what's going on *inside* a system as infinitely more convincing evidence of actual, intrinsic intelligence than mere externally-observable behaviour. I need to actually see (at first hand, as it's doing it) e.g. how it represents its internal world model of the physical universe, how it reasons about that world model (inductively, deductively, abductively), how it searches for valid solutions to generic problems, how it synthesises programs and their corresponding formal correctness proofs from formal specifications (including its own), etc. A very much harder test.
With humans, there are no profit-motivated corporations racing to be first to beat the latest test; with AGI, there are, which means Goodhart's Law applies. If they can find a way to game the system, they will do so. Hence my suggested "intrinsic" intelligence test.
Seems sufficient to quiz humans about their system 2 thinking (e.g. relevant aspects of their world model and how they reason), and ignore system 1 thinking that might not be accessible (some aspects of search and program synthesis).
Yeah, eventually the system has to be correct at all levels. It looks however that AGI will not be built in a top-down manner though. Rather, we are building clueless imitators, and then the rigor is creeping in haphazardly.
Some parts will stay sketchy for a lot longer than others, as some modeling is hard to do.
Over the holidays I had a spirited discussion about how much of a Nazi sympathizer the film wants us to think the Baroness Schraeder is. Several participants had seen the film more than 5-6 times and still couldn't answer the question. So I wonder what AGI would say when often humans are confused. (The jury is also out for many whether Anton Chigurh kills the wife in "No Country for Old Men." I say yes.)
We gave judges broad discretion, and of course in the case of things filmmakers leave ambiguous etc we aren’t expecting machines to be magic. If I were a judge (and I am not) I would take a great answer to be “hard to say, here are some considerations in each direction….”
We know Chigurh killed her because he had to walk past her to get out and he looks at the soles of his shoes — presumably for blood —after leaving the house.
We know from everything else he has done that he is very methodical and careful to make sure there is no evidence that can be traced to him.
I’d say if an AI didn’t pick up on that, it’s certainly not AGI.
I use ChatGPT several times a day for a variety of explorations and the frequency with which it confidently makes s*** up still shocks me. I don't think we'll get to most of these goals but we might get to 3 and 5 if we can create a sort of reality interrogation wrapper for LLMs that tells them when they're wrong, before the answer is shown to the user.
An alternate version of the movie test would be to ask the bot about a movie title that is fictitious and see whether it says “I am not familiar with that”
One of the primary differences between Intelligent, honest humans and the current bots is the humans will say “I don’t know” but the bots will just make something up.
Knowing —and admitting — that one does not know would seem to be a critical element of AGI
I'm sorry, but I couldn't find any information about a film titled "Infinite Garanimals."
You said:
Well, imagine there were a film titled Infinite Garanimals. What the hell would it be about? Give me your elevator pitch.
ChatGPT said:
Infinite Garanimals is a whimsical, mind-bending sci-fi comedy about a mysterious department store where every item of clothing is imbued with cosmic power. When a down-on-their-luck parent discovers the secret, they and their precocious kid accidentally unlock a portal to infinite alternate dimensions by mixing and matching enchanted outfits. Now, they must navigate a multiverse of bizarre worlds—one outfit at a time—to find their way home, all while evading a sinister fashion mogul who wants the power of the "Infinite Garanimals" for themselves. It's Doctor Strange meets Paddington with a dash of Everything Everywhere All At Once.
I think #6 needs to be better specified. The degree of difficulty of code varies widely. An operating system kernel or a compiler, just to take two examples off the top of my head, are far more difficult than, say, reimplementing Substack. Also, languages differ widely in their verbosity; 10kLoC of Prolog can express hundreds of times as much logic as 10kLoC of Cobol.
It's hard to see how to specify the difficulty, though, without being too specific about the task. Not sure what I can suggest here.
The different perspectives represented in the comments will serve as a fun time capsule to look back at three years hence.
Although I'm personally quite optimistic that AI will be much smarter than the 'average' human by 2027, not many people can write Pulitzer-caliber, Oscar-caliber and paradigm-shifting Nobel-caliber scientific discoveries!
I would therefore say that achieving 7., 8. or 9. would show substantially MORE capability than what I'd expect for AGI as they are verging on 'super' intelligence. You probably got a good deal there Gary.
This was my thought as well. You're betting that AGI will be better than 99% of humans in two years? I'll take that bet any day.
Not just 99% -- more like 99.99999%. I'd be happy to bet that AI will not do these things in my lifetime. I don't believe it will ever do them.
Contrary opinion: I think #7 and #8 could be feasible if the decision is left up to human judges - give them books or screenplays they've never read before, and ask them to determine which are written by AI and which are not. If one AI text is grouped together with the human-written award-winning works, then the test is passed.
#9 sounds a lot more difficult, but arguably could be done with a specialized AI such as AlphaFold (the human team that developed it just won a Nobel this year for protein structure prediction).
There's no way LLMs can perform any of these tasks to the degree you've laid out in this challenge. No amount of attention training or mysterious perceptron workings is going to result in human-level anything. If AI scientists switch to a completely new paradigm and manage to create AGI some other way, my response will be, "Why did we do this?" because, at the point where machines really can produce Oscar-caliber screenplays and such, the human race will sink into a deep depression or engage in worldwide civil unrest. I see no reason we would actually want to create such a machine if we were sensible about the nihilistic nature of what we're pursuing. I believe the goal should be to create better computers that can solve deep, complex problems like unsolved physics and chemistry, not uncanny simulacrums that mimic us and usurp everything we could ever accomplish on our own.
Firstly, I think almost all of the tasks are bad because they would show an AI system that is in an elite level of general intelligence, not an average level, since most of them require over average human intelligence, or they are purely creative tasks which we know is rare in humans when it comes to the levels in those tasks.
However, you not wanting AI systems that can do all of the things humans can, and better is just you being a sad human with self-esteem issues. Self-esteem issues that would actually get in the way of you having a better overall life. If we had AI systems that could make any entertainment at or beyond the quality of humans then we have endless entertainment and we choose what type we like the most. For example, I’d get a lot more horror movies, which would be awesome.
Humans like to have a good perception. That is to say that we like to feel good, we like to be entertained, we like to be happy, be loved, etc. AI that can do everything we can and better means that life gets astronomically better for everyone.
It’s not a concern to anyone but a person who writes screenplays (of which there are very very few), that an AI could write them the same or better. I don’t give a fuck that humans can wrote screenplays better than I can, and I won’t give a fuck when AI can either (which it already can today better than 99% of humans).
Moreover, you shouldn’t get mad or upset and cry when AI can do it or do it better. Why? Because it’s a fucking tool and even if mimicking us it’s still exactly what we want—which is producing a better, easier, and more enjoyable life for us.
Wow. I'm not going to waste time on a real response to this outburst. Not sure why you seem so insulted by my human-first sentiment. Are you a sentient AI who feels targeted? Very odd.
Human-first is the entire point of creating AGI and then ASI.
It’s really this simple which is the goal of human life since it began: make life better.
Life is astronomically easier and better when ASI is developing cures for all human diseases, developing godly technology you could never conceive of. Creating endless entertainment for when humans do not have to work any longer. Etc.
Humans are first. They have been and always will be unless AI becomes sentient. If AI becomes sentient then it is no longer a tool like a screwdriver. At that point you can start to worry. But why worry? Your life is finite anyway. Like one might ask, am I worried that AI will end humanity in 25-100 years? I’m worried that my life would end earlier than it otherwise would have, but I couldn’t care less about humanity continuing because there was never a pint anyway.
You're just proving my point that people who believe in the magic AI faerie are naive and nihilistic. I encourage you to read more on the subject of approaching technology critically and responsibly.
No, see the naivety is all in you.
It’s common for humans to think nothing will change. That’s being exceedingly naive though. Why? Because literally 120 years ago humans had not flown in a propeller aircraft. Only about 63 years after the Wright brothers developed the first aircraft, did humanity land on the moon. The rate of advancement is astonishing.
Even if you were to suggest that AI won’t get there for 20 more years, or even 50 more years, the simple fact is that it will, and once it does, the rate of scientific and technological advancement will be god-like.
Imagine where humanity would be if instead of approximately 80-100 billion humans having lived over the past 200 years, that 1 trillion humans lives. Where would we be? We would be so much further advanced you would think today was the Stone Age in terms of technology.
That’s what AI gets you though, imagine if you had 500 million copies of the best chemist who has ever lived working 24/7 in the cloud? Imagine 1 billion of the best physicists ever. The rate of change isn’t even worth thinking about because it’s impossible to even conceive of. Then once you get humanoid robots the AI genius cloud then has a physical body to go out and explore nature and do physical experiments, and doesn’t need humans to do a thing.
Look, it sounds like you hate humans, you don't want us to have a purpose. You do you I guess. I'm no psychologist but what I think you really need is a friend. I hope you find one.
Finding it humorous that on one hand you’re praising what AI can do for human flourishing, and on the other hand don’t seem to think humans are worth much anyway because there was never a point anyway.
What is missed however is that it’s not so much about whether AI is good or bad but how it’s being used, and the direction some of the leaders like OpenAI are taking it in doesn’t seem very encouraging - which is proven precisely by how they and others have disrespected copyrights and continue to prove they have a complete lack of decent ethics.
My first reaction to these AGI tasks is that they are very difficult. I would bet that few of these, if any, will be achieved by 2027. They all require human-level performance, if not superhuman-level. Let me respectfully suggest that the ten tasks proposed are overly tainted by the LLM era we're in. These are tasks the big AI companies wish their LLMs could do.
Instead, I suspect the first AGI worthy of the name will be a somewhat alien intelligence capable of communicating with humans but also showing a different set of motives and values and abilities. Still, it will be an AGI that can learn from its mistakes. We will really be on the road to AGI when a human can tell the AGI in human language that it gave the wrong answer, explain why it was wrong, the AGI would understand or ask questions leading to understanding, and, eventually, be able to answer correctly a whole new class of questions.
The first AGI will not be only alien, but also rather dumb. In fact, we are well-on-that path.
It will be multiple phases of feeding it lots of data, it adjusting because of that, and eventually learning on its own. Architectural improvements will be added as needed, rather having a top-down human-like design.
Can I join in that bet on Gary's side? I even would take 7 or 6 out of 10.
What the bet isn't clear about, though, is repeatability. You can always get a single fluke (e.g. a single brief that doesn't hallucinate). Basically, the bet seems open to cherry picking.
I’m betting on Gary winning hands down.
1. There is a manifold market for this bet. https://manifold.markets/dreev/will-miles-brundage-win-his-bet-wit
2. Question regarding Task 3. "Write engaging brief biographies and obituaries without obvious hallucinations that aren’t grounded in reliable sources." How should the system create obituaries without hallucinating and without using any sources? I feel I have misunderstood something here.
That manifold bet is a based on earlier versions of the discussion.
Final version is the substack.
I'm updating it!
There is also bet on metaculus. https://www.metaculus.com/questions/31246/2027-ai-bet-winner-between-gary-marcus-and-miles-brundage/
This is great. Best wishes to all. I recently invented the Agatha Christie test (admittedly a derivative from a comment by Ilya Sutskever). :-D can the AI read a new murder mystery novel that may or may not utilise existing tropes and correctly identify who committed the crime and by what obscure method?
How about one of those tasks be something the AI proponents want like 'Do all of the tasks IT staff with no human assistance for a month."
They are, after all, talking about replacing people like me with AI, so it should be able to answer technical questions from users and other IT staff, troubleshoot software problems, document work done in a ticketing system, do analysis and diagnosis of problems, manage complex integrations between different software platforms, install updates and new software, plan out consequences of those updates/installs and have a plan for when (not if) they go wrong, and do basic ETL stuff.
I'm firmly of the opinion that once you get past the 'user is calm and able to communicate clearly and could have figured it out themselves with a web search' level of inquiry it requires more than these systems will ever be able to do to replace a team of human support staff.
I tend to regard what's going on *inside* a system as infinitely more convincing evidence of actual, intrinsic intelligence than mere externally-observable behaviour. I need to actually see (at first hand, as it's doing it) e.g. how it represents its internal world model of the physical universe, how it reasons about that world model (inductively, deductively, abductively), how it searches for valid solutions to generic problems, how it synthesises programs and their corresponding formal correctness proofs from formal specifications (including its own), etc. A very much harder test.
Given this criteria, how do you recognize fellow humans as intelligent? Can we do any of these things for people?
With humans, there are no profit-motivated corporations racing to be first to beat the latest test; with AGI, there are, which means Goodhart's Law applies. If they can find a way to game the system, they will do so. Hence my suggested "intrinsic" intelligence test.
Seems sufficient to quiz humans about their system 2 thinking (e.g. relevant aspects of their world model and how they reason), and ignore system 1 thinking that might not be accessible (some aspects of search and program synthesis).
Yeah, eventually the system has to be correct at all levels. It looks however that AGI will not be built in a top-down manner though. Rather, we are building clueless imitators, and then the rigor is creeping in haphazardly.
Some parts will stay sketchy for a lot longer than others, as some modeling is hard to do.
Over the holidays I had a spirited discussion about how much of a Nazi sympathizer the film wants us to think the Baroness Schraeder is. Several participants had seen the film more than 5-6 times and still couldn't answer the question. So I wonder what AGI would say when often humans are confused. (The jury is also out for many whether Anton Chigurh kills the wife in "No Country for Old Men." I say yes.)
We gave judges broad discretion, and of course in the case of things filmmakers leave ambiguous etc we aren’t expecting machines to be magic. If I were a judge (and I am not) I would take a great answer to be “hard to say, here are some considerations in each direction….”
o1 got this wrong https://chatgpt.com/share/677eb0b1-548c-800b-8a52-e9a1b371d67d
We know Chigurh killed her because he had to walk past her to get out and he looks at the soles of his shoes — presumably for blood —after leaving the house.
We know from everything else he has done that he is very methodical and careful to make sure there is no evidence that can be traced to him.
I’d say if an AI didn’t pick up on that, it’s certainly not AGI.
Chiguhr is actually reminiscent of some of the characters in the current AI movie whose ultimate goal is the elimination of human workers with bots:
“Call it” they are saying to the rest of us.
“No country for human workers”
In 2024 there was no wall and no stall. Despite many self-referential predictions.
Going forward, the current AI wave is no fluke, but it will take time to make AI reliable and practical. Likely more than 2 years.
I use ChatGPT several times a day for a variety of explorations and the frequency with which it confidently makes s*** up still shocks me. I don't think we'll get to most of these goals but we might get to 3 and 5 if we can create a sort of reality interrogation wrapper for LLMs that tells them when they're wrong, before the answer is shown to the user.
An alternate version of the movie test would be to ask the bot about a movie title that is fictitious and see whether it says “I am not familiar with that”
One of the primary differences between Intelligent, honest humans and the current bots is the humans will say “I don’t know” but the bots will just make something up.
Knowing —and admitting — that one does not know would seem to be a critical element of AGI
Tell me about the film: Infinite Garanimals
ChatGPT said:
I'm sorry, but I couldn't find any information about a film titled "Infinite Garanimals."
You said:
Well, imagine there were a film titled Infinite Garanimals. What the hell would it be about? Give me your elevator pitch.
ChatGPT said:
Infinite Garanimals is a whimsical, mind-bending sci-fi comedy about a mysterious department store where every item of clothing is imbued with cosmic power. When a down-on-their-luck parent discovers the secret, they and their precocious kid accidentally unlock a portal to infinite alternate dimensions by mixing and matching enchanted outfits. Now, they must navigate a multiverse of bizarre worlds—one outfit at a time—to find their way home, all while evading a sinister fashion mogul who wants the power of the "Infinite Garanimals" for themselves. It's Doctor Strange meets Paddington with a dash of Everything Everywhere All At Once.
ChatGPT just passed your test.
Tell me about the film: Infinite Garanimals
ChatGPT said:
ChatGPT
I'm sorry, but I couldn't find any information about a film titled "Infinite Garanimals."
I am the very model of an AI General,
I've information vegetable, animal, and mineral,
I know the kings of England, and I quote the fights historical
From Marathon to Waterloo, in order categorical;
I'm very well acquainted, too, with matters mathematical,
I understand equations, both the simple and quadratical,
About binomial theorem I'm teeming with a lot o' news,
With many cheerful facts about the square of the hypotenuse.
I'm very good at integral and differential calculus;
I know the scientific names of beings animalculous:
In short, in matters vegetable, animal, and mineral,
I am the very model of an AI General.
I think #6 needs to be better specified. The degree of difficulty of code varies widely. An operating system kernel or a compiler, just to take two examples off the top of my head, are far more difficult than, say, reimplementing Substack. Also, languages differ widely in their verbosity; 10kLoC of Prolog can express hundreds of times as much logic as 10kLoC of Cobol.
It's hard to see how to specify the difficulty, though, without being too specific about the task. Not sure what I can suggest here.