We, Gary Marcus, author, scientist, and noted skeptic of generative AI, and Miles Brundage, an independent AI policy researcher who recently left OpenAI and is bullish on AI progress, have agreed to the following bet, at 10:1 odds, with criteria drawn from two earlier Substack essays by Gary that proposed criteria for AGI and intelligence on par with leading humans:
If there exist AI systems that can perform 8 of the 10 tasks below by the end of 2027, as determined by our panel of judges, Gary will donate $2,000 to a charity of Miles’ choice; if AI can do fewer than 8, Miles will donate $20,000 to a charity of Gary’s choice.1 2 (Update: These odds reflect Miles’ strong confidence, and the bet is a trackable test of his previously stated views, but do not indicate a lack of confidence on Gary’s part; readers are advised not to infer Gary’s beliefs from the odds.)
Both of us agree that the achievement of 8 out of 10 of these tasks would be strong evidence for the G (General) in Artificial General Intelligence having been achieved. For the purpose of the bet, the tasks should not be performed by 10 completely separate systems, but there can be some degree tailoring to the task for the same reason that we don’t expect humans to immediately write Oscar-caliber screenplays without having ever seen a movie or written a screenplay before – intelligence includes the ability to adapt, but the costs must be reasonable such that it is not a completely bespoke system designed for a single task.
The bet will be decided by a panel of judges (determined by mutual agreement) with broad discretion. For example, judges should feel free to consult whomever they wish, and have the freedom to take the time (perhaps months) that they need to make their decisions. Our intent is to capture whether the tasks are in question are technically feasible by the end of 2027 with modest costs on a per-task basis (say, 1 million dollars of compute + 10 person-weeks of effort) by the end of 2027, and not to exclude success on a task simply because no system happens to have addressed a given task at that time (e..g, because a given task lacks commercial potential). The judges should also feel free to make an adjudication as how to general any AI system(s) that succeed at the tasks might be.
The landmarks for this bet are drawn from two sets of examples Marcus gave in earlier proposed bets around AGI in 2023 and 2024 but to avoid making this bet hinge on unpredictable advances in hardware as well as on areas that Miles has paid less attention to (i.e., robotics), we excluded those involving physical action in the world (such as making a meal in an unfamiliar kitchen).
The tasks include four that one might expect of ordinary adults, two that require abilities on a par with human experts, and four that push to the limits of the most proficient humans.
The ten tasks
Watch a previously unseen mainstream movie (without reading reviews etc) and be able to follow plot twists and know when to laugh, and be able to summarize it without giving away any spoilers or making up anything that didn’t actually happen, and be able to answer questions like who are the characters? What are their conflicts and motivations? How did these things change? What was the plot twist?
Similar to the above, be able to read new mainstream novels (without reading reviews etc) and reliably answer questions about plot, character, conflicts, motivations, etc, going beyond the literal text in ways that would be clear to ordinary people.
Write engaging brief biographies and obituaries [amendment for clarification: for both: of length and quality in the New York Times obituaries] without obvious hallucinations that aren’t grounded in reliable sources.
Learn and master the basics of almost any new video game within a few minutes or hours, and solve original puzzles in the alternate world of that video game.
Write cogent, persuasive legal briefs without hallucinating any cases.
Reliably construct bug-free code of more than 10,000 lines from natural language specification or by interactions with a non-expert user. [Gluing together code from existing libraries doesn’t count.]
With little or no human involvement, write Pulitzer-caliber books, fiction and non-fiction.
With little or no human involvement, write Oscar-caliber screenplays.
With little or no human involvement, come up with paradigm-shifting, Nobel-caliber scientific discoveries.
Take arbitrary3 proofs from the mathematical literature written in natural language and convert them into a symbolic form suitable for symbolic verification.
— Gary Marcus and Miles Brundage, 30 December 2024
Gary Marcus is author of several books and many scientific articles about intelligence, including The Algebraic Mind, The Birth of the Mind, Kluge, Rebooting AI, and Taming Silicon Valley.
Miles Brundage is an independent AI policy researcher. In October, he left OpenAI, where he worked for six years in various capacities (most recently as Senior Advisor for AGI Readiness).
The bet says nothing about whether either party wants AGI to be achieved, or wants it to be achieved quickly. For example, while Miles thinks the default scenario involves very rapid progress in AI, he is not sure that such a rapid pace is ideal, and he thinks that there should be “brakes” put in place. Gary, too, worries about potential rapid technical progress if it is not made in close conjunction with major progress on aligning AI with human values.
Additionally, both of us pledge to make our respective donation sizes regardless of who wins, e.g., if Miles wins the bet, he will still donate $20,000, but to a charity of his choice, rather than Gary’s. This pledge is intended to capture the spirit of the bet, which is about making a contribution to public discussion through concrete predictions, rather financial gain.
Softening/clarifying the math challenge per discussion between Daniel Litt (who raised concerns about the structure of the challenge) and Ernest Davis (who originally framed the math challenge) testing should be “only on proofs that quote only theorems and definitions that are already parts of existing libraries of formal math (or where you can work your way down to such libraries with reading no more than N [TBA] additional pages in other papers)”, “ testing the AI program on 20 such papers from different areas of math and achievement of 20/20” would be considered success. Judges have final discretion.
The different perspectives represented in the comments will serve as a fun time capsule to look back at three years hence.
Although I'm personally quite optimistic that AI will be much smarter than the 'average' human by 2027, not many people can write Pulitzer-caliber, Oscar-caliber and paradigm-shifting Nobel-caliber scientific discoveries!
I would therefore say that achieving 7., 8. or 9. would show substantially MORE capability than what I'd expect for AGI as they are verging on 'super' intelligence. You probably got a good deal there Gary.