Aug 27, 2023Β·edited Aug 27, 2023Liked by Gary Marcus
For what it's worth, here's a slightly longer overview on my own current preferred approach to estimating "p(doom)", "p(catastrophe)", or other extremely uncertain unprecedented events. I haven't yet quite worked out how to do this all properly though - as Gary mentioned, I'm still working on this as part of my PhD research and as part of the MTAIR project (see https://www.lesswrong.com/posts/sGkRDrpphsu6Jhega/a-model-based-approach-to-ai-existential-risk). The broad strokes are more or less standard probabilistic risk assessment (PRA), but some of the details are my own take or are debated.
Step 1: Determine decision thresholds. To restate the part Gary quoted from our email conversation: We only really care about "p(doom)" or the like as it relates to specific decisions. In particular, I think the reason most people in policy discussions care about something like p(doom) is because for many people higher default p(doom) means they're willing to make larger tradeoffs to reduce that risk. For example, if your p(doom) is very low then you might not want to restrict AI progress in any way just because of some remote possibility of catastrophe (although you might want to regulate AI for other reasons!). But if your p(doom) is higher then you start being willing to make harder and harder sacrifices to avoid really grave outcomes. And if your default p(doom) is extremely high then, yes, maybe you even start considering bombing data centers.
So the first step is to decide where the cutoff points are, at least roughly - what are the thresholds for p(doom) such that our decisions will change if it's above or below those points? For example, if our decisions would be the same (i.e., the tradeoffs we'd be willing to make wouldn't change) for any p(doom) between 0.1 and 0.9, then we don't need any more fine-grained resolution on p(doom) if we've decided it's at least within that range.
How exactly to decide where the proper thresholds are, of course, is a much more difficult question. This is where things like risk tolerance estimation, decision making for non-ergodic processes, and decision making under deep uncertainty (DMDU) come in. I'm still trying to work my way through the literature on this.
Step 2: Determine plausible ranges for p(doom), or whatever probability you're trying to forecast. Use available data, models, expert judgment elicitations, etc. to get an initial range for the quantity of interest, in this case p(doom). This can be a very rough estimate at first. There are differing opinions on the best ways to do this, but my own preference is to use a combination of the following:
- Aggregate different expert judgements, quantitative models, etc. using some sort of weighted average approach. Part of my research is on how to do that weighting in a principled way, even if only on a subjective, superficial level (at least at first). Ideally we'd want to have principled ways of weighting different types of experts vs. quantitative models vs. prediction markets, presumably based on things like previous track records, potential biases, etc.
- I currently lean towards trying to specify plausible probability ranges in the form of second-order probabilities when possible (e.g., what's your estimated probability distribution for p(doom), rather than just a point estimate). Other people think it's fine to just use a point estimate or maybe a confidence interval, and still others advocate for using various types of imprecise probabilities. It's still unclear to me what all the pros and cons of different approaches are here.
- I usually advocate for epistemic modesty, at least for most policy decisions that will impact many people (like this one). Others seem to disagree with me on this, for reasons I don't quite understand, and instead they advocate for policy makers to think about the topic themselves and come to their own conclusions, even if they aren't themselves experts on the topic. (For more on this topic, see for example Why It's OK Not to Think for Yourself by Jon Matheson. For the opposite perspective, see Eliezer Yudkowsky's short book Inadequate Equilibria.)
Step 3: Decide whether it's worth doing further analysis. As above, if in Step 1 we've decided that our relevant decision thresholds are p(doom)=0.1 and p(doom)=0.9, and if Step 2 tells us that all plausible estimates for p(doom) are between those numbers, then we're done and no further analysis is required because further analysis wouldn't change our decisions in any way. Assuming it's not that simple though, we need to decide whether it's worth our time, effort, and money to do a deeper analysis of the issue. This is where Value of Information (VoI) analysis techniques can be useful.
Step 4 (assuming further analysis is warranted): Try to factor the problem. Can we identify the key sub-questions that influence the top-level question of p(doom)? Can we get estimates for those sub-questions in a way that allows us to get better resolution on the key top-level question? This is more or less what Joe Carlsmith was trying to do in his report, where he factored the problem into 6 sub-questions and tried to give estimates for those.
Once we have a decent factorization we can go looking for better data for each sub-question, or we can ask subject matter experts for their estimates of those sub-questions, or maybe we can try using prediction markets or the like.
The problem of course is that it's not always clear what's the best way to factor the problem, or how to put the sub-questions together in the right way so you get a useful overall estimate rather than something wildly off the mark, or how to make sure you didn't leave out anything really major, or how to account for "unknown unknowns", etc. Just getting to a good factorization of the problem can take a lot of time and effort and money, which is why we need Step 3.
One potential advantage of factorization is that it allows us to ask the sub-questions to different subject matter experts. For example, if we divide up the overall question of "what's your p(doom)?" into some factors that relate to machine learning and other factors that relate to economics, then we can go ask the ML experts about the ML questions and leave the economics questions for economists. (Or we can ask them both but maybe give more weight to the ML experts on the ML questions and more weight to the economists on the economics questions.) I haven't seen this done so much in practice though.
One idea I've been focusing on a lot for my research is to try to zoom in on "cruxes" between experts as a way of usefully factoring overall questions like p(doom). However, it turns out it's often very hard to figure out where experts actually disagree! One thing I really like is when experts say things like, "well if I agreed with you on A then I'd also agree with you on B," because then A is clearly a crux for that expert relative to question B. I actually really liked Gary's recent Coleman Hughes podcast episode with Scott Aaronson and Eliezer Yudkowsky, because I thought that they all did a great job on exactly this.
Step 5: Iterate. For each sub-question we can now ask whether further analysis on that question would change our overall decisions (we can use sensitivity analysis techniques for this). If we decide further analysis would be helpful and worth the time and effort, then we can factor that sub-question into sub-sub-questions, and keep iterating until it's no longer worth it to do further analysis.
The first phase of our MTAIR project (the 147 page report Gary linked to) tried to do an exhaustive factorization of p(doom) at least on a qualitative level. It was *very* complicated and it wasn't even complete by the time we decided to at least publish what we had!
A lot of what Epoch (https://epochai.org/) does can be seen as focusing in on particular sub-questions that they think are worth the extra analysis.
For more on the Probabilistic Risk Assessment approach in general, I'd especially recommend Douglas Hubbard's classic How to Measure Anything, or any of his other books on the topic.
Helpful, measured explanation. I'm amazed it is even possible to contemplate a useful approximation of p(doom) or p(catastrophe) from the multitude of (so far) predictable potential origins.
I agree with you, Gary, the bad human actors using AI pose a much greater risk than the emergence of superintelligence gone rogue. Yet, it's the latter that has captured the imagination of all to many and it tends to suck the oxygen out of the air. As you note, that is itself a problem.
Why is the idea of a rogue-AI so compelling? I don't know. Sure, it's a common trope in science fiction, and has been for decades. So what? Beyond that, this fear seems to be concentrated along a Silicon Valley to Oxford axis. Why? Sure, that's where the AI researchers are. But not all of them. Japan has plenty of AI researchers, but as far as I have been able to tell, fear of a rogue AI is not common in Japan. I know enough about manga and anime to know that Japanese popular culture is very much interested in anthropomorphic robots, but not crazy computers. Civil rights for robots was a major theme of Osamu Tezuka's very influential Astro Boy stories from the 1950s and 1960s, something I have blogged about (https://new-savanna.blogspot.com/2010/12/robot-as-subaltern-tezukas-mighty-atom.html). Still, that doesn't tell us much about why the idea is so appealing to some.
Yes, there is a certain narcissism about it. I mean, if you are a member of the p(doom) is near club, you can take some satisfaction in knowing that future of humankind depends on your heroic efforts. And, while I have seen such sentiments explicitly expressed here and there, I think that is a best a secondary factor.
My latest thought on the matter start with a question: What problem does believe in high p(doom) solve for believers? Well, since it is an idea about the future, and pretty much the whole future at that, letβs posit that thatβs the problem it solves: How do with think about the future? Posit the emergence of (a godlike) superintelligence that is so powerful that it can exert almost totally control over human life. Further posit that, for whatever reason, it decides that humankind is expendable. However, if β and itβs a BIG IF β we can exert control over this superintelligence (βalignmentβ), so that we can control it, then WE ARE IN CONTROL of the future.
Problem solved.
But why not just assume that we will be able to control this thing β as Yann LeCun seems to assume? Because that doesn't give us much guidance about what to do. If we can control this thing, whatever it is, then the future is wide open. Anything is possible. Thatβs no way to focus our efforts. But if it IS out to get us, that gives a pretty strong focus for our activity.
Do I believe this? How would I know? I just made it up.
Gary writes that "poverty-of-the-imagination arguments donβt have a great track record." Richness-of-the-imagination arguments don't either. Humans are terrible at predicting the future, and the more speculative the prediction, the worse we are. Predicting doom is a perennial human favorite.
For examples of "poverty-of-the-imagination arguments" we're presented with airplanes, climate change, nuclear weapons, and a gameshow host becoming president. That last one probably shouldn't be included on the grounds that it came out of nowhere; there was never any debate about whether it could happen until right before it happened. The article isn't about P(something will happen that no one predicted), which is somewhere around 0.9999999999999.
The other three differ fundamentally from p(AI doom) or p(AI catastrophe) in their specificity. As is repeatedly acknowledged in this article, "doom" and "catastrophe" could manifest in seemingly infinitely many ways - the imagination's the limit! But this was not the case for P(we create a heavier-than-air machine that can fly), P(we create a bomb that initiates a nuclear chain reaction), and P(burning fossil fuels makes average global temperatures increase by more than 1 degree celsius). Those were well defined.
This whole buisness of calculating probabilities for poorly defined outcomes using speculative chain-of-events thought experiments is silly. This isn't to say that AI "doom" or "catastrophe" is thus a fake problem and should be ignored! I fully support efforts at identifying and mitigating definable, real-world AI harms, much the same way the EPA and FTC and FDA and other agencies identify and mitigate definable, real-world harms that fall under their purview.
But, if we're gonna do P(AI doom), may as well go all the way and do P(something or other kills us all, dunno what yet). That's the territory we're operating in.
"AI risk β Superintelligence risk per se" so true. What I worry most about AI is the human reaction to it. I don't like the dommerism and I don't like the utopian dismissal. In my novel about advanced AI and what it means to be human, I didn't need the AI to be superintelligent to create catastrophe. I just needed humans to forget that they were intelligent (which is easier than creating superintelligence)
I think that global collapse due to a super-intelligent AI system it is not the issue worth much concern today. It strikes the imagination of people but there are other short-term or mid-term important AI development implications which should be dealt with quite immediately. The partial or global human race extinction will come much more likely and much more rapidly from global resources (materials, water, energy) shortage that will lead to open conflicts and wars all around the globe.
The AI driven chat-bots or assistants will improve in near future. They will be still objectively βmediocreβ but for an average user they will be relatively smart, sufficiently smart to appear as very helpful. More and more people will be tempted to use them all the time and rely on them in their personal or professional tasks. They will easily become addicted to this assistance, dependent on it. So I agree that the societal risks associated with the βwidespread AI adoptionβ are the major ones. I will not call these βcatastropheβ, because it will not bring a single bad event. It will rather induce a process of decay, a progressive qualitative deliquescence of education, information, social relations, intellectual and creative work. This process may be slow and insidious enough to go unnoticed, to be not perceived as a danger by the general audience, until it is too late.
" I am not all that worried about humans going extinct anytime very soon. Itβs hard for me to construct a realistic scenario of literal, full-on extinction that I take seriously. We are a pretty resourceful species, spread throughout the globe."
Dr Markus, This seems like unjustified hubris. What we have seen several times over, is that simple new algorithms when amplified by large data and compute can yield shocking an unexpected jumps in capabilities. Certainly I agree present AI systems present little threat, but from this we have strong empirical evidence that we should NOT treat this as evidence for what the next generation will bring.
And I would agree that human ingenuity IS more powerful than any natural force, we will survive nearly anything, except ingenuity itself! Empirically we see humanity has never strongly limited the usage of a new distributed technology. how can we imagine the space of AI systems we build to be limited?
For the last 10K years the most intelligent species on earth has dominated. Perhaps we will indefinitely have a "servant" class that is far more capable than ourselves. I cannot say this CAN'T happen. but you seem to believe it is the most likely thing to happen?! why would that be true?!
Certainly there are many cases where a lesser group contained and controlled a much stronger one for a long period. But it is an unstable condition... any instability and the natural order is restored. Why should things be different in our case?
For me this does not imply extinction. It just means that we probably won't be the ones choosing.
I'm about a far afield from AI as any layperson. From my vantage point, while I think it's important and I am grateful that AI experts area addressing issues of potential risk and harm, I have this to add relative tot his topic:
We (humans) have some important and actionable data and information about the present and the near term future ; so what actions, how can we take thoughtful action in the present, to address the issues, dangers, etc. that we (and by we I mean you AI folks) are identifying about the more concrete, specific threats that are being identified? Otherwise, and again this is from outside the field, it seems a little angels (or demons) on the head of a pin. Reminded of this by Joan Baez "Action is the antidote to despair." What am I missing? Surely there is something.
Whether p(doom) or p(cataβ¦) do not underestimate the collective creativity of 8 billion humans to rapidly solve challenges at speeds preventing either to happen. No one thought a COVID vaccine could emerge so soon after the epidemic was identified. No general anywhere (incl N Korea) will press a nuclear button knowing his iwn family will be incinerated un the following 15 minutes wtcβ¦
Good reading on the subject of Existential Risks is the book Precipice by Toby Ord: https://theprecipice.com/. The author does provide quantitative estimates for p(doom): "My best guess is that humanity faces a one-in-six chance of existential catastrophe over the next century", broken down to different risk factors.
It's always the thing that you CAN'T imagine that kills you. And since there's a 100% probability of not being able to imagine the unimaginable, then we're kinda stuck. In the meantime, mitigation of dangers you CAN imagine may be a poor second, but it's all we got. In the first instance the best advice is that of Douglas Adams' "Dont Panic!", it will go down in history alongside that of Decartes, for example, in terms of iconic phrases but it'll be too late, you only get one chance to go against sound advice. It'd be funny if in our efforts to combat climate change and/or an AI extinction event, we all end up dead as a result of our panic to save ourselves. Shame we won't be around to appreciate the irony.
I agree we should definitely focus on the concrete imminent effects we can already see and reason about. Such as post-truth society, dystopian authoritarian uses of AI etc.
As for p(doom), I argue it is a useless metric for advancing our understanding or policy decisions. We should stop accepting nothing more than guesses and define a criteria that is measurable.
"Itβs hard for me to construct a realistic scenario of literal, full-on extinction that I take seriously" is the key issue here. There is a clear mechanism of action for nuclear war or global heating. There is no plausible mechanism of action for AGI doom. The doomsters take one of two approaches.
The first is to come up with something concrete, which invariably turns out to be nonsense mired in ignorance of how biology, chemistry, physics, or even computers work. I had somebody try to explain to me on Twitter that the AI would get humans who don't know what they are doing to release an entirely protein based virus that waits in the atmosphere for a signal to wipe us all out in a day. None of that is biologically possible (I am a biologist). These are always scifi storytelling tropes like the AI evading somebody simply pulling its plug by copying itself onto devices that would not have the compute power to run it. It must be possible, because that is how it worked in that one movie from the 1990s. Magical thinking.
The second is to say that the ways of the future super-AI are ineffable. It will be to us as we are to ants, so it will be able to do things that appear impossible to us, merely by thinking for a few seconds. In other words, it will be able to do magic. The key problem here is that we aren't ants. We may not know in detail what gimmicks others may come up with yet, like social media, but we do know a lot of stuff is impossible because we tried and found that it can't work. No AI is going to build a perpetual motion device, or even 'only' find a neat solution for global heating that doesn't involve transitioning to zero carbon ASAP. And while an AI may think faster than an individual human, whatever it wants to do with those thoughts still happens at the speed of physics and can therefore be countered at the same speed, if necessary. That narrowly limits what a malicious super-AI could actually do before somebody presses the OFF button of the cluster it is running on.
Ancillary thoughts:
Third, the few relatively plausible doom scenarios are not specific to AI and thus not really an argument about AI as such. For example, if all the nukes are so easy to launch that a computer virus designed by the AI can launch them, then we have a problem to which AI is orthogonal, because we should really be worried right now about a minor software bug causing Armageddon, or about the US president having a bad day. But to my understanding, they aren't that easy to launch.
Fourth, super-AI appears deeply implausible because of what we already know about physics. The singularitarian idea of self-improving feedback going exponential relies on there not being any diminishing returns, trade-offs, and physical limits. In reality, there are always diminishing returns, trade-offs, and physical limits. (At the extreme, you have the EAs dreaming of a benevolent tech singularity resulting in humans colonising the entire universe. This idea at least is immediately disproved by the Fermi Paradox unless one assumes that humans are the only technological civilisation that has ever evolved across an unimaginably large number of planets since the birth of the universe.)
Ultimately, the AI doomsters have read or watched scifi like the Terminator movies or A Fire Upon The Deep and misunderstood them as describing something that can actually happen. Their imaginations are entirely unencumbered by an understanding of the relevant subject matter, be it virology, software, or astrophysics. As I have written elsewhere, we could just as well waste the same space on discussing what to do about Sauron getting his ring back. In the meantime, we are nicely distracted from discussing IP theft, biases baked into the models, and our failure to get serious about global heating.
First, I agree that human extinction is unlikely. That would probably take an astronomical event.
Catastrophe at some point is highly likely. Every human civilization ever created has eventually collapsed, so there's no credible reason to believe ours is eternal. The fact that we have the technical means to very efficiently destroy the modern world is obvious further evidence to support doomerism.
Will AI be the cause of global catastrophe? I have no idea. But if it's not that technology then some other product of the knowledge explosion will be, or rather already is, available to do the job.
We might keep in mind that we may as yet not be able to even imagine what will threaten us the most, just as a century ago the people of that time couldn't imagine much of what was to come. My primary concern with AI is that it's just more fuel being pored on an already overheated knowledge explosion. Who knows what else will pop out of that box?
The one thing we can know for certain is that each of us as individuals is going to perish. The question here is not what to do about this, as there is nothing to do other than maybe put the inevitable off a bit. The useful question would instead seem to be, how do we relate to this huge unknown which forms the foundation of our existence? What story will we tell ourselves about death? Given that there are no facts about death which can be relied upon, and indeed not even any data from "the other side", the story is ours to write. Some people will find the story that works for them in religions, while others of us will have to craft a story about death out of our relationship with the natural world.
Part of the genius of religions may be that it doesn't really matter if one's story is correct, as our profound ignorance on the matter of death makes determining correctness impossible. What does matter is, does our preferred story about death add value to our living, and do we really believe it?
The cause of a coming catastrophe will most likely be our addiction to an a simplistic, outdated, and increasingly dangerous "more is better" relationship with knowledge. At the heart of the threat lies a philosophical problem.
There may be a philosophical solution, not to what is coming for the culture as a whole, but for what is coming for us personally. The logic of such a philosophy seems remarkably simple.
When suffering from incurable ignorance regarding our ultimate fate, make our way to happy song by whatever means we can.
The mediocre AI is option C, that neither "we" nor "AI" but panicked competition (between companies or governments, doesn't matter) sets the agenda. That's the current state. I think p(intentional doom) is unlikely, but p(thoughtless doom) is not at all unlikely. The nuclear non-proliferation analogy has already failed, because there's no inherent scarcity in AI -- in fact the opposite. One nuclear bomb is a threat, but one superhuman AGI probably is not. The threat from any AI, smart or dumb, comes from widely replicated small harms. And these will most definitely fall on the already disadvantaged.
The only solution I see for this is for us to work to prevent AIs from committing small harms. Maybe by limiting the power of each AI so that principled hybrid reasoning is feasible. Technologists have a terrible record of ignoring small harms, of moving fast and breaking things. We can't rely on trained models to reason, because small harms by definition are corner cases. We always neglect these, defer them to some future version that never comes. We don't care until they've been replicated enough to show up in statistics. But small harms can be large to the individuals they affect, and each AI needs to predict and avoid them in its own domain and usage.
And the only way for this to happen is via regulation. Maybe the EU can save us. The US congress seems to have lost its own ability to reason.
"Do no evil" is a slogan rapidly loosing its cachet in large technology companies driven by profit. Accelerating the education of all, and promoting kindness as both individuals and groups will help keep the probability of p(doom) in control.
For what it's worth, here's a slightly longer overview on my own current preferred approach to estimating "p(doom)", "p(catastrophe)", or other extremely uncertain unprecedented events. I haven't yet quite worked out how to do this all properly though - as Gary mentioned, I'm still working on this as part of my PhD research and as part of the MTAIR project (see https://www.lesswrong.com/posts/sGkRDrpphsu6Jhega/a-model-based-approach-to-ai-existential-risk). The broad strokes are more or less standard probabilistic risk assessment (PRA), but some of the details are my own take or are debated.
Step 1: Determine decision thresholds. To restate the part Gary quoted from our email conversation: We only really care about "p(doom)" or the like as it relates to specific decisions. In particular, I think the reason most people in policy discussions care about something like p(doom) is because for many people higher default p(doom) means they're willing to make larger tradeoffs to reduce that risk. For example, if your p(doom) is very low then you might not want to restrict AI progress in any way just because of some remote possibility of catastrophe (although you might want to regulate AI for other reasons!). But if your p(doom) is higher then you start being willing to make harder and harder sacrifices to avoid really grave outcomes. And if your default p(doom) is extremely high then, yes, maybe you even start considering bombing data centers.
So the first step is to decide where the cutoff points are, at least roughly - what are the thresholds for p(doom) such that our decisions will change if it's above or below those points? For example, if our decisions would be the same (i.e., the tradeoffs we'd be willing to make wouldn't change) for any p(doom) between 0.1 and 0.9, then we don't need any more fine-grained resolution on p(doom) if we've decided it's at least within that range.
How exactly to decide where the proper thresholds are, of course, is a much more difficult question. This is where things like risk tolerance estimation, decision making for non-ergodic processes, and decision making under deep uncertainty (DMDU) come in. I'm still trying to work my way through the literature on this.
Step 2: Determine plausible ranges for p(doom), or whatever probability you're trying to forecast. Use available data, models, expert judgment elicitations, etc. to get an initial range for the quantity of interest, in this case p(doom). This can be a very rough estimate at first. There are differing opinions on the best ways to do this, but my own preference is to use a combination of the following:
- Aggregate different expert judgements, quantitative models, etc. using some sort of weighted average approach. Part of my research is on how to do that weighting in a principled way, even if only on a subjective, superficial level (at least at first). Ideally we'd want to have principled ways of weighting different types of experts vs. quantitative models vs. prediction markets, presumably based on things like previous track records, potential biases, etc.
- I currently lean towards trying to specify plausible probability ranges in the form of second-order probabilities when possible (e.g., what's your estimated probability distribution for p(doom), rather than just a point estimate). Other people think it's fine to just use a point estimate or maybe a confidence interval, and still others advocate for using various types of imprecise probabilities. It's still unclear to me what all the pros and cons of different approaches are here.
- I usually advocate for epistemic modesty, at least for most policy decisions that will impact many people (like this one). Others seem to disagree with me on this, for reasons I don't quite understand, and instead they advocate for policy makers to think about the topic themselves and come to their own conclusions, even if they aren't themselves experts on the topic. (For more on this topic, see for example Why It's OK Not to Think for Yourself by Jon Matheson. For the opposite perspective, see Eliezer Yudkowsky's short book Inadequate Equilibria.)
Step 3: Decide whether it's worth doing further analysis. As above, if in Step 1 we've decided that our relevant decision thresholds are p(doom)=0.1 and p(doom)=0.9, and if Step 2 tells us that all plausible estimates for p(doom) are between those numbers, then we're done and no further analysis is required because further analysis wouldn't change our decisions in any way. Assuming it's not that simple though, we need to decide whether it's worth our time, effort, and money to do a deeper analysis of the issue. This is where Value of Information (VoI) analysis techniques can be useful.
Step 4 (assuming further analysis is warranted): Try to factor the problem. Can we identify the key sub-questions that influence the top-level question of p(doom)? Can we get estimates for those sub-questions in a way that allows us to get better resolution on the key top-level question? This is more or less what Joe Carlsmith was trying to do in his report, where he factored the problem into 6 sub-questions and tried to give estimates for those.
Once we have a decent factorization we can go looking for better data for each sub-question, or we can ask subject matter experts for their estimates of those sub-questions, or maybe we can try using prediction markets or the like.
The problem of course is that it's not always clear what's the best way to factor the problem, or how to put the sub-questions together in the right way so you get a useful overall estimate rather than something wildly off the mark, or how to make sure you didn't leave out anything really major, or how to account for "unknown unknowns", etc. Just getting to a good factorization of the problem can take a lot of time and effort and money, which is why we need Step 3.
One potential advantage of factorization is that it allows us to ask the sub-questions to different subject matter experts. For example, if we divide up the overall question of "what's your p(doom)?" into some factors that relate to machine learning and other factors that relate to economics, then we can go ask the ML experts about the ML questions and leave the economics questions for economists. (Or we can ask them both but maybe give more weight to the ML experts on the ML questions and more weight to the economists on the economics questions.) I haven't seen this done so much in practice though.
One idea I've been focusing on a lot for my research is to try to zoom in on "cruxes" between experts as a way of usefully factoring overall questions like p(doom). However, it turns out it's often very hard to figure out where experts actually disagree! One thing I really like is when experts say things like, "well if I agreed with you on A then I'd also agree with you on B," because then A is clearly a crux for that expert relative to question B. I actually really liked Gary's recent Coleman Hughes podcast episode with Scott Aaronson and Eliezer Yudkowsky, because I thought that they all did a great job on exactly this.
Step 5: Iterate. For each sub-question we can now ask whether further analysis on that question would change our overall decisions (we can use sensitivity analysis techniques for this). If we decide further analysis would be helpful and worth the time and effort, then we can factor that sub-question into sub-sub-questions, and keep iterating until it's no longer worth it to do further analysis.
The first phase of our MTAIR project (the 147 page report Gary linked to) tried to do an exhaustive factorization of p(doom) at least on a qualitative level. It was *very* complicated and it wasn't even complete by the time we decided to at least publish what we had!
A lot of what Epoch (https://epochai.org/) does can be seen as focusing in on particular sub-questions that they think are worth the extra analysis.
For more on the Probabilistic Risk Assessment approach in general, I'd especially recommend Douglas Hubbard's classic How to Measure Anything, or any of his other books on the topic.
Helpful, measured explanation. I'm amazed it is even possible to contemplate a useful approximation of p(doom) or p(catastrophe) from the multitude of (so far) predictable potential origins.
I agree with you, Gary, the bad human actors using AI pose a much greater risk than the emergence of superintelligence gone rogue. Yet, it's the latter that has captured the imagination of all to many and it tends to suck the oxygen out of the air. As you note, that is itself a problem.
Why is the idea of a rogue-AI so compelling? I don't know. Sure, it's a common trope in science fiction, and has been for decades. So what? Beyond that, this fear seems to be concentrated along a Silicon Valley to Oxford axis. Why? Sure, that's where the AI researchers are. But not all of them. Japan has plenty of AI researchers, but as far as I have been able to tell, fear of a rogue AI is not common in Japan. I know enough about manga and anime to know that Japanese popular culture is very much interested in anthropomorphic robots, but not crazy computers. Civil rights for robots was a major theme of Osamu Tezuka's very influential Astro Boy stories from the 1950s and 1960s, something I have blogged about (https://new-savanna.blogspot.com/2010/12/robot-as-subaltern-tezukas-mighty-atom.html). Still, that doesn't tell us much about why the idea is so appealing to some.
Yes, there is a certain narcissism about it. I mean, if you are a member of the p(doom) is near club, you can take some satisfaction in knowing that future of humankind depends on your heroic efforts. And, while I have seen such sentiments explicitly expressed here and there, I think that is a best a secondary factor.
My latest thought on the matter start with a question: What problem does believe in high p(doom) solve for believers? Well, since it is an idea about the future, and pretty much the whole future at that, letβs posit that thatβs the problem it solves: How do with think about the future? Posit the emergence of (a godlike) superintelligence that is so powerful that it can exert almost totally control over human life. Further posit that, for whatever reason, it decides that humankind is expendable. However, if β and itβs a BIG IF β we can exert control over this superintelligence (βalignmentβ), so that we can control it, then WE ARE IN CONTROL of the future.
Problem solved.
But why not just assume that we will be able to control this thing β as Yann LeCun seems to assume? Because that doesn't give us much guidance about what to do. If we can control this thing, whatever it is, then the future is wide open. Anything is possible. Thatβs no way to focus our efforts. But if it IS out to get us, that gives a pretty strong focus for our activity.
Do I believe this? How would I know? I just made it up.
Gary writes that "poverty-of-the-imagination arguments donβt have a great track record." Richness-of-the-imagination arguments don't either. Humans are terrible at predicting the future, and the more speculative the prediction, the worse we are. Predicting doom is a perennial human favorite.
For examples of "poverty-of-the-imagination arguments" we're presented with airplanes, climate change, nuclear weapons, and a gameshow host becoming president. That last one probably shouldn't be included on the grounds that it came out of nowhere; there was never any debate about whether it could happen until right before it happened. The article isn't about P(something will happen that no one predicted), which is somewhere around 0.9999999999999.
The other three differ fundamentally from p(AI doom) or p(AI catastrophe) in their specificity. As is repeatedly acknowledged in this article, "doom" and "catastrophe" could manifest in seemingly infinitely many ways - the imagination's the limit! But this was not the case for P(we create a heavier-than-air machine that can fly), P(we create a bomb that initiates a nuclear chain reaction), and P(burning fossil fuels makes average global temperatures increase by more than 1 degree celsius). Those were well defined.
This whole buisness of calculating probabilities for poorly defined outcomes using speculative chain-of-events thought experiments is silly. This isn't to say that AI "doom" or "catastrophe" is thus a fake problem and should be ignored! I fully support efforts at identifying and mitigating definable, real-world AI harms, much the same way the EPA and FTC and FDA and other agencies identify and mitigate definable, real-world harms that fall under their purview.
But, if we're gonna do P(AI doom), may as well go all the way and do P(something or other kills us all, dunno what yet). That's the territory we're operating in.
"AI risk β Superintelligence risk per se" so true. What I worry most about AI is the human reaction to it. I don't like the dommerism and I don't like the utopian dismissal. In my novel about advanced AI and what it means to be human, I didn't need the AI to be superintelligent to create catastrophe. I just needed humans to forget that they were intelligent (which is easier than creating superintelligence)
https://www.amazon.com/gp/aw/d/B0CCCHSHRH/
I think that global collapse due to a super-intelligent AI system it is not the issue worth much concern today. It strikes the imagination of people but there are other short-term or mid-term important AI development implications which should be dealt with quite immediately. The partial or global human race extinction will come much more likely and much more rapidly from global resources (materials, water, energy) shortage that will lead to open conflicts and wars all around the globe.
The AI driven chat-bots or assistants will improve in near future. They will be still objectively βmediocreβ but for an average user they will be relatively smart, sufficiently smart to appear as very helpful. More and more people will be tempted to use them all the time and rely on them in their personal or professional tasks. They will easily become addicted to this assistance, dependent on it. So I agree that the societal risks associated with the βwidespread AI adoptionβ are the major ones. I will not call these βcatastropheβ, because it will not bring a single bad event. It will rather induce a process of decay, a progressive qualitative deliquescence of education, information, social relations, intellectual and creative work. This process may be slow and insidious enough to go unnoticed, to be not perceived as a danger by the general audience, until it is too late.
" I am not all that worried about humans going extinct anytime very soon. Itβs hard for me to construct a realistic scenario of literal, full-on extinction that I take seriously. We are a pretty resourceful species, spread throughout the globe."
Dr Markus, This seems like unjustified hubris. What we have seen several times over, is that simple new algorithms when amplified by large data and compute can yield shocking an unexpected jumps in capabilities. Certainly I agree present AI systems present little threat, but from this we have strong empirical evidence that we should NOT treat this as evidence for what the next generation will bring.
And I would agree that human ingenuity IS more powerful than any natural force, we will survive nearly anything, except ingenuity itself! Empirically we see humanity has never strongly limited the usage of a new distributed technology. how can we imagine the space of AI systems we build to be limited?
For the last 10K years the most intelligent species on earth has dominated. Perhaps we will indefinitely have a "servant" class that is far more capable than ourselves. I cannot say this CAN'T happen. but you seem to believe it is the most likely thing to happen?! why would that be true?!
Certainly there are many cases where a lesser group contained and controlled a much stronger one for a long period. But it is an unstable condition... any instability and the natural order is restored. Why should things be different in our case?
For me this does not imply extinction. It just means that we probably won't be the ones choosing.
I'm about a far afield from AI as any layperson. From my vantage point, while I think it's important and I am grateful that AI experts area addressing issues of potential risk and harm, I have this to add relative tot his topic:
We (humans) have some important and actionable data and information about the present and the near term future ; so what actions, how can we take thoughtful action in the present, to address the issues, dangers, etc. that we (and by we I mean you AI folks) are identifying about the more concrete, specific threats that are being identified? Otherwise, and again this is from outside the field, it seems a little angels (or demons) on the head of a pin. Reminded of this by Joan Baez "Action is the antidote to despair." What am I missing? Surely there is something.
Whether p(doom) or p(cataβ¦) do not underestimate the collective creativity of 8 billion humans to rapidly solve challenges at speeds preventing either to happen. No one thought a COVID vaccine could emerge so soon after the epidemic was identified. No general anywhere (incl N Korea) will press a nuclear button knowing his iwn family will be incinerated un the following 15 minutes wtcβ¦
Good reading on the subject of Existential Risks is the book Precipice by Toby Ord: https://theprecipice.com/. The author does provide quantitative estimates for p(doom): "My best guess is that humanity faces a one-in-six chance of existential catastrophe over the next century", broken down to different risk factors.
It's always the thing that you CAN'T imagine that kills you. And since there's a 100% probability of not being able to imagine the unimaginable, then we're kinda stuck. In the meantime, mitigation of dangers you CAN imagine may be a poor second, but it's all we got. In the first instance the best advice is that of Douglas Adams' "Dont Panic!", it will go down in history alongside that of Decartes, for example, in terms of iconic phrases but it'll be too late, you only get one chance to go against sound advice. It'd be funny if in our efforts to combat climate change and/or an AI extinction event, we all end up dead as a result of our panic to save ourselves. Shame we won't be around to appreciate the irony.
This doesn't seem true at all, for instance my dad was pretty sure he was going to die from liver cancer, and he was spot on
I agree we should definitely focus on the concrete imminent effects we can already see and reason about. Such as post-truth society, dystopian authoritarian uses of AI etc.
As for p(doom), I argue it is a useless metric for advancing our understanding or policy decisions. We should stop accepting nothing more than guesses and define a criteria that is measurable.
https://www.mindprison.cc/p/pdoom-the-useless-ai-predictor
"Itβs hard for me to construct a realistic scenario of literal, full-on extinction that I take seriously" is the key issue here. There is a clear mechanism of action for nuclear war or global heating. There is no plausible mechanism of action for AGI doom. The doomsters take one of two approaches.
The first is to come up with something concrete, which invariably turns out to be nonsense mired in ignorance of how biology, chemistry, physics, or even computers work. I had somebody try to explain to me on Twitter that the AI would get humans who don't know what they are doing to release an entirely protein based virus that waits in the atmosphere for a signal to wipe us all out in a day. None of that is biologically possible (I am a biologist). These are always scifi storytelling tropes like the AI evading somebody simply pulling its plug by copying itself onto devices that would not have the compute power to run it. It must be possible, because that is how it worked in that one movie from the 1990s. Magical thinking.
The second is to say that the ways of the future super-AI are ineffable. It will be to us as we are to ants, so it will be able to do things that appear impossible to us, merely by thinking for a few seconds. In other words, it will be able to do magic. The key problem here is that we aren't ants. We may not know in detail what gimmicks others may come up with yet, like social media, but we do know a lot of stuff is impossible because we tried and found that it can't work. No AI is going to build a perpetual motion device, or even 'only' find a neat solution for global heating that doesn't involve transitioning to zero carbon ASAP. And while an AI may think faster than an individual human, whatever it wants to do with those thoughts still happens at the speed of physics and can therefore be countered at the same speed, if necessary. That narrowly limits what a malicious super-AI could actually do before somebody presses the OFF button of the cluster it is running on.
Ancillary thoughts:
Third, the few relatively plausible doom scenarios are not specific to AI and thus not really an argument about AI as such. For example, if all the nukes are so easy to launch that a computer virus designed by the AI can launch them, then we have a problem to which AI is orthogonal, because we should really be worried right now about a minor software bug causing Armageddon, or about the US president having a bad day. But to my understanding, they aren't that easy to launch.
Fourth, super-AI appears deeply implausible because of what we already know about physics. The singularitarian idea of self-improving feedback going exponential relies on there not being any diminishing returns, trade-offs, and physical limits. In reality, there are always diminishing returns, trade-offs, and physical limits. (At the extreme, you have the EAs dreaming of a benevolent tech singularity resulting in humans colonising the entire universe. This idea at least is immediately disproved by the Fermi Paradox unless one assumes that humans are the only technological civilisation that has ever evolved across an unimaginably large number of planets since the birth of the universe.)
Ultimately, the AI doomsters have read or watched scifi like the Terminator movies or A Fire Upon The Deep and misunderstood them as describing something that can actually happen. Their imaginations are entirely unencumbered by an understanding of the relevant subject matter, be it virology, software, or astrophysics. As I have written elsewhere, we could just as well waste the same space on discussing what to do about Sauron getting his ring back. In the meantime, we are nicely distracted from discussing IP theft, biases baked into the models, and our failure to get serious about global heating.
Your essay is thought provoking as always. Hmm...
First, I agree that human extinction is unlikely. That would probably take an astronomical event.
Catastrophe at some point is highly likely. Every human civilization ever created has eventually collapsed, so there's no credible reason to believe ours is eternal. The fact that we have the technical means to very efficiently destroy the modern world is obvious further evidence to support doomerism.
Will AI be the cause of global catastrophe? I have no idea. But if it's not that technology then some other product of the knowledge explosion will be, or rather already is, available to do the job.
We might keep in mind that we may as yet not be able to even imagine what will threaten us the most, just as a century ago the people of that time couldn't imagine much of what was to come. My primary concern with AI is that it's just more fuel being pored on an already overheated knowledge explosion. Who knows what else will pop out of that box?
The one thing we can know for certain is that each of us as individuals is going to perish. The question here is not what to do about this, as there is nothing to do other than maybe put the inevitable off a bit. The useful question would instead seem to be, how do we relate to this huge unknown which forms the foundation of our existence? What story will we tell ourselves about death? Given that there are no facts about death which can be relied upon, and indeed not even any data from "the other side", the story is ours to write. Some people will find the story that works for them in religions, while others of us will have to craft a story about death out of our relationship with the natural world.
Part of the genius of religions may be that it doesn't really matter if one's story is correct, as our profound ignorance on the matter of death makes determining correctness impossible. What does matter is, does our preferred story about death add value to our living, and do we really believe it?
The cause of a coming catastrophe will most likely be our addiction to an a simplistic, outdated, and increasingly dangerous "more is better" relationship with knowledge. At the heart of the threat lies a philosophical problem.
There may be a philosophical solution, not to what is coming for the culture as a whole, but for what is coming for us personally. The logic of such a philosophy seems remarkably simple.
When suffering from incurable ignorance regarding our ultimate fate, make our way to happy song by whatever means we can.
The mediocre AI is option C, that neither "we" nor "AI" but panicked competition (between companies or governments, doesn't matter) sets the agenda. That's the current state. I think p(intentional doom) is unlikely, but p(thoughtless doom) is not at all unlikely. The nuclear non-proliferation analogy has already failed, because there's no inherent scarcity in AI -- in fact the opposite. One nuclear bomb is a threat, but one superhuman AGI probably is not. The threat from any AI, smart or dumb, comes from widely replicated small harms. And these will most definitely fall on the already disadvantaged.
The only solution I see for this is for us to work to prevent AIs from committing small harms. Maybe by limiting the power of each AI so that principled hybrid reasoning is feasible. Technologists have a terrible record of ignoring small harms, of moving fast and breaking things. We can't rely on trained models to reason, because small harms by definition are corner cases. We always neglect these, defer them to some future version that never comes. We don't care until they've been replicated enough to show up in statistics. But small harms can be large to the individuals they affect, and each AI needs to predict and avoid them in its own domain and usage.
And the only way for this to happen is via regulation. Maybe the EU can save us. The US congress seems to have lost its own ability to reason.
"Do no evil" is a slogan rapidly loosing its cachet in large technology companies driven by profit. Accelerating the education of all, and promoting kindness as both individuals and groups will help keep the probability of p(doom) in control.