I would just assume that the difference between superforecasters and experts wasn't a product of different actual beliefs but the extremely low skill of people who are not superforecasters (or just experienced forecasters, I suppose) to actually translate a perception of relevant factors into a good probability - stuff like giving a 5% probability to things that should logically be 0.1% at best, like Covid deaths instantly flatlining in the middle of 2020.
> But I’m with the superforecasters on this. I didn’t participate in the tournament, but the experts’ forecasts seem clearly too high to me. There has never been a catastrophe that killed 10% of the human population in a five year period in all of recorded history, although two plague pandemics—the Plague of Justinian in the sixth century and the Black Death in fourteenth—are near misses.
For obvious reasons, human forecasters don't say "we observe a nonzero base rate for human extinctions", yet extinction remains possible. Superforecasters excel at outside views and base rates, but I think outside view and base rates are less useful than usual for forecasting AGI catastrophe, as there is nothing remotely like AGI in human history.
This problem is a bit like forecasting the risk of catastrophic nuclear war during the cold war (probably involving countervalue strikes, but possibly not, if nuclear winter alone is severe enough to be catastrophic). What base rate would you have used for that in 1950? Even with the cold war over, I still feel like we have little data to work with. I think it's pretty contingent ― it depends on the personalities of the people at the top on both sides (not that there are necessarily only two sides, mind you), how each side thinks about the other, and on the systems of control and relationships (e.g. how hard it is to launch missiles without orders from the top). One should also consider near misses: what if one side decides to do a series of countervalue strikes but the other side responds only with counterforce strikes? Does this lead to a dystopia as countries surrender to the most vicious attacker, or a second war with a second nuclear-armed nation, as everyone knows that the victor chose countervalue strikes?
Or perhaps forecasting this is more like forecasting the risk of intentional or semi-intentional catastrophe via highly contagious and deadly superviruses (partly because if AGI wants to kill everyone, this is the most obvious way to do it). For a human to do this requires that he be either omnicidal or genocidal, and in the latter case he is constrained to require a targeted virus that he is confident will not kill his own race (most likely a genocidal person would only want to kill one race, which might be too hard, and he might also decide not to do it due to the risk of mutation). Currently, it also requires a high level of knowledge and skill, and the effort would benefit from having a lot of money and/or manpower. Given that omnicidal ideation is extremely rare, and determined genocidal ideation is rare, and that the necessary skillset is very rare, I expect humans to create such a virus quite rarely ― but how do you decide just how rarely?
Estimating the same risk coming from AGI is much, much harder than this. Obviously, AGIs don't have to worry about biological viruses killing their "race", so a genocidal AGI won't be constrained by a need for viral selectivity like a human would. But there are several other reasons why I think the risk of AGIs doing this is much higher than the risk of humans doing this.
First of all, even if there is only one "species" of AGI (just as there is only one species of LLM, the Transformer), there are probably a wide variety of ways to configure it and a variety of personalities that it can have. Just as humans end up with a wide variety of personalities, beliefs, and goals (and just as GPTs seem able to mimick all of these insofar as they are not deliberately limited via e.g. RLHF), we should expect AGIs to be potentially highly variable insofar as humanity doesn't actively and strictly prevent variability. But now, several more difficulties appear for our forecast:
- Many "species" of AGI could easily appear over time, as people (and maybe AGIs) explore the Mind Design Space further
- Unlike biological species that develop slowly along a "tree" of species, software tends to explode onto the scene as a "mosaic", as any single person can create a new app constructed out of an arbitrary combination of existing npm packages, each configured in an arbitrary way. Whereas 10 species can evolve to 20 species over many generations, if you start from 10 software libraries, people can easily build a thousand-plus apps based on arbitrary combinations of those 10.
- AGIs are likely to think much faster than humans, regardless of whether they are more intelligent than us
- Intelligence is probably proportional to the logarithm of processing power (at minimum), so if an AGI wants to do more cognitively demanding tasks than any human can do, it can increase its intelligence with a supercomputer. Alternately, if it wants to do a task that a team of humans could do, it could run copies of itself scattered over the internet.
- If the "base" intelligence of an AGI is equal to an average human, you might not expect much danger; after all, the average high school cannot destroy the world no matter how much they might dream of it. However, given a "full-capability" goal-directed AGI ― one with agenticity, goals, smart long-term memory, smart task prioritization, and an ability to alter clones of itself ― its mental abilities will grow over time because it has the full capabilities of a computer. This means it can (in layman's terms) install new algorithms into its own mind, whether it wrote the algorithms itself or not. The consequences of this are hard to predict but should will generally increase mental ability over time. In addition, all AGIs will probably have much better short-term memories than humans because they run on computers.
- Communication between AGIs and other software & AGIs can be orders of magnitude faster than communication between humans and web sites & other humans. Some of this communication bandwidth will be used to improve the reliability of communication between AGIs/systems; some of it could also be used to perform coordinated actions extremely quickly.
It seems to me that the behavior of the "average" AGI is irrelevant for forecasting catastrophic risk. If a billion people have a friendly well-behaved AGI assistant on their phone, this does not imply that some e/acc teenager cannot create a substantially different AGI that decides to kill everyone.
The risk of AGI, therefore,. comes not from the average AGI, but from the most dangerous AGI anyone *ever* builds. And I think when you look at it that way, every single one of the above bullet points increases the risk of catastrophe. Have you considered all this, Robert?
Edit: also, I will quantify what "worst" means by analogy to viruses. The total impact of a "deadly" virus is a function of both its virulence (direct deadliness) and communicability (contagiousness). Similarly, the total impact of a "deadly" AGI is a function of whatever harmful intentions it has, its raw intelligence (which defies definition, but "we know it when we see it"), and its ability to effectively use that intelligence as an agent (which depends on the quality of its long-term memory and on its skill at short-term and long-term planning). It seems to me that several kinds of limitations would stop AGI from causing catastrophe, but (i) it's hard to stop people from making more capable / less limited AGIs and (ii) a likely scenario is that a single human will control a large army of AGIs eventually, and since some humans are malevolent, the army could be malevolent (that might not mean "catastrophe" in the technical sense of mass deaths, but it's worth noting, because in my mind s-risks have the potential to be worse than x-risks).
You probably appreciate this already, but I'd like to add that, just as Putin would be just as dangerous in a wheelchair, AGIs can in principle amass real-world power without ever leaving the internet, especially as they are likely to be able to impersonate humans quite well.
I'd have to write several essays to respond to all these points, but I did work for years as an existential risk researcher—and have written several papers related to AI risk—so I have in fact considered these issues carefully.
In general I think that it's a mistake to discount the past steeply even when there are methodological problems using it as a guide to the future. It is hard—verging on impossible—to predict scientific or social developments purely analytically. The outside view should tell us that we shouldn't be confident that any particular speculative scenario will come to pass.
Well, I am subscribed, so I would read those essays! I am not sure quite what you mean, though. I am not positing a "particular" speculative scenario. It seems to me that the way of thinking I laid out above fits a very wide variety of realistic future scenarios. And it rests not only on my mental model of technology and intelligence, but also how society and humans work. I assume humans will behave generally the same way as have before; for example they will publish information about algorithms that will later be used in AGI, and indeed the majority of the necessary foundations might have been published already. (Complete AGI designs might not be published initially, but once one organization puts the pieces together, it's probably just a matter of time before another organization or teenager does so independently, in some slightly different way, and publishes details publicly.)
A strong trend from history is that strong regulations tend to appear only after disaster strikes. I'm not sure how to estimate, in this case, whether AGI will lead to a minor disaster that causes global regulations quickly enough to prevent a bigger catastrophe, and how well regulations would work given (i) the lack of expertise among politicians, (ii) the greater difficulty of preventing an AI arms race relative to a nuclear arms race, and (iii) the even greater difficulty of preventing teenagers in basements from trying to design and build AGIs. Regarding the last point, I feel like the main difference in my mental model (vs popular conception) is in thinking that AGI doesn't require immense processing power, so that a teenager in a basement could build one despite whatever the regulations say (like a baby, it might take several years to train from scratch on a PC, but that's not a tremendous guardrail and I don't think from-scratch training will be necessary anyway).
Edit: oh, and while I've suggested scenarios in which AGI is invented suddenly, I wonder if a boiling-the-frog scenario ― in which mostly-safe AIs gradually advance into mostly-safe quasi-AGIs from which we derive very safe "full-capability" AGIs ― would be even more hazardous in the long run, due to a complacency effect in society, and because an AGI that turns out to be highly malevolent in this environment would have access to more information about the nature of intelligence and AI, which it could then use for its own purposes. However, if the "very safe" AGIs can be made arbitrarily powerful, an attempted "pivotal event" to stop disaster is more likely to succeed in this scenario.
Glad to hear, too, that you're in agreement with the superforecaster medians from XPT. You have a lot more experience forecasting questions about catastrophe than I do.
Thanks, Kjirste! I'm very fortunate to have made such a good recovery.
I'm likewise glad the superforecaster medians lined up with my own views. I sometimes felt isolated as a catastrophic risk researcher who didn't think catastrophe was imminent. Why do you think the superforecasters and the experts were unable to convince one another? What do you think was the source of the disagreement between the two groups?
So many reasons! I wrote a bit about it the other day: https://medium.com/@kjirstecm/ai-forecasting-thoughts-on-comments-about-fris-xpt-report-8bb77b910070 and I also agree with comments I've seen elsewhere about it being long, and perhaps people became tired. It was difficult to keep track of discussions with so many questions, many inter-related (16 required, but each had 9 specific forecasts iirc, and then there were as many as possible of the rest of the 59 questions to deal with). Pages 40 and 41 of the report also summarize some of the interaction difficulties - it did feel like a stalemate at times.
What you wrote is really interesting! I've been struck by the religious quality of AI risk arguments too. This isn't an ordinary question to many people; it's deeply-held eschatology.
I wonder if thinking of the difference that way suggests an alternative way to fostering communication between groups? Or maybe it just means that agreement of any sort will be harder to reach.
That's a great question. It seems clear that the best forecasters to base their forecasts on fixed religious or ideological beliefs. But l don't most people think that's what they're doing. I always think the cure is to subject our views to an outside view and seek out reasons why we might be wrong. But I don't know how to make sure that's what everyone's doing.
You mention "exasperating" experts that seemed to want "faith". I have noticed this with some people. Yudkowsky's high P(doom) has been particularly confusing to me, as I expected clearer arguments from the father of modern rationalism. I love Rationality A-Z and enjoyed HPMOR but so far I haven't seen a convincing chain of reasoning about AGI from him. But tell me, do any of the experts reason about this the same way I do? https://tellingthefuture.substack.com/p/forecasting-the-end-of-the-world/comment/21536469
Not 100% sure for any one person or question, but I'd say that the general arguments that you lay out were represented among the views expressed in the tournament. Several were expressed by multiple people in their rationales, sometimes from experts, sometimes supers, iirc.
I should also say that there was representation of both cohorts (experts and supers) in saying things that I thought were over the top. I am quite sure that there were experts and supers who found my rationales and forecasts exasperating too.
I would just assume that the difference between superforecasters and experts wasn't a product of different actual beliefs but the extremely low skill of people who are not superforecasters (or just experienced forecasters, I suppose) to actually translate a perception of relevant factors into a good probability - stuff like giving a 5% probability to things that should logically be 0.1% at best, like Covid deaths instantly flatlining in the middle of 2020.
Bravo for your hard work and self confidence evident in your reports on your continuing recovery.
Thank you, Carolyn!
> But I’m with the superforecasters on this. I didn’t participate in the tournament, but the experts’ forecasts seem clearly too high to me. There has never been a catastrophe that killed 10% of the human population in a five year period in all of recorded history, although two plague pandemics—the Plague of Justinian in the sixth century and the Black Death in fourteenth—are near misses.
For obvious reasons, human forecasters don't say "we observe a nonzero base rate for human extinctions", yet extinction remains possible. Superforecasters excel at outside views and base rates, but I think outside view and base rates are less useful than usual for forecasting AGI catastrophe, as there is nothing remotely like AGI in human history.
This problem is a bit like forecasting the risk of catastrophic nuclear war during the cold war (probably involving countervalue strikes, but possibly not, if nuclear winter alone is severe enough to be catastrophic). What base rate would you have used for that in 1950? Even with the cold war over, I still feel like we have little data to work with. I think it's pretty contingent ― it depends on the personalities of the people at the top on both sides (not that there are necessarily only two sides, mind you), how each side thinks about the other, and on the systems of control and relationships (e.g. how hard it is to launch missiles without orders from the top). One should also consider near misses: what if one side decides to do a series of countervalue strikes but the other side responds only with counterforce strikes? Does this lead to a dystopia as countries surrender to the most vicious attacker, or a second war with a second nuclear-armed nation, as everyone knows that the victor chose countervalue strikes?
Or perhaps forecasting this is more like forecasting the risk of intentional or semi-intentional catastrophe via highly contagious and deadly superviruses (partly because if AGI wants to kill everyone, this is the most obvious way to do it). For a human to do this requires that he be either omnicidal or genocidal, and in the latter case he is constrained to require a targeted virus that he is confident will not kill his own race (most likely a genocidal person would only want to kill one race, which might be too hard, and he might also decide not to do it due to the risk of mutation). Currently, it also requires a high level of knowledge and skill, and the effort would benefit from having a lot of money and/or manpower. Given that omnicidal ideation is extremely rare, and determined genocidal ideation is rare, and that the necessary skillset is very rare, I expect humans to create such a virus quite rarely ― but how do you decide just how rarely?
Estimating the same risk coming from AGI is much, much harder than this. Obviously, AGIs don't have to worry about biological viruses killing their "race", so a genocidal AGI won't be constrained by a need for viral selectivity like a human would. But there are several other reasons why I think the risk of AGIs doing this is much higher than the risk of humans doing this.
First of all, even if there is only one "species" of AGI (just as there is only one species of LLM, the Transformer), there are probably a wide variety of ways to configure it and a variety of personalities that it can have. Just as humans end up with a wide variety of personalities, beliefs, and goals (and just as GPTs seem able to mimick all of these insofar as they are not deliberately limited via e.g. RLHF), we should expect AGIs to be potentially highly variable insofar as humanity doesn't actively and strictly prevent variability. But now, several more difficulties appear for our forecast:
- Many "species" of AGI could easily appear over time, as people (and maybe AGIs) explore the Mind Design Space further
- Unlike biological species that develop slowly along a "tree" of species, software tends to explode onto the scene as a "mosaic", as any single person can create a new app constructed out of an arbitrary combination of existing npm packages, each configured in an arbitrary way. Whereas 10 species can evolve to 20 species over many generations, if you start from 10 software libraries, people can easily build a thousand-plus apps based on arbitrary combinations of those 10.
- AGIs can self-replicate. If they can run on ordinary PCs, as I think is possible (see https://dpiepgrass.medium.com/gpt5-wont-be-what-kills-us-all-57dde4c4e89d), they can potentially self-replicate extremely fast.
- AGIs are likely to think much faster than humans, regardless of whether they are more intelligent than us
- Intelligence is probably proportional to the logarithm of processing power (at minimum), so if an AGI wants to do more cognitively demanding tasks than any human can do, it can increase its intelligence with a supercomputer. Alternately, if it wants to do a task that a team of humans could do, it could run copies of itself scattered over the internet.
- If the "base" intelligence of an AGI is equal to an average human, you might not expect much danger; after all, the average high school cannot destroy the world no matter how much they might dream of it. However, given a "full-capability" goal-directed AGI ― one with agenticity, goals, smart long-term memory, smart task prioritization, and an ability to alter clones of itself ― its mental abilities will grow over time because it has the full capabilities of a computer. This means it can (in layman's terms) install new algorithms into its own mind, whether it wrote the algorithms itself or not. The consequences of this are hard to predict but should will generally increase mental ability over time. In addition, all AGIs will probably have much better short-term memories than humans because they run on computers.
- Communication between AGIs and other software & AGIs can be orders of magnitude faster than communication between humans and web sites & other humans. Some of this communication bandwidth will be used to improve the reliability of communication between AGIs/systems; some of it could also be used to perform coordinated actions extremely quickly.
It seems to me that the behavior of the "average" AGI is irrelevant for forecasting catastrophic risk. If a billion people have a friendly well-behaved AGI assistant on their phone, this does not imply that some e/acc teenager cannot create a substantially different AGI that decides to kill everyone.
The risk of AGI, therefore,. comes not from the average AGI, but from the most dangerous AGI anyone *ever* builds. And I think when you look at it that way, every single one of the above bullet points increases the risk of catastrophe. Have you considered all this, Robert?
Edit: also, I will quantify what "worst" means by analogy to viruses. The total impact of a "deadly" virus is a function of both its virulence (direct deadliness) and communicability (contagiousness). Similarly, the total impact of a "deadly" AGI is a function of whatever harmful intentions it has, its raw intelligence (which defies definition, but "we know it when we see it"), and its ability to effectively use that intelligence as an agent (which depends on the quality of its long-term memory and on its skill at short-term and long-term planning). It seems to me that several kinds of limitations would stop AGI from causing catastrophe, but (i) it's hard to stop people from making more capable / less limited AGIs and (ii) a likely scenario is that a single human will control a large army of AGIs eventually, and since some humans are malevolent, the army could be malevolent (that might not mean "catastrophe" in the technical sense of mass deaths, but it's worth noting, because in my mind s-risks have the potential to be worse than x-risks).
You probably appreciate this already, but I'd like to add that, just as Putin would be just as dangerous in a wheelchair, AGIs can in principle amass real-world power without ever leaving the internet, especially as they are likely to be able to impersonate humans quite well.
Thanks for the comment, David.
I'd have to write several essays to respond to all these points, but I did work for years as an existential risk researcher—and have written several papers related to AI risk—so I have in fact considered these issues carefully.
In general I think that it's a mistake to discount the past steeply even when there are methodological problems using it as a guide to the future. It is hard—verging on impossible—to predict scientific or social developments purely analytically. The outside view should tell us that we shouldn't be confident that any particular speculative scenario will come to pass.
Well, I am subscribed, so I would read those essays! I am not sure quite what you mean, though. I am not positing a "particular" speculative scenario. It seems to me that the way of thinking I laid out above fits a very wide variety of realistic future scenarios. And it rests not only on my mental model of technology and intelligence, but also how society and humans work. I assume humans will behave generally the same way as have before; for example they will publish information about algorithms that will later be used in AGI, and indeed the majority of the necessary foundations might have been published already. (Complete AGI designs might not be published initially, but once one organization puts the pieces together, it's probably just a matter of time before another organization or teenager does so independently, in some slightly different way, and publishes details publicly.)
A strong trend from history is that strong regulations tend to appear only after disaster strikes. I'm not sure how to estimate, in this case, whether AGI will lead to a minor disaster that causes global regulations quickly enough to prevent a bigger catastrophe, and how well regulations would work given (i) the lack of expertise among politicians, (ii) the greater difficulty of preventing an AI arms race relative to a nuclear arms race, and (iii) the even greater difficulty of preventing teenagers in basements from trying to design and build AGIs. Regarding the last point, I feel like the main difference in my mental model (vs popular conception) is in thinking that AGI doesn't require immense processing power, so that a teenager in a basement could build one despite whatever the regulations say (like a baby, it might take several years to train from scratch on a PC, but that's not a tremendous guardrail and I don't think from-scratch training will be necessary anyway).
Edit: oh, and while I've suggested scenarios in which AGI is invented suddenly, I wonder if a boiling-the-frog scenario ― in which mostly-safe AIs gradually advance into mostly-safe quasi-AGIs from which we derive very safe "full-capability" AGIs ― would be even more hazardous in the long run, due to a complacency effect in society, and because an AGI that turns out to be highly malevolent in this environment would have access to more information about the nature of intelligence and AI, which it could then use for its own purposes. However, if the "very safe" AGIs can be made arbitrarily powerful, an attempted "pivotal event" to stop disaster is more likely to succeed in this scenario.
Thanks for subscribing! I'll talk more about these issues in future posts, so we'll have to continue this discussion.
So glad you've made it to this milestone!
Glad to hear, too, that you're in agreement with the superforecaster medians from XPT. You have a lot more experience forecasting questions about catastrophe than I do.
Thanks, Kjirste! I'm very fortunate to have made such a good recovery.
I'm likewise glad the superforecaster medians lined up with my own views. I sometimes felt isolated as a catastrophic risk researcher who didn't think catastrophe was imminent. Why do you think the superforecasters and the experts were unable to convince one another? What do you think was the source of the disagreement between the two groups?
So many reasons! I wrote a bit about it the other day: https://medium.com/@kjirstecm/ai-forecasting-thoughts-on-comments-about-fris-xpt-report-8bb77b910070 and I also agree with comments I've seen elsewhere about it being long, and perhaps people became tired. It was difficult to keep track of discussions with so many questions, many inter-related (16 required, but each had 9 specific forecasts iirc, and then there were as many as possible of the rest of the 59 questions to deal with). Pages 40 and 41 of the report also summarize some of the interaction difficulties - it did feel like a stalemate at times.
What you wrote is really interesting! I've been struck by the religious quality of AI risk arguments too. This isn't an ordinary question to many people; it's deeply-held eschatology.
I wonder if thinking of the difference that way suggests an alternative way to fostering communication between groups? Or maybe it just means that agreement of any sort will be harder to reach.
That's a great question. It seems clear that the best forecasters to base their forecasts on fixed religious or ideological beliefs. But l don't most people think that's what they're doing. I always think the cure is to subject our views to an outside view and seek out reasons why we might be wrong. But I don't know how to make sure that's what everyone's doing.
You mention "exasperating" experts that seemed to want "faith". I have noticed this with some people. Yudkowsky's high P(doom) has been particularly confusing to me, as I expected clearer arguments from the father of modern rationalism. I love Rationality A-Z and enjoyed HPMOR but so far I haven't seen a convincing chain of reasoning about AGI from him. But tell me, do any of the experts reason about this the same way I do? https://tellingthefuture.substack.com/p/forecasting-the-end-of-the-world/comment/21536469
Not 100% sure for any one person or question, but I'd say that the general arguments that you lay out were represented among the views expressed in the tournament. Several were expressed by multiple people in their rationales, sometimes from experts, sometimes supers, iirc.
I should also say that there was representation of both cohorts (experts and supers) in saying things that I thought were over the top. I am quite sure that there were experts and supers who found my rationales and forecasts exasperating too.