15 Comments
Sep 3, 2023Liked by Robert de Neufville

I would just assume that the difference between superforecasters and experts wasn't a product of different actual beliefs but the extremely low skill of people who are not superforecasters (or just experienced forecasters, I suppose) to actually translate a perception of relevant factors into a good probability - stuff like giving a 5% probability to things that should logically be 0.1% at best, like Covid deaths instantly flatlining in the middle of 2020.

Expand full comment
Jul 27, 2023Liked by Robert de Neufville

Bravo for your hard work and self confidence evident in your reports on your continuing recovery.

Expand full comment
Jul 29, 2023·edited Jul 29, 2023

> But I’m with the superforecasters on this. I didn’t participate in the tournament, but the experts’ forecasts seem clearly too high to me. There has never been a catastrophe that killed 10% of the human population in a five year period in all of recorded history, although two plague pandemics—the Plague of Justinian in the sixth century and the Black Death in fourteenth—are near misses.

For obvious reasons, human forecasters don't say "we observe a nonzero base rate for human extinctions", yet extinction remains possible. Superforecasters excel at outside views and base rates, but I think outside view and base rates are less useful than usual for forecasting AGI catastrophe, as there is nothing remotely like AGI in human history.

This problem is a bit like forecasting the risk of catastrophic nuclear war during the cold war (probably involving countervalue strikes, but possibly not, if nuclear winter alone is severe enough to be catastrophic). What base rate would you have used for that in 1950? Even with the cold war over, I still feel like we have little data to work with. I think it's pretty contingent ― it depends on the personalities of the people at the top on both sides (not that there are necessarily only two sides, mind you), how each side thinks about the other, and on the systems of control and relationships (e.g. how hard it is to launch missiles without orders from the top). One should also consider near misses: what if one side decides to do a series of countervalue strikes but the other side responds only with counterforce strikes? Does this lead to a dystopia as countries surrender to the most vicious attacker, or a second war with a second nuclear-armed nation, as everyone knows that the victor chose countervalue strikes?

Or perhaps forecasting this is more like forecasting the risk of intentional or semi-intentional catastrophe via highly contagious and deadly superviruses (partly because if AGI wants to kill everyone, this is the most obvious way to do it). For a human to do this requires that he be either omnicidal or genocidal, and in the latter case he is constrained to require a targeted virus that he is confident will not kill his own race (most likely a genocidal person would only want to kill one race, which might be too hard, and he might also decide not to do it due to the risk of mutation). Currently, it also requires a high level of knowledge and skill, and the effort would benefit from having a lot of money and/or manpower. Given that omnicidal ideation is extremely rare, and determined genocidal ideation is rare, and that the necessary skillset is very rare, I expect humans to create such a virus quite rarely ― but how do you decide just how rarely?

Estimating the same risk coming from AGI is much, much harder than this. Obviously, AGIs don't have to worry about biological viruses killing their "race", so a genocidal AGI won't be constrained by a need for viral selectivity like a human would. But there are several other reasons why I think the risk of AGIs doing this is much higher than the risk of humans doing this.

First of all, even if there is only one "species" of AGI (just as there is only one species of LLM, the Transformer), there are probably a wide variety of ways to configure it and a variety of personalities that it can have. Just as humans end up with a wide variety of personalities, beliefs, and goals (and just as GPTs seem able to mimick all of these insofar as they are not deliberately limited via e.g. RLHF), we should expect AGIs to be potentially highly variable insofar as humanity doesn't actively and strictly prevent variability. But now, several more difficulties appear for our forecast:

- Many "species" of AGI could easily appear over time, as people (and maybe AGIs) explore the Mind Design Space further

- Unlike biological species that develop slowly along a "tree" of species, software tends to explode onto the scene as a "mosaic", as any single person can create a new app constructed out of an arbitrary combination of existing npm packages, each configured in an arbitrary way. Whereas 10 species can evolve to 20 species over many generations, if you start from 10 software libraries, people can easily build a thousand-plus apps based on arbitrary combinations of those 10.

- AGIs can self-replicate. If they can run on ordinary PCs, as I think is possible (see https://dpiepgrass.medium.com/gpt5-wont-be-what-kills-us-all-57dde4c4e89d), they can potentially self-replicate extremely fast.

- AGIs are likely to think much faster than humans, regardless of whether they are more intelligent than us

- Intelligence is probably proportional to the logarithm of processing power (at minimum), so if an AGI wants to do more cognitively demanding tasks than any human can do, it can increase its intelligence with a supercomputer. Alternately, if it wants to do a task that a team of humans could do, it could run copies of itself scattered over the internet.

- If the "base" intelligence of an AGI is equal to an average human, you might not expect much danger; after all, the average high school cannot destroy the world no matter how much they might dream of it. However, given a "full-capability" goal-directed AGI ― one with agenticity, goals, smart long-term memory, smart task prioritization, and an ability to alter clones of itself ― its mental abilities will grow over time because it has the full capabilities of a computer. This means it can (in layman's terms) install new algorithms into its own mind, whether it wrote the algorithms itself or not. The consequences of this are hard to predict but should will generally increase mental ability over time. In addition, all AGIs will probably have much better short-term memories than humans because they run on computers.

- Communication between AGIs and other software & AGIs can be orders of magnitude faster than communication between humans and web sites & other humans. Some of this communication bandwidth will be used to improve the reliability of communication between AGIs/systems; some of it could also be used to perform coordinated actions extremely quickly.

It seems to me that the behavior of the "average" AGI is irrelevant for forecasting catastrophic risk. If a billion people have a friendly well-behaved AGI assistant on their phone, this does not imply that some e/acc teenager cannot create a substantially different AGI that decides to kill everyone.

The risk of AGI, therefore,. comes not from the average AGI, but from the most dangerous AGI anyone *ever* builds. And I think when you look at it that way, every single one of the above bullet points increases the risk of catastrophe. Have you considered all this, Robert?

Edit: also, I will quantify what "worst" means by analogy to viruses. The total impact of a "deadly" virus is a function of both its virulence (direct deadliness) and communicability (contagiousness). Similarly, the total impact of a "deadly" AGI is a function of whatever harmful intentions it has, its raw intelligence (which defies definition, but "we know it when we see it"), and its ability to effectively use that intelligence as an agent (which depends on the quality of its long-term memory and on its skill at short-term and long-term planning). It seems to me that several kinds of limitations would stop AGI from causing catastrophe, but (i) it's hard to stop people from making more capable / less limited AGIs and (ii) a likely scenario is that a single human will control a large army of AGIs eventually, and since some humans are malevolent, the army could be malevolent (that might not mean "catastrophe" in the technical sense of mass deaths, but it's worth noting, because in my mind s-risks have the potential to be worse than x-risks).

You probably appreciate this already, but I'd like to add that, just as Putin would be just as dangerous in a wheelchair, AGIs can in principle amass real-world power without ever leaving the internet, especially as they are likely to be able to impersonate humans quite well.

Expand full comment

So glad you've made it to this milestone!

Glad to hear, too, that you're in agreement with the superforecaster medians from XPT. You have a lot more experience forecasting questions about catastrophe than I do.

Expand full comment