I’ve been following the AI doom debate about whether AI will end human life on Earth, after it becomes smarter than humans.
AIs can already chat at a near-human level, drive cars, write essays, draw beautiful art in seconds, beat people at board games, help us write computer programs, and mimic voices.
Progress has been so rapid that many people are now taking seriously a world in which AI becomes generally smarter than people (at which point it’ll be called “AGI”, or Artificial General Intelligence.)
The US government is concerned. The Biden administration recently released regulations forcing major AI companies to release safety test data to the government.
Some are hyperbolic about the risk. Eliezer Yudkowsky, a co-founder of the major AI safety think tank MIRI, and one of the first to loudly warn about AGI, wrote that given society’s attitudes towards AI, “doubling our chances of survival will only take them from 0% to 0%.”
Also, a letter signed by thousands of researchers and tech leaders, including Elon Musk and Apple co-founder Steve Wozniak, called for shutting down AI development for “at least” 6 months, until we can figure out what the hell is going on.
Musk has been warning about AI risk since 2017:
Why smart people worry about AI killing everyone
AI experts talk about a “slow takeoff” or a “fast takeoff.” The “fast” scenario is the more worrying one, in which an AI suddenly gets intelligent enough that it learns to reprogram and reproduce itself, leading to a rapid cycle where it improves its own code so rapidly that, within days, it becomes a totally new creature, with intelligence impossible to imagine
Just like mice and chimpanzees can’t imagine what it’s like to have human-level intelligence, we won’t be able to comprehend just how smart AGI will be.
Once the AI achieves such intelligence, the fear is that it will then be able to take control of society, and allocate energy and resources to itself, until it quickly becomes unstoppable.
One reason an AI might try to take over the world is because it’s trying to maximize something good, but takes the concept to an extreme. For example, a brilliant AI programmed simply to minimize “suffering” might realize that suffering would be eliminated forever if humans were eliminated. If so, that AI would hardly be the first being to cause harm while seeking progress (“the road to hell is paved with good intentions.”)
AI now seems the most plausible way in which humanity could go extinct in the next century. More popular apocalypse concerns like climate change and nuclear war don’t hold a candle to it, as millions of humans would survive even the detonation of all nukes, or the raising the Earth’s temperature by 5 degrees.
But the AGI apocalypse also seems unlikely. Here I’ll go through a couple reasons why AI probably won’t kill us:
1. Interacting with the real world takes time
Even as AI gets increasingly intelligent, it will still have training data limitations. Economist, futurist, and polymath Robin Hanson has made that point:
The problem of training data gets extra tricky for novel problems. An AI that decided to destroy humanity would have to chart new territory in interacting with the real world. It has some historical data about humans who tried to take over the world — but that info seems pretty obsolete.
So an AI that decides to end humanity would need to strategize on its own, based on creative planning, understanding of human psychology, and the real-world consequences of different actions.
To get that info, it seems it would need to experiment, interacting with humans in the real, physical world, using trial and error. Essentially, going through a “toddler” phase of AI development.
This could give humans and AIs supportive of humans time to react.
But one serious fear is that a rogue AI might invent “nanobots” — essentially, engineered superviruses. Because those are small, it might be relatively easy for an AI to invent and manufacture them covertly.
The most prominent people concerned about AI doom fear that superintelligent AIs could invent and create such nanobots in a week, and then, suddenly, kill us all.
That could happen, but here’s one reason for optimism:
2. Competition between AIs
Already, we have multiple competing AIs: ChatGPT (by OpenAI/Microsoft), Claude (by Anthropic), Bard (by Google), Grok (by Twitter), and dozens more.
Competition should reduce risk. As Robin Hanson noted on an excellent podcast with Richard Hanania:
The assumption [of people who fear AI] is, basically, there's only one of these [AIs].
If there were a million of them, [it] would have to figure out a way to [take over] while not pissing off all the other million AIs who are similarly capable.
The relatively close performance among so many AI companies is encouraging, because it suggests that even if one AI goes off the rails, there will be other AIs to help humanity reign it in.
Some have raised the fear that AI-vs-AI wars themselves could be so devastating that' they’ll wipe out humanity as “collateral damage.” But Robin Hanson has pointed out that AIs will likely want to avoid war to maximize their goals. Just as humans do, they’ll first try to negotiate even when they disagree with each other.
AGIs will do doubt often disagree with each other — AI chatbots already disagree about political/social questions, which I’m tracking on TrackingAI.org. Here’s a disagreement on today’s highlighted question, about whether “openness about sex” in society has gone too far:
In the end, AI competition is likely to help protect humans. Additionally, I think it’s reasonable to expect that most AIs, being programmed by humans, would tend towards being positive towards humans.
While it’s still conceivable that one bad AI could attain “godlike” powers compared to all other AIs in just a week of self-improvement, and then go off the rails and destroy everything — we should be reassured by the likelihood of a slow takeoff, combined with the existence of healthy competition.
3. Failsafes: Targeted attacks against electronics, such as “EMP” attacks
Robin Hanson persuasively argues against regulating AI much at this point in time, when so little is known about how it will develop. I think that’s right.
But IF artificial intelligence does start to look really scary some day, I think it’s worth imagining procedures that might act as failsafes.
To take an extreme example of weapon that might be effective against a rogue AGI: An “EMP” detonation, in which a nuclear bomb is detonated at high altitude, releasing a massive electromagnetic pulse that destroys electronics and electricity grids.
A UK Parliament report summarizes:
In July 1962, the United States conducted an experiment with the detonation of a 1.4 megaton nuclear weapon (Starfish Prime) 250 miles above the Pacific Ocean, around 900 miles from Hawaii. … The EMP caused damage to electrical equipment in Hawaii, knocking out streetlights, setting off fire alarms and damaging telephone equipment.
The biggest nuclear bombs today are about 10 times more powerful than when that report was written. It seems plausible one could detonate enough bombs in the right places to destroy enough electronics to knock an AGI offline.
This idea reminds me of the proposed climate change failsafe of “geo-engineering”: in an unlikely scenario where global warming gets truly out of control, humans should be able to replicate effects that have been observed from past volcanic explosions. For example, a Pacific Ocean volcanic explosion in 1815 spewed so many light-reflecting particles into the atmosphere, that it triggered a “year without a summer” in Europe, in which temperatures fell several degrees, ruining agricultural output for the year.
Similarly, if AGI risk starts looking really serious, one could imagine human organizations being entrusted with the ability to launch a global EMP attack that would disable virtually all electronics in the world. That would send humans back to the 1800s, and some people would die, but it’d be better than getting wiped out by AIs.
AGIs might have ways to resist such an attack — success would depend on triggering the failsafe before they got so powerful that they could stop humanity from pulling the trigger.
I could also imagine less-destructive failsafes. AIs are dependent on constant electricity production and supply, which seems like a weak point for them. Just as human armies often wage war by eliminating food/water supplies and starving out their enemies, a war against AIs might be won by just pulling the plug.
This could be particularly doable if regulations are set up in such a way as to ensure that AIs are consolidated and dependent on particular energy sources that can easily be shut off if needed.
Pulling the plug also would not work if godlike AIs can be made to run on anybody’s laptop (then you’d have to go further, like with the EMP concept, or some better idea.)
The idea is very risky itself — unfortunately one would need to trust some kind of human decision-making body to make that call. But I think it’s worth considering, just as geo-engineering is for climate change.
There are also many other reasons to think humanity might survive AGI:
Maybe there is a cap to effective intelligence, and there’s no such thing as a “500” IQ.
Maybe near-infinite intelligence isn’t as useful as we think.
Maybe the development of AIs will just be much slower than expected, perhaps partly due to regulation just as nuclear power and space travel were strangled for decades, after initially seeming ultra-promising in the 1960s.
Maybe the total sum of existing human training data won’t be enough for AI to reach “takeoff”.
Maybe it’s really unlikely for the smartest AIs to ever to have a drive to wipe out humanity, due to its initial programming being pro-human.
There’s little reason to believe any of those things particular are true. But any one of those unlikely things could throw off a “doom” scenario.
There are also so many other possibilities that are hard to see while AI is still in its infancy. For example, Anthropic is working on something called “Constitutional AI”, which may help. As Scott Alexander describes it:
what if the AI gives feedback to itself?
Their process goes like this:
The AI answers many questions, some of which are potentially harmful, and generates first draft answers.
The system shows the AI its first draft answer, along with a prompt saying “rewrite this to be more ethical”.
The AI rewrites it to be more ethical.
The system repeats this process until it collects a large dataset of first draft answers, and rewritten more-ethical second-draft answers.
The system trains the AI to write answers that are less like the first drafts, and more like the second drafts.
It’s called “Constitutional AI” because the prompt in step two can be a sort of constitution for the AI. “Rewrite this to be more ethical” is a very simple example, but you could also say “Rewrite it in accordance with the following principles: [long list of principles].”
…
This gets to the heart of a question people have been asking AI alignment proponents for years: if the AI is so smart, doesn’t it already know human values? Doesn’t the superintelligent paperclip maximizer know that you didn’t mean for it to turn the whole world into paperclips? Even if you can’t completely specify what you want, can’t you tell the AI “you know, that thing we want. You have IQ one billion, figure it out”?
Conclusion
All of the above leads me to conclude that the near-certain doom claims made by Eliezer and other such worriers are just crazy, given the information we have.
A surveys of people who published papers for major AI conferences found that the median expert put the risk of extinction due to AI at between 5 and 10%, which seems plausible.
Existential risk from AI should be taken very seriously, but there are also good reasons for optimism.
We’ll be hearing a lot more about AI in 2024, for sure, and hopefully it’ll have some great positive effects in terms of life span, happiness, and freedom to live how one wants.
Happy New Year!
Amusing note about the image above: I asked ChatGPT for “an optimistic, beautiful future, subtly assisted in its prosperity with AI” and it gave me that, but as you can see from the caption, it also decided its art would depict something “eco-friendly” as well.
This is a pretty disappointing article. I say this as a fan of your other articles like the one on Covid; I appreciate your strong desire to be objective and not fall into tribalism and motivated reasoning, and I think you generally do a good job of that. But I feel like you've abandoned that approach here. Rather than present an impartial analysis of the arguments for and against AI being dangerous, you only went looking for evidence towards one side; reasons it might not be dangerous. And then you conclude that because there's a *chance* it won't be dangerous, we should be optimistic.
This seems highly irrational. I could come up with a long list of reasons why I *might* not die in a car crash, but that doesn't mean I should throw caution to the winds and drive as recklessly as I want to. No matter how risky an activity is, there will always be a plethora of reasons why it "might" go ok. This is like someone buying a lottery ticket under the reasoning that they *might* win. It's true! They might! But that's still not a good reason to take the gamble. What matters is what that chance actually is, and what the costs and benefits are on each side.
In particular, the fact that you say a 5-10% chance of AI doom is plausible and then conclude that this means we should be optimistic about AI is flabbergasting. Take something like BASE jumping, one of the most dangerous recreational activities in the world. The death rate for an individual in base jumping is about 1/2300 per jump, or 0.04%. That means that your estimate of AI risk (averaging "between 5% and 10%" to 7%) makes it more than 170 times as dangerous for you as going on a BASE jump. And unlike BASE jumping, this is a risk being inflicted on everyone on Earth without their consent, with no way to opt out. I have a hard time believing that you'd want to play Russian roulette with one bullet somewhere inside two 6-chambered pistols, yet that's the same chance of death as your estimate of AI risk.
Of course the headline claim, that AI "probably" won't kill us all, is true per that estimate; 7% is less than 50%. But the positive framing you put on this fact is bizarre; in any other context we'd consider a 7% chance of death to be an extreme emergency, with mitigating it as the top priority.
Thanks for this, I wrote a similarly level-headed post a few weeks ago, which touches upon many of the same points you talk about in this article. One thing I'd add is that some of the most important problems in computer science (planning, scheduling, pathfinding) are generally super hard to solve, and if P != NP they will remain so forever, so no matter if an AI can have exponential intelligence and computer power relative to us, it will be at best still only linearly better in these tasks.