Why AI might not be an existential risk to humanity W42

published7 months ago
4 min read

Why AI might not be an existential risk to humanity W42

Welcome to the ML Safety Report. Watch this week's episode on YouTube or listen to it in our podcast.

This week, we’re looking at counterarguments to the basic case for why AI is an existential risk to humanity, looking at how strong AI might come very soon, and sharing interesting papers.

But first a small note: You can now subscribe to our newsletter and listen to these updates in your favorite podcasting app. Check out newsletter.apartresearch.com and podcast.apartresearch.com.

Today is October 20th and this is the ML Safety Progress Update!

AI X-risk counterarguments

Existential risk of AI does not seem overwhelmingly likely according to Katja Grace from AI Impacts. She writes a long article arguing against the major perspectives on how AI can become very dangerous and notes that enough uncertainty makes AI safety seem like a relevant concern.

Her counterarguments go against the three main cases for why superintelligent AI will become an existential risk: 1) Superhuman AI systems will be goal-directed, 2) goal-directed AI systems’ goals will be bad, and 3) superhuman AI will overpower humans.

Her counterarguments for why AI systems might not be goal-directed are that many highly functional systems can be “pseudo-agents”, models that don’t pursue utility maximization but optimize for a range of sub-goals to be met. Additionally, to be a risk, the bar for goal-directedness is extremely high.

Her arguments for why goal-directed AI systems’ goals might not be bad are that: 1) Even evil humans broadly correspond to human values and that slight diversity from the optimal policy seem alright. 2) AI might just learn the correct thing from the dataset since humans also seem to get their behavior from the diverse training data of the world. 3) Deep learning seems very good at learning fuzzy things from data and values seem learnable in slightly the same way as generating faces (and we don’t see faces without noses for example). The last counterargument is that 4) AIs who learn short-term goals will both be highly functional and have a low chance of optimizing for dangerous, long-term goals such as power-seeking.

Superhuman AI might also not overpower humans since: 1) A genius human in the stone age would have a much harder time getting to space than an average intelligence human today which shows that intelligence is a much more nuanced concept than we set it to be. 2) AI might not be better than human-AI combinations. 3) AI will need our trust to take over critical infrastructure. 4) There are many other properties than intelligence which seem highly relevant. 5) Many goals do not end in taking over the universe. 6) Intelligence feedback loops can take many speeds and you need a lot of confidence that it is fast to say it leads to doom. And 7) key concepts in the literature are quite vague, meaning that we lack an understanding of how they will lead to existential risk.

Erik Jenner and Johannes Treutlein give their response to her counterarguments. Their main point is that there’s good evidence that the difference between AI and humans will be large and that we need Grace’s slightly aligned AI to help us reach a state where we do not build much more capable and more misaligned systems.

Comprehensive AI Services (CAIS)

A relevant text to mention in relation to these arguments is Eric Drexler’s attempt at reframing superintelligence into something more realistic in an economic world. Here, he uses the term “AI services” to describe singular tasks that will be economically relevant. The comprehensive in comprehensive AI services is what we usually call general. The main point is that we will see a lot of highly capable but specialized AI before we get the monolithic artificial general intelligence. We recommend reading the report if you have the time.

Strong AGI coming soon

At the opposite end of the spectrum from Grace, Porby shares why they think AGI will arrive in the next 20 years with convincing arguments on 1) how easy the problem of intelligence is, 2) how immature current machine learning is, 3) how quickly we’ll reach the level of hardware needed, and 4) how we cannot look at current AI systems to predict future abilities.

Other news

  • In other news, in a new survey published in Nature, non-expert users of AI systems think interpretability is important, especially in safety-critical scenarios. However, they prefer accuracy in most tasks.
  • Neel Nanda shares an opinionated reading of his favorite Circuits interpretability work.
  • A new method in reinforcement learning shows good results on both performance and how moral its actions are. They take a text-based game and train a reinforcement learning agent with both a task policy and a moral policy.
  • Wentworth notes how prediction markets might be useful for alignment research.
  • DeepMind has given a language model access to a physics simulation to increase its physics reasoning ability.
  • Nate Soares describes how superintelligent beings do not necessarily leave humans alive on game theoretic grounds.
  • A new research agenda in AI safety seeks to study the theory of deep learning using a pragmatic approach to understand key concepts.


And now, diving into the many opportunities available for all interested in learning and doing more ML safety research!

This has been the ML Safety Progress Update and we look forward to seeing you next week!


Apart Research

We share newsletters about the progress of ML safety, run fun ML safety hackathons and develop the collaborative research platform AI Safety ideas.

Read more from Apart Research

Governing AI & Evaluating Danger

about 2 months ago
4 min read

Perspectives on AI Safety

3 months ago
4 min read