Governing AI & Evaluating Danger

published6 months ago
4 min read

Governing AI & Evaluating Danger

We might need to shut it all down, AI governance seems more important than ever and technical research is challenged. Welcome to this week's update! We've renamed our newsletter the AI Safety Digest (AISD) and will make a few changes during the next few weeks, so prepare for those.

Watch or listen to this weeks episode on YouTube or podcast.

Stop AGI Development

"We need to shut it all down." This is the wording in a new Time Magazine article where Eliezer Yudkowsky urges us to stop the development towards artificial general intelligence completely before it's too late.

He refers to a recent open letter signed by over 1800 researchers and experts in AI urging the world to stop the training of larger-than-GPT-4 models for at least 6 months. It is receiving a lot of criticism from different points of view for either not taking the existential risks seriously enough or for being alarmist without any reason.

The letter’s perception has been negatively affected by Elon Musk’s controversial inclusion, and many people seem to have not even read it while assuming it is about banning all AI research when it is clearly not, as mentioned above.

In addition, the criticism that it is not focused enough on existential risk seems to miss that it has had a positive impact on what is now being talked about in the public sphere. Nearly everyone in the research field has been interviewed about this letter, and it represents a great leap forward for the conversation on AI safety.

As part of the release of the letter, The Center for AI and Digital Policy (CAIDP) filed a complaint about OpenAI's release of GPT-4 to the FTC. If this leads to an FTC investigation, we might end up with better government control on large artificial intelligence systems releases for upcoming systems.

AI Governance Ideathon

In the context of this letter, we held the AI governance ideathon this past weekend. More than 120 people participated from across all 7 continents with local jam sites in 6 of these. The submissions were amazing and here we'll quickly summarize a few of them.

  • A proposal to implement data taxation won the first prize. It presents a formula to tax large model training runs such as GPT-4 without costing anything for smaller, narrow AI models. The method is also robust to most tax avoidance schemes.
  • Another submission dove deep into how AI governance is highly relevant in developing countries and why we want to make sure it develops well, especially in the light of China's influence in e.g. Africa and Southeast Asia.
  • We also saw a global coordination scheme for slowing down AGI by constructing an international oversight body that collaborates and regulates countries and companies towards safer AI.
  • A technical project used GPT-4 to evaluate AI project proposals. Despite the limited results, it presents the first steps towards creating automated auditing of AI projects.
  • The NAIRA proposal gives a detailed plan to establish a US department such as the Food and Drug Administration (FDA) to control AI development.
  • A market dynamics proposal wants to create AI-based watchmen to provide the best grounds for healthy competition between AIs and give a good overview of economics and AI safety.
  • Another submission proposes to rank companies based on how safety-focused their activities are, something that might be useful in the context of public procurement contracts and to establish a better public perspective on organizations in AGI development.
  • A Canadian team made a simulation of different avatars using GPT-4 that lead to great discussion about AI safety from Margrethe Vestager, Jack Sparrow, and various other simulated identities.
  • As ARC evals are being developed, a proposal focuses on legislation to ensure that these become requirements before publishing large models.
  • In 1985, environmental impact assessments made sure that European development projects do not negatively affect the environment too much. With the proposal for AI Impact Assessments, the same process is put to use for large model training scenarios.

You can read all the projects on the ideathon page or watch the award ceremony on our YouTube channel.

AI Safety Research?

With releases such as LangChain, the Zapier Natural Language Actions API and ChatGPT Plugins, we see higher risks emerging from hooking up large language systems with the internet in various ways. You can now even talk to your watch to request GPT-4 to program on Github for you!

With these levels of progress, it seems like the main advances we need in AI safety at the moment are related to the evaluation and certification of how dangerous future models are and to create techniques that are specifically applicable to systems like large language models.

A good example of this is the Alignment Research Center's evaluations on language models for their capability to break out of their digital confines. In a recent article, they expand more on their work presented in the GPT-4 system card.

GPT-4 was given instructions on how to use internet tools and given the help of a scientist as a liaison to the web, it ran on a cloud instance and ended up hiring a TaskRabbit worker to solve Captchas and even dissuaded the TaskRabbit worker from thinking it was a robot by saying it had poor eyesight.

Luckily, it was not capable enough to do good long-term planning to escape, though we must remember that this was without further tooling added (e.g. Pinecone) and we're still expecting GPT-5 and -6. It is both an exciting and scary time ahead!


With the fast developments, we of course see just as many opportunities within the space as usual! Join us in:

  • You can join in a couple of weeks for another interpretability hackathon where we give you clear guidelines for how to do exciting things with neural network interpretability along with 48 hours and a deadline! Come along, either virtually or by hosting a local site. Join our Discord to stay up-to-date.
  • Come along for the launch event for the newly founded European Network for AI Safety, a decentralized organization for coordination across Europe.
  • The Stanford AI100 essay writing competition is still in progress and invites you to write how you think AI will affect our lives in the future.
  • If you are very fast, you can join a course in information security with a former Google information security officer. The deadline is tomorrow!

Thank you for following along and we look forward to seeing you next time!


Dokk21, Filmbyen 23, 2. tv, Aarhus, 8000
Unsubscribe · Preferences

Apart Research

We share newsletters about the progress of ML safety, run fun ML safety hackathons and develop the collaborative research platform AI Safety ideas.

Read more from Apart Research

Perspectives on AI Safety

7 months ago
4 min read

Bing Wants to Kill Humanity W07

7 months ago
4 min read