Understanding large models
An important task for our work in making future machine learning systems safe is to understand how we can measure, monitor and understand these large models’ safety.
This past week has a couple of interesting examples of work that helps us in this direction besides last week’s wonderful inverse scaling examples.
These are but a few good examples of work that investigates how we can scale our alignment understanding to larger systems. You can join us next weekend for the ScaleOversight hackathon to contribute to this growing field and meet amazing people who share the passion for ML safety around the world!
Hardcore AGI doom
We also shift our focus slightly from the technical aspects of AI alignment research to a thought-provoking article by Nuño Sempere. The piece addresses the alarmist views regarding the imminent dangers of artificial general intelligence (AGI).
Sempere critiques the notion of a severe short-term risk from AGI, such as an 80% chance of human extinction by 2070, stating that these claims are based on flawed reasoning and imperfect concepts. He also highlights the lack of proper presentation of the cumulative evidence against such extreme risks."
On the topic, in this week’s ML Street Talk podcast, renowned philosopher Luciano Floridi made an appearance. Floridi recently published an article expressing his distrust of both those who believe in a rapid intelligence explosion and those who dismiss the risks of AI. He stresses the importance of preserving human dignity and argues that the concept of AI having agency (“able to think”) is not actually relevant to the conversation about risk.
Of course, there are still many risks from AI, especially in the longer term. We recommend that you read Eliezer Yudkowsky’s list of ways AGI can go wrong. Here, he mentions that we need 100% safe solutions, we cannot “just train AI on good actions” and that current efforts are not attacking the right problems.
In other research news…
In the opportunities area, we have…
Thank you for joining us in this week’s ML and AI safety update!