Governing AI & Evaluating Danger
We might need to shut it all down, AI governance seems more important than ever and technical research is challenged. Welcome to this week's update! We've renamed our newsletter the AI Safety Digest (AISD) and will make a few changes during the next few weeks, so...
A Self-Replicating GPT-4!
What a week.
There was already a lot to cover Monday when I came in for work and I was going to do a special feature on the Japan Alignment Conference 2023 and watched all their recordings. Then GPT-4 came out yesterday and all my group chats began buzzing.
So in this...
Interpretability on Reinforcement Learning and Language Models
This week, we take a look at interpretability used on a Go-playing neural network, glitchy tokens and the opinions and actions of top AI labs and entrepreneurs.
Watch this week's MLAISU on YouTube or listen to it on Podcast.
Research...
Failures of language models
Welcome to this week’s ML & AI safety update where we look at Bing going bananas, see that certification mechanisms can be exploited and that scaling oversight seems like a solvable problem from our latest hackathon results.
Watch this week's MLAISU on YouTube or...
Will Microsoft and Google start an AI arms race?
We would not be an AI newsletter without covering the past week’s releases from Google and Microsoft but we will use this chance to introduce the concept of AI race dynamics and why researchers are getting more cynical.
Watch this week's MLAISU on...
Extreme AI Risk
In this week's newsletter, we explore the topic of modern large models’ alignment and examine criticisms of extreme AI risk arguments. Of course, don't miss out on the opportunities we've included at the end!
Watch this week's MLAISU on YouTube or listen to it on...
Was ChatGPT a good idea?
In this week’s ML & AI Safety Update, we hear Paul Christiano’s take on one of OpenAI’s main alignment strategies, dive into the second round winners of the inverse scaling prize and share the many fascinating projects from our mechanistic interpretability hackathon. And...
Compiling code to neural networks?
Welcome to this week’s ML & AI Safety Report where we dive into overfitting and look at a compiler for Transformer architectures! This week is a bit short because the mechanistic interpretability hackathon is starting today – sign up on ais.pub/mechint and join...
Robustness & Evolution
Welcome to this week’s ML Safety Report where we talk about robustness in machine learning and the human-AI dichotomy. Stay until the end to check out several amazing competitions you can participate in today.
Watch this week's MLAISU on YouTube or listen to it on...
AI Improving Itself
Over 200 research ideas for mechanistic interpretability, ML improving ML and the dangers of aligned artificial intelligence. Welcome to 2023 and a happy New Year from us at the ML & AI Safety Updates!
Watch this week's MLAISU on YouTube or listen to it on Spotify.
Mechanistic...