profile

Apart Research

Featured Post

What a Week! GPT-4 & Japanese Alignment

A Self-Replicating GPT-4! What a week. There was already a lot to cover Monday when I came in for work and I was going to do a special feature on the Japan Alignment Conference 2023 and watched all their recordings. Then GPT-4 came out yesterday and all my group chats began buzzing. So in this...
Read now
17 days ago • 5 min read

Perspectives on AI Safety

Interpretability on Reinforcement Learning and Language Models This week, we take a look at interpretability used on a Go-playing neural network, glitchy tokens and the opinions and actions of top AI labs and entrepreneurs. Watch this week's MLAISU on YouTube or listen to it on Podcast. Research...
26 days ago • 4 min read

Bing Wants to Kill Humanity W07

Failures of language models Welcome to this week’s ML & AI safety update where we look at Bing going bananas, see that certification mechanisms can be exploited and that scaling oversight seems like a solvable problem from our latest hackathon results. Watch this week's MLAISU on YouTube or...
about 1 month ago • 4 min read

Will Microsoft and Google start an AI arms race? W06

Will Microsoft and Google start an AI arms race? We would not be an AI newsletter without covering the past week’s releases from Google and Microsoft but we will use this chance to introduce the concept of AI race dynamics and why researchers are getting more cynical. Watch this week's MLAISU on...
about 2 months ago • 3 min read

Extreme AI Risk W05

Extreme AI Risk In this week's newsletter, we explore the topic of modern large models’ alignment and examine criticisms of extreme AI risk arguments. Of course, don't miss out on the opportunities we've included at the end! Watch this week's MLAISU on YouTube or listen to it on...
about 2 months ago • 2 min read

Was ChatGPT a good idea? W04

Was ChatGPT a good idea? In this week’s ML & AI Safety Update, we hear Paul Christiano’s take on one of OpenAI’s main alignment strategies, dive into the second round winners of the inverse scaling prize and share the many fascinating projects from our mechanistic interpretability hackathon. And...
2 months ago • 3 min read

Compiling code to neural networks? W03

Compiling code to neural networks? Welcome to this week’s ML & AI Safety Report where we dive into overfitting and look at a compiler for Transformer architectures! This week is a bit short because the mechanistic interpretability hackathon is starting today – sign up on ais.pub/mechint and join...
2 months ago • 2 min read

Robustness & Evolution W02

Robustness & Evolution Welcome to this week’s ML Safety Report where we talk about robustness in machine learning and the human-AI dichotomy. Stay until the end to check out several amazing competitions you can participate in today. Watch this week's MLAISU on YouTube or listen to it on...
3 months ago • 4 min read

Hundreds of research ideas! W01

AI Improving Itself Over 200 research ideas for mechanistic interpretability, ML improving ML and the dangers of aligned artificial intelligence. Welcome to 2023 and a happy New Year from us at the ML & AI Safety Updates! Watch this week's MLAISU on YouTube or listen to it on Spotify. Mechanistic...
3 months ago • 5 min read

Will machines ever rule the world? MLAISU W50

Will machines ever rule the world? Hopes and fears of the current AI safety paradigm, GPU performance predictions and popular literature on why machines will never rule the world. Welcome to the ML & AI safety Update! Watch this week's episode on YouTube or listen to the audio version here. Hopes...
4 months ago • 4 min read