profile

Apart Research

We share newsletters about the progress of ML safety, run fun ML safety hackathons and develop the collaborative research platform AI Safety ideas.

Featured Post

Governing AI & Evaluating Danger

Governing AI & Evaluating Danger We might need to shut it all down, AI governance seems more important than ever and technical research is challenged. Welcome to this week's update! We've renamed our newsletter the AI Safety Digest (AISD) and will make a few changes during the next few weeks, so prepare for those. Watch or listen to this weeks episode on YouTube or podcast. Stop AGI Development We need to shut it all down." This is the wording in a new Time Magazine article where Eliezer...
Read now
8 months ago • 4 min read

What a Week! GPT-4 & Japanese Alignment

A Self-Replicating GPT-4! What a week. There was already a lot to cover Monday when I came in for work and I was going to do a special feature on the Japan Alignment Conference 2023 and watched all their recordings. Then GPT-4 came out yesterday and all my group chats began buzzing. So in this week's MLAISU, we're covering the latest technical safety developments with GPT-4, looking at Anthropic's safety strategy, and covering the fascinating Japanese alignment conference. Watch this week's...
9 months ago • 5 min read

Perspectives on AI Safety

Interpretability on Reinforcement Learning and Language Models This week, we take a look at interpretability used on a Go-playing neural network, glitchy tokens and the opinions and actions of top AI labs and entrepreneurs. Watch this week's MLAISU on YouTube or listen to it on Podcast. Research updates We'll start with the research-focused updates from the past couple of weeks. First off, Haoxing Du and others used interpretability methods to analyze how Leela Zero, a Go-playing neural...
9 months ago • 4 min read

Bing Wants to Kill Humanity W07

Failures of language models Welcome to this week’s ML & AI safety update where we look at Bing going bananas, see that certification mechanisms can be exploited and that scaling oversight seems like a solvable problem from our latest hackathon results. Watch this week's MLAISU on YouTube or listen to it on Podcast. Bing wants to kill humanity Microsoft has released the Bing AI which is a ChatGPT-like powered search engine. Many test users have found it very useful but many people have found...
10 months ago • 4 min read

Will Microsoft and Google start an AI arms race? W06

Will Microsoft and Google start an AI arms race? We would not be an AI newsletter without covering the past week’s releases from Google and Microsoft but we will use this chance to introduce the concept of AI race dynamics and why researchers are getting more cynical. Watch this week's MLAISU on YouTube or listen to it on Spotify. Understanding Race Dynamics This week, Microsoft debuted their updated version of Bing, heavily reliant on OpenAI's GPT-4, the latest state-of-the-art language...
10 months ago • 3 min read

Extreme AI Risk W05

Extreme AI Risk In this week's newsletter, we explore the topic of modern large models’ alignment and examine criticisms of extreme AI risk arguments. Of course, don't miss out on the opportunities we've included at the end! Watch this week's MLAISU on YouTube or listen to it on Spotify. Understanding large models An important task for our work in making future machine learning systems safe is to understand how we can measure, monitor and understand these large models’ safety. This past week...
10 months ago • 2 min read

Was ChatGPT a good idea? W04

Was ChatGPT a good idea? In this week’s ML & AI Safety Update, we hear Paul Christiano’s take on one of OpenAI’s main alignment strategies, dive into the second round winners of the inverse scaling prize and share the many fascinating projects from our mechanistic interpretability hackathon. And stay tuned until the end for some unique opportunities in AI safety! Watch this week's MLAISU on YouTube or listen to it on Spotify. Reinforcement learning from human feedback Reinforcement learning...
10 months ago • 4 min read

Compiling code to neural networks? W03

Compiling code to neural networks? Welcome to this week’s ML & AI Safety Report where we dive into overfitting and look at a compiler for Transformer architectures! This week is a bit short because the mechanistic interpretability hackathon is starting today – sign up on ais.pub/mechint and join the Discord. Watch this week's MLAISU on YouTube or listen to it on Spotify . Superpositions & Transformers In a recent Anthropic paper, the authors find that overfitting corresponds to the neurons...
11 months ago • 2 min read

Robustness & Evolution W02

Robustness & Evolution Welcome to this week’s ML Safety Report where we talk about robustness in machine learning and the human-AI dichotomy. Stay until the end to check out several amazing competitions you can participate in today. Watch this week's MLAISU on YouTube or listen to it on Spotify. Robust Models Robustness is a crucial aspect of ensuring the safety of machine learning systems. A robust model is better able to adapt to new datasets and is less likely to be confused by unusual...
11 months ago • 4 min read

Hundreds of research ideas! W01

AI Improving Itself Over 200 research ideas for mechanistic interpretability, ML improving ML and the dangers of aligned artificial intelligence. Welcome to 2023 and a happy New Year from us at the ML & AI Safety Updates! Watch this week's MLAISU on YouTube or listen to it on Spotify. Mechanistic interpretability The interpretability researcher Neel Nanda has published a massive list of 200 open and concrete problems in mechanistic interpretability. They’re split into the following...
11 months ago • 5 min read
Share this page
Built with ConvertKit