Compiling code to neural networks? W03

published8 months ago
2 min read

Compiling code to neural networks?

Welcome to this week’s ML & AI Safety Report where we dive into overfitting and look at a compiler for Transformer architectures! This week is a bit short because the mechanistic interpretability hackathon is starting today – sign up on and join the Discord.

Watch this week's MLAISU on YouTube or listen to it on Spotify .

Superpositions & Transformers

In a recent Anthropic paper, the authors find that overfitting corresponds to the neurons in a model storing data points instead of features. This mostly happens early in training and when we don’t have a lot of data.

In their experiment, they use a very simple model (a so-called toy model) that is useful when studying isolated phenomena in detail. In some of the visualizations, they train it from 2D data with T training examples. As seen below, the feature activations (blue) look very messy while the activations to the data points (red) look very clean.

Going deeper in the paper, they find that this generalizes to larger dimensions (10,000D) and that the transition from overfitting on smaller datasets’ data points to generalizing to the actual data features seems to be the reason for the famous double descent phenomenon where a model sees a dip in performance but then becomes better afterwards.

And on the topic of toy models, DeepMind releases Tracr, a compiler that can turn any RASP human-readable code into a Transformer architecture. This can be useful for studying how algorithms represent themselves in Transformer space and to study phenomena of learned algorithms in-depth.

Other research news

In other news…

  • Demis Hassabis, the CEO of DeepMind, is warning the world on the risks of artificial intelligence in a new Time piece. He mentions that the wealth arising from artificial general intelligence (AGI) should be redistributed throughout the population and that we need to make sure it does not fall into the wrong hands.
  • Another piece reveals that OpenAI contracted Sama to use Kenyan workers with less than $2 / hour wage ($0.5 / hour average in Nairobi) for toxicity annotation for ChatGPT and undisclosed graphical models, with reports of employee trauma from the explicit and graphical annotation work, union breaking, and false hiring promises. A serious issue.
  • Jesse Hoogland releases an exciting piece exploring why and how neural networks generalize.
  • Neel Nanda shares more ideas for his 200 ideas in Mechanistic Interpretability.
  • Hatfield-Dodds from Anthropic shares reasons for hope in AI and claims that a high confidence in doom is unjustified.


For this week’s opportunities, the awesome new website will help us find the best events for you to join across the world:

Thank you for joining this week’s MLAISU and we’ll see you next week!


Dokk21, Filmbyen 23, 2. tv, Aarhus, 8000
Unsubscribe · Preferences

Apart Research

We share newsletters about the progress of ML safety, run fun ML safety hackathons and develop the collaborative research platform AI Safety ideas.

Read more from Apart Research

Governing AI & Evaluating Danger

6 months ago
4 min read

Perspectives on AI Safety

7 months ago
4 min read