A Self-Replicating GPT-4!
What a week.
There was already a lot to cover Monday when I came in for work and I was going to do a special feature on the Japan Alignment Conference 2023 and watched all their recordings. Then GPT-4 came out yesterday and all my group chats began buzzing.
So in this week's MLAISU, we're covering the latest technical safety developments with GPT-4, looking at Anthropic's safety strategy, and covering the fascinating Japanese alignment conference.
GPT-4: Capability & Safety
They also write (in 2.9) that the model shows more and more independent behavior, seemingly mimicking some of the risks we associate with uncontrollable AI, such as power-seeking and agenticness (the ability to have an identity, possibly leading to goal-directed behavior independent of us users' preferences).
The Alignment Research Center also describes an experiment in the report where they upload GPT-4 to its own computer and give it some money and abilities such as delegating tasks to versions of itself and running code. This is done to test for the ability to self replicate, a big fear for many machine learning practitioners.
They also collaborated with many other safety researchers to "red team" the model, i.e. find safety faults with GPT-4. In the report, it is explicitly stated that participation in this does not mean endorsement of OpenAI's strategy, but the gesture towards safety is very positive.
Additionally, they do not share their training methods due to safety concerns, though it seems just as likely that this is because of the competitive pressure of other AI development companies (read more on race dynamics).
GPT-4 is seemingly safer while being significantly more capable
Anthropic & Google's response
On the same day, Anthropic released a post on their updated availability for Claude, their ChatGPT-like competitor. It uses the "constitutional AI" approach which essentially means that the AI evaluates its outputs using a ruleset (constitution) on top of learning from human preferences.
They also published their approach to AI safety. Anthropic writes that AI will probably transform society and we don't know how to consistently make them behave well. They take a multi-faceted and empirical approach to the problem.
This is based on their goal of developing (I) better safety techniques for AI systems and (II) better ways of identifying how safe or unsafe a system is. They classify three possible AI safety scenarios: (I) That it is easy to solve or not a problem, (II) that it might lead to catastrophic risks and it is very hard to solve and (III) that it is near-impossible to solve. They hope and work mostly for scenarios (I) and (II).
Additionally, Google joins the chatbot API competition by releasing their PaLM language model as an API. Generally, Google seems to be lagging behind despite their research team kickstarting the large language models research, which seems like a large business failure but might be good for AI safety. However, the AGI company adept.ai also recently raised $350 million to build AI that can interact with anything on your computer.
Japanese Alignment Research
I watched all six hours of talks and discussions so you don't have to! The Japan Alignment Conference 2023 was a two-day conference in Tokyo that Conjecture held in collaboration with Araya, inviting researchers to think about alignment.
It started with a chat with Jaan Tallinn, who wants the Japanese researchers to join in the online discussions of alignment, and an introduction to the alignment problem. Connor Leahy and Eliezer Yudkowsky had a Q&A discussion and Siméon Campos presented a great introduction to how AGI governance might go about slowing down AGI development. Jan Kulveit also gave great presentations on active inference and AI alignment along with his expectation of "cyborg periods" between now and superintelligence.
But focusing on the talks from the Japanese side, we see some quite interesting perspectives on alignment:
Hopefully, the Japan Alignment Conference will represent some first steps towards collaborating with the great robotics and neuroscience talent in Japan!
There are many job opportunities available right now, with some great ones at top university AI alignment labs: At University of Chicago as an alignment postdoctoral researcher, as an NYU alignment postdoc, as a University of Cambridge policy research assistant and a collaborator with CHAI at UC Berkeley.
And come join our online writing hackathon on AI governance happening virtually and in-person across the world next weekend from March 24th to 26th. Emma Bluemke and Michael Aird will be keynote speakers and we have judges and cases from OpenAI, the Existential Risk Observatory and others.
You can participate for the whole weekend or just a few hours and get the chance to engage with exciting AI governance thinking, both technical and political; get reviews from top researchers and active organizations; and win large prizes.
And before then, we'll see you next week for the ML & AI Safety Update!