Hopes & Fears of AI Safety
Karnofsky released an article in his Cold Takes blog describing his optimistic take on how current methods might lead to safe AGI:
At the same time, Christiano writes a reminder that AI alignment is distinct from applied alignment. Updating models to be inoffensive will not lead to safe artificial general intelligence but safer short-term systems such as ChatGPT. Steiner writes a counter-post on the usefulness of working with applied alignment as well.
Relatedly, Shlegeris publishes a piece exploring whether reinforcement learning from human feedback is a good approach to alignment. He addresses questions such as if RLHF is better than alternative methods that achieve the same (yes), has been net positive (yes), and is useful for alignment research (yes).
The alternative perspective is pretty well covered in Steiner’s piece this week on why RLHF / IDA / Debate won’t solve outer alignment. Basically, these methods do not optimize for truth or safety, they optimize for getting the humans to “click the approve button”, something that can lead to many failures down the road.
GPU Performance Predictions
Hobbhahn and Besiroglu of EpochAI, the main AI capabilities prediction organization, have released a comprehensive forecasting report on how GPU performance will develop during the next 30 years.
They use a model composed of the relationship between GPU performance and its features and how features change over time due to making transistors smaller. They expect GPU performance to hit a theoretical peak before 2033 at 1e15 FLOP/s (floating point operations per second).
I also chatted with a few GPU researchers at NeurIPS and their take was that computing power will hit a peak, making AGI near-impossible. The newer GPUs from Google and Tesla are not necessarily better, they just avoid NVIDIA’s 4x markup on the price of GPUs.
This brings hope to how well we can avoid AGI being developed. Ajeya Cotra’s estimate of ~1e29 FLOP/s required for artificial general intelligence based on the computation done by a human during a lifetime seems to be significantly farther away than her estimates indicated based on the Epoch report. Read her estimates in the first part of her wonderful transformative AI forecasting report.
“Why Machines Will Never Rule the World”
In the spirit of predicting how capable AGI will be, Machine Learning Street Talk, the hugely popular machine learning podcast, has interviewed Walid Saba about his review of the book from August, “Why Machines Will Never Rule the World”, by Landgrebe and Smith.
The book’s basic argument is that artificial general intelligence will not be possible for mathematical reasons. The human brain is a complex dynamical system and they argue that systems of this sort cannot be modeled with our modern neural network architectures or within computers at all due to the limited nature of training data as a function of the past.
These arguments are in line with Searle’s 1980 Chinese room argument and Penrose’s argument of non-computability based on Gödel’s incompleteness theorem. Walid Saba’s review is generally positive about the book. I personally disagree with the arguments since we do not need to model the complex system of the brain, we just need to replicate it in a simulator.
Nevertheless, it is an interesting discussion about whether AGI is possible.
In other news…
There are some exciting Winter opportunities this week! Again, thank you to AGISF for sharing opportunities in the space.
This has been the ML & AI safety update. We will take a break for two weeks over Christmas but then be back with more wonderful hackathons and ML safety updates. See you then!