The Term Recursive Self-Improvement Is Often Used Incorrectly
The term Recursive Self-Improvement (RSI) now seems to get used sometimes for any time AI automates AI R&D. I believe this is importantly different from its original meaning and changes some of the key consequences.
OpenAI has stated that their goal is recursive self-improvement, with projections of hundreds of thousands of automated AI R&D researchers by next year and full AI researchers by 2028. This appears to be AI-automated AI research rather than RSI in the narrow sense.
When Eliezer Yudkowsky discussed RSI in 2008, he was referring specifically to an AI instance improving itself by rewriting the cognitive algorithm it is running on—what he described as “rewriting your own source code in RAM.” According to the LessWrong wiki, RSI refers to “making improvements on one’s own ability of making self-improvements.” However, current AI systems have no special insights into their own opaque functioning. Automated R&D might mostly consist of curating data, tuning parameters, and improving RL-environments to try to hill-climb evaluations much like human researchers do.
Eliezer concluded that RSI (in the narrow sense) would almost certainly lead to fast takeoff. The situation is more complex for AI-automated R&D, where the AI does not understand what it is doing. I still expect AI-automated R&D to substantially speed up AI development.
Why This Distinction Matters
Eliezer described the critical transition as when “the AI’s metacognitive level has now collapsed to identity with the AI’s object level.” I believe he was basically imagining something like if the human mind and evolution merged to the same goal—the process that designs the cognitive algorithm and the cognitive algorithm itself merging. As an example, imagine the model realizes that its working memory is too small to be very effective at R&D and it directly edits its working memory.
This appears less likely if the AI researcher is staring at a black box of itself or another model. The AI agent might understand that its working memory or coherence isn’t good enough, but that doesn’t mean it understands how to increase it. Without this self-transparency, I don’t think the same merge would happen that Eliezer described. It is also more likely that the process derails, such as that the next generation of AIs that are being designed start reward-hacking the RL environments designed by the less capable AIs of the previous generation.
The dynamics differ significantly:
True RSI: Direct self-modification with self-transparency and fast feedback loops → fast takeoff very likely
AI-automated research: Systems don’t understand what they are doing, slower feedback loops, potentially operating on other systems rather than directly on themselves
Alignment Preservation
This difference has significant implications:
True RSI: The AI likely understands how its preferences are encoded, potentially making goal preservation more tractable
AI-automated research: The AIs would also face alignment problems when building successors, with each successive generation potentially drifting further from original goals
Loss of Human Control
The basic idea that each new generation of AI will be better at AI research still stands, so we should still expect rapid progress. In both cases, the default outcome of this is eventually loss of human control and the end of the world.
Could We Still Get True RSI?
Probably eventually, e.g. through automated researchers discovering more interpretable architectures.
I think that Eliezer expected AI that was at least somewhat interpretable by default, history played out differently. But he was still right to focus on AI improving AI as a critical concern, even if it’s taking a different form than he anticipated.
See also: Nate Soares has also written about RSI in this narrow sense. Comments between Nate and Paul Christiano touch on this topic.
