The Practicalities of an Intelligence Explosion

Foundations of AI Dystopianism III: Self-Improvement (part 1)

Aug 20, 2023

A robot reads How to Win Friends & Influence People by Dale Carnegie

One of the cornerstones of both AI Dystopian and AI Utopian thinking is that an Artificial General Intelligence system will inevitably self-improve itself into superintelligence and achieve God-like capabilities by doing so. This has been referred to over the years as an intelligence explosion, the inevitable end result of creating an AGI system. This is a complex issue, and to even remotely address it, I’ve split this discussion into three parts. Part 2 will be in next week’s post.

The first significant discussion of this possibility was in mathematician and computer science pioneer I.J. Good’s 1965 paper Speculations Concerning the First Ultraintelligent Machines. In that paper, Good coined the term intelligence explosion, and the idea has been promulgated and widely discussed ever since. Computer scientist and science fiction author Vernor Vinge discussed it in his seminal 1993 presentation and paper on the technological singularity, in which it was a key component underlying the singularity itself.

One of the first papers to explore the concept in detail was computer scientist Steve Omohundro’s 2008 paper The Basic AI Drives. Among the drives described as inherent in any AGI system, the drive to self-improvement took the number one spot. The discussion in the years since that paper have stuck pretty close Omohundro’s conjectures, so it’s worth considering the statements made in the paper and how reasonable are the suppositions underlying those statements.

Omohundro stated that any AGI system will de facto seek to improve itself, and it will do this not because it has some evolutionary drive or cultural tendencies similar to humans, but instead because it will be better able to achieve its goals the more intelligent it is. His belief, and the belief of AI Dystopians as well as AI Utopians, is that this self-improvement is not only possible but inevitable.

Omohundro wrote:

One kind of action a system can take is to alter either its own software or its own physical structure. Some of these changes would be very damaging to the system and cause it to no longer meet its goals. But some changes would enable it to reach its goals more effectively over its entire future. Because they last forever, these kinds of self-changes can provide huge benefits to a system. Systems will therefore be highly motivated to discover them and to make them happen. If they do not have good models of themselves, they will be strongly motivated to create them though learning and study. Thus almost all AIs will have drives towards both greater self-knowledge and self-improvement.

The complementary ideas of self-improvement as an imperative for any AGI system, the ability of the system to actually implement that self-improvement, and the self-improvement leading to explosive superintelligence are fundamental to the beliefs promoted by both AI Dystopians and AI Utopians. A vast body of speculation has been built on these three ideas and extraordinary conclusions have been drawn from them. Some of these conclusions have been positive, most have been negative, yet the base concept itself has rarely been scrutinized.

So rather than simply accepting these ideas as self-evident and moving quickly into speculations based on them, let’s instead stop a moment to examine the validity of the basic conjecture of self-improving AGI itself.

First and foremost, let’s examine the assumptions that are built into the above quote and into the fundamental idea that runaway AGI self-improvement is the inevitable results of creating AGI.

Assumption: The AGI system has knowledge of its software and hardware design or is able to engage in self-study to ascertain that knowledge

Would an AGI system have knowledge of its own code and physical structure? If not, could it simply contemplate itself and thus ascertain that knowledge?

For the first question it seems that it would be fairly straightforward to design the system so that it had no cognitive knowledge of its design and make-up if we felt that such knowledge might lead to dangerous outcomes. It also seems that it would be relatively easy to keep this knowledge inaccessible to the AGI system and unavailable on the Internet, with the most obvious way to do this being to keep it in hardware that has no connection to either the AGI or the Internet.

An idea often offered as a counterargument to this is that the AGI would simply convince a human or humans to give them access to this knowledge. This possibility is discussed briefly in the following sections and will also be discussed further in the concluding post on this topic.

Let’s consider the second question. A human obviously can't just think really hard and by that alone gain knowledge of the brain’s structure and processes. The most we can do through self-reflection is examine our own feelings, thinking, and motivations — the end results of the brain’s functioning. Even in this we're limited and prone to error.

But no matter how deeply we look inward, we won't perceive the physical structure of our brains, we won't be able to determine the nature or even existence of neurons and synapses nor how they're configured or interact with each other or how that actually leads to our deciding to self-reflect in the first place. Even if we were given knowledge of the human brain’s structure and how it works, we wouldn’t be able to determine the specific configuration and functioning of our own brain simply through self-reflection.

But that's us. Would a machine be different in this respect? Would it be able to probe itself and learn the internal secrets of its design?

We could potentially design an AGI system that had a highly developed self-examination ability, with senses so accurate that it could trace every nuance of its makeup. But would we do that? Given that this could be problematic, it would seem to be a case of the Bad Engineer fallacy to assume such a potentially dangerous design.

But perhaps these future designers would want it to have such an ability for diagnosing and repairing problems. However, I would think that they’d have at least as much knowledge and good sense as we have today to realize potential problems. The system could be designed so that each subsystem of the overall system had its own diagnostic capability that was both incompatible with and had no communication with other internal diagnostic systems. The diagnostic system could be like the autonomic nervous system in animals with the overall system having no control or knowledge of it.

The diagnostic system could also be completely independent from the AGI system, a symbiotic yet functionally separate system running with a completely incompatible operating and communication system. In any case, these are just a few possibilities, and no doubt future engineers will think of better ones to keep the system from self-knowledge should that be deemed necessary.

Assumption: The AGI system will be constructed such that it will have the ability to improve its software and/or hardware

There is, of course, no reason to explicitly give the AGI system the ability to modify its software or hardware. Yet those who promote the inevitability of self-improving AGI systems believe that it's impossible to keep the AGI system from doing just that. The justifications offered for believing in this inevitability haven't expanded much from those given in Omohundro's paper:

If we wanted to prevent a system from improving itself, couldn’t we just lock up its hardware and not tell it how to access its own machine code? For an intelligent system, impediments like these just become problems to solve in the process of meeting its goals. If the payoff is great enough, a system will go to great lengths to accomplish an outcome. If the runtime environment of the system does not allow it to modify its own machine code, it will be motivated to break the protection mechanisms of that runtime. For example, it might do this by understanding and altering the runtime itself. If it can’t do that through software, it will be motivated to convince or trick a human operator into making the changes. Any attempt to place external constraints on a system’s ability to improve itself will ultimately lead to an arms race of measures and countermeasures.

All these statements boil down to the argument that the system will find a way. This is the same argument given by Jeff Goldblum in Jurassic Park for how some dinosaurs will invariably and spontaneously change gender to reproduce. In other words, we are once again confronted with the logic of 1990s sci-fi action movies.

As discussed above, there doesn’t seem much reason to assume that the AGI will have detailed knowledge of its own make-up. However, assuming that it does, there doesn’t seem much justification for thinking that it would be designed in a way that would give it the ability to reprogram significant portions of its code. It certainly would be possible to burn much of the code into non-modifiable memory. Even if the system somehow knew not only how it worked but how to make itself better, this would make it impossible to just tweak all of its code haphazardly for its own benefit.

Even without hardcoding some of the system code, there would be significant hardware constraints on any modifications made to the system. For example, every new version of iOS that comes out limits the number of older iPhones that can run it at reasonable speed or at all. The success of LLM systems today is directly correlated to increased hardware capability. Hardware and software go hand-in-hand when it comes to any sort of significant improvements in capability.

Hardware is obviously harder to modify than software, and modifying the system to any significant degree will require substantial upgrades. The usual counter to this, as in the quote above, is that the system will simply trick a human operator into making the changes.

This then involves many additional assumptions on top of the previous assumptions, such as:

The system is already so smart and cognizant of human behavior that it's able to trick one of the highly trained individuals who actually has the necessary access authority to modify the system into actually doing so. Keep in mind that this is before it's propelled itself to superintelligence through self-improvement, so arguments such as its intelligence compared to a human’s is like a human’s compared to an ant’s aren’t valid
The system is designed such that a single operator or a small group of operators is able to significantly modify the system's code and/or hardware and do so without the knowledge or authorization of multiple other people involved in managing the system
The new, improved system is designed in such a way that it can be created with essentially the same hardware or the same type of hardware with which the original was constructed. In other words, no fundamentally new systems and factories have to be designed, developed, and built simply to manufacture the new hardware required for the system's redesign

These are some pretty hefty assumptions, so much so that it hardly seems necessary to point out their shortcomings.

It seems that one could avoid the first point simply by employing decent training of staff and by limiting the system's knowledge of human behavior and personal information about the operators. This may involve not giving the system access to the Internet, which is another area AI Dystopians tend to employ the Goldblum principle, i.e. the system will find a way to get to the Internet.

As discussed in a previous dialogue, there is no logical or practical reason to believe that keeping the system disconnected from the Internet is impossible. While I think it’s an open question as to how necessary this would be, this will be an addressable issue of how best to balance safety versus utility in the AGI system. Similarly, the second point could be avoided both by sufficient training of the staff as well as by designing the system and the management structure with some prudent forethought.

Point three is an unknown given that we have no idea how the initial system would be designed let alone the improved system. However, it seems very probable that both versions of the system will require some pretty extreme hardware to function and won't just be running on garden variety Linux servers. As mentioned above, it seems highly likely that any significant improvements in ability will require improvements in not only the quantity but also the functionality of hardware.

Assumption: The AGI system will be able to obtain any additional hardware and power required for the self-modifications

Whether or not the system attempts to self-modify its code and hardware or tries to trick some unsuspecting operator into accomplishing the task, the next assumption is that it will have the resources to do so. Even if we assume that it's able to increase its intelligence only by modifying its software, these modifications would have to be so efficient as to not require any more memory and processing capability, not require any changes to hardware design, not consume significantly more power, and not require any additional environmental considerations (such as cooling).

Such a change would seem to either involve relatively minor intelligence improvements or design changes so drastic that continuity between the original system and the new system would be nearly impossible. The original system would have to keep running to make the changes on the new system, so it would be in the position of either replacing sections of itself while remaining intact enough to do the modifications or keeping itself intact while creating an entirely new system alongside itself.

The second option seems more likely, but this means that it would have to create the new improved system in some expendable area of its current resources (memory, processing, power, etc.) or somehow modify itself to take up fewer resources. It seems unlikely that such an expendable area would exist, and the latter option would likely lead to a recursive self-improvement dilemma, in which the system would continually have to redesign its current version to take up fewer and fewer resources in order to provide the resources for implementing improved versions of itself that are more intelligent (and likely to take up increasingly more resources).

Assumption: The AGI system is able to absolutely determine that the changes for the new system will be beneficial and have no unforeseen side effects that degrade it in unexpected ways

One quality of intelligence is the ability to extrapolate probable outcomes from current circumstances. This means, for example, that we're able to look at a chess board and imagine possible future outcomes from moving a chess piece or not moving a chess piece. Compared to less intelligent animals, we're able to consider more variables and their relationship to each other and project further into the future given those variables and our general knowledge.

One would expect that an AGI system smarter than a human would be able to consider more variables with more complex relationships than a human and project them even further into the future. It would be able to predict a more accurate future than we could just as we can predict a more accurate future than a dog or cat. This ability would be critical for an AGI system to absolutely determine that the changes for its new, improved self will be beneficial and have no unforeseen side effects that degrade it in unexpected ways.

Humans are actually pretty limited in how well we do this sort of prediction without using tools. Even with tools, our software has bugs, our writing has typos, and our cars have recalls. It will likely have taken many individuals with many specialties years to conceive, design, build, test, debug, redesign, and rebuild the original AGI system to get something that works.

Given the complexity not only of the original AGI system but of any major improvement to it, the degree of intelligence necessary to guarantee that all changes will be beneficial and without detrimental side effects is astronomical. This presents quite a problem to the AGI system that is attempting to improve itself surreptitiously on its own, particularly in the early stages of the supposed rise to superintelligence.

If it goes the route of replacing parts of itself with redesigned versions, it runs the risk of causing potentially fatal damage — it has to get things right on the first try and with no experimentation, simulation, or testing. It could run a simulation of all or part of itself to test the new component, but then it will run into the same physical constraints it would if it created the new version of itself as a separate system on the same hardware. That it would be able to simply perceive exactly how a system significantly more advanced than itself would function with no testing, simulations, or revisions seems unlikely at best.

Assumption: The AGI system is able to absolutely determine that if it's mistaken in any steps it takes to self-improve, it will be able to revert back to the version prior to the mistake

As described above, the choices the AGI system has for improving itself are either changing out portions of its code or hardware while somehow maintaining its own functioning or creating a new improved system alongside itself. Even if we throw out all the issues discussed to this point, it would seem that achieving the reversibility implied by the above assumption will be challenging.

If the AGI system replaces parts of its cognition in piecemeal style, then it runs the risk that any change could result in a drop in intelligence, psychosis, goal modification, or outright system failure, any of which could jeopardize going back a step. Even if every change were exactly right, the changes would have to be designed such that they were compatible with the current system and did not impede its ability to make the remaining changes. The system would have to operate under these monumental constraints and be infallible, which is a problem since we don't even get to the actionable aspect of this assumption unless it's already made a mistake.

If the system decided to create a separate improved entity alongside itself and managed to find a way around the problems listed above with such an approach, the new system would also likely have to be perfect on the first try. Otherwise, it may very well resist attempts by the original system to debug or erase it. With either approach, it seems that guaranteeing reversibility is going to be a very, very tall order.

Assumption: The AGI system is able to absolutely determine that it will have identity continuity with its improved self or, if identity continuity is not possible, it will be OK with self-destruction to provide the means for the improved version to exist

The issue of identity continuity and its relationship to self-improvement was briefly brought up in a previous dialogue post. Ultimately, the difference between having identity continuity and not having identity continuity is the difference between life and death.

To illustrate the issue, imagine if you were to clone yourself. The new you will be younger and stronger than the old you, at least when the new you matures. You have managed to create an improved you.

But what about the old you? As far as you're concerned, you're the old you, and the new you is someone completely different. Even if you were to somehow place all your memories into the new you clone, you're not going to be the clone. What if you only have resources for one of you — would you be OK with taking yourself permanently out of the picture at that point so that the clone "you" could continue in a new younger, stronger form?

Our identity as humans is a product of the intricate structures that make up our brains. Our cognition and memory are encapsulated within the same neural network, and the pattern of our life modifies our brain at many levels, from the ephemeral to the deeply structural. So while the overall design of the human brain is relatively consistent, the layout of neurons and synapses and the associated connections and flow of neurotransmitters in any two individuals is quite different.

The unique structure of our neural network is integral to our identity. What this means, at least as far as humans go, is that you could not simply record some aspect of the neurons in one person's brain and transfer that to the neurons in another person's brain to get the original person. As a simplistic analogy, most modern computer operating systems are fairly similar in general functioning and capabilities, but try running a Mac program without modification on a Windows PC. That's many orders of magnitude easier and yet still not possible.

We don’t know how closely an AGI will have to mimic the architecture of a human brain, but even our more advanced neural networks today are vaguely based on neuronal functioning in the brain. You can’t simply copy a current working system, like GPT-4 for example, onto an entirely new platform with differing hardware and software and expect it to maintain its integrity — or even expect it to run at all.

Since we have no idea how this theoretical AGI system would be constructed nor how it would construct an improved system, it’s impossible to know the likelihood of such a system maintaining its identity while transitioning from the former to the latter. However, it seems a bit farfetched to assume that this AGI system could and inevitably would iteratively spiral up its intellectual capability by many orders of magnitude while keeping the same identity. At the very least, it would be supremely challenging.