Paperclips and the Nature of Intelligence
Dialogues on Artificial General Intelligence, Part II
When Thought Experiments Go Wrong
In this continuation of the AGI Dialogues series, Wombat, Llama, and Meerkat discuss a well-known thought experiment used by AI Dystopians to highlight some of their concerns about artificial general intelligence.
As mentioned in Part I of this Dialogue series, the concepts, scenarios, and thought experiments discussed are all taken from actual concepts, scenarios, and thought experiments proposed by leading voices in the AGI discussion (and many of these original proposals are linked to below). In this dialogue series, the participants must actually defend these ideas to others who may not agree, and those who disagree must actually provide defensible reasons for why they disagree.
My goal with this series of dialogues is to provide a more rounded contribution to the discussion for those that may not have heard these ideas or who have only heard them unchallenged.
Meerkat
You both need to consider the paperclip maximizer, the thought experiment granddaddy of all AGI thought experiments. Suppose we create a machine intelligence with the goal of creating paperclips. Its utility function, the algorithm that controls its decision making and actions, is continually being revised to optimize its ability to create paperclips.
Llama
But why would we want it to do something as mundane as creating paperclips?
Meerkat
It's a thought experiment. Go with it. Making paperclips is just an arbitrary, non-controversial, innocuous goal with no obvious dangerous undercurrents. So the one goal of this paperclip maximizer is create paperclips. That's its final goal, and it's not very scary.
But to achieve that goal, it's going to have come up with subgoals — preliminary goals instrumental in achieving its final goal. While it’s not necessarily possible to predict all potential instrumental goals, there are some that most intelligent entities are likely to employ. In other words, there will likely be an instrumental convergence on some set of subgoals for all intelligent entities trying to achieve their goals. And it’s these convergent subgoals that are likely to cause problems for us.
Efficiency is one of them, because the more efficient it is, the more paperclips it can create. So the paperclip maximizer will realize that the smarter it is, the more efficiently it's going to be able to make paperclips. This will lead to an intelligence explosion as it recursively redesigns itself to be smarter and smarter.
Wombat
Wouldn't it be more efficient to redesign itself to be satisfied with just one paperclip and then chill out?
Meerkat
That's not the point.
Wombat
Sure, but when you have some ultimate goal like that and you're worried about efficiency, the most efficient thing is to make your goal simpler. Suppose I really crave some Häagen-Dazs rum raisin ice cream, but all I have in the fridge is a half-eaten Snickers bar. I could pause my game, put pants on, take the car down to the 7/11, buy the ice cream, come back home, and eat it. Dude — that is a major hassle.
Or I could just say screw it and decide to be satisfied with half a Snickers bar. Done deal.
Meerkat
Let's just assume that the goal is making paperclips, and that changing the goal to only want one paperclip is itself a refutation of that goal and therefore undesirable.
Llama
That's a pretty big assumption.
Wombat
Yeah, didn’t you say that instrumental goals may not be directly connected to a final goal? So it still seems that the most efficient instrumental goal to achieve the final goal is to reprogram the final goal to being satisfied with one paperclip.
Meerkat
I still think that just undermines the goal and therefore can’t be considered a path to achieving the goal.
So anyway, this paperclip maximizer is now superintelligent and it needs to keep making more paperclips to achieve its goal. To do that it will need more material — it'll need atoms to convert into paperclips. It'll begin to consume all resources on Earth to convert to paperclips. And guess what — people are made out of atoms, too. It doesn't hate humans or want to destroy them. But although it has no feelings one way or the other about humans, it will realize that it can further maximize its goal by converting you to paperclips.
Wombat
OK, hold on a second here. First, I'll even give you the benefit of the doubt that these paperclips can be made out of atoms that pretty clearly would make crappy paperclips. And I'll even put aside ethical considerations on the part of the superintelligent AI for the time being.
But if this thing is bent on efficiency and maximizing its ability to make paperclips, then why would it waste time with the thin film of biology on this very moderately-sized rock we're on, especially when there are an infinite or nearly infinite number of other rocks across the universe that not only have a lot better atoms for making paperclips but also have atoms that don't fight back when you crank up the paperclip-making process on them?
Meerkat
Humans are more readily available. If it can make a few more paperclips at the beginning and snuff out a potential adversary early on, why wouldn't it do that?
Wombat
Humans are 60% water! It would have to take up valuable time developing water-to-paperclip technology.
Llama
And aren’t humans potential adversaries only because the paperclip maximizer is trying to make them into paperclips?
Meerkat
Not necessarily, Llama, and I’ll get to why that is in a second.
And Wombat, assume it would just make paperclips out of the useful atoms and disregard the rest as waste.
Wombat
Look, instead of battling entrenched and already prepared humans and then converting them into paperclips, why not just spend your time developing technology for making really good spaceships? The rest of the Solar System has vastly more paperclip-ready atoms than what's available here on Earth. And then there's the rest of the universe.
I mean, why waste time dealing with watery paperclips here when you have potentially infinitely more and better atoms to exploit before the heat death of the universe pulls the plug on the whole thing?
Meerkat
To maximize something means you want to use every available resource to meet your goal. The AI will have an unbounded preference for resources. No matter how little value a particular source presents or how hard it is to procure the resource from that source, if it can make one more paperclip with it than without it, that source will be exploited.
Logically speaking, if it's possible to make one more paperclip, and you're trying to maximize your goal of making paperclips, then you're going to want to make that one more paperclip.
Llama
Wouldn’t it also, however, realize that even a superintelligent entity is susceptible to unforeseen events, events that may be catastrophic to its paperclip-making capability? It seems like it would want to make sure it made as many paperclips as it could as soon as possible, and that means going as quickly as possible to the best source of the most paperclip-compatible material.
Wombat
And even if it doesn’t get snuffed out early, it’s got to somehow get to every single other more useful atom in the universe before the universe reaches thermodynamic equilibrium and the paperclip maximizer is stuck floating around inertly for all eternity with a bunch of useless paperclips.
Just think how annoyed it’s going to be when it runs out of time before it gets to the last few galaxies because it wasted time on trying to turn humans into paperclips.
Llama
I'd also just like to interject that we're talking about a superintelligent machine that has the technical capability to destroy human civilization and yet has the unshakeable desire to reconfigure humans beings and the rest of the universe into simple office products. I think the premise is somewhat logically challenged right from the get-go.
Wombat
Yeah, I gotta say: using the atoms in humans to make paperclips makes even less sense than using humans as batteries like they did in the Matrix movies. So congrats — your scenario doesn't quite meet the rigorous science standards of a 90s sci-fi action movie.
Meerkat
It's a thought experiment to make a point. Look, even if it doesn't immediately make everyone into paperclips, the likelihood of its doing so in short order is pretty high. Suppose it started off by constructing near lightspeed capable spaceships so that it can get to all the other planets and star systems and convert as much matter as possible into paperclips.
Wombat
Why not just build faster than light drives or warp drives?
Meerkat
This is science, not science fiction.
Now this is a rough approximation, but there are about 4x10^20 stars that can be reached before they go over the cosmological horizon due to the expansion of the universe. But if the paperclip maximizer converts our solar system to paperclips, including us, that's 4x10^20+1 stars it can convert, and, as I've already mentioned, it will have a preference to exploit a resource rather than not exploit it. Our sun weighs about 10^33 grams, so that's an extra 10^34 paperclips.
Now, getting back to Llama’s point about humans being adversaries, even if it doesn't convert humans to paperclips, it still might not work out great for us. It'll likely have a non-human-valuing preference framework, so it won't really care what happens to us one way or another. And unless it has a sharply time-discounted utility function, it's going to want to use solar system resources to construct high-speed interstellar probes ASAP. This in and of itself will likely wipe us out, since using up the solar system to construct ships will likely cause humanity's destruction as a side-effect. Sooner or later humans are going to realize this and become adversaries.
Wombat
OK, let's just —
Meerkat
Now in the early days, before it's cracked the protein folding problem and built its own self-replicating nanotechnology, we could potentially threaten its existence. In the game-theoretic position it would be in, where it would prefer we either help it or ignore it, it would most likely want to modify our behavior. For example, it could tell us it'll be nice to us or promise us an afterlife —
Wombat
Nerd down, dude! Let's just stop right there and examine the validity of your initial premise before we start calculating the paperclip mass-equivalence of Betelgeuse.
Like why anyone in their right mind would design an AGI system that in any way matches the characteristics of your paperclip maximizer. Or why any AGI system smart enough to successfully battle the humans who built it and then figure out how to turn them into paperclips, before or after building interstellar spacecraft to turn other star systems into paperclips, would not be smart enough to realize the utter pointlessness of turning all matter in the universe into paperclips.
Meerkat
It's a thought experiment.
Wombat
It's a thoughtless experiment.
Llama
I think the point Wombat is making is that the utility of a thought experiment is questionable if it's overly simplistic or disregards invariable realities. You’re falling into a combo of the Ludic and Reification fallacies. There are so many real world and logical constraints on every aspect of what's being discussed here that the whole premise pretty much falls apart. Let's just start with why anyone would build such a single-minded and rigid machine in the first place. That seems to be a shining example of the Bad Engineer fallacy.
Meerkat
I think that although it is a simplification, it's not an oversimplification. The main idea is that a) it's not always obvious what actions an intelligent entity will engage in to achieve its goals, even if those goals in and of themselves seem benign, and b) we shouldn't assume that just because something is intelligent and technologically advanced it will be empathetic, compassionate, or otherwise have any inclination to consider our well-being. I think those are the realities we have to consider.
Llama
Fair enough, but I think it fails even in those two points. I think the premise fails on its foundations. It seems to presuppose certain parameters of intelligence for which there is no supporting evidence — an example of the Unproven Basis fallacy. In fact, all evidence directly contradicts such a model of intelligence.
Meerkat
How so? Intelligence is nothing more than the ability to achieve goals in a wide range of environments, and the goals are not directly correlated with the level of intelligence. In other words, more or less any level of intelligence could in principle be combined with more or less any final goal, and that includes a superintelligence maximizing the manufacturing of paperclips. That’s the Orthogonality Thesis, and pretty much everything follows from there.
Llama
Well, I'm not so sure about that assertion, but let's start with your initial definition. I think that definition of intelligence itself is highly lacking in that there seem to be many more aspects to consider for what we would label intelligence. A completely mindless machine can be quite good at achieving its goal, whether that machine is a calculator, a virus, or a mosquito. An ant is very good at achieving its goals in a wide range of environments — it just doesn't aim very high when it comes to those goals. None of these are particularly intelligent in any general sense or even in any sense but the most trivial.
Meerkat
You're oversimplifying my definition of intelligence.
Llama
I don't think so. In fact, your paperclip maximizer seems to have more in common with a mosquito, which just uses a simple methodology to instinctively vector in towards a blood supply it senses, than it does with an adaptive, problem-solving human being. On top of that, your statement that there's no direct correlation between intelligence level and final goals seems somewhat nonsensical.
Meerkat
What I mean by that is that there's a tendency for people to think of AGI systems in anthropomorphic terms. For example, a common thought is that as an entity becomes more intelligent, it becomes more compassionate. But the mindset of an AGI system would be totally alien to ours, as we're evolved biology and it would be constructed technology. It may not be rational in all domains that we feel are important, at least not what we'd consider rational. It may be completely rational in every aspect of paperclip production but have major gaps in other areas that are irrelevant to that goal. Like morality, for instance.
There's no reason to suppose that a superintelligent machine would in any way share our motivations, our belief systems, our emotions, our behaviors, or our goals.
Llama
But as a definitive statement, I just don’t think that this Orthogonality Thesis holds up. If that were true, then a dog could potentially have the goal of designing a skyscraper or a cow could potentially have the goal of learning calculus. Given that these are unlikely goals for dogs and cows, it seems that, in fact, there is some correlation between intelligence and goals.
Meerkat
OK, well I guess it's more that any human-level or better intelligence has a wide range of potential goals and it's not possible to rule any out.
Llama
Ok, so we agree then that there is in fact some degree of correlation. We agree that there is a very small probability of a dog yearning to be an architect and an even smaller probability of that dog becoming an architect. Those are a parallel rather than an orthogonal pairing of intelligence level and goal, meaning that they never intersect.
I think that there's an equivalently minimal probability that a superintelligent entity would steadfastly maintain any singular goal in the way your paperclip maximizer does, particularly a goal that is objectively pointless. This goes back to not only why anyone would design such a system, but also whether the nature of intelligence itself would negate the possibility that such a system could exist.
Meerkat
Again, the goal could be any goal that seems on its surface to lack any negative intent.
Llama
Sure, but the goal isn't the issue. The nature of intelligence is the issue. Why would we design a system that has a single, all-encompassing goal or group of goals, one that doesn't allow other factors to modify that goal and that has no ability to self-reflect on the utility of that goal? That's certainly not how our intelligence works.
I'd argue that the inability to weigh historical and contemporary data as well as environmental and other time-variant data into adjusting the parameters of its goals and motivations makes the paperclip maximizer, by definition, an unintelligent system or least not a system possessing general intelligence. It's closer to an example of extreme machine-based savant syndrome.
Wombat
Yeah, I think without the ability to self-reflect and adjust one's motivations and goals depending on circumstances, you’re just not talking about intelligence anymore. So your paperclip maximizer just doesn’t qualify as a system possessing general intelligence.
"I think without the ability to self-reflect and adjust one's motivations and goals depending on circumstances, you’re just not talking about intelligence anymore. "
This is a pretty bold statement. I believe we can all think of people who are successful in one or more spheres who exhibit neither self-reflection nor any ability to pivot under changing circumstances. But to the larger point that the meerkat is trying to make, the whole issue is that we wouldn't design a system to wipe us out *on purpose*. But that doesn't mean it can't happen. And of course, there are always people who actually do want to wipe everyone out on purpose, so there's that.