What if we programmed it with rules that it can’t violate to close the “loopholes”? Haven’t you read Isaac Asimov?

As anyone who makes a living searching the tax code for loopholes knows, there’s always another loophole. There’s always another way of structuring things to save your client money. Even if Congress were functional, they couldn’t close the loopholes one by one. The only sensible thing to do would be to scrap the whole thing and make something simpler (if one’s goal were just to close loopholes; I don’t support this politically).

If an AI’s goal is to maximize human happiness, it doesn’t matter how many things it’s not allowed to do along the way. The actual optimal solution to this problem is something like the Matrix, and we can’t confidently make rules against all the different ways of getting there or thereabouts. If our constraints are so well placed that this no longer becomes the best solution, then some solution that is similar in spirit will replace it as the optimal solution. For example, we stipulate that the AI must never act in a way such that there is more than a 1% chance of more than 1% of human skulls (belonging to living people) being penetrated. Now it can’t put electrodes in our brains. (What does the AI do? I don’t know… everyone is given ecstasy and a few other drugs, and when they become addicted, they are killed and replaced by new people who can “better appreciate life”? Ooh here’s a better idea: the electrodes that stimulate the brain go in through the the eye sockets.) This is a good example of what happens when you try to create a rule to prevent an AI from doing something. I was trying to come up with a very concrete rule that would prevent an AI from sticking electrodes in our brains, given the example with the mouse. I failed.

One by one, we can try to close all of these loopholes, but at the end of the day, the situation that we actually wanted the AI to promote is totally misaligned with the goal we gave it, and no amount of loophole closing will change that.

Isaac Asimov’s Three Laws are an example of an attempt to do this. They are:1

  1. A robot may not injure a human being or, through inaction, allow a human being to come to harm.
  2. A robot must obey the orders given it by human beings except where such orders would conflict with the First Law.
  3. A robot must protect its own existence as long as such protection does not conflict with the First or Second Laws.

A problem arises because intelligent agents have to reason and act even when they are not certain about everything. No intelligent system will be able to perfectly predict the entire future history of the universe given the actions available to it. So our AI following the Three Laws searches through all possible actions, and realizes that no matter what action it takes, some human being somewhere will probably get injured. Therefore, all actions it can possibly take will probably result in the injury of a human being, and all actions are forbidden (including the action of doing nothing!), rendering the Three Laws incoherent.

Now in fact, the way that the Three Laws are treated in the stories, they aren’t treated as rules. The robots are trying to minimize the amount of harm that comes to human beings. This is not a hard and fast rule any more that eliminates certain actions. It is a goal much like “promote human happiness.” And like the simple goals that we considered before, this one will not end up preserving what we care about. At best, it would act the same way it did in the “promote human happiness” scenario. At worst, it would kill us all immediately so that there no more humans would be harmed ever again. Kind of like the first 6 seconds of this.




  1.  Asimov, Isaac (1950). I, Robot
%d bloggers like this: