The LessWrong wiki has a good answer to this question:1
The orthogonality thesis states that an artificial intelligence can have any combination of intelligence level and goal. This is in contrast to the belief that, because of their intelligence, AIs will all converge to a common goal. The thesis was originally defined by Nick Bostrom in the paper “Superintelligent Will”, (along with the instrumental convergence thesis). For his purposes Bostrom defines intelligence to be instrumental rationality.
Defense of the thesis
It has been pointed out that the orthogonality thesis is the default position, and that the burden of proof is on claims that limit possible AIs. Stuart Armstrong writes that,
Thus to deny the Orthogonality thesis is to assert that there is a goal system G, such that, among other things:
- There cannot exist any efficient real-world algorithm with goal G.
- If a being with arbitrarily high resources, intelligence, time and goal G, were to try design an efficient real-world algorithm with the same goal, it must fail.
- If a human society were highly motivated to design an efficient real-world algorithm with goal G, and were given a million years to do so along with huge amounts of resources, training and knowledge about AI, it must fail.
- If a high-resource human society were highly motivated to achieve the goals of G, then it could not do so (here the human society is seen as the algorithm).
- Same as above, for any hypothetical alien societies.
- There cannot exist any pattern of reinforcement learning that would train a highly efficient real-world intelligence to follow the goal G.
- There cannot exist any evolutionary or environmental pressures that would evolve highly efficient real world intelligences to follow goal G.
One reason many researchers assume superintelligences to converge to the same goals may be because most humans have similar values. Furthermore, many philosophies hold that there is a rationally correct morality, which implies that a sufficiently rational AI will acquire this morality and begin to act according to it. Armstrong points out that for formalizations of AI such as AIXI and Gödel machines, the thesis is known to be true. Furthermore, if the thesis was false, then Oracle AIs would be impossible to build, and all sufficiently intelligent AIs would be impossible to control.
As mentioned above, the definition of intelligence as instrumental rationality is important here. (Instrumental rationality is the ability to select actions well in pursuit of a goal). But I’ll also note that the argument goes through even if we think this isn’t a good definition of intelligence. In that case, for the purpose of this question, we should just jettison the term artificial intelligence and call it artificial instrumental rationality instead. As argued above, the orthogonality thesis goes through for an “AIR.”
Instrumental rationality is all that’s necessary for an agent to have superhuman ability to accomplish goals, so the relevant question is actually “Why wouldn’t it act morally by the time it gains enough instrumental rationality to be dangerous?” This phrasing is a bit of protection against eliding definitions of intelligence, and hopefully reinforces the relevance of the quoted arguments above.
- https://wiki.lesswrong.com/wiki/Orthogonality_thesis ↩