Our “off-switch AI” has to be trusted to act in the world. (If it were contained, it could hardly turn off another AI). Its goal would be most simply described as that of minimizing the probability that another AI becomes powerful, and the most effective way to do that is to kill all humans. A goal function that actually clarifies what we really want from it quickly becomes incredibly complicated.
This is not to suggest that an off-switch AI is impossible, just that it is about as difficult as the goal alignment problem. Stay tuned. The discussion of the goal alignment problem is coming soon. If a goal function occurs to you that is simple and apparently adequate to safely guide an off-switch AI, hold on to it until you get to that part.