Eliezer Yudkowsky, an AI safety researcher who you’ll hear from later, got this objection a lot. One of the people who objected, named Nathan Russell, very much understood the field of AI safety, and couldn’t imagine any set of words that could convince him to let the AI “out of the box.” So Eliezer decided to test whether he, acting as the AI, could convince Nathan Russell to let him out of the box. If Nathan won, Eliezer would pay him 10 dollars. The content of discussion itself was agreed to be kept a secret, so that other people couldn’t discount it by saying that they would have behaved differently.
Eliezer won.1 Someone else heard about this but was convinced it wouldn’t work on him. Eliezer agreed he’d pay $20 if that person didn’t let him out of the box. That guy let him out of the box too.2 He did the experiment three more times, this time with payments the other way around: people agreed to pay Eliezer on the order of thousands of dollars should they be convinced to let him out of the box. He successfully convinced one of the three.3
Eliezer is of merely human intelligence (as far as we know). The “gatekeepers” in the game were positive beforehand that nothing could convince them to let the AI out of the box. They were cognizant of the risks. We don’t have a huge amount of data, but what we do have suggests that a superintelligence would probably be able to convince a human operator to give it access to the internet, or to run a file on a computer that would pursue the Oracle AI’s goals on its behalf.