DeepMind Mastered “Go” Only After It Was Told the Score

(p. C3) To function well outside controlled settings, robots must be able to approximate such human capacities as social intelligence and hand-eye coordination. But how to distill them into code?
“It turns out those things are really hard,” said Cynthia Breazeal, a roboticist at the Massachusetts Institute of Technology’s Media Lab.
. . .
Even today’s state-of-the-art AI has serious practical limits. In a recent paper, for example, researchers at MIT described how their AI software misidentified a 3-D printed turtle as a rifle after the team subtly altered the coloring and lighting for the reptile. The experiment showed the ease of fooling AI and raised safety concerns over its use in real-world applications such as self-driving cars and facial-recognition software.
Current systems also aren’t great at applying what they have learned to new situations. A recent paper by the AI startup Vicarious showed that a proficient Atari-playing AI lost its prowess when researchers moved around familiar features of the game.
. . .
Google’s DeepMind subsidiary used a technique known as reinforcement learning to build software that has repeatedly beat the best human players in Go. While learning the classic Chinese game, the machine got positive feedback for making moves that increased the area it walled off from its competitor. Its quest for a higher score spurred the AI to develop territory-taking tactics until it mastered the game.
The problem is that “the real world doesn’t have a score,” said Brown University roboticist Stefanie Tellex. Engineers need to code into AI programs so-called “reward functions”–mathematical ways of telling a machine it has acted correctly. Beyond the finite scenario of a game, amid the complexity of real-life interactions, it’s difficult to determine what results to reinforce. How, and how often, should engineers reward machines to guide them to perform a certain task? “The reward signal is so important to making these algorithms work,” Dr. Tellex added.
. . .
If a robot needs thousands of examples to learn, “it’s not clear that’s particularly useful,” said Ingmar Posner, the deputy director of the Oxford Robotics Institute in the U.K. “You want that machine to pick up pretty quickly what it’s meant to do.”

For the full commentary, see:
Daniela Hernandez. “‘Can Robots Learn to Improvise?” The Wall Street Journal (Sat., Dec. 16, 2017): C3.
(Note: ellipses added.)
(Note: the online version of the commentary has the date Dec. 15, 2017.)

The paper by the researchers at Vicarious, is:
Kansky, Ken, Tom Silver, David A. Mely, Mohamed Eldawy, Miguel Lázaro-Gredilla, Xinghua Lou, Nimrod Dorfman, Szymon Sidor, Scott Phoenix, and Dileep George. “Schema Networks: Zero-Shot Transfer with a Generative Causal Model of Intuitive Physics.” Manuscript, 2017.

The paper, mentioned above, from the MIT Media Lab, is:
Athalye, Anish, Logan Engstrom, Andrew Ilyas, and Kevin Kwok. “Synthesizing Robust Adversarial Examples.” Working paper, Oct. 30, 2017.

Leave a Reply

Your email address will not be published. Required fields are marked *