(p. C3) A few weeks ago a Google engineer got a lot of attention for a dramatic claim: He said that the company’s LaMDA system, an example of what’s known in artificial intelligence as a large language model, had become a sentient, intelligent being.
Large language models like LaMDA or San Francisco-based Open AI’s rival GPT-3 are remarkably good at generating coherent, convincing writing and conversations—convincing enough to fool the engineer. But they use a relatively simple technique to do it: The models see the first part of a text that someone has written and then try to predict which words are likely to come next. If a powerful computer does this billions of times with billions of texts generated by millions of people, the system can eventually produce a grammatical and plausible continuation to a new prompt or a question.
. . .
In what’s known as the classic “Turing test,” Alan Turing in 1950 suggested that if you couldn’t tell the difference in a typed conversation between a person and a computer, the computer might qualify as intelligent. Large language models are getting close. But Turing also proposed a more stringent test: For true intelligence, a computer should not only be able to talk about the world like a human adult—it should be able to learn about the world like a human child.
In my lab we created a new online environment to implement this second Turing test—an equal playing field for children and AI systems. We showed 4-year-olds on-screen machines that would light up when you put some combinations of virtual blocks on them but not others; different machines worked in different ways. The children had to figure out how the machines worked and say what to do to make them light up. The 4-year-olds experimented, and after a few trials they got the right answer. Then we gave state-of-the-art AI systems, including GPT-3 and other large language models, the same problem. The language models got a script that described each event the children saw and then we asked them to answer the same questions we asked the kids.
We thought the AI systems might be able to extract the right answer to this simple problem from all those billions of earlier words. But nobody in those giant text databases had seen our virtual colored-block machines before. In fact, GPT-3 bombed. Some other recent experiments had similar results. GPT-3, for all its articulate speech, can’t seem to solve cause-and-effect problems.
If you want to solve a new problem, googling it or going to the library may be a first step. But ultimately you have to experiment, the way the children did. GPT-3 can tell you what the most likely outcome of a story will be. But innovation, even for 4-year-olds, depends on the surprising and unexpected—on discovering unlikely outcomes, not predictable ones.
For the full commentary see:
(Note: ellipsis added.)
(Note: the online version of the commentary has the date July 15, 2022, and has the same title as the print version.)