Improved AI Models Do Worse at Identifying Prime Numbers

(p. A2) . . . new research released this week reveals a fundamental challenge of developing artificial intelligence: ChatGPT has become worse at performing certain basic math operations.

The researchers at Stanford University and the University of California, Berkeley said the deterioration is an example of a phenomenon known to AI developers as drift, where attempts to improve one part of the enormously complex AI models make other parts of the models perform worse.

“Changing it in one direction can worsen it in other directions,” said James Zou, a Stanford professor who is affiliated with the school’s AI lab and is one of the authors of the new research. “It makes it very challenging to consistently improve.”

The goal of the team of researchers, consisting of Lingjiao Chen, a computer-science Ph.D. student at Stanford, along with Zou and Berkeley’s Matei Zaharia, is to systematically and repeatedly see how the models perform over time at a range of tasks.

Thus far, they have tested two versions of ChatGPT: version 3.5, available free online to anyone, and version 4.0, available via a premium subscription.

The results aren’t entirely promising. They gave the chatbot a basic task: identify whether a particular number is a prime number. This is the sort of math problem that is complicated for people but simple for computers.

Is 17,077 prime? Is 17,947 prime? Unless you are a savant you can’t work this out in your head, but it is easy for computers to evaluate. A computer can just brute force the problem—try dividing by two, three, five, etc., and see if anything works.

To track performance, the researchers fed ChatGPT 1,000 different numbers. In March, the premium GPT-4, correctly identified whether 84% of the numbers were prime or not. (Pretty mediocre performance for a computer, frankly.) By June its success rate had dropped to 51%.

The phenomenon of unpredictable drift is known to researchers who study machine learning and AI, Zou said. “We had the suspicion it could happen here, but we were very surprised at how fast the drift is happening.”

