Latest “So-Called Reasoning Systems” Hallucinate MORE Than Earlier A.I. Systems

Since more sophisticated “reasoning” A.I. systems are increasingly inaccurate on the facts, it is unlikely that such systems will threaten any job where job performance depends on getting the facts right. Wouldn’t that include most jobs? The article quoted below suggests it would most clearly include jobs working with “court documents, medical information or sensitive business data.”

(p. B1) The newest and most powerful technologies — so-called reasoning systems from companies like OpenAI, Google and the Chinese start-up DeepSeek — are generating more errors, not fewer. As their math skills have notably improved, their handle on facts has gotten shakier. It is not entirely clear why.

Today’s A.I. bots are based on complex mathematical systems that learn their skills by analyzing enormous amounts of digital data. They do not — and cannot — decide what (p. B6) is true and what is false. Sometimes, they just make stuff up, a phenomenon some A.I. researchers call hallucinations. On one test, the hallucination rates of newer A.I. systems were as high as 79 percent.

. . .

The A.I. bots tied to search engines like Google and Bing sometimes generate search results that are laughably wrong. If you ask them for a good marathon on the West Coast, they might suggest a race in Philadelphia. If they tell you the number of households in Illinois, they might cite a source that does not include that information.

Those hallucinations may not be a big problem for many people, but it is a serious issue for anyone using the technology with court documents, medical information or sensitive business data.

“You spend a lot of time trying to figure out which responses are factual and which aren’t,” said Pratik Verma, co-founder and chief executive of Okahu, a company that helps businesses navigate the hallucination problem. “Not dealing with these errors properly basically eliminates the value of A.I. systems, which are supposed to automate tasks for you.”

. . .

For more than two years, companies like OpenAI and Google steadily improved their A.I. systems and reduced the frequency of these errors. But with the use of new reasoning systems, errors are rising. The latest OpenAI systems hallucinate at a higher rate than the company’s previous system, according to the company’s own tests.

The company found that o3 — its most powerful system — hallucinated 33 percent of the time when running its PersonQA benchmark test, which involves answering questions about public figures. That is more than twice the hallucination rate of OpenAI’s previous reasoning system, called o1. The new o4-mini hallucinated at an even higher rate: 48 percent.

When running another test called SimpleQA, which asks more general questions, the hallucination rates for o3 and o4-mini were 51 percent and 79 percent. The previous system, o1, hallucinated 44 percent of the time.

. . .

For years, companies like OpenAI relied on a simple concept: The more internet data they fed into their A.I. systems, the better those systems would perform. But they used up just about all the English text on the internet, which meant they needed a new way of improving their chatbots.

So these companies are leaning more heavily on a technique that scientists call reinforcement learning. With this process, a system can learn behavior through trial and error. It is working well in certain areas, like math and computer programming. But it is falling short in other areas.

For the full story see:

Cade Metz and Karen Weise. “A.I. Hallucinations Are Getting Worse.” The New York Times (Fri., May 9, 2025): B1 & B6.

(Note: ellipses added.)

(Note: the online version of the story was updated May 6, 2025, and has the title “A.I. Is Getting More Powerful, but Its Hallucinations Are Getting Worse.”)

Links, Diamond Videos, and Podcasts

UNECE · Innovation Matters: Innovative Dynamism

Innovation history and policies continue to be the themes of this second part of my conversation with Lars Anders Joensson on the United Nations’s Innovation Matters podcast. The discussion of “Innovation Matters: Innovative Dynamism” is mostly related to the process of innovative dynamism as discussed in my book Openness to Creative Destruction. Anders was especially energized in this second part of the conversation. (Recorded Weds., Aug. 3, 2022; posted Thurs., Sept. 19, 2024.) [To play this podcast, you click on the white-arrow-in-the-red-circle, in the upper left hand corner.]

UNECE · Innovation Matters: Openness to creative destruction (part 1) - lessons from history

I discuss innovation history and policies on the United Nations's Innovation Matters podcast in this first part of a conversation with Lars Anders Joensson that was recorded on Weds., Aug. 3, 2022 and was posted on Fri., Feb. 24, 2023. The discussion was mostly based on my book Openness to Creative Destruction. [To play this podcast, you click on the white-arrow-in-the-red-circle, in the upper left hand corner.]

I discuss "Policy Hurdles in the Fight against Aging" on Caleb O. Brown's Cato Daily Podcast that was recorded on Sun., April 3, 2022 and was posted on Fri., May 27, 2022. The discussion is based on research that I am conducting for a chapter of my next book which will be on Less Costs, More Cures: Unbinding Medical Entrepreneurs. [To play this podcast, you click on the white-arrow-in-the-light-blue-circle, in the lower left hand corner.]

On Nov. 3, 2021, I presented "Galilean Science: The Impediment to Progress When Science as Doctrine Wins Over Science as Process" at an Organisation [sic] for Economic Co-operation and Development (OECD) workshop on "AI and the Future of Science." I am grateful to Alistair Nolan for inviting me to participate.

Dr. Derek Yonai of the Koch Center for Leadership and Ethics posted on Tues., March 9, 2021 my half-hour "Innovation Unbound" lecture on how regulations bind innovators.

Petition Seeks to Increase Nebraska Minimum wage

The above story, by reporter Brent Weber, ran on WOWT’s 10 PM news on Tuesday, Aug. 10, 2021. It includes a couple of brief comments by me near the end.

Kate Wand slightly edited my AIER article "When I Knew More Than Hayek," and transformed it into a video she titled "Hayek, Covid & The Use of Knowledge in Society." This is the YouTube version of the video that "premiered" on Jan. 4, 2021. If you click above, the video should play right within my blog.

The YouTube version of the full hour and 15 minute EconTalk podcast on Openness to Creative Destruction, that was posted on August 12, 2019. The host and interviewer was Russ Roberts of Stanford University's Hoover Institution. If you click above, the podcast should play right within my blog.

Arthur Diamond: Sustaining Innovative Dynamism

The URL for the 29 minute "Arthur Diamond: Sustaining Innovative Dynamism" episode of Jim Pethokoukis's Political Economy podcast at the American Enterprise Institute (AEI) web site. Jim interviewed me on my book Openness to Creative Destruction. The episode was posted on July 29, 2020.

The YouTube version of the full hour and 8 minute Econonomics for Entrepreneurs podcast on Openness to Creative Destruction, that was posted on Oct. 22, 2019. The host and interviewer was Hunter Hastings of the Mises Institute. If you click above, the podcast should play right within my blog.

Innovation and Creative Destruction

The URL for the 55 minute "Innovation and Creative Destruction" episode of the Cato Institute's Free Thoughts podcast hosted by Aaron Ross Powell and Trevor Burrus. They interviewed me on my book Openness to Creative Destruction in an episode that was posted on February 29, 2020.

"Wilbur Wright Circles Manhattan": brief musings on Wilbur Wright, flight, and my Openness to Creative Destruction book.

Free to Try a Cure for Covid-19

The URL for the 35 minute "Free to Try a Cure for Covid-19" episode of David Forsyth's Freedom Adventure podcast. In an episode that was posted on Aug. 5, 2020, David interviewed me on how to speed therapies, or a vaccine, for Covid-19, and on my book Openness to Creative Destruction.

Arthur Diamond Interviews on Jim Blassingame's The Small Business Advocate

The URL leads to links to a series of interviews on topics including my book Openness to Creative Destruction, entrepreneurship, regulations, labor markets, and policies to speed vaccines and cures for Covid-19.

Art Diamond's personal website artdiamond.com

Art Diamond's academic website at UNO

"Cafe Hayek" (Don Boudreaux's excellent blog)

The StatCounter number above reports the number of "page loads" since the counter was installed late on 2/26/08. Page loads are defined on the site as "The number of times your page has been visited."

View My Stats

Leave a Reply Cancel reply