The Newest A.I. “Reasoning Models Actually Hallucinate More Than Their Predecessors”

I attended an I.H.S. Symposium last week where one of my minor discoveries was that a wide range of intellectuals, regardless of location on the political spectrum, share a concern for the allegedly damaging labor market effects of A.I.  As in much else I am an outlier–I am not concerned about A.I.

But since so many are concerned, and believe A.I. undermines my case for a better labor market under innovative dynamism, I will continue to occasionally highlight articles that present the evidence and arguments that reassure me.

(p. B1) “Humanity is close to building digital superintelligence,” Altman declared in an essay this week, and this will lead to “whole classes of jobs going away” as well as “a new social contract.” Both will be consequences of AI-powered chatbots taking over all our white-collar jobs, while AI-powered robots assume the physical ones.

Before you get nervous about all the times you were rude to Alexa, know this: A growing cohort of researchers who build, study and use modern AI aren’t buying all that talk.

The title of a fresh paper from Apple says it all: “The Illusion of Thinking.” In it, a half-dozen top researchers probed reasoning models—large language models that “think” about problems longer, across many steps—from the leading AI labs, including OpenAI, DeepSeek and Anthropic. They found little evidence that these are capable of reasoning anywhere close to the level their makers claim.

. . .

(p. B4) Apple’s researchers found “fundamental limitations” in the models. When taking on tasks beyond a certain level of complexity, these AIs suffered “complete accuracy collapse.” Similarly, engineers at Salesforce AI Research concluded that their results “underscore a significant gap between current LLM capabilities and real-world enterprise demands.”

Importantly, the problems these state-of-the-art AIs couldn’t handle are logic puzzles that even a precocious child could solve, with a little instruction. What’s more, when you give these AIs that same kind of instruction, they can’t follow it.

. . .

Gary Marcus, a cognitive scientist who sold an AI startup to Uber in 2016, argued in an essay that Apple’s paper, along with related work, exposes flaws in today’s reasoning models, suggesting they’re not the dawn of human-level ability but rather a dead end. “Part of the reason the Apple study landed so strongly is that Apple did it,” he says. “And I think they did it at a moment in time when people have finally started to understand this for themselves.”

In areas other than coding and mathematics, the latest models aren’t getting better at the rate that they once did. And the newest reasoning models actually hallucinate more than their predecessors.

For the full commentary see:

Christopher Mims. “Keywords: Apple Calls Today’s AI ‘The Illusion of Thinking’.” The Wall Street Journal (Sat., June 14, 2025): B1 & B4.

(Note: ellipses added.)

(Note: the online version of the commentary has the date June 13, 2025, and has the title “Keywords: Why Superintelligent AI Isn’t Taking Over Anytime Soon.” In the original print and online versions, the word “more” appears in italics for emphasis.)

Sam Altman’s blog essay mentioned above is:

Altman, Sam. “The Gentle Singularity.” In Sam Altman blog, June 10, 2025, URL: https://blog.samaltman.com/the-gentle-singularity.

The Apple research article briefly summarized in a passage quoted above is:

Shojaee, Parshin, Iman Mirzadeh, Keivan Alizadeh, Maxwell Horton, Samy Bengio, Mehrdad Farajtabar. “The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models Via the Lens of Problem Complexity.” Apple Machine Learning Research, June 2025, URL: https://machinelearning.apple.com/research/illusion-of-thinking.

Puzzling Studies Claim Economic Downturns Encourage Innovation

A recent paper co-authored by Talay joins several other studies (e.g., Anthony 2009 and Field 2011) in claiming that economic downturns encourage economic innovation.

I have always found these studies deeply puzzling. When I acquire infinite time, I plan to hunker down and try to figure out what is going on.

My initial hypothesis is that downturns do not actually help innovators, but that those entrepreneurs who persist in bringing new goods to market during a downturn either have higher levels of perseverance or else have better new goods.

In other words the puzzling results are due to a selection issue–other things are not equal, and downturns do not encourage innovation.

The WSJ article that summarizes the recent paper is:

Lisa Ward. “When a Recession Helps Product Launch.” The Wall Street Journal (Tues., June 17, 2025): B6.

(Note: the online version of the WSJ article has the date June 12, 2025, and has the title “Yes or No: It’s Smart to Launch a New Product in a Recession.”)

The recent academic published paper co-authored by Talay and summarized in The Wall Street Journal article mentioned and cited above is:

Talay, M. Berk, Koen Pauwels, and Steven H. Seggie. “Why and When to Launch New Products During a Recession: An Empirical Investigation of the U.K. FMCG Industry and the U.S. Automobile Industry.” Journal of the Academy of Marketing Science 52, no. 2 (March 2024): 576-98.

The two books cited above that support the claim that downturns encourage innovation are:

Anthony, Scott D. The Silver Lining: An Innovation Playbook for Uncertain Times. Boston, MA: Harvard Business School Press, 2009.

Field, Alexander J. A Great Leap Forward: 1930s Depression and U.S. Economic Growth, Yale Series in Economic and Financial History. New Haven, CT: Yale University Press, 2011.

Citizen Scientist Tim Friede Is Now “Director of Herpetology” at Centivax Startup

An earlier blog entry told the story, quoting The New York Times, of citizen scientist Tim Friede who injected himself with venom from multiple snakes, and allowed many to bite him, in order to develop antibodies that would protect him from all snake bites, and allow scientist Jacob Glanville to create a universal snakebite antivenom.

Now The Wall Street Journal has run a similar story, but adds a minor satisfying twist. The scientist Jacob Glanville has co-founded a startup called Centivax to develop the universal antivenom, and Friede is Centivax’s “director of herpetology.”

The Wall Street Journal article is:

Nidhi Subbaraman. “A Universal Antivenom, From a Man Bitten by Snakes 200 Times.” The Wall Street Journal (Sat., June 14, 2025): C5.

(Note: the online version of the WSJ article has the date June 10, 2025, and has the title “A Man Let Snakes Bite Him 200 Times. His Blood Inspired a Universal Antivenom.”)

Development of IVF Took 10 Years of Trial and Error

If the Joy television movie accurately reflects the history of the development of IVF (in vitro fertilization) then it illustrates a couple of themes that are important. One is the frequent fruitfulness of trial-and-error experimentation. The other is that some medical entrepreneurs are motivated by having some form of ‘skin-in-the-game,’ in this case nurse Jean Purdy. (Support for the second theme is more speculative than for the first, since the evidence that the real Jane Purdy experienced endometriosis and infertility is circumstantial.)

(p. A10) “Joy,” . . . begins in 1968 and charts the 10-year journey of trial, error and more trial and error by an odd trio of pioneers: Bob Edwards (James Norton), a biologist and true-believer in the possibilities of IVF; Patrick Steptoe (Bill Nighy), a surgical obstetrician who is less than convinced but can be; and Jean Purdy (Thomasin McKenzie), a nurse who signs on as Bob’s assistant and, as we learn, has her own agenda regarding infertile women. (Edwards received the 2010 Nobel Prize in Medicine, his partners having passed away.)

Jack Thorne’s screenplay massages the IVF medical story into a personal one, mostly about Jean, who is portrayed as a critical member of the team and the one whose life reflects the social uproar over the mission—giving childless women a choice about becoming mothers.

For the full television review see:

John Anderson. “The Birth of a Medical Miracle.” The Wall Street Journal (Monday, Nov 22, 2024): A10.

(Note: ellipsis added.)

(Note: the online version of the television review has the date November 21, 2024, and has the title “‘Joy’ Review: The Birth of a Medical Miracle on Netflix.”)

Impact of Weight Loss Drugs Underlines the Importance of Serendipity and Medical Entrepreneurship

The story of the creation of the weight loss drugs involves a fair amount of serendipity and medical entrepreneurship. A case has been made that we would have had these drugs 25 years sooner if Pfizer had not abandoned the development of their version due to the dauntingly huge costs of drug development. Recall that the largest component of those huge costs are the mandated Phase 3 randomized double-blind clinical trials.

Almost everyone now views these drugs as a major medical advance. The question is: how big? According to the reviewer quoted below, Dr. Eric Topol speculates: very big indeed. He raises the possibility that to improve lifespan and healthspan eventually most of us will be on one of these drugs.

If so then the weight loss drugs will be even more compelling examples of the importance of regulating so as to allow entrepreneurs to take quick advantage of serendipity. How many lives were lost that could have been saved, how much suffering was experienced that could have been avoided, if over-regulation by the F.D.A. had not delayed the availability of these drugs by 25 years?

(p. A13) More than half of American adults suffer from at least one chronic illness—most commonly diabetes, heart disease, cancer or neurodegeneration. By age 65, 80% are afflicted with two or more conditions. Among those fortunate enough to reach 80, it’s rare to find anyone who has arrived unscathed. In 2008 a group of scientists at the Scripps Research Institute in San Diego set out to recruit 1,400 of these healthy souls—known as the Wellderly—to figure out how they managed it.

Led by the cardiologist Eric Topol, the researchers hoped to identify the genetic factors associated with healthy aging. To their surprise, they found little in the DNA that stood out. They did, however, notice several striking traits. Compared with their peers, the disease-free subjects were generally thinner, exercised more frequently and seemed “remarkably upbeat,” often with rich social lives. These observations encouraged the research team to think about longevity (years of life) and healthspan (years of health) more broadly. In “Super Agers” Dr. Topol shares the results of this intellectual exploration.

. . .

Expensive new weight-loss drugs like Ozempic and Zepbound, Dr. Topol writes, have “extraordinary potential to promote health span.” In addition to stanching appetite, these drugs also seem to rapidly reduce harmful inflammation—an effect that “precedes and is independent of weight loss.” In the future, the author believes it’s “conceivable that most people will be taking” such medications, . . .

For the full review see:

David A. Shaywitz. “Bookshelf; Living the Good Life.” The Wall Street Journal (Wednesday, May 7, 2025): A13.

(Note: ellipses added.)

(Note: the online version of the review has the date May 6, 2025, and has the title “Bookshelf; ‘Super Agers’: Living the Good Life.”)

The book under review is:

Topol, Eric. Super Agers: An Evidence-Based Approach to Longevity. New York: Simon & Schuster, 2025.

Correll Managed Georgia-Pacific Well and Then Used Those Skills to Save a Failing Hospital

In my Openness book, I make the case for the many benefits of an economic system of innovative dynamism. One of the lesser, but still important, benefits was first identified by Joseph Schumpeter. He argued for a spillover effect of innovative dynamism. The skills, knowledge, and technologies created by innovative entrepreneurs in the for-profit sector of the economy, are also applied and imitated in the nonprofit and government sectors. So where there is innovative dynamism, not only is the market more creative and efficient, but both the nonprofit and the government sectors are more creative and efficient.

A good example may be Pete Correll who acted entrepreneurially as CEO of Georgia-Pacific to bring more stability to the business by acquiring the James River Corporation, maker of Quilted Northern, and guided the Georgia-Pacific firm through years of lawsuits over asbestos. He eventually sold Georgia-Pacific to Koch Industries, Inc. My impression is that Charles Koch then applied his market-based management system to make the Georgia-Pacific part of his business much more efficient and innovative. [Query: does Koch’s achievement undermine my claim that Pete Correll had acted entrepreneurially in his earlier management of Georgia-Pacific? Or can both Correll and Koch have been good manager/entrepreneurs, but in different ways at different times?]

But according to his obituary in the WSJ, his greatest achievement may have been in taking over a near-bankrupt Atlanta public (aka government) hospital, reorganizing it from government to nonprofit, and modernizing its management and technology.

Carrell’s obituary in the WSJ:

James R. Hagerty. “CEO Helped Save A Public Hospital.” The Wall Street Journal (Sat., June 5, 2021 [sic]): A9.

(Note: the online version of the WSJ obituary has the date June 2, 2021 [sic], and has the title “Retired CEO Saved an Atlanta Public Hospital.”)

For Charles Koch’s entrepreneurial market-based management system see:

Koch, Charles G. The Science of Success: How Market-Based Management Built the World’s Largest Private Company. Hoboken, NJ: Wiley & Sons, Inc., 2007.

My book mentioned in my initial comments is:

Diamond, Arthur M., Jr. Openness to Creative Destruction: Sustaining Innovative Dynamism. New York: Oxford University Press, 2019.

It May Take a “Thorny Character” to Be “Willing to Challenge Entire Establishment Belief Systems”

The obituary quoted below misidentifies Richard Bernstein’s main contribution. Yes, it is noteworthy that he was probably the first diabetes sufferer to effectively and continually monitor his own blood glucose level. But his main contribution was that by careful self-monitoring and trial-and-error experimentation he discovered that his health improved when he cutback on both carbs and insulin.

The obituary writer quotes Gary Taubes, but either didn’t read his book or disagrees with it, because Taubes is clear about Bernstein’s main contribution.

I am halfway through Taubes’s book. It is long and sometimes deep in the weeds, but comes highly recommended by Marty Makary and Siddhartha Mukherjee, both of whom I highly respect. The book sadly highlights how mainstream medicine can be very slow to reform clinical practice to new knowledge.

(p. C6) Richard Bernstein was flipping through a medical trade journal in 1969 when he saw an advertisement for a device that could check blood-sugar levels in one minute with one drop of blood. It was marketed to hospitals, not consumers, but Bernstein wanted one for himself. He had been sick his entire life and was worried he was running out of time.

. . .

Since he wasn’t a doctor, the manufacturer wouldn’t even sell him a device. So, he bought one under the name of his wife, Dr. Anne Bernstein, a psychiatrist.

He experimented with different doses of insulin and the frequency of shots. He eased off carbohydrates. He checked his blood sugar constantly to see how it was reacting.

After experimenting for several years, he figured out that if he maintained a low-carb diet, he didn’t need as much insulin and could avoid many of the wild swings in his blood-sugar levels. By checking his blood sugar throughout the day, he learned how to maintain normal levels. It changed his life.

. . .

With his diabetes under control, he tried to spread the word and change the way the disease is treated. In the early years, he was dismissed by much of the medical establishment. His ideas went against accepted wisdom and he was, after all, not a doctor. In 1979, at the age of 45, he enrolled at the Albert Einstein College of Medicine, where he received his M.D.

“I never wanted to be a doctor,” he told the New York Times in 1988. “But I had to become one to gain credibility.”

Bernstein went into private practice in Mamaroneck, N.Y., where he treated diabetics and continued to advocate for his ideas—to his patients, in articles, YouTube videos, letters to the editor, and writing books, including “Dr. Bernstein’s Diabetes Solution.”

. . .

Gary Taubes, the author of “Rethinking Diabetes,” said that it was Bernstein’s work that eventually led to the Diabetes Control and Complications Trial, a landmark study that demonstrated that diabetics could blunt the destructive effects of the disease by keeping their blood-sugar levels nearer normal. Released in 1993, the results led to the kind of self-monitoring and frequent shots of insulin that remains part of the standard treatment plan for Type 1 diabetes today—part of what Bernstein had been pushing for years.

This was only partial vindication for Bernstein. The medical establishment never fully embraced Bernstein or the strict low-carb diet that he prescribed, which some considered unrealistic.

Taubes said that Bernstein was a bit of a “thorny character” who was easy for the establishment to dislike. He also noted that’s something that comes with the territory when you spend your career telling people they’re wrong and you’re right.

“But often it’s the people who are not easy to like,” Taubes said, “who are the ones who are willing to challenge entire establishment belief systems.”

For the full obituary see:

Chris Kornelis. “A Diabetic Who Pioneered Self-Monitoring for Blood Sugar.” The Wall Street Journal (Sat., May 10, 2025): C6.

(Note: the online version of the WSJ obituary has the date May 9, 2025, and has the title “Richard Bernstein, Who Pioneered Diabetics’ Self-Monitoring of Blood Sugar, Dies at 90.”)

Bernstein’s book mentioned above is:

Bernstein, Richard K., MD. Dr. Bernstein’s Diabetes Solution: The Complete Guide to Achieving Normal Blood Sugars. New York: Little, Brown Spark, 2011.

Taubes’s book mentioned above is:

Taubes, Gary. Rethinking Diabetes: What Science Reveals About Diet, Insulin, and Successful Treatments. New York: Knopf, 2024.

The Classical Liberal Economist’s Current Job: Minimize the Harm from Tariffs, Maximize the Benefits from Deregulation and Downsizing Government

I used to run into Richard Burkhauser at economics meetings occasionally and always enjoyed talking with him and hearing about his research. I believe Richard’s activity in the first Trump administration makes sense: if tariffs are going to be imposed, do them in a way that minimizes the damage to the economy. Although not mentioned in the article quoted below, I am sure Richard also did what he could to further the part of Trump’s agenda that was positive for he economy: reducing regulations so entrepreneurs can innovate and create jobs, and downsizing the government so taxpayers can keep more of their earnings.

(p. 1) Partway through a panel discussion at a recent economics conference in San Francisco, Jason Furman, a former adviser to President Barack Obama, turned to Kimberly Clausing, a former member of the Biden administration and the author of a book extolling the virtues of free trade.

“Everyone in this room agrees with your book,” Mr. Furman said. “No one outside of this room agrees with your book.”

The academics and policy wonks gathered in the hotel conference room laughed, but the comment captured something real: After decades of helping to shape policy on weighty matters like taxes and health insurance, economists find that their influence is at a low ebb.

. . .

(p. 6) Mr. Trump, in his first term, had few economists in top roles, and perhaps the most prominent exception — Peter Navarro, a Harvard-trained economist who was an adviser on trade policy — held skeptical views on trade, particularly with China, that put him far outside the economic mainstream. (In a 2016 survey of academic economists, not a single respondent said putting tariffs on China to encourage domestic production would be a good idea.)

Economists who held more mainstream views had limited influence. Richard Burkhauser, a Cornell University professor who served on Mr. Trump’s Council of Economic Advisers, said he and his colleagues quickly understood that there was little point in trying to talk Mr. Trump out of imposing tariffs.

“The most forlorn economists at the C.E.A. specialized in trade,” he said. If they had tried to fight tariffs, he said, “that would have been the last meeting we were at.”

Instead, Mr. Burkhauser said, economists focused on a different question: If the administration was going to impose tariffs, how could it do them in the least painful way possible?

For the full story see:

Ben Casselman. “Economists See Influence Wane in Policy Circles.” The New York Times, SundayBusiness Section (Sun., January 12, 2025): 1 & 6.

(Note: ellipsis added.)

(Note: the online version of the story has the date Jan. 10, 2024, and has the title “Economists Are in the Wilderness. Can They Find a Way Back to Influence?”)

Plenty in Science Still “Just Doesn’t Make Any Sense”

In my Openness book, I argue against those who see a future of inevitable stagnation. One argument for inevitable stagnation says that entrepreneurs build their innovations on science and we have run out of new knowledge to learn in science.

But whenever we keep our eyes open and observe more closely, or in new areas, we see what we cannot yet explain. The passages quoted below give another example. So we still have a lot to learn in science.

(Of course I also point out in the book that much entrepreneurial innovation is not tied to current advances in science–and is done by entrepreneurs who do not know, or who do not hold in high esteem, the current conclusions of mainstream scientists.)

(p. A14) On Dec. 24 [2024], NASA’s Parker Solar Probe swooped closer than it ever had before to the sun, just a few million miles above its blazing hot surface.

The team behind the mission waited nervously, trusting that the probe would survive the encounter. Then, a few minutes shy of midnight on Thursday [Dec. 2?, 2024], Parker phoned home.

. . .

. . ., there was some fear that the probe might not survive this time. Parker’s heat shield is designed so that the front of the vehicle can withstand facing the blistering heat of the sun’s outer atmosphere, which reaches millions of degrees, while the back, which contains the probe’s sensitive instruments, sits at a comfortable 85 degrees Fahrenheit.

“Literally one side is at a temperature that is unfathomable,” Joseph Westlake, the director of heliophysics at NASA, said. “And the back of it is a hot, sunny day.”

. . .

Parker’s data will . . . help scientists understand how the sun’s outer atmosphere, known as the corona, can be hundreds of times hotter than the solar surface below it.

“It’s like if you were standing next to a bonfire and you took a couple of steps back, and all of a sudden it got hotter,” Dr. Westlake said. “It just doesn’t make any sense.”

For the full story see:

Katrina Miller. “After Silence, Solar Probe Signals Earth of Survival.” The New York Times (Sat., December 28, 2024): A14.

(Note: ellipses, bracketed year, and bracketed date, added.)

(Note: the online version of the story was updated Dec. 30, 2024, and has the title “After Days of Silence, NASA’s Parker Solar Probe Phones Home.”)

My book mentioned in my initial comments is:

Diamond, Arthur M., Jr. Openness to Creative Destruction: Sustaining Innovative Dynamism. New York: Oxford University Press, 2019.

Medical Entrepreneur Fired for Nimbly Pivoting to Get Job Done

Back in early 2021, the Moderna vaccine was not yet widely available. Protocols mandated who could get the scarce shots, prioritizing health care workers, senior citizens, and those with severe diseases. Each vial contained enough for 10 doses, but the doses had to be given with six hours, before the vaccine spoiled. On Dec. 29 Dr. Hasan Gokal, a Pakistani immigrant, worked at the county’s first vaccination event, set up for health care workers. Near the end of the scheduled event a health care worker showed up and a nurse punctured a new vial to give the worker the shot.

Now, what to do with the remaining nine doses? He got on the phone and drove around seeking and finding several senior citizens who wanted the vaccine. Exhausted with a half-hour until the vaccine expired, he gave the final dose to his wife, who had pulmonary sarcoidosis, which was indicated in the protocols as a qualification for the vaccine.

Dr. Gokal’s supervisor and the director of human resources then fired Dr. Gokal:

The officials maintained that he had violated protocol and should have returned the remaining doses to the office or thrown them away, the doctor recalled. He also said that one of the officials startled him by questioning the lack of “equity” among those he had vaccinated.

“Are you suggesting that there were too many Indian names in that group?” Dr. Gokal said he asked.

Exactly, he said he was told. (Barry 2021, p. A5)

A couple of weeks later, the county district attorney charged Dr. Gokal with theft of doses of the vaccine.

Dr. Gokal acted as a medical entrepreneur. His job was to save lives by administering the vaccine. He nimbly pivoted in a difficult situation. For that he was punished–fired and charged with a crime.

The growing promulgation and enforcement of protocols limit physicians from acting as mission-oriented entrepreneurs. They are limited in their use of judgement based on their own experiences, they are limited in innovating, and sometimes they are even limited in using all of a scarce vaccine. These limits may be part of the reason that so many physicians today experience frustration and burn-out.

[As of the time of the writing of the NYT article cited below, Dr. Gokal remained fired from his job, and still was in legal jeopardy.]

My source for the facts of Dr. Gokal’s case, is the NYT article:

Dan Barry. “Racing the Clock, a Doctor Gave Out the Vaccine.” The New York Times (Thurs., February 11, 2021 [sic]): A1 & A5.

(Note: the online version of the NYT article was updated June 23, 2023 [sic], and has the title “The Vaccine Had to Be Used. He Used It. He Was Fired.”)

Gig Work Enables Free Agent Entrepreneurship

In my Openness book, I distinguish between free agent entrepreneurs and innovative entrepreneurs. Free agent entrepreneurs are there own boss, doing what has been done before. Innovative entrepreneurs are their own boss, doing what is new. Of course the distinction is not sharp–a continuum.

Recent research, summarized in the WSJ, suggests that gig work can ease entry into free agent entrepreneurship. Gig work is flexible–the gig worker has time when they need it, to work on their entrepreneurial venture. Gig work also can generate capital and give experience in self-management.

A higher percent of gig workers become entrepreneurs than similar employed workers, and they do so, on average, at a slightly younger age.

Those who want to regulate gig work, and thereby make it less common, should remember how gig work benefits aspiring entrepreneus.

The WSJ article mentioned above is:

Lisa Ward. “Gig Workers Show More Enterprise, Study Finds.” The Wall Street Journal (Thurs., May 8, 2025): A11.

(Note: the online version of the WSJ article has the date May 5, 2025, and has the title “Want to Start a Business? Maybe Begin by Being a Gig Worker.”)

The academic working paper summarized in the WSJ article is:

Denes, Matthew R., Spyridon Lagaras, and Margarita Tsoutsoura. “Entrepreneurship and the Gig Economy: Evidence from U.S. Tax Returns.” In National Bureau of Economic Research Working Paper #33347, Jan. 2025.

My book mentioned in my initial comments is:

Diamond, Arthur M., Jr. Openness to Creative Destruction: Sustaining Innovative Dynamism. New York: Oxford University Press, 2019.