AI Cannot Know What People Think “At the Very Edge of Their Experience”

The passages quoted below mention “the advent of generative A.I.” From previous reading, I had the impression that “generative A.I” meant A.I. that had reached human level cognition. But when I looked up the meaning of the phrase, I found that it means A.I. that can generate new content. Then I smiled. I was at Wabash College as an undergraduate from 1971-1974 (I graduated in three years). Sometime during those years, Wabash acquired its first minicomputer, and I took a course in BASIC computer programming. I distinctly remember programming a template for a brief poem where at key locations I inserted a random word variable. Where the random word variable occurred, the program randomly selected from one of a number of rhyming words. So each time the program was run, a new rhyming poem would be “generated.” That was new content, and sometimes it was even amusing. But it wasn’t any good, and it did not have deep meaning, and if what it generated was true, it was only by accident. So I guess “the advent of generative A.I.” goes back at least to the early 1970s when Art Diamond messed around with a DEC.

This is not the main point of the passages quoted below. The main point is that the frontiers of human thought are not on the internet, and so cannot be part of the training of A.I. So whatever A.I. can do, it can’t think at the human “edge.”

(p. B3) Dan Shipper, the founder of the media start-up Every, says he gets asked a lot whether he thinks robots will replace writers. He swears they won’t, at least not at his company.

. . .

Mr. Shipper argues that the advent of generative A.I. is merely the latest step in a centuries-long technological march that has brought writers closer to their own ideas. Along the way, most typesetters and scriveners have been erased. But the part of writing that most requires humans remains intact: a perspective and taste, and A.I. can help form both even though it doesn’t have either on its own, he said.

“One example of a thing that journalists do that language models cannot is come and have this conversation with me,” Mr. Shipper said. “You’re going out and talking to people every day at the very edge of their experience. That’s always changing. And language models just don’t have access to that, because it’s not on the internet.”

For the full story see:

Benjamin Mullin. “Will Writing Survive A.I.? A Start-Up Is Betting on It.” The New York Times (Mon., May 26, 2025): B3.

(Note: ellipsis added.)

(Note: the online version of the story has the date May 21, 2025, and has the title “Will Writing Survive A.I.? This Media Company Is Betting on It.”)

If AI Takes Some Jobs, New Human Jobs Will Be Created

In the passage quoted below, Atkinson makes a sound general case for optimism on the effects of AI on the labor market. I would add to that case that many are currently overestimating the potential cognitive effectiveness of AI. Humans have a vast reservoir of unarticulated common sense knowledge that is not accessible to AI training. In addition AI cannot innovate at the frontiers of knowledge, not yet posted to the internet.

(p. A15) AI doomsayers frequently succumb to what economists call the “lump of labor” fallacy: the idea that there is a limited amount of work to be done, and if a job is eliminated, it’s gone for good. This fails to account for second-order effects, whereby the saving from increased productivity is recycled back into the economy in the form of higher wages, higher profits and reduced prices. This creates new demand that in turn creates new jobs. Some of these are entirely new occupations, such as “content creator assistant,” but others are existing jobs that are in higher demand now that people have more money to spend—for example, personal trainers.

Suppose an insurance firm uses AI to handle many of the customer-service functions that humans used to perform. Assume the technology allows the firm to do the same amount of work with 50% less labor. Some workers would lose their jobs, but lower labor costs would decrease insurance premiums. Customers would then be able to spend less money on insurance and more on other things, such as vacations, restaurants or gym memberships.

In other words, the savings don’t get stuffed under a mattress; they get spent, thereby creating more jobs.

For the full commentary, see:

Robert D. Atkinson. “No, AI Robots Won’t Take All Our Jobs.” The Wall Street Journal (Fri., June 6, 2025): A15.

(Note: the online version of the commentary has the date June 5, 2025, and has the same title as the print version.)

Latest “So-Called Reasoning Systems” Hallucinate MORE Than Earlier A.I. Systems

Since more sophisticated “reasoning” A.I. systems are increasingly inaccurate on the facts, it is unlikely that such systems will threaten any job where job performance depends on getting the facts right. Wouldn’t that include most jobs? The article quoted below suggests it would most clearly include jobs working with “court documents, medical information or sensitive business data.”

(p. B1) The newest and most powerful technologies — so-called reasoning systems from companies like OpenAI, Google and the Chinese start-up DeepSeek — are generating more errors, not fewer. As their math skills have notably improved, their handle on facts has gotten shakier. It is not entirely clear why.

Today’s A.I. bots are based on complex mathematical systems that learn their skills by analyzing enormous amounts of digital data. They do not — and cannot — decide what (p. B6) is true and what is false. Sometimes, they just make stuff up, a phenomenon some A.I. researchers call hallucinations. On one test, the hallucination rates of newer A.I. systems were as high as 79 percent.

. . .

The A.I. bots tied to search engines like Google and Bing sometimes generate search results that are laughably wrong. If you ask them for a good marathon on the West Coast, they might suggest a race in Philadelphia. If they tell you the number of households in Illinois, they might cite a source that does not include that information.

Those hallucinations may not be a big problem for many people, but it is a serious issue for anyone using the technology with court documents, medical information or sensitive business data.

“You spend a lot of time trying to figure out which responses are factual and which aren’t,” said Pratik Verma, co-founder and chief executive of Okahu, a company that helps businesses navigate the hallucination problem. “Not dealing with these errors properly basically eliminates the value of A.I. systems, which are supposed to automate tasks for you.”

. . .

For more than two years, companies like OpenAI and Google steadily improved their A.I. systems and reduced the frequency of these errors. But with the use of new reasoning systems, errors are rising. The latest OpenAI systems hallucinate at a higher rate than the company’s previous system, according to the company’s own tests.

The company found that o3 — its most powerful system — hallucinated 33 percent of the time when running its PersonQA benchmark test, which involves answering questions about public figures. That is more than twice the hallucination rate of OpenAI’s previous reasoning system, called o1. The new o4-mini hallucinated at an even higher rate: 48 percent.

When running another test called SimpleQA, which asks more general questions, the hallucination rates for o3 and o4-mini were 51 percent and 79 percent. The previous system, o1, hallucinated 44 percent of the time.

. . .

For years, companies like OpenAI relied on a simple concept: The more internet data they fed into their A.I. systems, the better those systems would perform. But they used up just about all the English text on the internet, which meant they needed a new way of improving their chatbots.

So these companies are leaning more heavily on a technique that scientists call reinforcement learning. With this process, a system can learn behavior through trial and error. It is working well in certain areas, like math and computer programming. But it is falling short in other areas.

For the full story see:

Cade Metz and Karen Weise. “A.I. Hallucinations Are Getting Worse.” The New York Times (Fri., May 9, 2025): B1 & B6.

(Note: ellipses added.)

(Note: the online version of the story was updated May 6, 2025, and has the title “A.I. Is Getting More Powerful, but Its Hallucinations Are Getting Worse.”)

A.I. Only “Knows” What Has Been Published or Posted

A.I. “learns” by scouring language that has been published or posted. If outdated or never-true “facts” are posted on the web, A.I. may regurgitate them. It takes human eyes to check whether there really is a picnic table in a park.

(p. B1) Last week, I asked Google to help me plan my daughter’s birthday party by finding a park in Oakland, Calif., with picnic tables. The site generated a list of parks nearby, so I went to scout two of them out — only to find there were, in fact, no tables.

“I was just there,” I typed to Google. “I didn’t see wooden tables.”

Google acknowledged the mistake and produced another list, which again included one of the parks with no tables.

I repeated this experiment by asking Google to find an affordable carwash nearby. Google listed a service for $25, but when I arrived, a carwash cost $65.

I also asked Google to find a grocery store where I could buy an exotic pepper paste. Its list included a nearby Whole Foods, which didn’t carry the item.

For the full commentary see:

Brian X. Chen. “Underneath a New Way to Search, A Web of Wins and Imperfections.” The New York Times (Tues., June 3, 2025): B1 & B4.

(Note: the online version of the commentary has the date May 29, 2024, and has the title “Google Introduced a New Way to Use Search. Proceed With Caution.”)

Do Graeber’s “Bullshit Jobs” Thrive in Innovative Dynamism?

Last week I participated in a panel on “Freedom and Abundance” with Bri Wolf at an I.H.S. Symposium on “The Future of Liberalism.” As a small part of my presentation (and also in my Openness book), I claim that innovative dynamism creates more jobs than it destroys, and that the new jobs are generally better jobs than the old jobs.

After the panel Bri asked me how I respond to David Graeber’s book Bullshit Jobs. I vaguely remembered hearing of the book, and told her I would look into it. What follows is my brief, quick, edited response.

Graeber claimed that a large number of jobs in the for-profit sector are purposeless, demoralizing “bullshit” jobs. I do think that there are some bullshit jobs, but think that they are much more common in the government and non-profit sectors than in the for-profit sector. There are some in the for-profit sector, but I would argue that the number is diminishing, and many of them are due to labor unions and government regulations, that protect bullshit jobs from being eliminated.

Where innovative dynamism is allowed to function unbound, the trend is toward more meaningful jobs. Two of the important technological innovations of the last several decades have been computers and the internet. Erik Brynjolfsson and co-authors wrote a few papers showing that an important effect has been to flatten the hierarchy at a great many firms. This eliminates much of the middle management that Graeber identifies as one main location of bullshit jobs.

I also looked the book up on Wikipedia and noticed that a couple of empirical papers have been written that raise doubts about some of the claims in the book.

The book seems to have gotten enough attention to justify a longer more serious critique than I am giving it in this blog entry. But I humor myself that I have bigger fish to fry, namely my mission to see if I can help nudge the healthcare mess more toward being a system of innovative dynamism.

Some of Erik Brynjolfsson’s relevant co-authored articles, alluded to above, are:

Bresnahan, Timothy F., Erik Brynjolfsson, and Lorin M. Hitt. “Information Technology, Workplace Organization and the Demand for Skilled Labor: Firm-Level Evidence.” Quarterly Journal of Economics 117, no. 1 (2002): 339-76.

Brynjolfsson, Erik, and Lorin M. Hitt. “Beyond Computation: Information Technology, Organizational Transformation and Business Performance.” Journal of Economic Perspectives 14, no. 4 (Fall 2000): 23-48.

Brynjolfsson, Erik, and Lorin M. Hitt. “Computing Productivity: Firm-Level Evidence.” Review of Economics and Statistics 85, no. 4 (Nov. 2003): 793-808.

David Graeber’s book is:

Graeber, David. Bullshit Jobs: A Theory. New York: Simon & Schuster, 2018.

My book is:

Diamond, Arthur M., Jr. Openness to Creative Destruction: Sustaining Innovative Dynamism. New York: Oxford University Press, 2019.

The Newest A.I. “Reasoning Models Actually Hallucinate More Than Their Predecessors”

I attended an I.H.S. Symposium last week where one of my minor discoveries was that a wide range of intellectuals, regardless of location on the political spectrum, share a concern for the allegedly damaging labor market effects of A.I.  As in much else I am an outlier–I am not concerned about A.I.

But since so many are concerned, and believe A.I. undermines my case for a better labor market under innovative dynamism, I will continue to occasionally highlight articles that present the evidence and arguments that reassure me.

(p. B1) “Humanity is close to building digital superintelligence,” Altman declared in an essay this week, and this will lead to “whole classes of jobs going away” as well as “a new social contract.” Both will be consequences of AI-powered chatbots taking over all our white-collar jobs, while AI-powered robots assume the physical ones.

Before you get nervous about all the times you were rude to Alexa, know this: A growing cohort of researchers who build, study and use modern AI aren’t buying all that talk.

The title of a fresh paper from Apple says it all: “The Illusion of Thinking.” In it, a half-dozen top researchers probed reasoning models—large language models that “think” about problems longer, across many steps—from the leading AI labs, including OpenAI, DeepSeek and Anthropic. They found little evidence that these are capable of reasoning anywhere close to the level their makers claim.

. . .

(p. B4) Apple’s researchers found “fundamental limitations” in the models. When taking on tasks beyond a certain level of complexity, these AIs suffered “complete accuracy collapse.” Similarly, engineers at Salesforce AI Research concluded that their results “underscore a significant gap between current LLM capabilities and real-world enterprise demands.”

Importantly, the problems these state-of-the-art AIs couldn’t handle are logic puzzles that even a precocious child could solve, with a little instruction. What’s more, when you give these AIs that same kind of instruction, they can’t follow it.

. . .

Gary Marcus, a cognitive scientist who sold an AI startup to Uber in 2016, argued in an essay that Apple’s paper, along with related work, exposes flaws in today’s reasoning models, suggesting they’re not the dawn of human-level ability but rather a dead end. “Part of the reason the Apple study landed so strongly is that Apple did it,” he says. “And I think they did it at a moment in time when people have finally started to understand this for themselves.”

In areas other than coding and mathematics, the latest models aren’t getting better at the rate that they once did. And the newest reasoning models actually hallucinate more than their predecessors.

For the full commentary see:

Christopher Mims. “Keywords: Apple Calls Today’s AI ‘The Illusion of Thinking’.” The Wall Street Journal (Sat., June 14, 2025): B1 & B4.

(Note: ellipses added.)

(Note: the online version of the commentary has the date June 13, 2025, and has the title “Keywords: Why Superintelligent AI Isn’t Taking Over Anytime Soon.” In the original print and online versions, the word “more” appears in italics for emphasis.)

Sam Altman’s blog essay mentioned above is:

Altman, Sam. “The Gentle Singularity.” In Sam Altman blog, June 10, 2025, URL: https://blog.samaltman.com/the-gentle-singularity.

The Apple research article briefly summarized in a passage quoted above is:

Shojaee, Parshin, Iman Mirzadeh, Keivan Alizadeh, Maxwell Horton, Samy Bengio, Mehrdad Farajtabar. “The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models Via the Lens of Problem Complexity.” Apple Machine Learning Research, June 2025, URL: https://machinelearning.apple.com/research/illusion-of-thinking.

Gig Work Enables Free Agent Entrepreneurship

In my Openness book, I distinguish between free agent entrepreneurs and innovative entrepreneurs. Free agent entrepreneurs are there own boss, doing what has been done before. Innovative entrepreneurs are their own boss, doing what is new. Of course the distinction is not sharp–a continuum.

Recent research, summarized in the WSJ, suggests that gig work can ease entry into free agent entrepreneurship. Gig work is flexible–the gig worker has time when they need it, to work on their entrepreneurial venture. Gig work also can generate capital and give experience in self-management.

A higher percent of gig workers become entrepreneurs than similar employed workers, and they do so, on average, at a slightly younger age.

Those who want to regulate gig work, and thereby make it less common, should remember how gig work benefits aspiring entrepreneus.

The WSJ article mentioned above is:

Lisa Ward. “Gig Workers Show More Enterprise, Study Finds.” The Wall Street Journal (Thurs., May 8, 2025): A11.

(Note: the online version of the WSJ article has the date May 5, 2025, and has the title “Want to Start a Business? Maybe Begin by Being a Gig Worker.”)

The academic working paper summarized in the WSJ article is:

Denes, Matthew R., Spyridon Lagaras, and Margarita Tsoutsoura. “Entrepreneurship and the Gig Economy: Evidence from U.S. Tax Returns.” In National Bureau of Economic Research Working Paper #33347, Jan. 2025.

My book mentioned in my initial comments is:

Diamond, Arthur M., Jr. Openness to Creative Destruction: Sustaining Innovative Dynamism. New York: Oxford University Press, 2019.

“A.I.s Are Overly Complicated, Patched-Together Rube Goldberg Machines Full of Ad-Hoc Solutions”

A.I. can be a useful tool for searching and summarizing the current state of consensus knowledge. But I am highly dubious that it will ever be able to make the breakthrough leaps that some humans are sometimes able to make. And I am somewhat dubious that it will ever be able to make the resilient pivots that all of us must sometimes make in the face of new and unexpected challenges.

(p. B2) In a series of recent essays, [Melanie] Mitchell argued that a growing body of work shows that it seems possible models develop gigantic “bags of heuristics,” rather than create more efficient mental models of situations and then reasoning through the tasks at hand. (“Heuristic” is a fancy word for a problem-solving shortcut.)

When Keyon Vafa, an AI researcher at Harvard University, first heard the “bag of heuristics” theory, “I feel like it unlocked something for me,” he says. “This is exactly the thing that we’re trying to describe.”

Vafa’s own research was an effort to see what kind of mental map an AI builds when it’s trained on millions of turn-by-turn directions like what you would see on Google Maps. Vafa and his colleagues used as source material Manhattan’s dense network of streets and avenues.

The result did not look anything like a street map of Manhattan. Close inspection revealed the AI had inferred all kinds of impossible maneuvers—routes that leapt over Central Park, or traveled diagonally for many blocks. Yet the resulting model managed to give usable turn-by-turn directions between any two points in the borough with 99% accuracy.

Even though its topsy-turvy map would drive any motorist mad, the model had essentially learned separate rules for navigating in a multitude of situations, from every possible starting point, Vafa says.

The vast “brains” of AIs, paired with unprecedented processing power, allow them to learn how to solve problems in a messy way which would be impossible for a person.

. . .

. . ., today’s AIs are overly complicated, patched-together Rube Goldberg machines full of ad-hoc solutions for answering our prompts. Understanding that these systems are long lists of cobbled-together rules of thumb could go a long way to explaining why they struggle when they’re asked to do things even a little bit outside their training, says Vafa. When his team blocked just 1% of the virtual Manhattan’s roads, forcing the AI to navigate around detours, its performance plummeted.

This illustrates a big difference between today’s AIs and people, he adds. A person might not be able to recite turn-by-turn directions around New York City with 99% accuracy, but they’d be mentally flexible enough to avoid a bit of roadwork.

For the full commentary see:

Christopher Mims. “We Now Know How AI ‘Thinks.’ It Isn’t Thinking at All.” The Wall Street Journal (Saturday, April 26, 2025): B2.

(Note: ellipses added.)

(Note: the online version of the commentary has the date April 25, 2025, and has the title “We Now Know How AI ‘Thinks’—and It’s Barely Thinking at All.”)

A conference draft of the paper that Vafa co-authored on A.I.’s mental map of Manhattan is:

Vafa, Keyon, Justin Y. Chen, Ashesh Rambachan, Jon Kleinberg, and Sendhil Mullainathan. “Evaluating the World Model Implicit in a Generative Model.” In 38th Conference on Neural Information Processing Systems (NeurIPS). Vancouver, BC, Canada, Dec. 2024.

Girls Who Are Skilled in Both STEM and Non-STEM Fields, Usually Prefer Non-STEM Fields

Gender discrimination is not the only explanation for there being more men than women in STEM jobs, according to the research summarized in the passages quoted below.

(p. C3) Scores of surveys over the last 50 years show that women tend to be more interested in careers that involve working with other people while men prefer jobs that involve manipulating objects, whether it is a hammer or a computer. These leanings can be seen in the lab, too. Studies published in the Personality and Social Psychology Bulletin in 2016, for example, found that women were more responsive to pictures of people, while men were more responsive to pictures of things.

Consistent with what men and women say they want, the STEM fields with more men, such as engineering and computer science, focus on objects while those with more women, such as psychology and biomedicine, focus on people.

Given the push to get more people—and especially more girls—interested in STEM, it is worth noting that talented students of both sexes tend to avoid a career in math or science if they can pursue something else. STEM jobs aren’t for everyone, regardless of how lucrative they may be.

A study of more than 70,000 high-school students in Greece, published in the Journal of Human Resources in 2024, found that girls on average outperformed boys in both STEM and non-STEM subjects but rarely pursued STEM in college if they were just as strong in other things. A study of middle-aged adults who had been precocious in math as teens, published in the journal Psychological Science in 2014, found that only around a quarter of the men were working in STEM and IT.

Large-scale studies around the world show that women are generally more likely than men to have skills in non-STEM areas, while men who are strong in math and science are often less skilled elsewhere. But while everyone seems to be concerned about whether girls are performing well in STEM classes, no one seems all that troubled by the fact that boys are consistently underperforming in reading and writing.

For the full essay see:

Hippel, William von. “Why Are Girls Less Likely to Become Scientists?” The Wall Street Journal (Saturday, March 8, 2025): C3.

(Note: the online version of the essay has the date March 6, 2025, and has the same title as the print version.)

Hippel’s essay, quoted above, is adapted from his book:

Hippel, William von. The Social Paradox: Autonomy, Connection, and Why We Need Both to Find Happiness. New York: Harper, 2025.

The academic study published in the Journal of Human Resources and mentioned above is:

Goulas, Sofoklis, Silvia Griselda, and Rigissa Megalokonomou. “Comparative Advantage and Gender Gap in Stem.” Journal of Human Resources 59, no. 6 (Nov. 2024): 1937-80.

“Effort Means That You Care About Something”

In my Openness book, I argue that we should allow each other the freedom to choose intensity over work-life balance. David Brooks is sometimes thought-provoking and eloquent, for instance in the passages quoted below where he defends intensity.

One question that Brooks discusses elsewhere in his essay is: how do you find your “passion,” your “misery,” your “vocation”? He tries but after reading his answers, I think the mystery mostly remains. The best answer to this question that I have found is in a book by John Chisholm called Unleash Your Inner Company. Chishom suggests that you should apply yourself to something worth doing, and work to do it better. If you do that, he suggests, you are likely to eventually find you increasingly care about what you are doing.

(p. 9) My own chosen form of misery is writing. Of course, this is now how I make a living, so I’m earning extrinsic rewards by writing. But I wrote before money was involved, and I’m sure I’ll write after, and the money itself isn’t sufficient motivation.

Every morning, seven days a week, I wake up and trudge immediately to my office and churn out my 1,200 words — the same daily routine for over 40 years. I don’t enjoy writing. It’s hard and anxiety-filled most of the time. Just figuring out the right structure for a piece is incredibly difficult and gets no easier with experience.

I don’t like to write but I want to write. Getting up and trudging into that office is just what I do. It’s the daily activity that gives structure and meaning to life. I don’t enjoy it, but I care about it.

We sometimes think humans operate by a hedonic or utilitarian logic. We seek out pleasure and avoid pain. We seek activities with low costs and high rewards. Effort is hard, so we try to reduce the amount of effort we have to put into things — including, often enough, the effort of thinking things through.

And I think we do operate by that kind of logic a lot of the time — just not when it comes to the most important things in our lives. When it comes to the things we really care about — vocation, family, identity, whatever gives our lives purpose — we are operating by a different logic, which is the logic of passionate desire and often painful effort.

. . .

. . . I have found that paradoxically life goes more smoothly when you take on difficulties rather than try to avoid them. People are more tranquil when they are heading somewhere, when they have brought their lives to a point, going in one direction toward an important goal. Humans were made to go on quests, and amid quests more stress often leads to more satisfaction, at least until you get to the highest levels. The psychologist Carol Dweck once wrote: “Effort is one of the things that gives meaning to life. Effort means that you care about something.”

All this toil is not really about a marathon or a newspaper article or a well-stocked shelf at the grocery store. It’s about slowly molding yourself into the strong person you want to be. It’s to expand yourself through challenge, steel yourself through discipline and grow in understanding, capacity and grace. The greatest achievement is the person you become via the ardor of the journey.

. . .

So, sure, on a shallow level we lead our lives on the axis of pleasure and pain. But at the deeper level, we live on the axis between intensity and drift. Evolution or God or both have instilled in us a primal urge to explore, build and improve. But life is at its highest when passion takes us far beyond what evolution requires, when we’re committed to something beyond any utilitarian logic.

For the full commentary see:

David Brooks. “A Surprising Route to the Best Life Possible.” The New York Times, SundayOpinion Section (Sun., March 30, 2025): 9.

(Note: ellipses added.)

(Note: the online version of the commentary has the date March 27, 2025, and has the same title as the print version. The first couple of paragraphs quoted above appear in the longer online version, but not in the shorter print version, of the commentary. In the third quoted paragraph, the words “like” and “want” are italicized.)

My book mentioned in my initial comments is:

Diamond, Arthur M., Jr. Openness to Creative Destruction: Sustaining Innovative Dynamism. New York: Oxford University Press, 2019.

The book by Chisholm that I praise in my initial comments is:

Chisholm, John. Unleash Your Inner Company: Use Passion and Perseverance to Build Your Ideal Business. Austin, TX: Greenleaf Book Group Press, 2015.