Home›The Journal – Articles and Insights›AI Foundations & Limits

AI Foundations & Limits, AI Interaction, Collaboration & Task Design, AI News & Trends

Confident and Wrong: Why Reasoning Models Still Make Things Up

The newest AI models reason better than ever, which makes them more dangerous, not less. Capability has outpaced reliability. Here is the gap nobody warns you about, and how to check for it.

Fred

4 min read

Jun 7, 2026

Person sitting at a computer with a prompt on the screen.

The newest models are the smartest yet, and that is exactly what makes them risky. Here is the gap nobody warns you about.

If you have used AI in the last year, you have watched it get better. The latest models reason step by step, work through messy tasks, and produce answers that genuinely impress. That is real progress, and it is also a trap that catches smart, careful people. Before you stake any real work on these tools, it helps to understand how they actually generate an answer.

Here is the trap in one line: capability has improved faster than reliability. The models are more impressive than ever, and they still confidently make things up.

The counterintuitive finding

You might assume that smarter, reasoning-focused models would hallucinate less. The data suggests otherwise. OpenAI’s own testing found that its reasoning models hallucinated more, not less, on a factual benchmark: its o3 model produced false information 33 percent of the time on a person-focused question set, and a smaller model reached 48 percent, compared with 16 percent for an earlier model. A system can get better at reasoning through a hard problem and at the same time get worse at simply not inventing facts.

Why? Models built to reason tend to fill gaps with plausible-sounding answers rather than abstain. When they do not know, they do not stop. They generate something that fits, confidently.

Why the problem will not simply disappear

It is tempting to assume the next version will fix this. The companies building these tools are more cautious. OpenAI, explaining its research, noted that its newest models hallucinate less but the problem still occurs, and remains a fundamental challenge for all large language models. Part of the reason is structural. The way models are trained and graded rewards confident guessing over honestly saying “I am not sure.” A guess that is sometimes right scores better on the tests than consistent humility, so the models learn to guess.

Translation for your work life: do not wait for a future model to make verification unnecessary. Better models reduce the error rate. They do not remove your responsibility to check.

The real danger is the packaging

The reason this matters so much for a new professional is not the error rate by itself. It is how the errors are dressed. A reasoning model often shows its work, walking you through a confident, logical-looking chain of steps to its conclusion. That visible reasoning makes the output feel more trustworthy, even when the conclusion is wrong. You are now being persuaded by the appearance of thinking, not just a slick final answer.

Imagine asking a reasoning model whether a market is worth entering. It produces a tidy, numbered analysis: market size, growth rate, three competitors, a recommendation. Every step looks considered. But the market-size figure is invented, the growth rate is from the wrong region, and one of the competitors exited last year. Nothing in the output looks uncertain, because the model does not experience uncertainty. It experiences only the next likely token. The polish is not evidence. It is the product.

So the better the model gets at sounding like a careful expert, the more deliberately you have to check it, because every signal your brain uses to gauge credibility, fluency, structure, confidence, step-by-step logic, is exactly what these systems now produce regardless of whether they are right.

What to do about it

Treat capability and reliability as two separate things. A model can be brilliant at structuring an argument and still be wrong about the facts inside it. Judge those independently.

Check the facts, not the fluency. When you review AI output, your job is not to assess whether the reasoning sounds good. It is to confirm whether the specific claims are true.

Be most careful exactly when you are most impressed. The moment an answer feels authoritative and complete is the moment your guard drops, and that is the moment to slow down.

What this looks like in practice

The fix is not to distrust everything, which is exhausting and unrealistic. It is to separate the parts of an answer that are safe to trust from the parts that are not. The structure, the framing, the way a problem is broken down: usually reliable, and genuinely useful. The specific facts plugged into that structure, the numbers, names, dates, and citations: treat every one as unverified until you confirm it. So when a reasoning model hands you a confident analysis, keep the thinking and check the facts. That habit costs a minute and is the difference between looking sharp and being the person who repeated an invented statistic in a meeting.

Smarter is not safer

The arrival of reasoning AI is genuinely useful, and it does not change the core job. These tools have grown more capable while staying confidently unreliable, the most demanding combination there is. Treat an impressive answer as a draft to verify rather than a verdict to trust, and you get the value of powerful AI without getting burned by it.

Fred

See Full Bio

About the author

Fred

All articles LinkedIn Email

Keep reading

More from the journal

All articles →

AI Case Studies, Critical Thinking & Verification

Jun 10, 2026

Trust Takes Years to Build and One Unverified Send to Burn

A senior colleague forwards your memo to a client without reading every line. That forward is the whole game. It means your name is safe to attach to…

Fred
5 min read
AI at Work, Responsible & Ethical Use

Jun 10, 2026

You’re Not Joining a Finished System

You keep waiting for someone to hand you the official AI process. It is not coming, not at the speed the tools move. The half-built state is not…

Fred
5 min read
AI at Work, AI Interaction, Collaboration & Task Design

Jun 10, 2026

You Will Be the Change Agent Whether You Asked to Be or Not

You start your first job already fluent. Then you discover the tools were never the problem. The people are. Here is why AI adoption is a change management…

Fred
5 min read

Confident and Wrong: Why Reasoning Models Still Make Things Up

The counterintuitive finding

Why the problem will not simply disappear

The real danger is the packaging

What to do about it

What this looks like in practice

Smarter is not safer

More from the journal

Trust Takes Years to Build and One Unverified Send to Burn

You’re Not Joining a Finished System

You Will Be the Change Agent Whether You Asked to Be or Not

One short essay a week on getting AI-fluent for the work you actually do.