AI Diagnoses Better Than Doctors? Not So Fast - Ideas in Progress: Meaning, Mischief & Mayhem

I recently came across a headline claiming that an AI model developed by Microsoft had outperformed doctors in diagnosing complex medical cases—specifically, those published in the New England Journal of Medicine (NEJM). The model was reportedly tested on around 300 NEJM case reports.

That immediately took me back to my days as a medical student, and later, as a newly qualified doctor. Every week, I would read NEJM religiously, and the case reports were among my favourite sections. While I can’t claim I always nailed the diagnosis, many of them were indeed solvable—even to someone at the very start of their clinical career.

I used to wonder: how is it that an inexperienced doctor like me could sometimes crack these complex cases that even seasoned clinicians had missed?

Here’s what I eventually realised:

These cases are selected precisely because they are complex, rare, or diagnostically elegant. You read them expecting to be challenged or surprised.
They’re beautifully curated. The case history is deconstructed and presented in a structured, almost didactic format. There is often a logical narrative, precise language, and a clear timeline—all courtesy of top-notch editorial work.

In other words, even before I read a word, I knew I was being led toward a diagnostic puzzle—with all the relevant facts laid out and the distractions stripped away. That’s a very different beast from clinical practice.

Patients in real life don’t arrive with edited narratives. In A&E or clinic, their stories are fragmented, often confused, and sometimes misleading. They leave key details unsaid. They volunteer symptoms that may have nothing to do with the problem. And most importantly, they communicate through so much more than words.

Their body language, facial expressions, tone of voice, and even something as subtle as the way they sigh—these non-verbal cues can be diagnostic gold. I’ve had long-term patients where even a brief phone call tells me something’s not quite right, purely from how they speak.

How is AI going to replicate that?

Maybe it will. Maybe it already is. I don’t know. But let’s not get ahead of ourselves. The comparison between AI and doctors needs to be grounded in the messy, human complexity of real clinical work—not in the neat brilliance of case reports.

Because diagnosing a curated case in a journal is not the same as making sense of the living, breathing uncertainty of a patient sitting across from you.