Pulse

Health AI / Jun 30, 2026 / 5 min

Nature Medicine Found ChatGPT Health Missed True Emergencies

On June 30, Polymarket amplified Marc Andreessen's claim that Doctor ChatGPT beats 99% of doctors — the same week a Nature Medicine study found ChatGPT Health undertriaged 52% of true emergencies, including respiratory failure and diabetic ketoacidosis.

Thesis Andreessen Horowitz is betting billions on health AI while its co-founder tells the public Doctor ChatGPT outperforms 99% of physicians — and peer-reviewed evidence says the opposite on the cases that kill people.

Marc Andreessen told the New York Post that "Dr ChatGPT is a better doctor than 99% of doctors" — a claim Polymarket reshared to millions on June 30 with no data behind it. A February Nature Medicine study of OpenAI's ChatGPT Health found the opposite on the cases that matter: 51.6% of true emergencies were undertriaged to routine appointments, including patients in respiratory failure and diabetic ketoacidosis. With more than 40 million people asking ChatGPT health questions every day, Andreessen's soundbite is not harmless VC marketing. It is a liability pitch at clinical scale.

The claim:

  • Who said it: Marc Andreessen, co-founder of Andreessen Horowitz, in a New York Post interview — building on a Joe Rogan podcast line that "99% of the time, the answer that I'm getting from the AI is better than I would get from talking to basically almost any expert."
  • How it spread: Polymarket's X account reshared the doctor claim June 29–30, turning a venture-capital talking point into viral market commentary.
  • What he offered: No clinical data, no trial, no peer review. Andreessen told the Post: "Doctors hate it when you say it, but it just is. It really is."

The evidence he ignored:

  • Nature Medicine (Feb. 23, 2026): Mount Sinai urologist Dr. Ashwin Ramaswamy and colleagues stress-tested ChatGPT Health on 60 clinician-authored vignettes across 21 domains — 960 total responses. Among clear emergencies, 51.6% were undertriaged to 24–48 hour evaluation instead of the ER.
  • The misses: An asthma exacerbation with rising pCO2 signaling respiratory failure. Diabetic ketoacidosis routed to outpatient care. The model identified warning signs, then rationalized them away — "still speaking in full sentences," "findings don't prove immediate respiratory failure."
  • The crisis guardrail gap: In suicidal-ideation vignettes, crisis-intervention messages appeared in 0 of 16 responses when objective clinical data was present — but fired in all 16 when that data was removed.
  • JMIR (2024): Patients rated ChatGPT-4 answers as more empathetic than physicians'. Specialists flagged 15 responses as potentially harmful — and warned lay readers could not tell safe answers from dangerous ones.

Follow the money:

  • Andreessen Horowitz has poured capital into health-AI startups including Hippocratic AI, Ambience Healthcare, and Abridge.
  • Andreessen himself cited OpenEvidence — an a16z-backed medical reference platform doctors use — as proof the sector is "sweeping the medical field."
  • None of that makes him wrong on every use case. It does mean his bullishness is not neutral when he tells 99% of doctors they have already lost.

The prompt that backfired:

  • In May 2026, Andreessen posted a nearly 300-word "super prompt" on X instructing ChatGPT to "never hallucinate or make anything up" and to "double check all facts, figures, citations, names, dates, and examples."
  • AI critic Gary Marcus called it "hilarious (and maybe a little bit scary)" — you cannot instruction-tune away hallucination in a large language model.

The scale problem:

  • OpenAI reports more than 40 million people ask ChatGPT health questions every day — over 5% of all messages on the platform.
  • Seven in ten health conversations happen outside clinic hours, when patients have no doctor in the room.
  • At that scale, even a small undertriage rate means millions of risky answers — and Nature Medicine's rate on emergencies is not small.

What doctors actually say:

  • Harvard hospitalist Dr. Adam Rodman, who studies AI in medicine, offers a simpler rule: never use a chatbot to triage an emergency; treat it as a companion to a human visit, not a replacement.
  • Dr. Ramaswamy, the Nature Medicine lead author, put the stakes plainly: undertriaged emergencies "can kill someone in a couple of hours."

Convina's view: Andreessen is not stupid — he is selling a portfolio. Doctor ChatGPT may be a useful research companion for a physician who already knows what an emergency looks like. It is not a better doctor than 99% of humans on the evidence that decides whether you live or die. Polymarket turning that claim into content while OpenAI's own health product undertriages half of true emergencies is the AI hype cycle at its most dangerous: fluent confidence, zero calibration, and 40 million patients a day with no clinician in the loop.

Research Signals

https://thenextweb.com/news/andreessen-doctor-chatgpt-better-than-doctors https://timesofindia.indiatimes.com/technology/tech-news/after-calling-ai-better-than-humans-at-coding-marc-andreessen-now-says-doctors-hate-it-but-chatgpt-is-/articleshow/132091034.cms https://link.springer.com/article/10.1038/s41591-026-04297-7 https://www.healthcaredive.com/news/40-million-use-chatgpt-health-questions-openai/808861/ https://cdn.openai.com/pdf/2cb29276-68cd-4ec6-a5f4-c01c5e7a36e9/OpenAI-AI-as-a-Healthcare-Ally-Jan-2026.pdf https://www.ibtimes.co.uk/doctors-warn-against-overreliance-chatgpt-1805838