Skip to Main Content

Diagnosis is an especially tantalizing application for generative AI: Even when given tough cases that might stump doctors, the large language model GPT-4 has solved them surprisingly well.

But a new study points out that accuracy isn’t everything — and shows exactly why health care leaders already rushing to deploy GPT-4 should slow down and proceed with caution. When the tool was asked to drum up likely diagnoses, or come up with a patient case study, it in some cases produced problematic, biased results.

advertisement

“GPT-4, being trained off of our own textual communication, shows the same — or maybe even more exaggerated — racial and sex biases as humans,” said Adam Rodman, a clinical reasoning researcher who co-directs the iMED Initiative at Beth Israel Deaconess Medical Center and was not involved in the research.

Get unlimited access to award-winning journalism and exclusive events.

Subscribe

Exciting news! STAT has moved its comment section to our subscriber-only app, STAT+ Connect. Subscribe to STAT+ today to join the conversation or join us on Twitter, Facebook, LinkedIn, and Threads. Let's stay connected!

To submit a correction request, please visit our Contact Us page.