ChatGPT is a bad doctor, but that shouldn't surprise anyone

ChatGPT is found to make errors 83 percent of the time when performing pediatric diagnoses, a study conducted by JEMA Pediatrics found. Nevertheless, generative AI is a promising technology for health care, the researchers argue.

The research focused specifically on ChatGPT in its GPT-4 guise. OpenAI offers this version of its chatbot as part of its ChatGPT Plus subscription service.

In the study, 100 medical cases in pediatrics were submitted to the chatbot. Of these, 83 were incorrectly diagnosed by ChatGPT, 11 of which were worded too broadly. According to the researchers, the times it gets stuff wrong show that the AI tool fails to form important relationships, such as between autism and vitamin deficiencies.

They also believe that the dataset used by OpenAI contains too many errors for it to be reliable. GPT-4 is based on a large amount of Internet data that has not been extensively fact-checked. LLMs, by their very nature, fail to differentiate fact from fiction anyway. By contrast, the JEMA Pediatrics researchers place Med-PaLM 2, Google’s model trained on medical information, which they state could be a lot more promising.

Nothing new

ChatGPT has failed to perform various tasks well. For example, it was found to generate mostly insecure programming code and is not generally recommended to aid scholars. It’s important to note, however, that OpenAI seems to know this all too well. Indeed, anyone asking the chatbot a medical question will be referred to a medical expert in no time by the chatbot. Of course, there are bound to be ways to circumvent such safeguards, but OpenAI’s clear aim is to avoid anyone relying on ChatGPT to decide if they need to see a doctor.

Specialized models, featuring far fewer parameters than GPT-4 (which is said to contain 1.8 trillion), may indeed offer better results. For example, Microsoft recently showed that Phi-2, a “small language model” with a relatively tiny set of 2.7 billion parameters, can still produce impressive and truthful information. The key ingredient: high-quality, textbook-quality data to train the model. By now, it is clear that smaller, high-quality data sets allow AI models to deliver better results than any LLM trained with a huge amount of mostly unverified information may achieve.

Medical world does benefit from AI

Promising medical applications of AI have been in the news before. IBM Watson aimed to shake up the healthcare industry over ten years ago. It was said to have been able to accelerate drug discovery and supply doctors with reliable diagnoses. Those promises, often not even made by IBM itself, were never realized. In the end, IBM sold a significant portion of its medtech products for more than a billion dollars in 2022.

Since then, Google in particular has been riding high with Med-PaLM. Despite renewed positive coverage for a new medical AI tool, and impressive benchmark scores to boot, the company seems to be somewhat cautious, reluctant to set expectations too high. “While Med-PaLM 2 reached state-of-the-art performance on several multiple-choice medical question answering benchmarks, and our human evaluation shows answers compare favorably to physician answers across several clinically important axes, we know that more work needs to be done to ensure these models are safely and effectively deployed.”

Medical diagnoses aren’t something AI excels at just yet. Currently, the most ambitious application would be to detect false negatives, where a medical expert would be advised to look again at patient data. However, using such data for AI applications is not easy, given the privacy issues it raises. On the other hand, AI can still be useful to the medical community. EHR (electronic health record) vendors are adding Gen AI functionality to reduce administrative workloads and get professionals to their answers faster. It’s not a doctor replacement, but it may make repetitive tasks more surmountable than ever before.

Joint AI training without sharing data: FlexOlmo makes it possible

Researchers at the Allen Institute for Artificial Intelligence (AI2) have presented a new framework for train...

Mels Dees July 11, 2025

Citrix returns to the mainstream hypervisor market

Citrix is trying to regain a foothold in the general hypervisor market. The company is seizing the momentum t...

Mels Dees July 10, 2025

Top story

Replatforming virtualized workloads: Do your VMs need a new home?

Finding a balance for VMs and containers

Sander Almekinders 7 hours ago

Tech calendar

ChatGPT is a bad doctor, but that shouldn’t surprise anyone

Insight: IT in Healthcare

Nothing new

Medical world does benefit from AI

Stay tuned, subscribe!

ASML chain moves en masse to Southeast Asia: a sign of things to come?

Domain-specific AI beats general models in business applications

The AI wave is forcing organizations to rethink their infrastructure

Children with autism treated months earlier thanks to process automation

EU launches action plan for cybersecurity in healthcare

Orange Cyberdefense turns security into a business enabler

AI-powered scanner detects skin cancer at lightning speeds

Krijg Volledig Inzicht van Gebruiker tot Cloud met Cisco ThousandEyes

GITEX DIGI_HEALTH 5.0 - Thailand

IT Arena

Innovation Week 2025

Luxembourg Venture Days

Appdevcon

Experience Synology’s latest enterprise backup solution

How to choose the right Enterprise Linux platform?

Enhance your data protection strategy for 2025

Strengthen your cybersecurity with DNS best practices