NEJM: Rushing to LLMs & Measuring AI in Health Care

Two recent NEJM articles evaluating the clinical use of AI models

Nov 03, 2024

These two Perspective articles are published in the NEJM: October 31, 2024, Vol.391, No. 17. Both require a subscription for access.

The New England Journal of Medicine homepage

Large Language Models and the Degradation of the Medical Record by Liam G. McCoy, M.D., et al., attempts to make a case against promoting the integration of AI large language models (LLMs) into clinical electronic health records. (EHRs). Although physicians now spend the majority of their waking hours reading and writing notes on their computers, even extending into the evening (“pajama time”), these authors don’t see this as a viable solution. The use of AI has even been offered as a panacea for professional burnout and decreased primary care effectiveness (see Kohane article below).

They summarize their concerns in a table labeled Risks of LLM-Generated Electronic Health Record Text, Aside from Confabulation [the dreaded “hallucination problem”]:

Increased chart clutter - The ease with which clinical observations can be generated by AI will necessitate the use of an LLM-generated summary, which may not be accurate.
Decreased information density - LLMs may generate seemingly authoritative text, but lacking in sharp insights. They give as an example a cardiac evaluation that includes the differential diagnosis of chest pain, but what is really only needed is the cardiologist’s assessment.
Persuasion and automation bias - Consultants are usually cautious about their clinical recommendations but LLMs are more persuasive than their degree of accuracy warrants.
Increased time to verify - The so-called “human-in-the-loop” exigency.
“Model collapse” - This is said to happen when repeated training of future LLMs on LLM-generated text, increasingly emphasizes the original data sets. A hospital might never encounter rare phenomena, so their data set, used to update the LLM, might ignore such uncommon clinical events.

Finally, the authors warn against the influence of EHR vendors in making suggestions about the use of AI products, but instead feel that this domain belongs to the clinicians and should take into account optimizing patient care.

Compared with What? Measuring AI against the Health Care We Have by Isaac S. Kohane, M.D., Ph.D., begins with the current problem of the severe lack of primary care practices that are accepting new patients. When asked by a colleague, with expertise in AI coincidentally, for a referral for a primary care physician, Kohane was unsuccessful, ultimately recommending that he seek help from his insurance provider.

For addressing this problem, Kohane writes: In this context, the possibility of augmenting the work of clinicians — including doctors, nurse practitioners, and physician assistants — with AI is being seriously considered.

[Nota Bene : This same issue of the NEJM has another Perspective article: The Failing U.S. Health System by David Blumenthal, M.D., et al. Can we envision AI entering as the deus ex machina to solve such problems as access to primary care?]

The author is sanguine about the future use advanced generative AI programs, such as large language models (LLMs), in contrast to the McCoy article above. He even acknowledges their use by patients, who use AI for medical advice, or even a second opinion, especially when a diagnosis is uncertain. This augurs a beneficial shift in the traditional patient-clinician relationship.

However, the article stresses the need for rigorous evaluation of these AI tools through clinical trials. This evaluation should focus on comparing health outcomes, including therapeutic efficacy and medical errors, achieved through AI assistance with those in the current system lacking in primary care clinicians. This comparison should not be made with idealized healthcare system, which at this time might not be practical.

I’m glad Kohane recognizes the role of the patient in using AI for better care management, which in most cases relates to chronic conditions or uncertain diagnoses. This is a major part of my research and writing about healthcare AI: providing patients with greater agency by giving them a more persuasive voice in a system that has a hard time listening.

CogSciAI

Discussion about this post