Using Generative AI for medical diagnostics is dangerous and irresponsible. The AI companies should have a visible disclaimer everywhere. But because they don't, even the so-called "AI experts" are being confused.
Case in point. As I mentioned in my review of the November 29, 2023, congressional hearing “Understanding How AI is Changing Health Care,” there was a 'covfefe' that everyone seemed to have missed: https://sergeiai.substack.com/p/what-if-a-physician-doesnt-use-ai.
Rep. Gus Bilirakis:
“Mr. Shen, can you tell us about the role of generative AI, what it is, and what its potential can be within the health care sector?”
Peter Shen, Head of Digital Health – North America, Siemens Healthineers:
“With generative AI here, we see the greatest potential in the ability for the AI to consume information about the patient themselves. So, when a patient goes to get an exam for a diagnosis, leveraging generative AI can help identify precisely what diagnosis should be looked for. Another area where generative AI benefits medical imaging is in interpreting the images themselves. It can translate complicated medical language into layman’s terms for the patient, helping them better understand the test results from their exam.”
Wrong! We don't use hallucinating AI for precision medicine. Shame on you, Mr. Shen.
If the experts of AI make such egregiously erroneous statements, what can you expect from the users of AI?
Yes, as per Sam Altman, AI can be magical, but in healthcare, we need more than magic. We need precision, accuracy, and reliability. The thought of using generative AI in medical diagnostics is as absurd as using a Magic 8-Ball for brain surgery. It’s not just irresponsible. It’s a gamble with human lives.
The statistical nature of these machines is revealed I've noticed, when trying out ChatGPT for low-level editing of text: it tends to wander away from the task the longer it's allowed to generate answers. It has no real internal coherence. I had to keep telling it over again what it's job was exactly.
This lack of true internal coherence was dramatically revealed the other day when it went bonkers for 6 hours. 😆
Big fan of your work! But not the best example (I’m a cardiologist). 1)Instructions not that crazy despite what pt said. trunk rotations are ok as long as arms close to chest. Nothing about arm exercises were recommended. 2)pts after cardiac surgery are inundated w instructions from OT, PT, written, etc. So why even ask a bot? 3) I could not come close to reproducing the output. And no matter how u ask, you always get the boiler plate “ but ask your provider for specific instructions…”. 4) what was on reference 5?? 5) activity recommendations change over time. 2 wks vs 2mos.
- the patient replicated the response, before I posted, with a different query, so it was not a one-off. That said, these systems are stochastic, and often patched after errors are reported (I posted this on X before writing it up here). Mileage can vary, but I am sure others will note errors of this type.
- Others in the comments noted similar pastisching behavior (eg someone just wrote “ I've had Perplexity deliver these pastiche/mashup answers fairly consistently. I asked about the autopilot for a Saab JA37 and mingled in was a bunch of stuff about the C-1 autopilot for the B-17. Because they were both made by Honeywell.”)
- the patient reports to me re your comment “[I ] was two weeks out [post surgery]. Like I said, it was the pec stretches that really scared me. Perplexity cite a generic article and linked to it and included info from that article in the response. Personally I had received no specific instructions from OT or PT on the topic.”
- i take your point that time postop matters, but of course the bot didn’t ask, which is a further problem
- boilerplate like that tends to get ignored. public awareness campaigns like mine are important for that reason
Points v well taken, and I agree with the overall concept that "pastiche" is problematic. As a doc, TBH, we DONT like when pts to go the interwebs (used to be Dr Google, now chatgpt, perplex, etc). Pts bringing in (or emailing) their search results can massively increase our time. I rarely look at what they bring in. More efficient and better care talking the patient. Out of box commercial chatbots are a bad idea for practicing medicine. Would love to hear you comment on shift from chat w single LLM to a system of elements. https://bair.berkeley.edu/blog/?refresh=1
If I were implementing a bot that pretends to provide CDS, I would have an agent that asks (in the code) "is this a medical question?" probably by sending the user query to a small model, and then use prompt engineering to restrict context to medically verified information (text, pubmed, med society guidelines). Either via RAG or all crammed into a long context window. Lastly, CDS is a highly regulated area by FDA, and (I hope) OAI, et al, will come under much more scrutiny. So far, my 40+ yrs of training and experience have not (yet) been part of the training set. Bots are fine for patient education. Not so good for practicing medicine. AI will NOT revolutionize healthcare; for many reasons. I have much more to say about this. Have been following your work for a long time, so keep it up!! Someone has to rub against the self-congratuatory big-model hype.
The amount of examples one can have showing that 'stochastically constrained random next token generation' is not 'understanding' is practically infinite.
Maybe I should add: the amount of engineering around it to turn that variably-random data into reliable understanding is not infinite, but really, really, really, large (and very, very, very underestimated)
I've had Perplexity deliver these pastiche/mashup answers fairly consistently. I asked about the autopilot for a Saab JA37 and mingled in was a bunch of stuff about the C-1 autopilot for the B-17. Because they were both made by Honeywell.
They are just ungrounded structuralist large language models that are manipulating tokens. They are as aware of the world as is my sofa. It is amazing that they perform as well as they do, they are incredibly interesting systems from my point of view as a NLP researcher, and they are (eventually) going to revolutionise human-machine interfaces. But people should stop trying to use them as some kind of search engine.
I've noticed those sorts of pastiche answers from GenAIs.
Since users may innocently trust the answer (as if it were aware, or like a human, or had common sense), if I were the product maker of an AI, I would also generate (additional) very large and stern warnings regarding medical advice, or anything similar that could cause harm, especially as these things become more mainstream. For example, imagine an elderly person, on medications, feeling feeble and told by a friend or caretaker to look things up on the computer, or given a caretaker app... why wait for things like this to happen? (My two cents...).
This is not a surprise if you have listened to Lex Fridman’s interview of Stephen Wolfram who explains very clearly the limitations of AI’s ability to generate accurate answers.
The comments so far here have been really useful. The need for caution extends beyond AI to all sources of information, including seemingly authoritative sites and medical peeps on YouTube. There's a broader challenge in working critically with what we consume, AI or otherwise. Advice can sometimes be dangerously incorrect, highly polarised, or not relevant to someone's specific needs/history.
The way AI mimics these complexities shows how difficult it is to work out what's going to provide benefit versus what might harm.
Chatbots will have their use in the medial field. Primarily for summarizing doctor visits, doing paperwork, etc. They can aid with diagnosis too, but for providing educated guesses. Those will be specialized chatbots that are trained on a lot of medical data and validated over a few years.
Then there should be a totally non-chatbot tool that checks if a drug order filed by a doctor is likely to be harmful for the current patient.
digitaurus said: "But people should stop trying to use them as some kind of search engine."
The problem, d, is that search engines Google and MS/Bing are promoting their AI/LLMs as an adjunct to their searches, putting the AI bot text on top of all other searches. Users have to actively skip over the LLM reply to get to the real information.
Gary, please keep kicking the Prophets of Ai in the nuts! As you well know (since I first quoted you on autonomous killer automobiles in 2017), the past decade has been a virtual Gulag for Ai-skeptics, myself included. And it is a relief to see the facade finally collapsing. Cheers!
I am vastly amused by the AI brigade claiming people using their Information System need to do their own research to validate and qualify the Information the Information System spewed forth. In straightforward words what they are saying is their vaunted System is unreliable, thus useless, and the people who use it are idiots.
Using Generative AI for medical diagnostics is dangerous and irresponsible. The AI companies should have a visible disclaimer everywhere. But because they don't, even the so-called "AI experts" are being confused.
Case in point. As I mentioned in my review of the November 29, 2023, congressional hearing “Understanding How AI is Changing Health Care,” there was a 'covfefe' that everyone seemed to have missed: https://sergeiai.substack.com/p/what-if-a-physician-doesnt-use-ai.
Rep. Gus Bilirakis:
“Mr. Shen, can you tell us about the role of generative AI, what it is, and what its potential can be within the health care sector?”
Peter Shen, Head of Digital Health – North America, Siemens Healthineers:
“With generative AI here, we see the greatest potential in the ability for the AI to consume information about the patient themselves. So, when a patient goes to get an exam for a diagnosis, leveraging generative AI can help identify precisely what diagnosis should be looked for. Another area where generative AI benefits medical imaging is in interpreting the images themselves. It can translate complicated medical language into layman’s terms for the patient, helping them better understand the test results from their exam.”
Wrong! We don't use hallucinating AI for precision medicine. Shame on you, Mr. Shen.
If the experts of AI make such egregiously erroneous statements, what can you expect from the users of AI?
Yes, as per Sam Altman, AI can be magical, but in healthcare, we need more than magic. We need precision, accuracy, and reliability. The thought of using generative AI in medical diagnostics is as absurd as using a Magic 8-Ball for brain surgery. It’s not just irresponsible. It’s a gamble with human lives.
The statistical nature of these machines is revealed I've noticed, when trying out ChatGPT for low-level editing of text: it tends to wander away from the task the longer it's allowed to generate answers. It has no real internal coherence. I had to keep telling it over again what it's job was exactly.
This lack of true internal coherence was dramatically revealed the other day when it went bonkers for 6 hours. 😆
Big fan of your work! But not the best example (I’m a cardiologist). 1)Instructions not that crazy despite what pt said. trunk rotations are ok as long as arms close to chest. Nothing about arm exercises were recommended. 2)pts after cardiac surgery are inundated w instructions from OT, PT, written, etc. So why even ask a bot? 3) I could not come close to reproducing the output. And no matter how u ask, you always get the boiler plate “ but ask your provider for specific instructions…”. 4) what was on reference 5?? 5) activity recommendations change over time. 2 wks vs 2mos.
a few responses
- the patient replicated the response, before I posted, with a different query, so it was not a one-off. That said, these systems are stochastic, and often patched after errors are reported (I posted this on X before writing it up here). Mileage can vary, but I am sure others will note errors of this type.
- Others in the comments noted similar pastisching behavior (eg someone just wrote “ I've had Perplexity deliver these pastiche/mashup answers fairly consistently. I asked about the autopilot for a Saab JA37 and mingled in was a bunch of stuff about the C-1 autopilot for the B-17. Because they were both made by Honeywell.”)
- the patient reports to me re your comment “[I ] was two weeks out [post surgery]. Like I said, it was the pec stretches that really scared me. Perplexity cite a generic article and linked to it and included info from that article in the response. Personally I had received no specific instructions from OT or PT on the topic.”
- i take your point that time postop matters, but of course the bot didn’t ask, which is a further problem
- boilerplate like that tends to get ignored. public awareness campaigns like mine are important for that reason
that said i appreciate your comments (and your praise as well).
Points v well taken, and I agree with the overall concept that "pastiche" is problematic. As a doc, TBH, we DONT like when pts to go the interwebs (used to be Dr Google, now chatgpt, perplex, etc). Pts bringing in (or emailing) their search results can massively increase our time. I rarely look at what they bring in. More efficient and better care talking the patient. Out of box commercial chatbots are a bad idea for practicing medicine. Would love to hear you comment on shift from chat w single LLM to a system of elements. https://bair.berkeley.edu/blog/?refresh=1
If I were implementing a bot that pretends to provide CDS, I would have an agent that asks (in the code) "is this a medical question?" probably by sending the user query to a small model, and then use prompt engineering to restrict context to medically verified information (text, pubmed, med society guidelines). Either via RAG or all crammed into a long context window. Lastly, CDS is a highly regulated area by FDA, and (I hope) OAI, et al, will come under much more scrutiny. So far, my 40+ yrs of training and experience have not (yet) been part of the training set. Bots are fine for patient education. Not so good for practicing medicine. AI will NOT revolutionize healthcare; for many reasons. I have much more to say about this. Have been following your work for a long time, so keep it up!! Someone has to rub against the self-congratuatory big-model hype.
"Nothing about arm exercises" -- a pec stretch involves lifting the arms, no? Seems to contradict the instruction to keep arms close to the chest.
The amount of examples one can have showing that 'stochastically constrained random next token generation' is not 'understanding' is practically infinite.
Maybe I should add: the amount of engineering around it to turn that variably-random data into reliable understanding is not infinite, but really, really, really, large (and very, very, very underestimated)
I've had Perplexity deliver these pastiche/mashup answers fairly consistently. I asked about the autopilot for a Saab JA37 and mingled in was a bunch of stuff about the C-1 autopilot for the B-17. Because they were both made by Honeywell.
They are just ungrounded structuralist large language models that are manipulating tokens. They are as aware of the world as is my sofa. It is amazing that they perform as well as they do, they are incredibly interesting systems from my point of view as a NLP researcher, and they are (eventually) going to revolutionise human-machine interfaces. But people should stop trying to use them as some kind of search engine.
I've noticed those sorts of pastiche answers from GenAIs.
Since users may innocently trust the answer (as if it were aware, or like a human, or had common sense), if I were the product maker of an AI, I would also generate (additional) very large and stern warnings regarding medical advice, or anything similar that could cause harm, especially as these things become more mainstream. For example, imagine an elderly person, on medications, feeling feeble and told by a friend or caretaker to look things up on the computer, or given a caretaker app... why wait for things like this to happen? (My two cents...).
This is not a surprise if you have listened to Lex Fridman’s interview of Stephen Wolfram who explains very clearly the limitations of AI’s ability to generate accurate answers.
The guy is an idiot.
The comments so far here have been really useful. The need for caution extends beyond AI to all sources of information, including seemingly authoritative sites and medical peeps on YouTube. There's a broader challenge in working critically with what we consume, AI or otherwise. Advice can sometimes be dangerously incorrect, highly polarised, or not relevant to someone's specific needs/history.
The way AI mimics these complexities shows how difficult it is to work out what's going to provide benefit versus what might harm.
Chatbots will have their use in the medial field. Primarily for summarizing doctor visits, doing paperwork, etc. They can aid with diagnosis too, but for providing educated guesses. Those will be specialized chatbots that are trained on a lot of medical data and validated over a few years.
Then there should be a totally non-chatbot tool that checks if a drug order filed by a doctor is likely to be harmful for the current patient.
digitaurus said: "But people should stop trying to use them as some kind of search engine."
The problem, d, is that search engines Google and MS/Bing are promoting their AI/LLMs as an adjunct to their searches, putting the AI bot text on top of all other searches. Users have to actively skip over the LLM reply to get to the real information.
well, if LLM is teached upon medical corpus, then its answers are correct. I tested this on our LLM and it gives very sound answer.
Gary, please keep kicking the Prophets of Ai in the nuts! As you well know (since I first quoted you on autonomous killer automobiles in 2017), the past decade has been a virtual Gulag for Ai-skeptics, myself included. And it is a relief to see the facade finally collapsing. Cheers!
lol. i just call them like i see them. and i happen not to like what i see right now….
Yet our media elites main focus this week has been a black pope on the image generator
I am vastly amused by the AI brigade claiming people using their Information System need to do their own research to validate and qualify the Information the Information System spewed forth. In straightforward words what they are saying is their vaunted System is unreliable, thus useless, and the people who use it are idiots.