Recently, generative artificial intelligence platforms have taken off. Think of things like ChatGPT, Google Gemini and Microsoft co-pilot. The world has been abuzz with Artificial Intelligence (AI) and its promise of making life easier. Students have been using it to summarize books, teachers for help in making lesson plans, others for help in crafting something as simple as an email. While AI can certainly be useful, there needs to be caution around how we use it, and what we use it for.
First, we ought to understand what kind of artificial intelligence ChatGPT is. ChatGPT is what is known as a Large Language Model (LLM). This means that during its training it is fed large amounts of textual data. It begins to learn grammar, semantics, and conceptual context through analyzing the text. It stores the patterns and knowledge it has learned, and uses this to, when prompted, respond in a way that is coherent. This means what LLMs are generally good at predicting what the next word in a sentence is based on previous words. This is why, typically, they excel quite well at keeping up a conversation.
However, even on the official ChatGPT website, there is a disclaimer that “ChatGPT can make mistakes. Check important info.” This is in part, due to something referred to as an “AI hallucination”. An AI hallucination is when a chatbot generates a response that contains incorrect, misleading or sometimes even nonsensical information. These hallucinations can occur for varying reasons, for example; insufficient training data, misapplication of learned patterns or misinterpretation of a prompt. Regardless of cause, the outcome is the same– sometimes, AI gets factual 38 information wrong. The tricky thing about these hallucinations is that often, the AI presents them as if they were fact. If the person asking the prompt doesn’t know better, this means that they could be internalizing falsehoods from the AI.
Here’s an easy example; a few months ago, a bizarre hallucination from ChatGPT went viral. When asked how many r’s there were in the word strawberry, it responded that there were two. More than that, attempts to correct the AI on the number of r’s in strawberry (which is objectively three), received pushback from the model. In one article written on Inc.com, the author attempted to convince the AI that there was in fact, three r’s in the word strawberry, by saying “No, count again. S-T-R, that’s one R. A-W-B-E-R-R, that’s two more R‘s. Y. Three R’s.” Chat GPT then responded, “I see what you did there! You cleverly split up the word to make it seem like there are three R‘s, but when counting the actual letters in the word ‘strawberry,’ there are still just two R‘s. It’s a fun twist, though!” In this particular case, it’s easy enough for the user to recognize that ChatGPT is making an error. The issue becomes more complicated when people begin to turn to ChatGPT for answers to questions they don’t know the answers to themselves.
Yet, according to a survey by Kaiser Family Foundation (KFF), a non-profit health research organization, a quarter of adults under the age of thirty reported that they use AI chatbots once a month for medical advice, even though a majority (56%) of adults who interact with AI, were unsure whether they could tell if AI was providing accurate health information in response.
Medical misinformation has always been an issue. People go to doctor’s appointments armed with google search results every day. Sometimes, patient research is helpful– if done correctly, and considers a variety of sources, the quality of the source, and defaults to medical authorities. However, when speaking to a chatbot, it can be difficult to tell what sources it is drawing its answers from, and thus, difficult to evaluate its accuracy. For example, in December of last year, researchers at Long Island University presented ChatGPT with thirty-nine medication related queries and compared its responses to that of pharmacists.
The study found that the AI provided correct and complete answers in only ten scenarios. When asked to provide references to support each of its responses, the chatbot was only able to provide references for eight, and in each case, it appeared as though ChatGPT was fabricating references. Worse, the references were not easily recognizable as fake– they were formatted correctly, provided URLs and used the names of real scientific journals. It was only revealed that they were fictional when researchers attempted to access those specific article titles. Other researchers have found that sometimes, when forging medical references, ChatGPT can go so far as to include the names of real researchers who have previously published work online. This takes us back to how ChatGPT is set up to work. It is able to write what looks like a convincing reference because it can recognize and recapitulate the pattern of what a scientific reference looks like. However, it is not bound to ensure that the information that it says comes from that “reference”, truly does.
This is truly disturbing because inaccurate medical advice from a chatbot, if followed, could be downright dangerous. In the aforementioned study on ChatGPT’s medication advice, when prompted on whether a common blood pressure medication (Verapamil), could be taken at the same time as a Covid-19 medication (Paxolovid), ChatGPT responded that there would be no adverse effects if the two drugs were taken together. This is contrary to the truth, and patients taking these drugs at the same time can experience sudden drops in blood pressure resulting in dizziness and fainting. If a doctor were prescribing Paxlovid to a patient who was using Verapamil, they would likely caution them on the possibility of dizziness, and might even consider lowering their Verapamil dosage.
So, what exactly should we make of all of this? The reality is that generative AI isn’t going anywhere, and with time it will get better. However, as users of these tools, we must recognize them for what they are, “tools”. We must continue to turn to doctors and other licensed healthcare professionals as the authority on medical advice. While it might be ok to ask AI something like “can you summarize this medical article for me”, it is less appropriate to ask AI to diagnose you or suggest treatment plans. It is important to remember that you should always, when possible, cross check what AI tells you with a second, reputable reference. And when you’re sick, always seek a doctor first.