Artificial Intelligence, once confined to the realms of fiction, has now advanced to the point where it can answer real-life queries, write code, and even captivate individuals. In a notable development, an AI chatbot, specifically designed for healthcare, has successfully passed a rigorous medical examination in the United States. Developed by Google, this chatbot, known as Med-PaLM, has demonstrated competence in the exam, although a peer-reviewed study reveals that its responses still lag behind those of human doctors.
The introduction of ChatGPT, a generative AI developed by OpenAI with support from Microsoft, catapulted the topic of artificial intelligence into the mainstream. While much has been said about the potential benefits and risks associated with AI, the field of healthcare has witnessed notable progress. According to media reports, algorithms have proven capable of interpreting certain medical scans as accurately as human experts.
Google unveiled Med-PaLM in a preprint study released in December, but it has not yet been made available to the public. In a subsequent peer-reviewed study published in the journal Nature, Google researchers disclosed that Med-PaLM achieved a score of 67.6% on the US Medical Licensing Examination (USMLE). The passing threshold for this exam is 60%.
The study acknowledged that Med-PaLM’s performance is promising but still falls short of that exhibited by medical professionals. In response, Google has developed a new evaluation benchmark aimed at identifying and mitigating “hallucinations” – instances where AI models provide incorrect information.
Karan Singhal, a Google researcher and the lead author of the study, revealed that the team tested a newer version of their model using this benchmark and obtained “super exciting” results. Med-PaLM 2 achieved an impressive score of 86.5% on the USMLE exam, surpassing its predecessor by almost 20%. However, it is important to note that the preprint study containing these findings has not undergone peer review.
Experts in the field have highlighted certain concerns associated with AI-powered medical chatbots. James Davenport, a computer scientist at the University of Bath, remarked that there is a significant distinction between answering “medical questions and actual medicine.” Anthony Cohn, an AI specialist at Leeds University, pointed out that hallucinations are likely to persist as a challenge for large language models due to their statistical nature. Consequently, these models should be viewed as assistants rather than ultimate decision-makers.
Singhal envisions a future where Med-PaLM could aid doctors by presenting alternative options that might otherwise be overlooked. The aim is to utilize AI as a supportive tool in the medical field rather than a substitute for human expertise.
Post Your Comments