摘要
Abstract
AIM To evaluate the performance differences of multimodal large language models[ChatGPT(GPT-4),Claude 3,Gemini 1.5]in medication adherence-related Q&A,providing evidence for clinical AI tool selection.METHODS A multi-dimensional evaluation system was designed,including 30 standardized clinical questions(10 on missed dose management,10 on medication education,and 10 on drug interactions).In a single-blind format,responses from the 3 models were scored by 5 senior pharmacists(accuracy:0-5;practicality:0-3;safety:0-2).Total and category-specific scores were compared to evaluate the efficacy of 3 multimodal large language models in enhancing medication adherence.RESULTS GPT-4 achieved perfect scores(100 points)in all categories(missed dose management,medication education,and drug interactions)and was adept at complex decision-making and medication safety assessment.Claude 3's scores were 72 for missed dose management,80 for medication education,and 72 for drug interactions.Gemini 1.5's scores were 44,60,and 29 in the respective categories,indicating that its outputs require strict human review.CONCLUSION In addressing questions related to medication adherence,GPT-4 demonstrated the best performance among the three multimodal large language models.Claude 3 excelled in medication education,while Gemini 1.5 struggled significantly,often requiring human review for its results.Multimodal large language models can serve as efficient assistive tools for pharmacists in clinical work,but their use must be combined with human review to ensure patient safety,improve therapeutic efficacy,and enhance pharmacist efficiency.关键词
多模态大语言模型/用药依从性/医疗应用/人工智能辅助决策Key words
multimodal large language model/medication adherence/healthcare application/AI-assisted decision-making分类
药学