| 注册
首页|期刊导航|眼科新进展|大语言模型在神经眼科中应用的多中心评价

大语言模型在神经眼科中应用的多中心评价

王子荀 李志清 张晓玲 贾洪强 魏瑞华 王宇航 范珂 祁艳华 谢学说 魏世辉

眼科新进展2025,Vol.45Issue(10):810-815,6.
眼科新进展2025,Vol.45Issue(10):810-815,6.DOI:10.13389/j.cnki.rao.2025.0139

大语言模型在神经眼科中应用的多中心评价

A multicenter evaluation study of the use of large language models in neuro-ophthalmology

王子荀 1李志清 1张晓玲 2贾洪强 3魏瑞华 1王宇航 4范珂 5祁艳华 6谢学说 7魏世辉8

作者信息

  • 1. 300384 天津市,天津医科大学眼科医院、眼视光学院、眼科研究所,国家眼耳鼻喉疾病临床医学研究中心天津市分中心,天津市视网膜功能与疾病重点实验室
  • 2. 300384 天津市,天津医科大学眼科医院、眼视光学院、眼科研究所,国家眼耳鼻喉疾病临床医学研究中心天津市分中心,天津市视网膜功能与疾病重点实验室||056001 河北省邯郸市,河北省邯郸市眼科医院,邯郸市第三医院
  • 3. 061001 河北省沧州市,沧州市眼科医院
  • 4. 300384 天津市,天津医科大学眼科医院、眼视光学院、眼科研究所,国家眼耳鼻喉疾病临床医学研究中心天津市分中心,天津市视网膜功能与疾病重点实验室||100185 北京市,解放军总医院第三医学中心眼科医学部
  • 5. 450003 河南省郑州市,河南省人民医院,河南省立眼科医院
  • 6. 215031 江苏省苏州市,苏州大学理想眼科医院
  • 7. 300384 天津市,先进计算与关键软件(信创)海河实验室
  • 8. 100185 北京市,解放军总医院第三医学中心眼科医学部
  • 折叠

摘要

Abstract

Objective To evaluate answers to typical clinical questions related to neuro-ophthalmology generated by Artificial Intelligence(AI)Large Language Models(LLM)and to explore the performance of neuro-ophthalmology-related questions on LLM in a multidimensional manner using objective and expert assessment.Methods Multicenter,random-ized,cross-sectional pilot study.Thirty typical questions related to neuro-ophthalmology were selected based on four per-spectives:definition,etiology,clinical manifestations and signs,and treatment and prognosis,and were analyzed quantita-tively using Deepseek,Wenxin Yiyin 4.0,Doubao,and Kimi 1.5,which are four open-source LLMs in China,and quantita-tively analyzed with objective assessment;and quantitatively rated by three ophthalmologists using expert assessment for 120 answer texts.Three ophthalmology experts quantitatively scored the 120 answer texts.Three ophthalmologists quantita-tively scored the 120 answer texts.Level 3,5,and 4 Likert scales were developed according to the completeness,accura-cy,professionalism,relevance,and criticality of the question texts,respectively.The best-performing LLM was selected,and its performance was observed across the four types of questions.Additionally,three other experts assessed whether the best-performing one could be evaluated as a substitute for real-world doctor-patient communication.Results In the objective Chinese text reading difficulty analysis,the differences in total word count among the four LLMs were statistically significant(all P<0.001).Of the four LLMs,Kimi 1.5 performed the best,with frequencies of 61%,29%,and 41%for the highest scores in completeness(3),accuracy and professionalism(5),and relevance and usefulness(4),respective-ly.Kimi 1.5 performed more consistently on the questions on the four areas of neuro-ophthalmologic disorders:definition,etiology,clinical manifestations and signs,treatment,and prognosis,with no between-group differences(P>0.05).Con-clusion Chinese language LLMs have great potential in the clinical application of neuro-ophthalmology.Kimi 1.5 outper-forms other LLMs in terms of completeness,accuracy,professionalism,relevance,and usefulness,but it still cannot re-place real-world doctor-patient communication.There is a need to explore new diagnostic and therapeutic model of AI+physician in the future.

关键词

神经眼科/人工智能/大语言模型/应用评价/多中心

Key words

neuro-ophthalmology/artificial intelligence/large language model/application evaluation/multicenter

分类

医药卫生

引用本文复制引用

王子荀,李志清,张晓玲,贾洪强,魏瑞华,王宇航,范珂,祁艳华,谢学说,魏世辉..大语言模型在神经眼科中应用的多中心评价[J].眼科新进展,2025,45(10):810-815,6.

基金项目

天津市医学重点学科建设资助项目(编号:TJYXZDXK-3-004A-2) (编号:TJYXZDXK-3-004A-2)

天津市视网膜功能与疾病重点实验室自主与开放课题(编号:2023tjswmm004) (编号:2023tjswmm004)

天津医科大学眼科医院高水平创新型人才培养基金(编号:YDYYRCXM-B2023-02) (编号:YDYYRCXM-B2023-02)

眼科新进展

OA北大核心

1003-5141

访问量0
|
下载量0
段落导航相关论文