环球中医药2026,Vol.19Issue(3):403-409,7.DOI:10.3969/j.issn.1674-1749.2026.03.001
从中医皮肤病病历结构化提取角度评估五个大语言模型的效能
Evaluating the effectiveness of five large language models from the perspective of structured extraction of traditional Chinese medicine dermatology medical records
摘要
Abstract
Objective To evaluate the performance of five large language models(LLMs)in the structured extraction of Traditional Chinese Medicine(TCM)dermatology medical records,clarifying their capabilities in extracting information from TCM clinical texts.Methods A total of 240 electronic medical records from a TCM dermatology specialty outpatient clinic were retrospectively collected.A structured template was designed,containing fields for basic information,four-diagnosis information,diagnoses,and integrated TCM-Western treatment plans.Two TCM medical students and two TCM dermatology specialists performed manual annotation separately.Under a unified prompt,five general-purpose LLMs were tasked with performing structured extraction from TCM dermatology medical records.A 0~3 point scale was used to evaluate extraction quality,and the time required per record was recorded.First,within-group consistency was compared between the medical student group and the physician group.If no statistically significant difference was found within these two groups(P>0.05),one annotator was then randomly selected from each group.These selected annotators were subsequently used for between-group comparative tests with the LLMs,with a P-value of<0.05 considered statistically significant.Results In basic information extraction,LLMs overall performed comparably to specialists and significantly outperformed medical students(P<0.05).For TCM diagnostic or therapeutic information and overall record extraction quality(P<0.05),LLMs generally performed better than students but were inferior to specialists(P<0.05)only individual models approached specialist-level performance after prompt optimization(P>0.05).The annotation efficiency of LLMs was significantly higher than that of both students and specialists.Conclusion In the task of structured extraction of TCM dermatology records,general-purpose LLMs have demonstrated the potential to replace junior human annotators and substantially improve efficiency.However,they still struggle to substitute experienced TCM dermatologists in understanding and expressing professional TCM diagnostic and therapeutic information.They can serve as auxiliary tools in the stages of TCM clinical data governance and research preparation.关键词
大语言模型/中医皮肤科/电子病历/信息提取/结构化/国外模型/国内模型/效能评估Key words
large language models/traditional Chinese medicine dermatology/electronic medical records/information extraction/data structuring/foreign models/domestic models/performance evaluation分类
医药卫生引用本文复制引用
宁博彪,冷学明,骆长永,李宝花,许潇予,王棣,宋坪..从中医皮肤病病历结构化提取角度评估五个大语言模型的效能[J].环球中医药,2026,19(3):403-409,7.基金项目
北京市高层次创新创业人才支持计划"登峰"项目(G202514020) (G202514020)
2025年度国家卫生健康委能力建设和继续教育中心慢病管理研究课题(GWJJMB202510025172) (GWJJMB202510025172)