首页|期刊导航|环球中医药|从中医皮肤病病历结构化提取角度评估五个大语言模型的效能

从中医皮肤病病历结构化提取角度评估五个大语言模型的效能

宁博彪冷学明骆长永李宝花许潇予王棣宋坪

环球中医药2026，Vol.19Issue(3)：403-409,7.

环球中医药2026，Vol.19Issue(3)：403-409,7.DOI:10.3969/j.issn.1674-1749.2026.03.001

从中医皮肤病病历结构化提取角度评估五个大语言模型的效能

Evaluating the effectiveness of five large language models from the perspective of structured extraction of traditional Chinese medicine dermatology medical records

宁博彪 ¹冷学明 ²骆长永 ³李宝花 ⁴许潇予 ⁵王棣 ⁵宋坪⁶

作者信息

1. 100091 北京,中国中医科学院西苑医院皮肤科||中国中医科学院博士后流动站
2. 中国科学院大学电子电气与通信工程学院
3. 北京中医药大学东方医院感染科
4. 中国中医科学院博士后流动站
5. 北京中医药大学中医学院
6. 100091 北京,中国中医科学院西苑医院皮肤科
折叠

摘要

Abstract

Objective To evaluate the performance of five large language models(LLMs)in the structured extraction of Traditional Chinese Medicine(TCM)dermatology medical records,clarifying their capabilities in extracting information from TCM clinical texts.Methods A total of 240 electronic medical records from a TCM dermatology specialty outpatient clinic were retrospectively collected.A structured template was designed,containing fields for basic information,four-diagnosis information,diagnoses,and integrated TCM-Western treatment plans.Two TCM medical students and two TCM dermatology specialists performed manual annotation separately.Under a unified prompt,five general-purpose LLMs were tasked with performing structured extraction from TCM dermatology medical records.A 0～3 point scale was used to evaluate extraction quality,and the time required per record was recorded.First,within-group consistency was compared between the medical student group and the physician group.If no statistically significant difference was found within these two groups(P>0.05),one annotator was then randomly selected from each group.These selected annotators were subsequently used for between-group comparative tests with the LLMs,with a P-value of<0.05 considered statistically significant.Results In basic information extraction,LLMs overall performed comparably to specialists and significantly outperformed medical students(P<0.05).For TCM diagnostic or therapeutic information and overall record extraction quality(P<0.05),LLMs generally performed better than students but were inferior to specialists(P<0.05)only individual models approached specialist-level performance after prompt optimization(P>0.05).The annotation efficiency of LLMs was significantly higher than that of both students and specialists.Conclusion In the task of structured extraction of TCM dermatology records,general-purpose LLMs have demonstrated the potential to replace junior human annotators and substantially improve efficiency.However,they still struggle to substitute experienced TCM dermatologists in understanding and expressing professional TCM diagnostic and therapeutic information.They can serve as auxiliary tools in the stages of TCM clinical data governance and research preparation.

关键词

大语言模型/中医皮肤科/电子病历/信息提取/结构化/国外模型/国内模型/效能评估

Key words

large language models/traditional Chinese medicine dermatology/electronic medical records/information extraction/data structuring/foreign models/domestic models/performance evaluation

分类

医药卫生

引用本文复制引用

宁博彪,冷学明,骆长永,李宝花,许潇予,王棣,宋坪..从中医皮肤病病历结构化提取角度评估五个大语言模型的效能[J].环球中医药,2026,19(3):403-409,7.

基金项目

北京市高层次创新创业人才支持计划"登峰"项目(G202514020) （G202514020）

2025年度国家卫生健康委能力建设和继续教育中心慢病管理研究课题(GWJJMB202510025172) （GWJJMB202510025172）

环球中医药

ISSN：1674-1749

访问量0

下载量0

段落导航