北京中医药大学学报2025,Vol.48Issue(8):1176-1184,9.DOI:10.3969/j.issn.1006-2157.2025.08.018
基于大语言模型与文本嵌入计算的中医证素辨证自动化方法研究
Automated syndrome element differentiation in traditional Chinese medicine based on large language models and text embedding computation
摘要
Abstract
Objective This study aimed to develop an automated method for syndrome element differenti-ation in Traditional Chinese Medicine(TCM).Methods We first constructed and trained an Instruction-tuned Multi-Task TCM text embedding model(Instr-MT-TCM)using four distinct TCM task datasets,including domain knowledge,synonymous terminology,syndrome differentiation and treatment,and TCM case labels.Subsequently,five TCM diagnostics experts holding master's degrees or higher were organized to screen a real-world TCM case dataset and annotate symptoms and signs.The purpose was to evaluate the F1-score of the proposed method—the combination of Instr-MT-TCM and a Large Language Model(LLM)—by comparing its performance against the manual annotation result on the syndrome element differentiation task.Finally,to validate its feasibility in real-world clinical settings,the method was applied to 48 prostate cancer cases to calculate the syndrome element scores.Results The Instr-MT-TCM model showed rapid performance improvement in its early training phase,achieving a Recall@1(R@1)of 0.848.Experts curated a dataset of 1,793 real-world clinical cases,covering 34 common diseases and 66 syndrome patterns.In the syndrome element differentiation task,the collaborative framework of LLM and Instr-MT-TCM achieved a mean F1-score of 0.927,outperforming the 0.512 from manual annota-tion.The syndrome element analysis revealed that the predominant elements of disease nature were fire(heat)and yin deficiency,while the main elements of disease location were bladder and kidney.Conclusion This study proposes and validates a novel method for automated TCM syndrome element dif-ferentiation based on the synergy between LLM and our custom Instr-MT-TCM model.Achieving a high F1-score(0.927)on real-world data,the method demonstrates excellent accuracy and generalization ability.Its application in prostate cancer analysis highlights its significant clinical potential,offering effective technical support,and a new research direction for intelligent TCM syndrome element differentiation.关键词
证素辨证/大语言模型/文本嵌入Key words
syndrome element differentiation/large language model/text embedding分类
信息技术与安全科学引用本文复制引用
孙肇阳,汪洋,马铭泽,陈妍文,吕镇秀,江甜甜,温慧玲,陈波,关静..基于大语言模型与文本嵌入计算的中医证素辨证自动化方法研究[J].北京中医药大学学报,2025,48(8):1176-1184,9.基金项目
国家重点研发计划(No.2017YFC1703302) National Key R&D Program of China(No.2017YFC1703302) (No.2017YFC1703302)