| 注册
首页|期刊导航|北京中医药大学学报|基于大语言模型与文本嵌入计算的中医证素辨证自动化方法研究

基于大语言模型与文本嵌入计算的中医证素辨证自动化方法研究

孙肇阳 汪洋 马铭泽 陈妍文 吕镇秀 江甜甜 温慧玲 陈波 关静

北京中医药大学学报2025,Vol.48Issue(8):1176-1184,9.
北京中医药大学学报2025,Vol.48Issue(8):1176-1184,9.DOI:10.3969/j.issn.1006-2157.2025.08.018

基于大语言模型与文本嵌入计算的中医证素辨证自动化方法研究

Automated syndrome element differentiation in traditional Chinese medicine based on large language models and text embedding computation

孙肇阳 1汪洋 2马铭泽 3陈妍文 4吕镇秀 4江甜甜 2温慧玲 4陈波 2关静1

作者信息

  • 1. 北京中医药大学中医学院 北京 100029
  • 2. 天津中医药大学针灸推拿学院
  • 3. 天津中医药大学第一附属医院
  • 4. 天津中医药大学中医学院
  • 折叠

摘要

Abstract

Objective This study aimed to develop an automated method for syndrome element differenti-ation in Traditional Chinese Medicine(TCM).Methods We first constructed and trained an Instruction-tuned Multi-Task TCM text embedding model(Instr-MT-TCM)using four distinct TCM task datasets,including domain knowledge,synonymous terminology,syndrome differentiation and treatment,and TCM case labels.Subsequently,five TCM diagnostics experts holding master's degrees or higher were organized to screen a real-world TCM case dataset and annotate symptoms and signs.The purpose was to evaluate the F1-score of the proposed method—the combination of Instr-MT-TCM and a Large Language Model(LLM)—by comparing its performance against the manual annotation result on the syndrome element differentiation task.Finally,to validate its feasibility in real-world clinical settings,the method was applied to 48 prostate cancer cases to calculate the syndrome element scores.Results The Instr-MT-TCM model showed rapid performance improvement in its early training phase,achieving a Recall@1(R@1)of 0.848.Experts curated a dataset of 1,793 real-world clinical cases,covering 34 common diseases and 66 syndrome patterns.In the syndrome element differentiation task,the collaborative framework of LLM and Instr-MT-TCM achieved a mean F1-score of 0.927,outperforming the 0.512 from manual annota-tion.The syndrome element analysis revealed that the predominant elements of disease nature were fire(heat)and yin deficiency,while the main elements of disease location were bladder and kidney.Conclusion This study proposes and validates a novel method for automated TCM syndrome element dif-ferentiation based on the synergy between LLM and our custom Instr-MT-TCM model.Achieving a high F1-score(0.927)on real-world data,the method demonstrates excellent accuracy and generalization ability.Its application in prostate cancer analysis highlights its significant clinical potential,offering effective technical support,and a new research direction for intelligent TCM syndrome element differentiation.

关键词

证素辨证/大语言模型/文本嵌入

Key words

syndrome element differentiation/large language model/text embedding

分类

信息技术与安全科学

引用本文复制引用

孙肇阳,汪洋,马铭泽,陈妍文,吕镇秀,江甜甜,温慧玲,陈波,关静..基于大语言模型与文本嵌入计算的中医证素辨证自动化方法研究[J].北京中医药大学学报,2025,48(8):1176-1184,9.

基金项目

国家重点研发计划(No.2017YFC1703302) National Key R&D Program of China(No.2017YFC1703302) (No.2017YFC1703302)

北京中医药大学学报

OA北大核心

1006-2157

访问量0
|
下载量0
段落导航相关论文