数字中医药(英文)2026,Vol.9Issue(1):57-67,11.DOI:10.1016/j.dcmed.2026.02.005
基于知识图谱增强的中医证候诊断长尾学习方法
Knowledge graph-enhanced long-tail learning approach for traditional Chinese medicine syndrome differentiation
摘要
Abstract
Objective To address the dual challenges of long-tail distribution and feature sparsity in tra-ditional Chinese medicine(TCM)syndrome differentiation within real clinical settings,we propose a data-efficient learning framework enhanced by knowledge graphs. Methods We developed Agent-GNN,a three-stage decoupled learning framework,and vali-dated it on the Traditional Chinese Medicine Syndrome Diagnosis(TCM-SD)dataset contain-ing 54 152 clinical records across 148 syndrome categories.First,we constructed a compre-hensive medical knowledge graph encoding the complete TCM reasoning system.Second,we proposed a Functional Patient Profiling(FPP)method that utilizes large language models(LLMs)combined with Graph Retrieval-Augmented Generation(RAG)to extract structured symptom-etiology-pathogenesis subgraphs from medical records.Third,we employed het-erogeneous graph neural networks to learn structured combination patterns explicitly.We compared our method against multiple baselines including BERT,ZY-BERT,ZY-BERT+Know,GAT,and GPT-4 Few-shot,using macro-F1 score as the primary evaluation metric.Ad-ditionally,ablation experiments were conducted to validate the contribution of each key com-ponent to model performance. Results Agent-GNN achieved an overall macro-F1 score of 72.4%,representing an 8.7 per-centage points improvement over ZY-BERT+Know(63.7%),the strongest baseline among traditional methods.For long-tail syndromes with fewer than 10 samples,Agent-GNN reached a macro-F1 score of 58.6%,compared with 39.3%for ZY-BERT+Know and 41.2%for GPT-4 Few-shot,representing relative improvements of 49.2%and 42.2%,respectively.Abla-tion experiments confirmed that the explicit modeling of etiology-pathogenesis nodes con-tributed 12.4 percentage points to this enhanced long-tail syndrome performance. Conclusion This study proposes Agent-GNN,a knowledge graph-enhanced framework that effectively addresses the long-tail distribution challenge in TCM syndrome differentiation.By explicitly modeling manifestation-mechanism-essence patterns through structured knowl-edge graphs,our approach achieves superior performance in data-scarce scenarios while pro-viding interpretable reasoning paths for TCM intelligent diagnosis.关键词
证候诊断/医学知识图谱/图神经网络/长尾学习/数据高效学习Key words
Syndrome differentiation/Medical knowledge graph/Graph neural networks/Long-tail learning/Data-efficient learning引用本文复制引用
孔伟康,温川飙,罗悦..基于知识图谱增强的中医证候诊断长尾学习方法[J].数字中医药(英文),2026,9(1):57-67,11.基金项目
Sichuan TCM Culture Coordinated Development Re-search Center Project(2023XT131),National Key Science and Technology Project of China(2023ZD0509405),and National Natural Science Foundation of China(82174236). (2023XT131)