| 注册
首页|期刊导航|数字中医药(英文)|Fine-Med-Mental-T&P:一种构建高质量中医神志病指令数据集的双轨方法

Fine-Med-Mental-T&P:一种构建高质量中医神志病指令数据集的双轨方法

魏彦柏 荆晓朔 晏峻峰

数字中医药(英文)2026,Vol.9Issue(1):31-42,12.
数字中医药(英文)2026,Vol.9Issue(1):31-42,12.DOI:10.1016/j.dcmed.2026.02.004

Fine-Med-Mental-T&P:一种构建高质量中医神志病指令数据集的双轨方法

Fine-Med-Mental-T&P:a dual-track approach for high-quality instructional datasets of mental disorders in traditional Chinese medicine

魏彦柏 1荆晓朔 2晏峻峰1

作者信息

  • 1. 湖南中医药大学信息学院,湖南 长沙 410208,中国||湖南人工智能中医实验室,湖南 长沙 410208,中国
  • 2. 湖南人工智能中医实验室,湖南 长沙 410208,中国||湖南中医药大学中医学院,湖南 长沙 410208,中国
  • 折叠

摘要

Abstract

Objective To investigate methods for constructing a high-quality instructional dataset for tra-ditional Chinese medicine(TCM)mental disorders and to validate its efficacy. Methods We proposed the Fine-Med-Mental-T&P methodology for constructing high-quali-ty instruction datasets in TCM mental disorders.This approach integrates theoretical knowl-edge and practical case studies through a dual-track strategy.(i)Theoretical track:textbooks and guidelines on TCM mental disorders were manually segmented.Initial responses were generated using DeepSeek-V3,followed by refinement by the Qwen3-32B model to align the expression with human preferences.A screening algorithm was then applied to select 16 000 high-quality instruction pairs.(ii)Practical track:starting from over 600 real clinical case seeds,diagnostic and therapeutic instruction pairs were generated using DeepSeek-V3 and subsequently screened through manual evaluation,resulting in 4 000 high-quality practice-oriented instruction pairs.The integration of both tracks yielded the Med-Mental-Instruct-T&P dataset,comprising a total of 20 000 instruction pairs.To validate the dataset's effective-ness,three experimental evaluations(both manual and automated)were conducted:(i)com-parative studies to compare the performance of models fine-tuned on different datasets;(ii)benchmarking to compare against mainstream TCM-specific large language models(LLMs);(iii)data ablation study to investigate the relationship between data volume and model performance. Results Experimental results demonstrate the superior performance of T&P-model fine-tuned on the Med-Mental-Instruct-T&P dataset.In the comparative study,the T&P-model significantly outperformed the baseline models trained solely on self-generated or purely hu-man-curated baseline data.This superiority was evident in both automated metrics(ROUGE-L>0.55)and expert manual evaluations(scoring above 7/10 across accuracy).In benchmark comparisons,the T&P-model also excelled against existing mainstream TCM LLMs(e.g.,HuatuoGPT and ZuoyiGPT).It showed particularly strong capabilities in handling diverse clinical presentations,including challenging disorders such as insomnia and coma,showcas-ing its robustness and versatility.Data ablation studies showed that T&P-model performance had an overall upward trend with minor fluctuations when training data increased from 10%to 50%;beyond 50%,performance improvement slowed significantly,with metrics plateauing and approaching a saturation point. Conclusion This study has successfully constructed the specialized Med-Mental-Instruct-T&P instruction dataset for TCM mental disorders proposed the systematic Fine-Med-Mental-T&P methodology for its development,effectively addressing the critical challenge of high-quality,domain-specific data scarcity in TCM,and providing essential data support for developing intelligent TCM diagnostic and therapeutic systems.

关键词

神志病/中医/指令数据集构建/指令微调/大语言模型

Key words

Mental disorder/Traditional Chinese medicine(TCM)/Instruction dataset construction/Instruction tuning/Large language model

引用本文复制引用

魏彦柏,荆晓朔,晏峻峰..Fine-Med-Mental-T&P:一种构建高质量中医神志病指令数据集的双轨方法[J].数字中医药(英文),2026,9(1):31-42,12.

基金项目

Key Scientific Research Project of the Hunan Provincial Department of Education(23A312). (23A312)

数字中医药(英文)

2096-479X

访问量1
|
下载量0
段落导航相关论文