中国中医药信息杂志2024,Vol.31Issue(9):48-57,10.DOI:10.19879/j.cnki.1005-5304.202403074
基于机器学习的流行性感冒中医辨证模型研究
Study on TCM Influenza Syndrome Differentiation Model Based on Machine Learning
摘要
Abstract
Objective To train influenza clinical syndrome data using machine learning methods;To obtain an influenza syndrome differentiation model.Methods The medical records of influenza patients who visited the fever clinic of China-Japan Friendship Hospital from December 2019 to March 2022 were collected.The data set system was used for data processing,and the data generated by different data processing processes were stored separately for training.The study selected logistic regression,decision tree,naive Bayes,support vector machine,multi-layer perceptron,lightGBM and random forest as alternative models,and optimized the hyperparameters through Optuna.Models were trained separately in each data set,and the model prediction performance was evaluated,with the macro-F1 score as the core.Results Totally 1 011 training samples were collected,including 453 cases of wind-heat syndrome,152 cases of superficial wind-cold syndrome,and 406 cases of superficial cold and internal heat syndrome;8 data sets were obtained for training,containing 80 copies of data.After training,the macro-F1 scores of logistic regression,decision tree,naive Bayes,support vector machine,multi-layer perceptron lightGBM and random forest model were 0.783 0,0.774 2,0.731 5,0.782 4,0.716 7,0.793 8 and 0.815 3,respectively.Weighted samples could significantly improve the average model performance,while PCA would reduce the average model performance.The prediction performance of the logistic regression model was the best in the single method models,and the random forest model was the best in the integrated method models.Conclusion In the case of a small sample size,it is more appropriate to use logistic regression,decision tree,support vector machine and lightGBM for the TCM influenza syndrome differentiation model.As the sample size increases,logistic regression,support vector machine,lightGBM and random forest may be more suitable.Different data processing methods will affect model performance.Collecting information on the typical degree of syndrome types is beneficial to improving model performance.关键词
流行性感冒/机器学习/逻辑回归模型/辨证模型/特征工程Key words
influenza/machine learning/logistic regression model/syndrome differentiation model/feature engineering分类
医药卫生引用本文复制引用
张誉腾,张洪春,陈梦琳,靳鑫,刘剑..基于机器学习的流行性感冒中医辨证模型研究[J].中国中医药信息杂志,2024,31(9):48-57,10.基金项目
国家中医药管理局中医药创新团队及人才支持计划项目(ZYYCXTD-D-202208) (ZYYCXTD-D-202208)