基于6种机器学习算法的早发性卵巢功能不全影响因素分析OACSTPCD
Analysis of influencing factors of premature ovarian insufficiency based on 6 machine learning algorithms
目的:通过机器学习算法对早发性卵巢功能不全(POI)的影响因素进行特征排序,找出对POI影响较大的因素.方法:先制定纳入和剔除标准,选取因月经不调就诊的500 例患者,根据中医证型进行年龄和职业差异性分析.再通过逻辑回归、支持向量机、决策树、随机森林、极端梯度提升和K-最近邻6 种机器学习算法对患者进行POI预测分类,根据算法求得的马修斯相关系数和AUC进行预测精准度比较.通过随机森林中的准确度和基尼不纯度下降对POI影响因素进行特征排序,结合逐步剔除法得到对POI影响程度排序前五的特征因素.结果:随机森林的算法在马修斯相关系数、准确率和AUC中均获得了最大值,分别为0.399、0.717 和0.908.POI的影响因素有子宫或盆腔手术史、受教育程度、年龄、减肥史和吸烟史,这些因素的Borda计数得分依次为手术史(2.446)、受教育程度(2.924)、年龄(4.060)、减肥史(5.303)、吸烟史(6.429).结论:随机森林的性能在预测POI患者中优于其他5 种算法,当患者的数据信息不足时,医生可先通过这5 个特征因素的指标对月经不调患者进行初步干预.
Aim:To rank the influencing factors of premature ovarian insufficiency(POI)by machine learning algo-rithm,and find out the factors that have a greater impact on POI.Methods:Firstly,the inclusion and exclusion criteria were established,500 patients with abnormal menstruation were selected,and the corresponding age and occupation differences were analyzed according to the traditional Chinese medicine syndrome type.Then,6 machine learning algorithms including Logistic regression,support vector machine,decision tree,random forest,extreme gradient boosting and K-nearest neighbor were used to predict and classify POI,and the prediction accuracy was compared according to the Matthews correlation coef-ficient and AUC obtained by the algorithm.POI influencing factors were sorted through the accuracy and Gini impurity re-duction in random forest,and the top 5 factors were obtained by the stepwise elimination method.Results:Random forest al-gorithm obtained the maximum value in Matthews correlation coefficient,accuracy and AUC,which were 0.399,0.717 and 0.908,respectively.The influencing factors of POI were uterine or pelvic surgery history,education level,age,weight loss history and smoking history.The Borda count scores for the 5 factors were uterine or pelvic surgery history(2.446),educa-tion level(2.924),age(4.060),weight loss history(5.303),and smoking history(6.429).Conclusions:The performance of random forest algorithm is better than the other 5 algorithms in predicting POI.When the data information of patients is insufficient,doctors could preliminarily intervene patients with irregular menstruation through the indicators of these 5 char-acteristic factors.
陆玉婷;盛正和;黄菲;裴世成;蒙华琳;伍善广
广西科技大学医学部||柳州市桂中特色药用资源开发重点实验 广西柳州 545005||湖南中医药大学药学院||湖南省中药活性物质筛选工程技术研究中心 长沙 410208柳州市人民医院中医内科 广西柳州 545005广西科技大学医学部||柳州市桂中特色药用资源开发重点实验 广西柳州 545005
预防医学
早发性卵巢功能不全机器学习特征排序
premature ovarian insufficiencymachine learningfeature ranking
《郑州大学学报(医学版)》 2024 (002)
246-251 / 6
国家自然科学基金项目(21766003);湖南省研究生科研创新项目(CX20220776)
评论