| 注册
首页|期刊导航|生物医学工程研究|基于不均衡数据的风险因素筛选与Nomogram预测模型构建研究

基于不均衡数据的风险因素筛选与Nomogram预测模型构建研究

周学超 郝保兵 刘成友

生物医学工程研究2025,Vol.44Issue(4):245-255,11.
生物医学工程研究2025,Vol.44Issue(4):245-255,11.DOI:10.19529/j.cnki.1672-6278.2025.04.07

基于不均衡数据的风险因素筛选与Nomogram预测模型构建研究

Research on risk factor screening and Nomogram prediction model construction based on unbalanced data

周学超 1郝保兵 2刘成友3

作者信息

  • 1. 南京市第二医院/南京中医药大学附属南京医院 肿瘤和血管疾病介入科,南京 210003
  • 2. 南京市鼓楼医院肝胆及移植外科,南京 210008
  • 3. 南京市第一医院/南京医科大学附属南京医院 临床医学工程处,南京 210006
  • 折叠

摘要

Abstract

In order to solve and optimize the problem that the classification results were biased towards the majority class due to unbalanced data in the process of pattern recognition,we took University of California,Irvine(UCI)myocardial infarction dataset as the research object,constructed a Nomogram prediction model.Firstly,three imbalance-handling methods,including K-fold cross-sam-pling voting(K-CSV),synthetic minority over-sampling technique_norminal continuous(SMOTE_NC)and random undersampling(RUS)were used to combine with mutual information,support vector machine weights,Spearman correlation analysis and variance ex-pansion facto to remove multicollinearity features.Secondly,univariate and multivariate Logistic regression were used to screen for inde-pendent risk factors,and constructe the Nomogram prediction model.The results showed that the original imbalanced data,the area un-der the receiver operating characteristic curve(AUC)value and average precision(AP)value of the model was 0.85 and 0.64,re-spectively.After RUS processing,the AUC and AP was 0.87 and 0.86 respectively,and the type Ⅱ error rate was 11.54%.After pro-cessing with SMOTE_NC,the AUC and AP were 0.96,but the accuracy rate dropped to 79.89%and the type Ⅱ error rate increased to 29.73%.The AUC and AP values of K-CSV were both 0.90,and the type Ⅱ error rate was 10.53%.The results of Cox regression anal-ysis showed that the selected features were significantly correlated with the prognosis of patients(P<0.01),indicating that the estab-lished model has high reliability in survival risk prediction.

关键词

Nomogram/不均衡数据/合成少数过采样技术_标准连续/K折交叉抽样投票法/方差膨胀因子/逻辑回归

Key words

Nomogram/Imbalanced data/Synthetic minority over-sampling technique_nominal continuous/K-fold cross-sam-pling voting/Variance inflation factor/Logistic regression

分类

医药卫生

引用本文复制引用

周学超,郝保兵,刘成友..基于不均衡数据的风险因素筛选与Nomogram预测模型构建研究[J].生物医学工程研究,2025,44(4):245-255,11.

基金项目

医疗装备综合价值评估研究项目(20250412). (20250412)

生物医学工程研究

1672-6278

访问量0
|
下载量0
段落导航相关论文