首页|期刊导航|计算机科学与探索|基于规则集成的可解释机器学习算法及应用

基于规则集成的可解释机器学习算法及应用

闵继源鲁统宇任婷婷陈汝昊

计算机科学与探索2024，Vol.18Issue(6)：1476-1490,15.

计算机科学与探索2024，Vol.18Issue(6)：1476-1490,15.DOI:10.3778/j.issn.1673-9418.2310026

基于规则集成的可解释机器学习算法及应用

Interpretable Machine Learning Algorithm Based on Rules Ensemble and Its Appli-cation

闵继源 ¹鲁统宇 ¹任婷婷 ²陈汝昊¹

作者信息

1. 中国计量大学经济与管理学院,杭州 310018
2. 东南大学网络空间安全学院,南京 211189
折叠

摘要

Abstract

Machine learning algorithms have achieved great success due to their excellent predictive performance,but their applicability is limited in areas where there is a high demand for model interpretability.Aiming at the weak-ness of lacking interpretability of machine learning algorithms,a new interpretable machine learning algorithm called ensemble trees penalized logistic rule regression is proposed based on the idea of rules ensemble,which can achieve comparable predictive performance to the ensemble trees algorithm with less structural complexity and re-tains the interpretive effect of logistic regression.Firstly,it extracts branches from ensemble trees such as random forest and XGBoost,and converts them into logic rules.Then,the rule set is pruned and deduplicated to obtain a streamlined rule set.Finally,the rules are incorporated into logistic regression as variables and complexity control is performed with Lasso algorithm.Taking the enterprise risk warning as an example,it is compared with multiple ma-chine learning algorithms.The results show that this algorithm not only inherits the default discrimination ability of the ensemble trees well and exceeds most of the machine learning algorithms in various classification indices,but also can give the thresholds of the enterprise risk indices through the rules,which is convenient for enterprises to carry out risk management.Further,the enterprise credit score is produced according to this algorithm,which verifies its wide applicability.The obtained score conforms to the objective law and is discriminative,and the robustness of the model's prediction performance is verified by three public datasets.

关键词

可解释机器学习/规则学习/非线性回归/集成树/风险预警

Key words

interpretable machine learning/rule learning/nonlinear regression/ensemble trees/risk early warning

分类

信息技术与安全科学

引用本文复制引用

闵继源,鲁统宇,任婷婷,陈汝昊..基于规则集成的可解释机器学习算法及应用[J].计算机科学与探索,2024,18(6):1476-1490,15.

基金项目

国家自然科学基金面上项目(72071186) （72071186）

国家市场监督管理总局科技计划项目(2023MK232).This work was supported by the National Natural Science Foundation of China(72071186),and the Science and Technology Plan Project of State Administration for Market Regulation(2023MK232). （2023MK232）

计算机科学与探索

OA北大核心CSTPCD

ISSN：1673-9418

访问量0

下载量0

段落导航