计算机科学与探索2024,Vol.18Issue(6):1476-1490,15.DOI:10.3778/j.issn.1673-9418.2310026
基于规则集成的可解释机器学习算法及应用
Interpretable Machine Learning Algorithm Based on Rules Ensemble and Its Appli-cation
摘要
Abstract
Machine learning algorithms have achieved great success due to their excellent predictive performance,but their applicability is limited in areas where there is a high demand for model interpretability.Aiming at the weak-ness of lacking interpretability of machine learning algorithms,a new interpretable machine learning algorithm called ensemble trees penalized logistic rule regression is proposed based on the idea of rules ensemble,which can achieve comparable predictive performance to the ensemble trees algorithm with less structural complexity and re-tains the interpretive effect of logistic regression.Firstly,it extracts branches from ensemble trees such as random forest and XGBoost,and converts them into logic rules.Then,the rule set is pruned and deduplicated to obtain a streamlined rule set.Finally,the rules are incorporated into logistic regression as variables and complexity control is performed with Lasso algorithm.Taking the enterprise risk warning as an example,it is compared with multiple ma-chine learning algorithms.The results show that this algorithm not only inherits the default discrimination ability of the ensemble trees well and exceeds most of the machine learning algorithms in various classification indices,but also can give the thresholds of the enterprise risk indices through the rules,which is convenient for enterprises to carry out risk management.Further,the enterprise credit score is produced according to this algorithm,which verifies its wide applicability.The obtained score conforms to the objective law and is discriminative,and the robustness of the model's prediction performance is verified by three public datasets.关键词
可解释机器学习/规则学习/非线性回归/集成树/风险预警Key words
interpretable machine learning/rule learning/nonlinear regression/ensemble trees/risk early warning分类
信息技术与安全科学引用本文复制引用
闵继源,鲁统宇,任婷婷,陈汝昊..基于规则集成的可解释机器学习算法及应用[J].计算机科学与探索,2024,18(6):1476-1490,15.基金项目
国家自然科学基金面上项目(72071186) (72071186)
国家市场监督管理总局科技计划项目(2023MK232).This work was supported by the National Natural Science Foundation of China(72071186),and the Science and Technology Plan Project of State Administration for Market Regulation(2023MK232). (2023MK232)