统计与决策2024,Vol.40Issue(15):59-64,6.DOI:10.13546/j.cnki.tjyjc.2024.15.010
基于稳健距离的大数据Logistic回归最优子抽样
Optimal Subsampling for Big Data Logistic Regression Based on Robust Distance
摘要
Abstract
The statistical analysis of big data is faced with some challenging problems under the limited computing resources,so it is a choice to use sub-data instead of full data for statistical analysis.Based on the robust distance of the minimum covariance determinant,this paper proposes a more efficient sub-data selection algorithm for logistic regression models with big data,then conducts a large number of numerical simulations,and compares the performance of the proposed algorithm with that of other ex-isting algorithms under different criteria.The results are shown as below:The proposed algorithm has higher estimation efficiency and computational efficiency,and has a significant reduction in computational time compared with the full data.The value of the determinant of the sub-data information matrix obtained by the proposed algorithm is larger than those obtained by other algo-rithms.Meanwhile,the proposed method is robust when there is a high correlation between covariates.Finally,the analysis is made onthe actual data set,which shows that the proposed algorithm has smaller prediction error.关键词
最小协方差行列式/信息矩阵/最优子抽样Key words
minimum covariance determinant/information matrix/optimal subsampling分类
数理科学引用本文复制引用
韩潇,王明秋,赵胜利..基于稳健距离的大数据Logistic回归最优子抽样[J].统计与决策,2024,40(15):59-64,6.基金项目
国家自然科学基金面上项目(12271294 ()
12171277) ()