| 注册
首页|期刊导航|统计与决策|基于稳健距离的大数据Logistic回归最优子抽样

基于稳健距离的大数据Logistic回归最优子抽样

韩潇 王明秋 赵胜利

统计与决策2024,Vol.40Issue(15):59-64,6.
统计与决策2024,Vol.40Issue(15):59-64,6.DOI:10.13546/j.cnki.tjyjc.2024.15.010

基于稳健距离的大数据Logistic回归最优子抽样

Optimal Subsampling for Big Data Logistic Regression Based on Robust Distance

韩潇 1王明秋 1赵胜利1

作者信息

  • 1. 曲阜师范大学 统计与数据科学学院,山东 曲阜 273165
  • 折叠

摘要

Abstract

The statistical analysis of big data is faced with some challenging problems under the limited computing resources,so it is a choice to use sub-data instead of full data for statistical analysis.Based on the robust distance of the minimum covariance determinant,this paper proposes a more efficient sub-data selection algorithm for logistic regression models with big data,then conducts a large number of numerical simulations,and compares the performance of the proposed algorithm with that of other ex-isting algorithms under different criteria.The results are shown as below:The proposed algorithm has higher estimation efficiency and computational efficiency,and has a significant reduction in computational time compared with the full data.The value of the determinant of the sub-data information matrix obtained by the proposed algorithm is larger than those obtained by other algo-rithms.Meanwhile,the proposed method is robust when there is a high correlation between covariates.Finally,the analysis is made onthe actual data set,which shows that the proposed algorithm has smaller prediction error.

关键词

最小协方差行列式/信息矩阵/最优子抽样

Key words

minimum covariance determinant/information matrix/optimal subsampling

分类

数理科学

引用本文复制引用

韩潇,王明秋,赵胜利..基于稳健距离的大数据Logistic回归最优子抽样[J].统计与决策,2024,40(15):59-64,6.

基金项目

国家自然科学基金面上项目(12271294 ()

12171277) ()

统计与决策

OA北大核心CHSSCDCSSCICSTPCD

1002-6487

访问量0
|
下载量0
段落导航相关论文