统计与决策2026,Vol.42Issue(9):42-48,7.DOI:10.13546/j.cnki.tjyjc.2026.09.007
概率样本与非概率样本的整合估计方法
Integrated Estimation Method for Probability Samples and Non-Probability Samples
摘要
Abstract
In traditional probability sample collection,the increase in costs and the decline in response rates result in insuffi-cient valid samples,and the absence of target variables leads to biased estimations.In addition,the unknown sampling probability of non-probabilistic samples in online surveys and the method of estimating the sampling probability of non-probabilistic samples based on the Logistic regression model are sensitive to model specifications,which may lead to extreme probabilities and further cause the problem of high variation in the estimation results.To address the above issues,this paper proposes an integrated esti-mation method that combines probability samples and non-probability samples.Firstly,this method estimates the target variable of the probability samples through XGBoost to obtain the initial estimate,then further estimates the sampling probabilities of the probability samples and the non-probability samples,uses the kernel smoothing method to calculate the similarity of the sampling probabilities of the two types of samples,allocates the weights of the probability samples reasonably to the non-probability sam-ples according to the similarity,and performs weighted estimation of the known target variable based on the estimated weights,and finally,combines the two types of samples into one sample and adjusts the weights of the two types of samples by minimizing the MSE of the combined estimator,thereby achieving the estimation of the overall population.The results of both simulation and em-pirical studies indicate that,under various circumstances,the proposed method outperforms other methods in terms of bias and mean square error,demonstrating significant superiority.关键词
概率样本/非概率样本/XGBoost/倾向得分加权/核平滑Key words
probability sample/non-probability sample/XGBoost/propensity score weighting/kernel smoothing分类
数理科学引用本文复制引用
罗世华,戴玉芳..概率样本与非概率样本的整合估计方法[J].统计与决策,2026,42(9):42-48,7.基金项目
江西省研究生创新专项资金项目(YC2023-B179) (YC2023-B179)
江西财经大学第十八届学生科研课题(20231015151904996) (20231015151904996)