统计与决策2024,Vol.40Issue(12):24-28,5.DOI:10.13546/j.cnki.tjyjc.2024.12.004
编制价格指数的爬虫数据抽样方法研究
Research on Crawler Data Sampling Method for Price Index Compilation
摘要
Abstract
Aiming at the problem of high cost of compiling price index with full crawler data,this paper proposes a sampling method.This method adopts the idea of"big data-small data",and fully captures the commodity transaction data of the e-com-merce platform through web crawler technology in the base period to form a sampling frame.Sampling techniques are used in con-tinuous surveys;according to the idea of stratified sampling,clustering algorithms and silhouette coefficients are used to achieve overall data stratification;representative samples of each stratum are obtained through random sampling with unequal probability.Considering the non-response phenomenon of the selected samples in the continuous survey,the idea of formal and alternative samples is proposed.For each formal sample,the nearest neighbor matching algorithm is used to select several alternative sam-ples.When the formal sample has no answer,the alternative sample is used as a substitute to complete the price index compila-tion.Finally,the grain and oil category in Tmall mall is used as an example for experimental validation,and the results show that in the captured data,the full-amount crawler data in the base period is 18351,the average sampling ratio of the continuous survey from 2 to 8 periods is 10.18%,and the average relative error of sampling is 0.59%,which indicates that the method is feasible.关键词
价格指数/爬虫数据/分层抽样/聚类算法/样本匹配Key words
price index/crawler data/stratified sampling/clustering algorithm/sample matching分类
社会科学引用本文复制引用
雷兵,梁凯凯,刘维..编制价格指数的爬虫数据抽样方法研究[J].统计与决策,2024,40(12):24-28,5.基金项目
国家社会科学基金一般项目(18BGL268) (18BGL268)
河南省高校哲学社会科学创新团队资助项目(2019-CXTD-04) (2019-CXTD-04)