| 注册
首页|期刊导航|统计与决策|编制价格指数的爬虫数据抽样方法研究

编制价格指数的爬虫数据抽样方法研究

雷兵 梁凯凯 刘维

统计与决策2024,Vol.40Issue(12):24-28,5.
统计与决策2024,Vol.40Issue(12):24-28,5.DOI:10.13546/j.cnki.tjyjc.2024.12.004

编制价格指数的爬虫数据抽样方法研究

Research on Crawler Data Sampling Method for Price Index Compilation

雷兵 1梁凯凯 1刘维1

作者信息

  • 1. 河南工业大学 管理学院,郑州 450000
  • 折叠

摘要

Abstract

Aiming at the problem of high cost of compiling price index with full crawler data,this paper proposes a sampling method.This method adopts the idea of"big data-small data",and fully captures the commodity transaction data of the e-com-merce platform through web crawler technology in the base period to form a sampling frame.Sampling techniques are used in con-tinuous surveys;according to the idea of stratified sampling,clustering algorithms and silhouette coefficients are used to achieve overall data stratification;representative samples of each stratum are obtained through random sampling with unequal probability.Considering the non-response phenomenon of the selected samples in the continuous survey,the idea of formal and alternative samples is proposed.For each formal sample,the nearest neighbor matching algorithm is used to select several alternative sam-ples.When the formal sample has no answer,the alternative sample is used as a substitute to complete the price index compila-tion.Finally,the grain and oil category in Tmall mall is used as an example for experimental validation,and the results show that in the captured data,the full-amount crawler data in the base period is 18351,the average sampling ratio of the continuous survey from 2 to 8 periods is 10.18%,and the average relative error of sampling is 0.59%,which indicates that the method is feasible.

关键词

价格指数/爬虫数据/分层抽样/聚类算法/样本匹配

Key words

price index/crawler data/stratified sampling/clustering algorithm/sample matching

分类

社会科学

引用本文复制引用

雷兵,梁凯凯,刘维..编制价格指数的爬虫数据抽样方法研究[J].统计与决策,2024,40(12):24-28,5.

基金项目

国家社会科学基金一般项目(18BGL268) (18BGL268)

河南省高校哲学社会科学创新团队资助项目(2019-CXTD-04) (2019-CXTD-04)

统计与决策

OA北大核心CHSSCDCSSCICSTPCD

1002-6487

访问量0
|
下载量0
段落导航相关论文