| 注册
首页|期刊导航|现代电子技术|海量大数据定向采样有差别挖掘算法仿真

海量大数据定向采样有差别挖掘算法仿真

宁滔

现代电子技术2024,Vol.47Issue(9):164-168,5.
现代电子技术2024,Vol.47Issue(9):164-168,5.DOI:10.16652/j.issn.1004-373x.2024.09.029

海量大数据定向采样有差别挖掘算法仿真

Simulation of differential mining algorithm for directional sampling of massive big data

宁滔1

作者信息

  • 1. 桂林电子科技大学 计算机工程学院,广西 北海 536000
  • 折叠

摘要

Abstract

In the big data,there may be imbalanced data distribution between different categories,where the number of data samples in certain categories is much smaller than that in others.In this case,the traditional sampling methods fail to accurately reflect the characteristics and differences of all categories.Therefore,the differential mining algorithm is studied for directional sampling of massive big data to broaden the application of big data information.On the basis of the initialization of the uniform resource locator(URL)on the website,web pages are crawled on the network,and hypertext markup language(HTML)data is collected from the web pages.The relevant connections of the directional data are extracted and imported into the URL queue.Relevant data search and processing are implemented according to network search strategies.After completing the data search,the URL of the next webpage will be automatically processed to continue with the directional sampling of massive big data.In combination with the fuzzy feature matching and detection filtering methods,the anti-interference processing in the directional sampling process of big data is achieved.Rough set algorithm is used for mining,and the extended difference matrix is used to reduce values in big data decision tables,so as to achieve the pattern classification of massive big data.The experimental results show that the packet loss rate of the algorithm during data collection is kept basically below 0.2%,and its robustness is strong.

关键词

海量大数据/网页抓取/定向采样/滤波处理/去冗余/粗糙集/扩展差别矩阵/决策规则

Key words

massive big data/web page crawling/directional sampling/filtering processing/redundancy removal/rough set/extended difference matrix/decision rule

分类

信息技术与安全科学

引用本文复制引用

宁滔..海量大数据定向采样有差别挖掘算法仿真[J].现代电子技术,2024,47(9):164-168,5.

基金项目

(2021-2024)广西职业教育教学改革重点项目(GXGZJG2021A035) (2021-2024)

现代电子技术

OA北大核心CSTPCD

1004-373X

访问量0
|
下载量0
段落导航相关论文