首页|期刊导航|巢湖学院学报|数据挖掘中Web文档转换算法的设计与实现

数据挖掘中Web文档转换算法的设计与实现

赵小龙佘东

巢湖学院学报Issue(6)：34-38,5.

数据挖掘中Web文档转换算法的设计与实现

DESIGN AND IMPLEMENTATION OF WEB DOCUMENTS CONVERSION ALGORITHM IN DATA MINING

赵小龙 ¹佘东¹

作者信息

1. 安徽工业经济学院,安徽合肥230051
折叠

摘要

Abstract

Web text information mining is one of the important applications of applying data mining technologies into informa- tion analysis and processing, how to transform web documents into data mining to the required format, i.e. web document pre- processing becomes a significant research task. In this paper the method is ： from Internet to download a large number of web- page files, webpage files are converted into a text files, and then through the algorithm to word frequency statistics the data of the text files, delete non-using words, remove high frequency words, process etyma of substantive words, extract stems, elimi- nate redundant words and establish word lis4 thus extraction word list, alphabetical index to generate word frequency index, and the dictionary file comparison, get the word ID, the last generation of Reuters-21578 Database data format. This web docu ment data converted into standard data sets for classification and clustering to prepare in data mining.

关键词

Web文档/数据挖掘/预处理

Key words

web documents/data mining/preprocessing

分类

信息技术与安全科学

引用本文复制引用

赵小龙,佘东..数据挖掘中Web文档转换算法的设计与实现[J].巢湖学院学报,2011,(6):34-38,5.

基金项目

安徽省高校优秀人才基金项目（）

巢湖学院一般项目（）

安徽工业经济学院《学院科研管理信息系统开发研究》自然科学基金项目支持（）

巢湖学院学报

ISSN：1672-2868

访问量0

下载量0

段落导航