首页|期刊导航|计算机工程|基于Hadoop的Web日志挖掘

基于Hadoop的Web日志挖掘

程苗陈华平

计算机工程2011，Vol.37Issue(11)：37-39,3.

计算机工程2011，Vol.37Issue(11)：37-39,3.DOI:10.3969/j.issn.1000.3842.2011.11.013

基于Hadoop的Web日志挖掘

Weblog Mining Based on Hadoop

程苗 ¹陈华平²

作者信息

1. 中国科学技术大学管理学院,合肥,230026
2. 中国科学技术大学计算机科学与技术学院,合肥,230026
折叠

摘要

Abstract

The mass data from Web are distributed, heterogeneous and dynamic, so the current data mining system based on single node has developed to a bottleneck. Using the advantage of cloud computing——distributed processing and virtualization, this paper presents a Weblog analysis platform under the Hadoop's cluster framework based on cloud computing, it also presents a hybrid algorithm which can distributed process in the cloud computing environment. To further verify the effectiveness and efficiency of the platform, it uses the improved algorithm to mine users'preferred access path in Weblog on the platform. Experimental results show that, using distributed algorithm to process large number of Weblog files in the cluster, can significantly improve the efficiency of Web data mining.

关键词

云计算/Hadoop架构/Map/Reduce编程模式/Web日志挖掘/遗传算法/偏爱访问路径

Key words

cloud computing/ Hadoop frame/ Map/Reduce/ Weblog mining/ genetic algorithm/ preferred browsing path

分类

信息技术与安全科学

引用本文复制引用

程苗,陈华平..基于Hadoop的Web日志挖掘[J].计算机工程,2011,37(11):37-39,3.

基金项目

博士点基金资助项目(200803580024) （200803580024）

创新研究群体科学基金资助项目(70821001) （70821001）

计算机工程

OACSCDCSTPCD

ISSN：1000-3428

访问量0

下载量0

段落导航