计算机工程2011,Vol.37Issue(11):37-39,3.DOI:10.3969/j.issn.1000.3842.2011.11.013
基于Hadoop的Web日志挖掘
Weblog Mining Based on Hadoop
摘要
Abstract
The mass data from Web are distributed, heterogeneous and dynamic, so the current data mining system based on single node has developed to a bottleneck. Using the advantage of cloud computing——distributed processing and virtualization, this paper presents a Weblog analysis platform under the Hadoop's cluster framework based on cloud computing, it also presents a hybrid algorithm which can distributed process in the cloud computing environment. To further verify the effectiveness and efficiency of the platform, it uses the improved algorithm to mine users'preferred access path in Weblog on the platform. Experimental results show that, using distributed algorithm to process large number of Weblog files in the cluster, can significantly improve the efficiency of Web data mining.关键词
云计算/Hadoop架构/Map/Reduce编程模式/Web日志挖掘/遗传算法/偏爱访问路径Key words
cloud computing/ Hadoop frame/ Map/Reduce/ Weblog mining/ genetic algorithm/ preferred browsing path分类
信息技术与安全科学引用本文复制引用
程苗,陈华平..基于Hadoop的Web日志挖掘[J].计算机工程,2011,37(11):37-39,3.基金项目
博士点基金资助项目(200803580024) (200803580024)
创新研究群体科学基金资助项目(70821001) (70821001)