现代电子技术2017,Vol.40Issue(9):115-120,6.DOI:10.16652/j.issn.1004-373x.2017.09.031
基于Hadoop的网络日志挖掘方案的设计
Design of Web log mining scheme based on Hadoop
摘要
Abstract
A thought of mining the Web log data with exponent level is put forward. A high reliability Web log data mining scheme was designed. Aiming at the available public Web log dataset,the filtering algorithm based on MapReduce was imple-mented in the data preprocessing stage to mine the service information supporting the enterprise decision. The platform estab-lished with this scheme is optimized,and its performance is increased by 3.26%. The effect of the scheme's high reliability and log file quantity on the I/O speed of the platform,and the comparison of the platform with the single machine in the aspect of query performance were tested. The results show that the designed scheme is reliable,double increased with the increase of the log file quantity,the time cost of the read operation is increased by 52.58% averagely,and the time cost of the write operation is in-creased by 79.69%. With the increase of the log quantity,the query time cost of the single machine is increased rapidly,and the query time cost of the platform is stable. With the increase of the machine nodes,the computational time cost is decreased by 8.87% averagely.关键词
网络日志/数据挖掘/数据清洗/Hadoop/MySQLKey words
Web log/data mining/data filtering/Hadoop/MySQL分类
信息技术与安全科学引用本文复制引用
许抗震,吴云..基于Hadoop的网络日志挖掘方案的设计[J].现代电子技术,2017,40(9):115-120,6.基金项目
国家自然科学基金项目(NSF61370161) (NSF61370161)
贵州省科学技术基金项目(黔科合J字[2010]2100) (黔科合J字[2010]2100)
贵州大学博士基金项目(贵大人基合字(2009)029) (贵大人基合字(2009)