首页|期刊导航|计算机工程与应用|面向P2P特定信息的爬虫改进技术

面向P2P特定信息的爬虫改进技术

丁军平蔡皖东

计算机工程与应用2011，Vol.47Issue(29)：23-26,4.

计算机工程与应用2011，Vol.47Issue(29)：23-26,4.DOI:10.3778/j.issn.1002-8331.2011.29.007

面向P2P特定信息的爬虫改进技术

Improved crawler algorithm technique for P2P specific information

丁军平 ¹蔡皖东¹

作者信息

1. 西北工业大学计算机学院,西安710072
折叠

摘要

Abstract

Current topic crawler algorithm technique can crawl lots of uncorrelated websites during obtaining of the "meta-in-formation", so the current topic crawler algorithm technique has been improved by being added URL classification algorithm. This classification algorithm,based on the supplied URL sample information, generates multiple uncorrelated URL key words sets and "meta-information" URL key words sets.lt sets up power to the key words in the set,and sets the threshold value to all sets;describes URL by feature vector,and calculates the distance with the key words set to classify URL;analyzes the algorithm performance in detail.As the test indicates,compared with the traditional topic crawler technique,the improved technique can dramatically improve the efficiency during obtaining of the "meta-information".The obtained "meta-information" quantity can be improved by 96.21% in the same time,which can fully meet the performance requirement of initiative monitoring model to crawler.

关键词

“元信息”获取/主题爬虫技术/URL分类算法/特征向量表示/主动监测模型

Key words

"meta-information" obtaining/topic crawler technique/URL classification algorithm/ feature vector representation/ initiative monitoring model

分类

信息技术与安全科学

引用本文复制引用

丁军平,蔡皖东..面向P2P特定信息的爬虫改进技术[J].计算机工程与应用,2011,47(29):23-26,4.

基金项目

国家高技术研究发展计划(863)(the National High-Tech Research and Development Plan of China under Grant No.2009AA01Z424). （863）

计算机工程与应用

OACSCDCSTPCD

ISSN：1002-8331

访问量0

下载量0

段落导航