| 注册
首页|期刊导航|高技术通讯|基于URL文本特征及链接关系的钓鱼网站识别算法

基于URL文本特征及链接关系的钓鱼网站识别算法

赵蹲宇 张兆心

高技术通讯2017,Vol.27Issue(8):708-717,10.
高技术通讯2017,Vol.27Issue(8):708-717,10.DOI:10.3772/j.issn.1002-0470.2017.08.004

基于URL文本特征及链接关系的钓鱼网站识别算法

A fishing website identification algorithm based on URL text feature and link relation

赵蹲宇 1张兆心1

作者信息

  • 1. 哈尔滨工业大学计算机科学与技术学院 哈尔滨150001
  • 折叠

摘要

Abstract

Based on the analysis of the uniform resource location ( URL) text data of fishing sites and the characteristics of the network topology composed of fishing websites, a fishing site recognition algorithm based on URL text features and link relation ( FAUFL) is proposed to improve the accuracy rate of fishing site recognition.The principle of the algorithm is as below:By using URL text features as input, the random forest algorithm is used to generate the fish-ing site discrimination algorithm based on URL text features.The related web page group is constructed by using the link relation as input, and the related web page algorithm based on the maximum flow cutting is used to gener-ate the fishing website based on the link discriminant algorithm.By taking the above two kinds of discriminant algo-rithms' results as input, the further evaluation is conducted by using the Bagging algorithm.The test results show that the accuracy rate of the FAUFL is 99.2%, which is 3.9% higher than that of the URL text feature-based algo-rithm, and 5.0% higher than that of the link-based algorithm.

关键词

钓鱼网站/融合算法/统一资源定位符(URL)/文本特征/链接关系

Key words

fishing website/fusion algorithm/uniform resource location (URL)/text feature/link relation

引用本文复制引用

赵蹲宇,张兆心..基于URL文本特征及链接关系的钓鱼网站识别算法[J].高技术通讯,2017,27(8):708-717,10.

基金项目

国家重点研发计划( SQ2017YFGX110125-01 ),国家自然科学基金 ( 61370215, 61370211, 61402137 ),国家科技支撑计划(2012BAH45B01)和国家信息安全计划(2017A065,2017A111)资助项目. ( SQ2017YFGX110125-01 )

高技术通讯

OA北大核心CSTPCD

1002-0470

访问量0
|
下载量0
段落导航相关论文