高技术通讯2017,Vol.27Issue(8):708-717,10.DOI:10.3772/j.issn.1002-0470.2017.08.004
基于URL文本特征及链接关系的钓鱼网站识别算法
A fishing website identification algorithm based on URL text feature and link relation
摘要
Abstract
Based on the analysis of the uniform resource location ( URL) text data of fishing sites and the characteristics of the network topology composed of fishing websites, a fishing site recognition algorithm based on URL text features and link relation ( FAUFL) is proposed to improve the accuracy rate of fishing site recognition.The principle of the algorithm is as below:By using URL text features as input, the random forest algorithm is used to generate the fish-ing site discrimination algorithm based on URL text features.The related web page group is constructed by using the link relation as input, and the related web page algorithm based on the maximum flow cutting is used to gener-ate the fishing website based on the link discriminant algorithm.By taking the above two kinds of discriminant algo-rithms' results as input, the further evaluation is conducted by using the Bagging algorithm.The test results show that the accuracy rate of the FAUFL is 99.2%, which is 3.9% higher than that of the URL text feature-based algo-rithm, and 5.0% higher than that of the link-based algorithm.关键词
钓鱼网站/融合算法/统一资源定位符(URL)/文本特征/链接关系Key words
fishing website/fusion algorithm/uniform resource location (URL)/text feature/link relation引用本文复制引用
赵蹲宇,张兆心..基于URL文本特征及链接关系的钓鱼网站识别算法[J].高技术通讯,2017,27(8):708-717,10.基金项目
国家重点研发计划( SQ2017YFGX110125-01 ),国家自然科学基金 ( 61370215, 61370211, 61402137 ),国家科技支撑计划(2012BAH45B01)和国家信息安全计划(2017A065,2017A111)资助项目. ( SQ2017YFGX110125-01 )