计算机工程2018,Vol.44Issue(3):171-177,7.DOI:10.3969/j.issn.1000-3428.2018.03.029
基于URL特征检测的违法网站识别方法
Illegal Website Identification Method Based on URL Feature Detection
摘要
Abstract
An identification method based on URL feature detection is proposed to effectively identify illegal websites.A website similarity model based on path similarity is designed based on the hierarchical characteristics of user access path in message request line information,and distributed computing of the model is implemented by using Python programming language.Websites clustering is achieved by Fast Unfolding algorithm,and URL features of illegal websites are extracted.The features of high accuracy and specific meaning are selected as effective illegal website features.By detecting whether an unknown website has the URL features of an illegal website to identify illegal websites.Experimental results show that the method can effectively measure the degree of association between similar websites,and can effectively distinguish different types of websites with Fast Unfolding algorithm.Compared with other identifying methods based on URL morphological features,HTML or semantic features,F-Measure value of the proposed method achieves the best result.关键词
URL特征/违法网站识别/网站相似度/聚类/访问路径Key words
URL feature/illegal website identification/website similarity/clustering/access path分类
信息技术与安全科学引用本文复制引用
凡友荣,杨涛,王永剑,姜国庆..基于URL特征检测的违法网站识别方法[J].计算机工程,2018,44(3):171-177,7.基金项目
国家重点研发计划项目(2016YFC0800909) (2016YFC0800909)
中央高校基本科研业务费专项资金(C16356). (C16356)