计算机工程与应用2019,Vol.55Issue(24):84-90,7.DOI:10.3778/j.issn.1002-8331.1809-0259
基于URL语言特征的钓鱼网站检测算法
Phishing Detection Algorithm Based on Language Features of URL
摘要
Abstract
In order to deal with detection avoidance strategies of phishing sites, a phishing detection algorithm based on language features of URL is proposed. Through analyzing the differences in different detection domains of phishing sites and legal sites, the concept of motif and sensitivity is defined to describe language features. First of all, the similarity of main level domain is detected based on motif. When the similarity is lower than the pre-set threshold, valid subdomain features are selected. Then language features of subdomains are studied and detected using random forests. The results show that the accuracy rate of the proposed algorithm is 95.6%. The system running time is relatively less, and the aver-age recognition time is less than 1 s.关键词
钓鱼网站/统一资源定位符(URL)/语言特征/基元/敏感度Key words
phishing site/Uniform Resource Locato(r URL)/language feature/motif/sensitivity分类
信息技术与安全科学引用本文复制引用
王雨琪,刘博文,林果园..基于URL语言特征的钓鱼网站检测算法[J].计算机工程与应用,2019,55(24):84-90,7.基金项目
江苏省产学研前瞻性联合研究项目(No.BY2016026-04) (No.BY2016026-04)
软件新技术国家重点实验室开放基金(No.KFKT2018B27). (No.KFKT2018B27)