| 注册
首页|期刊导航|四川大学学报(自然科学版)|基于多维特征的涉诈网站检测与分类技术研究

基于多维特征的涉诈网站检测与分类技术研究

游畅 黄诚 田璇 燕玮 冷涛

四川大学学报(自然科学版)2024,Vol.61Issue(4):27-36,10.
四川大学学报(自然科学版)2024,Vol.61Issue(4):27-36,10.DOI:10.19907/j.0490-6756.2024.040003

基于多维特征的涉诈网站检测与分类技术研究

Research on detection and classification of fraudulent websites based on multi-dimensional features

游畅 1黄诚 1田璇 1燕玮 2冷涛3

作者信息

  • 1. 四川大学网络空间安全学院,成都 610065
  • 2. 中国电子信息产业集团有限公司第六研究所,北京 102209
  • 3. 四川警察学院智能警务四川省重点实验室,泸州 646099||中国科学院信息工程研究所,北京 100864
  • 折叠

摘要

Abstract

With the development and widespread use of the Internet,the tactics of fraudulent groups and their anti-detection technologies have been significantly advanced.Consequently,the detection and classification of fraudulent websites have become increasingly significant for maintaining cybersecurity in cyberspace.Tra-ditional detection methods,however,are proving insufficient in dealing with the emerging forms of deceptive websites and there is a notable dearth of research focused on the classification of these deceptive sites.To ad-dress this issue,this paper analyzes the typical features of current new fraudulent websites and proposes a multi-dimensional feature-based system for detecting and classifying fraudulent websites,which incorporates a total of 11 types of fraudulent website features and 3600 web keywords to represent fraudulent websites.The system initially uses a crawler to obtain the screenshot of a web page,WHOIS information and source code of a domain to be detected and then delivers them to the feature extraction module to construct a multi-dimensional feature set.The detection module extracts website domain names,code structure and WHOIS information as features and constructs a random forest model to perform the detection task.Subsequently,based on the detection results,the webpage classification module utilizes bi-directional GRU to obtain the tex-tual features of the webpage.In cases where the confidence level is below 0.7,the module employs a BERT model to ensure accuracy and efficiency.Additionally,a residual neural network is used to extract the web-page screenshot features while simultaneously calculating the similarity between the internal pictures of the webpage and the website Logo,and a Random Forest model is used for classification.Comparison experi-ments were conducted to evaluate the accuracy of the method.The experimental results demonstrate that our method achieves the highest accuracy with an average F1-score of 97.28%.Moreover,the results show that the multidimensional feature model effectively distinguishes between fraudulent and legitimate websites,over-comes limitations of traditional methods in detecting new fraudulent websites,and is suitable for the rapid de-tection and classification of fraudulent websites with new domain names on a global scale.

关键词

涉诈网站检测/网站分类/随机森林/深度学习

Key words

Fraudulent website detection/Website classification/Random forest/Deep learning

分类

计算机与自动化

引用本文复制引用

游畅,黄诚,田璇,燕玮,冷涛..基于多维特征的涉诈网站检测与分类技术研究[J].四川大学学报(自然科学版),2024,61(4):27-36,10.

基金项目

智能警务四川省重点实验室开放课题(ZNJW2024KFZD003) (ZNJW2024KFZD003)

四川省科技厅应用基础项目(2022NSFSC0752) (2022NSFSC0752)

四川大学学报(自然科学版)

OA北大核心CSTPCD

0490-6756

访问量0
|
下载量0
段落导航相关论文