河北工业科技2025,Vol.42Issue(3):221-230,10.DOI:10.7535/hbgykj.2025yx03003
引入注意力机制的恶意URL检测算法研究
Research on malicious URL detection algorithms incorporating attention mechanisms
摘要
Abstract
To solve the problem of traditional models struggling to capture global and local features when processing long URLs(uniform resource locator),a BERT-CNN based on hierarchical attention mechanism was proposed.This model captured the global semantic information of URLs through the BERT(bidirectional encoder representations from transformers)module and extracted local features of URLs through the convolutional neural network(CNN).The hierarchical attention mechanism was introduced between BERT and CNN,enabling the model to dynamically allocate attention weights at different levels and better capture key information in URLs.By introducing a sparse attention mechanism,the computational complexity and memory overhead of the model were reduced,while the global semantic understanding capability of BERT was retained.The comparative experiments,ablation experiments,and visualization experiments were conducted on the public malicious URL detection dataset to verify the performance of the proposed model.The results show that the BERT-CNN model based on the hierarchical attention mechanism achieves an accuracy of 96.8%in detecting malicious URLs,which is 2.5 percentage points higher than the BERT-CNN baseline model;the F1 score reaches 95.3%,which is 2.1 percentage points higher than the BERT-CNN baseline model.The malicious URL detection model with the attention mechanism has a significant advantage in capturing the global and local features of URLs,which can provide new technical paths and solutions for abnormal traffic detection.关键词
自然语言处理/卷积神经网络/恶意URL/BERT模型/分层注意力机制Key words
natural language processing/convolutional neural network(CNN)/malicious URL/BERT model/hierarchical attention mechanism分类
计算机与自动化引用本文复制引用
刘拥民,翟佳慧,徐卓农,邓伟豪,麻海志..引入注意力机制的恶意URL检测算法研究[J].河北工业科技,2025,42(3):221-230,10.基金项目
国家自然科学基金(31870532) (31870532)
长沙市科技计划项目(kq2402265) (kq2402265)