首页|期刊导航|网络安全与数据治理|基于Boosting集成学习的风险URL检测研究

基于Boosting集成学习的风险URL检测研究

冯美琪李赟蒋冰王立松刘春波陈伟

网络安全与数据治理2024，Vol.43Issue(7)：32-40,9.

网络安全与数据治理2024，Vol.43Issue(7)：32-40,9.DOI:10.19358/j.issn.2097-1788.2024.07.006

基于Boosting集成学习的风险URL检测研究

Research on risk URL detection based on Boosting ensemble learning

冯美琪 ¹李赟 ¹蒋冰 ¹王立松 ¹刘春波 ²陈伟¹

作者信息

1. 中国民航信息网络股份有限公司运行中心,北京 101318||中国民航信息网络股份有限公司 IT基础设施国产化适配工程技术研究中心,北京 101318
2. 中国民航大学信息安全测评中心,天津 300300
折叠

摘要

Abstract

With the continuous development of the Internet and the growing number of websites,URL,as the only access to web-sites,has become the focus of web attacks.The traditional URL detection method mainly targets malicious URLs,based on fea-ture values and black-and-white lists,but it is prone to false positives and lacks detection capability for complex URLs.To resolve the appeal issue,a hybrid model for risk URL detection in business access is proposed based on the Boosting concept in ensemble learning.In the early stage of this model,the URL is treated as a string,and natural language processing techniques are used to segment and vectorize it.Then,a two-step approach is adopted.Firstly,the GBDT algorithm is used to construct a binary classifi-cation model to determine whether the URL is at risk.Then,the original string of the risk URL is input into a multi classification model,and the XGBoost algorithm is used to perform multi classification judgment on it,clarifying the specific risk types of the risk URL and providing reference for security analysts.During the model construction process,parameter optimization was contin-uously carried out,and the AUC value and F1 value were used to evaluate the binary classification model and the multi classifica-tion model,respectively.The evaluation results showed that the AUC value of the binary classification model was 98.91%,and the F1 value of the multi classification model was 0.993,indicating good performance.Applying it to practical environments and comparing it with existing detection methods,it was found that the detection rate of the model is higher than that of existing WAF and APT detection devices,and its detection results make up for the missed reports of existing detection methods.

关键词

Web攻击/集成学习/正则化/分步建模法

Key words

web attacks/ensemble learning/regularization/stepwise modeling method

分类

信息技术与安全科学

引用本文复制引用

冯美琪,李赟,蒋冰,王立松,刘春波,陈伟..基于Boosting集成学习的风险URL检测研究[J].网络安全与数据治理,2024,43(7):32-40,9.

网络安全与数据治理

ISSN：2097-1788

访问量0

下载量0

段落导航