首页|期刊导航|哈尔滨商业大学学报（自然科学版）|融合随机森林与SHAP的恶意加密流量预测模型

融合随机森林与SHAP的恶意加密流量预测模型

吴燕

哈尔滨商业大学学报（自然科学版）2024，Vol.40Issue(2)：167-178,12.

融合随机森林与SHAP的恶意加密流量预测模型

Prediction model for malicious encrypted traffic with random forests and SHAP

吴燕¹

作者信息

1. 新疆财经大学统计与数据科学学院,乌鲁木齐 830012
折叠

摘要

Abstract

Encrypted traffic protects the user's private information but also hides malicious behaviors.Early detection of malicious encrypted traffic is a key means to defend against different network attacks(e.g.,distributed denial-of-service attacks,eavesdropping,injection attacks,etc.)and to protect the network from intrusion.Traditional port-based,deep packet inspection and other malicious traffic detection methods are difficult to fight against complex attacks such as code obfuscation,repackaging,etc.,while machine learning-based methods also suffer from high false alarm rates and difficulty to understand the decision-making process.For this reason,this paper proposed a highly interpretable model EPMRS for malicious encrypted traffic detection to make up for the limitations of existing research in terms of performance and interpretability.Based on data preprocessing such as data de-duplication,re-encoding,and feature screening,a maliciously encrypted traffic detection model was constructed based on random forest and compared with 10 mainstream machine learning models such as logistic regression,KNN,LGBM,and so on with 5-fold cross-validation experiments;based on the SHAP framework,from three different levels,namely the overall model,the interaction effect of the core risk features and the decision-making process of the samples.comprehensively enhance the interpretability of maliciously encrypted traffic detection models.The empirical results of EPMRS on the MCCCU dataset showed that the detection accuracy of EPMRS on unknown encrypted malicious traffic reached 99.996%and the misidentification rate was 0.000 3%,which improved the performance metrics by an average of 0.287 175%～7.513 175%compared with the existing work;at the same time.Meanwhile,through interpretable analysis,session,flow_duration,and goodputwere identified as the core risk factors affecting the detection of malicious encrypted traffic.

关键词

恶意加密流量/网络安全/随机森林/SHAP模型/可解释性

Key words

malicious encrypted traffic/safety of network/random forest/SHAP model/interpretability

分类

信息技术与安全科学

引用本文复制引用

吴燕..融合随机森林与SHAP的恶意加密流量预测模型[J].哈尔滨商业大学学报（自然科学版）,2024,40(2):167-178,12.

基金项目

国家自然科学基金项目(61562078) （61562078）

新疆天山青年计划项目(2018Q073) （2018Q073）

哈尔滨商业大学学报（自然科学版）

ISSN：1672-0946

访问量0

下载量0

段落导航