|国家科技期刊平台
首页|期刊导航|四川大学学报(自然科学版)|基于异质图属性增强的恶意软件变种检测方法

基于异质图属性增强的恶意软件变种检测方法OA北大核心CSTPCD

Malware variant detection method based on heterogeneous graph attribute enhancement

中文摘要英文摘要

如今越来越多的攻击者通过修改恶意软件源码的方式逃避恶意软件检测,恶意软件变种在代码重用、编码风格、攻击行为等多方面的复杂关系为恶意软件分析带来了挑战.近年来,图神经网络凭借其在建模图结构数据,学习实体间复杂关系等方面的强大能力,已被广泛应用于恶意软件分类与检测任务之中,以建模恶意软件及其变种间复杂的关系,摆脱孤立分析困境.然而,现有方法一方面缺少对恶意软件及其变种间多维度复杂关系的全面表征,导致复杂关联关系未被充分挖掘及利用.另一方面仅关注恶意软件间的拓扑结构,忽略了实体语义信息,这导致攻击者极易通过对抗手段伪造特征从而逃过检测.此外,与恶意软件相关的Windows API、通信IP等实体自身匮乏的语义信息进一步阻碍了语义信息的提取和表示.因此,实现恶意软件间全面的关联关系与特征语义信息的融合对提升恶意软件变种检测的鲁棒性和准确性具有重要意义.为此,本文提出了一种基于异质图属性增强的恶意软件变种检测方法.首先,构建了一个恶意软件异质信息网络,以建模恶意软件及特征间的复杂关系;然后,通过恶意软件异质信息网络,将恶意软件变种检测问题转化为异质图中的节点分类问题,为实体节点构建语义属性来增强节点信息的表示;接下来,对于其中语义信息匮乏的实体节点,从外部开源数据中学习实体的语义信息来弥补自身的语义缺失;最后,本文以拓扑关系为指导,基于注意力机制聚合有属性节点信息以补全无属性节点,实现节点属性补全.遵循一种迭代优化方式,以交替式地优化补全过程与异质图节点嵌入过程,实现统一的基于异质图属性补全的恶意软件变种检测方法.实验结果表明,本文提出的方法能够有效提升恶意软件变种检测的性能,在多个数据集下优于其他最先进的模型.

Nowadays,an increasing number of attackers have been circumventing malware detection by modi-fying the source code of malicious software.The complex relationships among malware variants in code re-use,coding style,attack behavior and other aspects pose significant challenges to malware analysis.In recent years,graph neural networks have been widely applied to the tasks of malware classification and detection due to their powerful capabilities in modeling graph-structured data and learning complex relationships be-tween entities.This approach has enabled the modeling of complex relationships between malware and its variants,overcoming the limitations of isolated analysis.However,existing methods,on the one hand,lack a comprehensive characterization of the multi-dimensional complex relationships among malware and its vari-ants,leading to the underutilization of these complex interrelations.On the other hand,they focus only on the topological structure of malware,ignoring the semantic information of entities,allowing attackers to eas-ily forge features through adversarial methods and thus evade detection.In addition,the deficiency of seman-tic information in entities such as Windows API and communication IPs further hinders the extraction and rep-resentation of semantic information.Therefore,achieving the integration of the comprehensive relationships and the feature semantic information is crucial for enhancing the robustness and accuracy of malware variant detection.Accordingly,the authors propose a malware variant detection method,which is enhanced by the attributes of the heterogeneous graph.Specifically,the authors construct a heterogeneous information net-work to capture the complex relationship between malware and its features.Utilizing this network,the mal-ware variant detection is transformed into a node classification problem in a heterogeneous graph.Then,the authors formulate semantic attributes for the entity nodes to enhance the representation of node information.For entity nodes where semantic information is sparse,the authors derive the semantic information of the enti-ties from external open-source data to address their semantic deficiency.Finally,guided by topological rela-tionships,the authors utilize an attention mechanism to aggregate information from nodes with attributes to compensate for those without attributes,achieving attribute completion.Following an iterative optimization approach,the authors alternately optimize the completion process and the heterogeneous graph node embed-ding process,formulating a unified method for malware variant detection that leverages attribute completion in heterogeneous graph.Experimental results show that our proposed method significantly enhances the per-formance of malware variant detection,outperforming other state-of-the-art models across multiple datasets.

孙锦涛;李祺;李晓龙

北京邮电大学网络空间安全学院,北京 100876国网宁夏电力有限公司电力科学研究院,银川 750011

计算机与自动化

恶意软件变种检测异质图神经网络特征增强属性补全

Malware variant detectionHeterogeneous graph neural networksFeature enhancementAttri-bute completion

《四川大学学报(自然科学版)》 2024 (003)

15-29 / 15

国家自然科学基金项目(62172055);宁夏自然科学基金课题(2021AAC03511)

10.19907/j.0490-6756.2024.030002

评论