计算机技术与发展2024,Vol.34Issue(5):126-132,7.DOI:10.20165/j.cnki.ISSN1673-629X.2024.0050
基于混合表征和协同训练的软件漏洞检测
Software Vulnerability Detection Based on Mixed Representation and Cooperative Training
摘要
Abstract
In order to solve the problems of poor generalization ability of deep learning model and low performance of traditional rule engine-based vulnerability detection tools due to fewer benchmark datasets in vulnerability domain,a method of software source code vul-nerability detection based on mixed representation and cooperative training was proposed.Firstly,source code text features and code semantic information are extracted based on the pre-trained model.Then,tools are used to generate abstract syntax tree,and AST(Abstract syntax tree)features of source code are extracted by custom traversal rules,and the two features are mixed to enrich code repre-sentation.Secondly,multiple deep models are built,and the generalization ability of each model is improved through a large number of unlabeled data based on cooperative training algorithm.In view of the problem that a single model may cause high false positive rate and high false positive rate,and that one model may dominate the prediction results,a multi-model integration method based on weighted voting mechanism is adopted.The experimental results show that the proposed method can solve the problem of poor model generalization caused by fewer data sets to some extent.Compared with some mainstream detection methods in the field of vulnerability detection,the proposed method has certain advantages in various indicators,and the detection performance is higher than that of the rule engine Fortify.关键词
深度学习/混合表征/漏洞检测/协同训练/集成学习Key words
deep learning/mixed feature/vulnerability detection/cooperative training/ensemble learning分类
信息技术与安全科学引用本文复制引用
陈浩东,李琳,乔梦晴,叶彪..基于混合表征和协同训练的软件漏洞检测[J].计算机技术与发展,2024,34(5):126-132,7.基金项目
武汉市重点研发计划资助项目(2022012202015070) (2022012202015070)
湖北省教育厅资助项目(2020354) (2020354)
湖北省大学生创新创业训练计划项目(S202110488047) (S202110488047)