摘要
Abstract
Aiming at the problem of low detection accuracy in traditional vulnerability detection schemes,this paper proposed a function level source code vulnerability detection scheme that comprehensively considered two intermediate representations of source code structure diagram and token sequence to a-chieve vulnerability detection.Firstly,the extended code property graph(CPG)for embedding nodes and edges was extracted,and the relational graph convolutional network(RGCN)was applied to per-form different processing on different edges,then a graph representation was generated.Secondly,to-ken sequences was extracted,and the pre-trained model CodeBert was applied to generate sequence rep-resentations.Finally,the two above step results were integrated,and a three-layer fully connected net-work was applied to ensure the vulnerability detection performance.This comprehensive evaluation of vulnerability detection schemes was made by using two types of datasets:synthetic and real software.The experimental results show that compared with existing vulnerability detection schemes based on se-quences,graphs,and a combination of both,this scheme has a significant improvement in accuracy and F1 value,the highest value of which can respectively reach 98.99%and 98.11%.In addition,the effec-tiveness of the improved methods in each stage was further verified by the compared experiment of con-trolling a single variable.关键词
漏洞检测/源代码/深度学习/图神经网络/预训练模型Key words
vulnerability detection/source code/deep learning/graph neural network/pre-training model分类
信息技术与安全科学