计算机应用研究2024,Vol.41Issue(7):2147-2152,6.DOI:10.19734/j.issn.1001-3695.2023.11.0534
基于对比学习的跨语言代码克隆检测方法
Contrastive learning based cross-language code clone detection
摘要
Abstract
Code clone detection is an important technology to improve software development efficiency,quality,and reliabili-ty.Single-language clone detection based on AST has achieved significant performance.However,the existence of synonyms and near-synonyms in AST nodes of cross-language codes and the high cost of manual labeling limit the effectiveness and use-fulness of existing clone detection methods.To address these issues,this paper proposed a cross-language code clone detection method based on contrastive tree convolutional neural network(CTCNN).Firstly,it parsed the codes of different programming languages into ASTs,and processed the node types and values of ASTs by synonym conversion to reduce the differences be-tween ASTs in different programming languages.At the same time,it employed contrastive learning to augment negative sam-ples and train the model,so that this approach ensured the minimization of distances between clone pairs and the maximization of distances between non-clone pairs in small sample datasets.Finally,it evaluated the proposed method on a public dataset with precision,recall,and F1-scores of 95.6%,99.98%,and 97.56%.The results show that compared to the best existing methods CLCDSA and C4,the proposed model improves the detection accuracy by 43.92%and 3.73%,and increases the F1-score by 29.84%and 6.29%,which confirms that the proposed model is an effective cross-language code clone detection method.关键词
跨语言/代码克隆/对比学习/抽象语法树Key words
cross-language/code clone/contrastive learning/abstract syntax tree分类
信息技术与安全科学引用本文复制引用
吕泉润,谢春丽,万泽轩,魏家劲..基于对比学习的跨语言代码克隆检测方法[J].计算机应用研究,2024,41(7):2147-2152,6.基金项目
国家自然科学基金面上基金资助项目(62276119) (62276119)
江苏师范大学研究生科研与实践创新计划资助项目(2022XKT1538) (2022XKT1538)