| 注册
首页|期刊导航|计算机应用研究|基于对比学习的跨语言代码克隆检测方法

基于对比学习的跨语言代码克隆检测方法

吕泉润 谢春丽 万泽轩 魏家劲

计算机应用研究2024,Vol.41Issue(7):2147-2152,6.
计算机应用研究2024,Vol.41Issue(7):2147-2152,6.DOI:10.19734/j.issn.1001-3695.2023.11.0534

基于对比学习的跨语言代码克隆检测方法

Contrastive learning based cross-language code clone detection

吕泉润 1谢春丽 1万泽轩 1魏家劲1

作者信息

  • 1. 江苏师范大学计算机科学与技术学院,江苏徐州 221116
  • 折叠

摘要

Abstract

Code clone detection is an important technology to improve software development efficiency,quality,and reliabili-ty.Single-language clone detection based on AST has achieved significant performance.However,the existence of synonyms and near-synonyms in AST nodes of cross-language codes and the high cost of manual labeling limit the effectiveness and use-fulness of existing clone detection methods.To address these issues,this paper proposed a cross-language code clone detection method based on contrastive tree convolutional neural network(CTCNN).Firstly,it parsed the codes of different programming languages into ASTs,and processed the node types and values of ASTs by synonym conversion to reduce the differences be-tween ASTs in different programming languages.At the same time,it employed contrastive learning to augment negative sam-ples and train the model,so that this approach ensured the minimization of distances between clone pairs and the maximization of distances between non-clone pairs in small sample datasets.Finally,it evaluated the proposed method on a public dataset with precision,recall,and F1-scores of 95.6%,99.98%,and 97.56%.The results show that compared to the best existing methods CLCDSA and C4,the proposed model improves the detection accuracy by 43.92%and 3.73%,and increases the F1-score by 29.84%and 6.29%,which confirms that the proposed model is an effective cross-language code clone detection method.

关键词

跨语言/代码克隆/对比学习/抽象语法树

Key words

cross-language/code clone/contrastive learning/abstract syntax tree

分类

信息技术与安全科学

引用本文复制引用

吕泉润,谢春丽,万泽轩,魏家劲..基于对比学习的跨语言代码克隆检测方法[J].计算机应用研究,2024,41(7):2147-2152,6.

基金项目

国家自然科学基金面上基金资助项目(62276119) (62276119)

江苏师范大学研究生科研与实践创新计划资助项目(2022XKT1538) (2022XKT1538)

计算机应用研究

OA北大核心CSTPCD

1001-3695

访问量0
|
下载量0
段落导航相关论文