指挥控制与仿真2025,Vol.47Issue(4):27-33,7.DOI:10.3969/j.issn.1673-3819.2025.04.005
基于大模型知识蒸馏的代码摘要自动生成
Code summarization based on large model knowledge distillation
尤刚 1刘文杰 1李美鹏 1孙立群 1王炼 1田铁库1
作者信息
- 1. 中国人民解放军 96941 部队,北京 100085
- 折叠
摘要
Abstract
Code summarization is a short natural language description of source code.Summaries are usually only one sen-tence long,but they are the primary way for developers to understand code.Recently,products based on large language mod-els(such as ChatGPT)have demonstrated a strong ability to generate these descriptions.However,to use these tools,pro-grammers must send their code to an untrusted third party for processing(for example,through API calls),but this method is unacceptable to many organizations.This paper presents an alternative:we use the example output generated by GPT-3.5 to train an open source model through a process related to knowledge distillation.Enabling small models(with 350 million parameters)to also be comparable to GPT-3.5 in code summarization tasks.关键词
代码摘要/大模型/知识蒸馏Key words
code summarization/large model/knowledge distillation分类
信息技术与安全科学引用本文复制引用
尤刚,刘文杰,李美鹏,孙立群,王炼,田铁库..基于大模型知识蒸馏的代码摘要自动生成[J].指挥控制与仿真,2025,47(4):27-33,7.