电子学报2025,Vol.53Issue(1):163-181,19.DOI:10.12263/DZXB.20240408
基于统计推理的二进制程序语义比较模型
Semantic Comparison Model for Binary Programs Based on Statistical Reasoning
摘要
Abstract
In the process of program defects and malicious code discovery,it is necessary to analyze the behavioral similarity of binary programs.Currently,syntax-based similarity analysis methods often ignore the execution semantics of the program,resulting in low analysis accuracy;In the process of generating symbolic logic formulas,semantic based analy-sis methods frequently call constraint solvers for semantic similarity comparison,resulting in significant time overhead.This article proposes a code similarity fuzzy matching analysis method based on statistical inference for binary programs.Starting from the calculation of instruction level similarity,the semantic similarity between basic blocks and functions is in-ferred step by step.Firstly,the binary code is divided into a set of fragments with a standardized form according to certain rules,and dynamic programming is used at the basic block granularity to construct a storage table with the same execution semantics for the longest common subsequence,thereby obtaining the initial semantic mapping of instructions between ba-sic blocks;Then,the mapping is extended to the target analysis code through neighborhood search,and the execution se-mantics of the fragments are learned during this process;Finally,statistical analysis is performed on the results of similar fragments to calculate the similarity of binary codes.During the experiment,an unsupervised pre training analysis method was used to improve the accuracy of code similarity analysis by tuning the pre training model parameters.Experiments were conducted on 13 mainstream open-source projects from the perspective of cross platform and optimization options.The ex-perimental results showed that compared to the comparison tools,the analysis accuracy of our method improved by an aver-age of 7.26%,Meanwhile,ablation experiments have shown that the pre trained model proposed in this paper can effective-ly improve the semantic matching performance of binary programs.关键词
程序分析/语义比较/逆向工程/统计推理/迁移学习Key words
program analysis/semantic comparison/reverse engineering/statistical reason/transfer learning分类
信息技术与安全科学引用本文复制引用
郭曦,王盼..基于统计推理的二进制程序语义比较模型[J].电子学报,2025,53(1):163-181,19.基金项目
国家自然科学基金(No.61502194) (No.61502194)
国家重点研发计划(No.2023YFF1000100) (No.2023YFF1000100)
湖北省教育厅科学技术研究项目(No.Q20211405) (No.Q20211405)
湖北工业大学博士科研启动基金项目(No.XJ2021003601) National Natural Science Foundation of China(No.61502194) (No.XJ2021003601)
National Key Research and De-velopment Program of China(No.2023YFF1000100) (No.2023YFF1000100)
Technology Research Plan of Hubei Provincial Department of Educa-tion(No.Q20211405) (No.Q20211405)
Doctoral Research Startup Foundation of Hubei University of Technology(No.XJ2021003601) (No.XJ2021003601)