电子学报2024,Vol.52Issue(4):1288-1295,8.DOI:10.12263/DZXB.20231039
基于API潜在语义的勒索软件早期检测方法
Ransomware Early Detection Method Based on API Latent Semantics
摘要
Abstract
Cryptographic ransomware extorts a ransom by encrypting user files.Existing early detection methods based on the first encryption-related application programming interface(API)cannot detect ransomware before it executes encryption behavior.Because the point at which different ransomware families begin executing their encryption behavior varies,existing early detection methods based on fixed time thresholds can only accurately detect a small fraction of ransom-ware before it executes encryption behavior.To further improve the timeliness of ransomware detection,this article propos-es a concept that characterizes the time period from the start of software operation to the first call of encryption-related dy-namic-link libraries(DLLs),namely the initial phase of operation(IPO).Based on the analysis of DLL and API call behavior in the early operational phase of several ransomwares,this article presents a method based on the API sequences generated by the software within the IPO as the detection object,namely the ransomware early detection method based on API latent seman-tics(REDMALS).REDMALS captures the API sequences within the IPO,uses the term frequency-inverse document frequen-cy algorithm and the latent semantic analysis algorithm to generate feature vectors on the captured API sequences and to ex-tract potential semantic structures,respectively,and then uses a machine learning algorithm to construct a detection model for ransomware detection.The experimental results show that REDMALS using the random forest algorithm achieves 97.7%and 96.0%accuracy on the constructed variant test set and unknown test set,respectively,and 83%and 76%of the ransom-ware samples in both test sets,respectively,can be detected before they perform any encryption behavior.关键词
勒索软件/早期检测/API/TF-IDF/潜在语义分析/随机森林Key words
ransomware/early detection/API/TF-IDF/latent semantic analysis/random forest分类
信息技术与安全科学引用本文复制引用
罗斌,郭春,申国伟,崔允贺,陈意,平源..基于API潜在语义的勒索软件早期检测方法[J].电子学报,2024,52(4):1288-1295,8.基金项目
国家自然科学基金(No.62162009) (No.62162009)
贵州省科技支撑计划(No.[2022]071) (No.[2022]071)
贵州省高等学校大数据与网络安全创新团队(No.[2023]052) (No.[2023]052)
河南省科技攻关计划项目(No.222102210048) National Natural Science Foundation of China(No.62162009) (No.222102210048)
Science and Technology Support Program of Guizhou Province(No.[2022]071) (No.[2022]071)
Big Data Security and Network Security Innovation Team of Guizhou Provin-cial High Education Institution(No.[2023]052) (No.[2023]052)
Key Technologies R&D Program of Henan Province(No.222102210048) (No.222102210048)