四川大学学报(自然科学版)2024,Vol.61Issue(4):14-26,13.DOI:10.19907/j.0490-6756.2024.040002
一种减少对威胁情报标注依赖的自动化IOC抽取方法
An automatical IOC extraction method for reducing dependency on threat intelligence labeling
摘要
Abstract
To address the increasingly challenging cyber threats,there is an urgent need to analyze cyber threats to gain advantage in cyberspace operations.Indicator of Compromise(IOC),an essential part of Cy-ber Threat Intelligence(CTI),is throughout the entire cyber attack lifecycle and describes key information(attack behaviors,entities,etc.)accurately at each attack stage.Extracting IOCs from CTI can assist cyber defence,trace and countermeasure.Existing IOC extraction methods have made great progress with machine learning or deep learning,but they require massive investment to label adequate CTI for training and are not as effective in scenarios with limited labeled CTI.To tackle this challenge,Automatical IOC Extraction based on Less labeled data(L-AIE),a novel IOC extraction method,is proposed to reduce the labeling cost while ensuring the extraction accuracy.L-AIE enhances the CTI text processing by fine-grained word tokenization to obtain enough information from less CTI.Context and Combination Layer are used to extract sufficient con-text of IOC entities which are split into subwords.Furthermore,in the training stage,L-AIE has an addi-tional Relation Layer to expand the differences between IOC categories.Extensive experiments demonstrates that L-AIE not only has less dependence on the amount of labeled data but also outperforms other outstanding methods.With only approximately 10%of the training data of previous experiments,L-AIE achieves a macro F1 score of 87.54%,more than 20%higher than other methods.When the amount of training data is further reduced,the L-AIE extraction result is affected to less than half the extent of the other models.关键词
网络威胁/网络威胁情报/威胁指标/小样本学习Key words
Cyber threat/Cyber threat intelligence/Indicator of compromise/Few-shot learning分类
计算机与自动化引用本文复制引用
余坚,王俊峰,陈熳熳,方智阳..一种减少对威胁情报标注依赖的自动化IOC抽取方法[J].四川大学学报(自然科学版),2024,61(4):14-26,13.基金项目
国家自然科学基金(U2133208) (U2133208)
国家重点研发计划(2022YFB3305200) (2022YFB3305200)
四川大学-泸州市人民政府战略合作项目(2022CDLZ-5) (2022CDLZ-5)