软件导刊2025,Vol.24Issue(5):70-78,9.DOI:10.11907/rjdk.241251
基于稀疏子空间的分布外文本检测
Sparse Subspace Based Out-of-Distribution Text Detection
摘要
Abstract
The goal of Out-of-Distribution(OOD)detection is to identify potential samples that do not conform to the data distribution during the training process,in order to avoid the model making predictions on unusual cases.OOD detection methods based on pre-trained language models often overly rely on data labels for text classification tasks,which limits their performance in practical applications.Currently,there is insufficient research on unsupervised OOD detection.To overcome this limitation,a new OOD text detection framework called Sparse Sub-space-based Out-of-Distribution Text Detection(SSOD)is proposed.This framework does not require labeled data and utilizes sparse sub-spaces to jointly model the feature distribution of known data.It constructs the probability density function of observed samples in the nearest subspace as the scoring metric for OOD detection.Experimental results show that SSOD achieves an average AUROC and average FAR95 that are respectively 2.2%and 4.1%higher than the baseline across different distribution shifts,surpassing existing supervised methods in overall performance.关键词
分布外检测/预训练语言模型/深度学习/稀疏子空间聚类/文本分类Key words
out-of-distribution detection/pre-trained language model/deep learning/sparse subspace clustering/text classification分类
信息技术与安全科学引用本文复制引用
王祉苑,彭涛,杨捷..基于稀疏子空间的分布外文本检测[J].软件导刊,2025,24(5):70-78,9.基金项目
中国高校产学研创新基金项目(2021ITA05012) (2021ITA05012)