| 注册
首页|期刊导航|软件导刊|基于稀疏子空间的分布外文本检测

基于稀疏子空间的分布外文本检测

王祉苑 彭涛 杨捷

软件导刊2025,Vol.24Issue(5):70-78,9.
软件导刊2025,Vol.24Issue(5):70-78,9.DOI:10.11907/rjdk.241251

基于稀疏子空间的分布外文本检测

Sparse Subspace Based Out-of-Distribution Text Detection

王祉苑 1彭涛 1杨捷2

作者信息

  • 1. 武汉纺织大学 计算机与人工智能学院,湖北 武汉 430200
  • 2. 伍伦贡大学 计算机与信息技术学院,新南威尔士州 伍伦贡 2522,澳大利亚
  • 折叠

摘要

Abstract

The goal of Out-of-Distribution(OOD)detection is to identify potential samples that do not conform to the data distribution during the training process,in order to avoid the model making predictions on unusual cases.OOD detection methods based on pre-trained language models often overly rely on data labels for text classification tasks,which limits their performance in practical applications.Currently,there is insufficient research on unsupervised OOD detection.To overcome this limitation,a new OOD text detection framework called Sparse Sub-space-based Out-of-Distribution Text Detection(SSOD)is proposed.This framework does not require labeled data and utilizes sparse sub-spaces to jointly model the feature distribution of known data.It constructs the probability density function of observed samples in the nearest subspace as the scoring metric for OOD detection.Experimental results show that SSOD achieves an average AUROC and average FAR95 that are respectively 2.2%and 4.1%higher than the baseline across different distribution shifts,surpassing existing supervised methods in overall performance.

关键词

分布外检测/预训练语言模型/深度学习/稀疏子空间聚类/文本分类

Key words

out-of-distribution detection/pre-trained language model/deep learning/sparse subspace clustering/text classification

分类

信息技术与安全科学

引用本文复制引用

王祉苑,彭涛,杨捷..基于稀疏子空间的分布外文本检测[J].软件导刊,2025,24(5):70-78,9.

基金项目

中国高校产学研创新基金项目(2021ITA05012) (2021ITA05012)

软件导刊

1672-7800

访问量0
|
下载量0
段落导航相关论文