首页|期刊导航|数字图书馆论坛|面向开源科技情报分析的智能文本分类方法研究

面向开源科技情报分析的智能文本分类方法研究

彭鹏徐红姣

数字图书馆论坛2025，Vol.21Issue(2)：65-72,8.

数字图书馆论坛2025，Vol.21Issue(2)：65-72,8.DOI:10.3772/j.issn.1673-2286.2025.02.007

面向开源科技情报分析的智能文本分类方法研究

Intelligent Text Classification Method for Open-Source Technology Intelligence Analysis

彭鹏 ¹徐红姣¹

作者信息

1. 中国科学技术信息研究所,北京 100038
折叠

摘要

Abstract

With the explosive growth of network information,identifying valuable technology intelligence from massive network text information and classifying it intelligently have become the key to open-source technology intelligence analysis.Based on the characteristics of open-source technology intelligence texts,this paper constructs an integrated model of text denoising and classification for open-source technology intelligence analysis.It combines large language model with automatic annotation method of prompt engineering to annotate noise data and text classification data.A pre-trained language model is constructed for noise recognition and filtering,filtering non-technology intelligence texts.Multilanguage pre-trained models and distillation techniques are used to improve the loss function design,solve the problems of uneven class distribution and insufficient data,and achieve the goal of improving the accuracy and stability of multi-label technology intelligence text classification to a certain extent.The experimental results show that compared with TextCNN and BERT methods,the method proposed in this paper has higher classification ability,robustness,and adaptability.

关键词

开源科技情报/文本分类/信息过滤/预训练语言模型

Key words

Open-Source Technology Intelligence/Text Classification/Information Filtering/Pre-Trained Language Model

分类

信息技术与安全科学

引用本文复制引用

彭鹏,徐红姣..面向开源科技情报分析的智能文本分类方法研究[J].数字图书馆论坛,2025,21(2):65-72,8.

数字图书馆论坛

ISSN：1673-2286

访问量8

下载量0

段落导航