| 注册
首页|期刊导航|数字图书馆论坛|基于DistilBERT的领域智库文本自动分类方法研究

基于DistilBERT的领域智库文本自动分类方法研究

王小丫 马建玲

数字图书馆论坛2026,Vol.22Issue(2):1-10,10.
数字图书馆论坛2026,Vol.22Issue(2):1-10,10.DOI:10.3772/j.issn.1673-2286.2026.02.001

基于DistilBERT的领域智库文本自动分类方法研究

Research on the Automatic Classification Method of Domain Think Tank Texts Based on DistilBERT

王小丫 1马建玲1

作者信息

  • 1. 中国科学院西北生态环境资源研究院,兰州 730000||甘肃省知识计算与决策智能重点实验室,兰州 730000||中国科学院大学经济与管理学院信息资源管理系,北京 100190
  • 折叠

摘要

Abstract

With the surge in the number of foreign think tank results and their reference value in application scenarios such as policy research and intelligence analysis,efficient processing and in-depth analysis of massive think tank texts have become core issues in related research and information analysis.Among them,automatic classification of think tank texts is an important foundation for improving information retrieval efficiency and supporting subsequent in-depth analysis.This research aims to construct a field-oriented automatic classification method for foreign think tank texts to provide technical support for intelligence research and analysis.In view of the current problems of scarcity of text annotation data and large differences in classification standards in domain think tanks,a hybrid automatic classification technology framework based on a combination of unsupervised topic modeling and lightweight pre-trained language model fine-tuning is proposed:First,unsupervised topic discovery is realized through BERTopic to build a data-driven classification standard;then the knowledge distillation model DistilBERT is used for domain adaptive fine-tuning to achieve efficient and accurate classification of domain think tank texts.Experimental results on think tank texts in the field of biomedicine show that this method is superior to various baseline models in both F1 value(0.746 5)and recall rate(0.750 0),verifying its effectiveness in text classification tasks in domain think tanks.

关键词

智库/智库文本/文本自动分类/主题建模/DistilBERT/生物医药

Key words

Think Tank/Think Tank Text/Automatic Text Classification/Topic Modeling/DistilBERT/Biomedicine

分类

社会科学

引用本文复制引用

王小丫,马建玲..基于DistilBERT的领域智库文本自动分类方法研究[J].数字图书馆论坛,2026,22(2):1-10,10.

基金项目

本研究得到中国科学院文献情报中心项目"智库资源保障平台建设及资源采集服务"(编号:E340090701)资助. (编号:E340090701)

数字图书馆论坛

1673-2286

访问量0
|
下载量0
段落导航相关论文