首页|期刊导航|江汉大学学报（自然科学版）|基于增强语言表示模型的网络新闻长文本分类的研究

基于增强语言表示模型的网络新闻长文本分类的研究

许楠桸柯圆圆胡晓莉

江汉大学学报（自然科学版）2024，Vol.52Issue(4)：37-44,8.

江汉大学学报（自然科学版）2024，Vol.52Issue(4)：37-44,8.DOI:10.16389/j.cnki.cn42-1737/n.2024.04.004

基于增强语言表示模型的网络新闻长文本分类的研究

Long Text Classification for Web News Based on Enhanced Language Representation Model

许楠桸 ¹柯圆圆 ¹胡晓莉¹

作者信息

1. 江汉大学人工智能学院,湖北武汉 430056
折叠

摘要

Abstract

Based on the real-time news content data of the Internet,the author classified the news topic of a time-limited Chinese long text data set.The segmentation scheme of annual keyword enhancement was used to improve the segmentation accuracy.In addition,the author adopted a long text compression method to process the special data of Chinese long text.The specific method was to select key sentences,and extract the keywords in long text using the TF-IDF algorithm,then carry out word vector training on the combined new text.Finally,the author used an enhanced language representation model to classify news topics and compared them with six machine learning and deep learning models,including recall rate,accuracy,precision,and F1 score.The experimental results show that the model can effectively classify long text in real-time news by extracting 16 important words.

关键词

ERNIE模型/预训练模型/新闻分类/长文本处理/中文文本

Key words

ERNIE model/pretraining model/news classification/long text processing/Chinese text

分类

信息技术与安全科学

引用本文复制引用

许楠桸,柯圆圆,胡晓莉..基于增强语言表示模型的网络新闻长文本分类的研究[J].江汉大学学报（自然科学版）,2024,52(4):37-44,8.

基金项目

江汉大学研究生科研创新基金项目(KYCXJJ202350) （KYCXJJ202350）

江汉大学学报（自然科学版）

ISSN：1673-0143

访问量1

下载量0

段落导航