| 注册
首页|期刊导航|电子科技|基于强化正则的小样本自动摘要方法

基于强化正则的小样本自动摘要方法

李清 万卫兵

电子科技2024,Vol.37Issue(7):16-24,9.
电子科技2024,Vol.37Issue(7):16-24,9.DOI:10.16180/j.cnki.issn1007-7820.2024.07.003

基于强化正则的小样本自动摘要方法

Automatic Summarization of Small Samples Based on Enhanced Regularization

李清 1万卫兵1

作者信息

  • 1. 上海工程技术大学 电子电气工程学院,上海 201620
  • 折叠

摘要

Abstract

Automatic text summarization aims to extract the main statements from text information for the pur-pose of compressing information.Existing generative automatic summarization methods do not take full advantage of the pre-trained model to learn the semantics of the original text,resulting in the loss of important information in the generated content,when the data set with a small number of samples is often prone to overfitting.In order to solve such problems and obtain better fine-tuning performance,the pre-trained model mT5(multilingual T5)is used as a baseline to improve the learning ability of the model by combining R-drop(Regularized dropout)with reinforced regularity for model fine-tuning,and Sparse softmax is used to reduce the ambiguity of prediction generation to en-sure the accuracy of the output.The model calculates BLEU(Bilingual Evaluation Understudy)for hyperparameter test on Chinese data sets LCSTS and CSL,and uses Rouge as evaluation index to evaluate data sets of different orders of magnitude.The experimental results show that the optimized pre-trained model can better learn the semantic rep-resentation of the original text,and the model can maintain a good fit in the small samples and generate more practi-cal results.

关键词

文本自动摘要/文本生成/预训练模型/小样本数据/强化正则/稀疏化输出/语义表征学习/mT5

Key words

automatic text summarization/text generation/pre-trained model/small sample data/reinforced regularity/sparse output/semantic representation learning/mT5

分类

信息技术与安全科学

引用本文复制引用

李清,万卫兵..基于强化正则的小样本自动摘要方法[J].电子科技,2024,37(7):16-24,9.

基金项目

科技创新 2030"新一代人工智能"重大项目(2020AAA0109300).Scientific and Technological Innovation 2030-Major Project of New Generation Artificial Intelligence(2020AAA0109300) (2020AAA0109300)

电子科技

1007-7820

访问量0
|
下载量0
段落导航相关论文