首页|期刊导航|计算机应用与软件|基于全局自适应宽度注意力改进的Transformer

基于全局自适应宽度注意力改进的Transformer

曾庆威张建张鸿昌谭雨阳沈文枫

计算机应用与软件2024，Vol.41Issue(7)：145-149,5.

计算机应用与软件2024，Vol.41Issue(7)：145-149,5.DOI:10.3969/j.issn.1000-386x.2024.07.022

基于全局自适应宽度注意力改进的Transformer

IMPROVED TRANSFORMER BASED ON GLOBAL ADAPTIVE WIDTH ATTENTION

曾庆威 ¹张建 ¹张鸿昌 ¹谭雨阳 ¹沈文枫¹

作者信息

1. 上海大学上海 210000
折叠

摘要

Abstract

Transformer is widely-used in natural language processing,but there is a problem that the input information is cut and the video memory is too large caused by the long text.The existing solution is to let the model dynamically determine the attention width of each layer,and it can associate the optimal sequence length under the premise of controlling calculation amount and memory footprint overhead.However,there is the disadvantage that the optimal attention width of each layer cannot reach the optimal attention width of the model.For this reason,we propose a global adaptive width attention(GAA).We let the attention range of each layer be associated with the global,so as to achieve the optimal global attention range of the model,and modified the feedforward layer of the model to the feedforward layer of the gated unit.Validations on the data sets enwiki8 and text-8 show that this method only uses 25％of the training calculation cost to achieve better performance than the baseline.

关键词

Transformer/全局自适应宽度注意力/FFNGLU

Key words

Transformer/Global adaptive width attention/FFNGLU

分类

信息技术与安全科学

引用本文复制引用

曾庆威,张建,张鸿昌,谭雨阳,沈文枫..基于全局自适应宽度注意力改进的Transformer[J].计算机应用与软件,2024,41(7):145-149,5.

基金项目

上海智能计算系统工程技术研究中心项目(19DZ2252600) （19DZ2252600）

国家重点研发计划项目(2017YFB0701600) （2017YFB0701600）

上海市科学技术委员会项目(19511121002). （19511121002）

计算机应用与软件

OA北大核心CSTPCD

ISSN：1000-386X

访问量0

下载量0

段落导航