首页|期刊导航|天津科技大学学报|基于探针稀疏注意力机制的门控Transformer模型

基于探针稀疏注意力机制的门控Transformer模型OA

Gated Transformer Based on Prob-Sparse Attention

中文摘要英文摘要

在强化学习中,智能体对状态序列进行编码,根据历史信息指导动作的选择,通常将其建模为递归型神经网络,但其存在梯度消失和梯度爆炸的问题,难以处理长序列.以自注意力机制为核心的 Transformer 是一种能够有效整合长时间范围内信息的机制,将传统 Transformer 直接应用于强化学习中存在训练不稳定和计算复杂度高的问题.门控Transformer-XL(GTrXL)解决了Transformer在强化学习中训练不稳定的问题,但仍具有很高的计算复杂度.针对此问题,本研究提出了一种具有探针稀疏注意力机制的门控 Transformer(PS-GTr),其在 GTrXL 中的恒等映射重排和门控机制的基础上引入了探针稀疏注意力机制,降低了时间复杂度和空间复杂度,进一步提高了训练效率.通过实验验证,PS-GTr在强化学习任务中的性能与GTrXL相当,而且训练时间更短,内存占用更少.

In reinforcement learning,the agent encodes state sequence and influences action selection by historical informa-tion,typically employing recurrent neural network.Such traditional methods encounter gradient issues such as gradient dis-appearance and gradient explosion,and are also challenged by long sequences.Transformer leverages self-attention to as-similate long-range information.However,traditional Transformer exhibits instability and complexity in reinforcement learn-ing.Gated Transformer-XL(GTrXL)ameliorates Transformer training stability,but remains complex.To solve these prob-lems,in this article we propose a prob-sparse attention gated Transformer(PS-GTr)model,which introduces prob-sparse attention mechanism on the basis of identity mapping rearrangement and gating mechanism in GTrXL,reducing time and space complexity,and further improving training efficiency.Experimental verification showed that PS-GTr had comparable performance compared to GTrXL in reinforcement learning tasks,but had lower training time and memory usage.

赵婷婷;丁翘楚;马冲;陈亚瑞;王嫄

天津科技大学人工智能学院,天津 300457天津科技大学人工智能学院,天津 300457天津科技大学人工智能学院,天津 300457天津科技大学人工智能学院,天津 300457天津科技大学人工智能学院,天津 300457

计算机与自动化

深度强化学习自注意力机制探针稀疏注意力机制

deep reinforcement learningself-attentionprob-sparse attention

《天津科技大学学报》 2024 (3)

56-63,8

国家自然科学基金项目(61976156)天津市企业科技特派员项目(20YDTPJC00560)

10.13364/j.issn.1672-6510.20230145

评论