首页|期刊导航|高技术通讯|Lite-IJformer:面向长序列Transformer的轻量化方法

Lite-IJformer:面向长序列Transformer的轻量化方法

连家诚郝一帆张曦珊支天孙广中

高技术通讯2025，Vol.35Issue(2)：167-174,8.

高技术通讯2025，Vol.35Issue(2)：167-174,8.DOI:10.3772/j.issn.1002-0470.2025.02.006

Lite-IJformer:面向长序列Transformer的轻量化方法

Lite-IJformer:lite weight method for long sequence Transformers

连家诚 ¹郝一帆 ²张曦珊 ³支天 ²孙广中⁴

作者信息

1. 中国科学技术大学计算机科学与技术学院合肥 230026||中国科学院计算技术研究所处理器芯片全国重点实验室北京 100190||中科寒武纪科技股份有限公司北京 100191
2. 中国科学院计算技术研究所处理器芯片全国重点实验室北京 100190
3. 中国科学院计算技术研究所处理器芯片全国重点实验室北京 100190||中科寒武纪科技股份有限公司北京 100191
4. 中国科学技术大学计算机科学与技术学院合肥 230026
折叠

摘要

Abstract

Aiming at the high computational complexity of long sequence Transformers,this paper proposes a lightweight method called Lite-IJformer.The core idea of the proposed method consists of two steps:(1)linearize the self-at-tention to reduce its computation complexity from quardratic to linear;(2)based on the low-rank matrix decompo-sition theory,reduce the dimension of KV matrix multiplication to further reduce the calculation scale.Experiments on long range arena(LRA)benchmark show that when the length of input sequences is 1 000-2 000,linearization can reduce the computational amount of self-attention by 13-26 times,and improve the inference speed by 4.75-5.72 times without precision loss.After dimension reduction,the computational amount of self-attention is further re-duced by 17.0％,and the inference speed of model is increased by 1.17 times,with a precision loss within 0.5％.

关键词

Transformer/自注意力/线性化方法/降维

Key words

Transformer/self-attention/linearization method/dimension reduction

引用本文复制引用

连家诚,郝一帆,张曦珊,支天,孙广中..Lite-IJformer:面向长序列Transformer的轻量化方法[J].高技术通讯,2025,35(2):167-174,8.

基金项目

国家重点研发计划(2022YFB4501601),国家自然科学基金(U22A2028,U20A20227)和中国科学院稳定支持基础研究领域青年团队计划(YSBR-029)资助项目. （2022YFB4501601）

高技术通讯

OA北大核心

ISSN：1002-0470

访问量11

下载量0

段落导航