针对深度学习中不规则内存访问的高吞吐内存管理单元OA北大核心CSTPCD
HTMMU:a memory management unit for irregular memory access in deep learning
人工智能应用的多样化与复杂化导致了算法模型的不规则内存访问,即集中突发的访问请求与稀疏的访问地址,从而给智能应用在内存资源严格受限的移动端设备的部署带来了挑战.这种不规则的内存访问导致了现有架构中内存管理单元(MMU)的地址转换面临低吞吐和长延迟的问题,使其成为系统访存通路的瓶颈.针对上述问题,本文提出了一种新的高吞吐MMU架构方案(HTMMU),通过多流并行,加强冗余请求的过滤,合理地分配有限的片上存储资源等手段,从而能高吞吐、低延迟地处理不规则访问的地址转换,提升系统访存效率.实验结果表明,在处理人工智能算法内突发的稀疏访存时,相较于当前主流MMU设计方案,HTMMU平均获得了 2.43倍的性能提升,而平均访问延迟降低为原先的34.1%,同时将额外面积开销控制在3.0%以内.
The diversification and complexity of artificial intelligence applications lead to irregular memory access pat-tern.The irregular memory access pattern can be defined as bursty and sparse memory access requests,which brings great challenges to the deployment of intelligent applications on mobile devices with strictly limited memory resources.This irregular memory access pattern has caused the memory management unit(MMU)in existing archi-tectures to face the problems of low throughput and long latency,making it a bottleneck of the system.To solve this problem,this paper proposes a novel MMU architecture called high-throughput MMU(HTMMU).HTMMU uses multi-stream parallelism,enhances filtering of redundant requests and allocates limited on-chip memory more rea-sonably to improve system memory access efficiency.Experimental results show that when dealing with the irregular memory accesses in artificial intelligence algorithms,compared with the current MMU design,HTMMU achieves 2.43 times speedup averagely,and reduces the average latency by 65.9%with less than 3.0%area overhead.
丁峰;李曦
中国科学技术大学计算机科学与技术学院 合肥 230026
内存管理单元(MMU)地址转换不规则访存深度学习高吞吐
memory management unit(MMU)address translationirregular memory accessdeep learninghigh-throughput
《高技术通讯》 2024 (007)
714-725 / 12
国家自然科学基金(U20A20227,U22A2028),中国科学院稳定支持基础研究领域青年团队计划(YSBR-029)和中国科学院青年创新促进会资助项目.
评论