基于行内局部性的内存控制器端预取OACSTPCD
Memory controller-side prefetching based on intra-row locality
本文提出一种基于行内局部性的内存控制器端预取.采用位图的数据结构记录行内每个数据块的状态;并且对每一行进行区域划分,量化每个区域的访问局部性;根据区域内的局部性高低决定预取的激进程度.对于局部性较低的区域,预取区域内未被访问过的数据块;对于局部性较高的区域,同时采用跨区域的预取.通过动态调整区域规模的大小来适应局部性程度的变化.上述预取方法在龙芯3A6000 处理器上实现并评测,评测程序采用SPEC CPU2006 访存密集型应用.评测结果显示本文的预取方法将每周期指令数(IPC)平均提升6.51%,将单线程IPC最高提升46.80%(bwaves),将双核四线程IPC最高提升26.22%(lbm).
This paper proposes a memory controller-side prefetching based on intra-row locality.The data structure of the bitmap is used to record the state of each data block in the row.And each row is divided into regions,and the ac-cess locality of each region is quantified.The aggressiveness of prefetching depends on the locality in the region.For areas with low locality,unaccessed data blocks in the area will be prefetched,and for areas with high locality,cross-area prefetch will be adopted at the same time.It adapts to changes in the degree of locality by dynamically adjusting the size of the region scale.The above prefetching method is implemented and evaluated on the Loongson 3A6000 processor using SPEC CPU2006 memory-intensive applications.The evaluation results show that the pre-fetching method in this paper improves the instruction per clock cycle(IPC)by6.51%on average(up to46.80%for single-thread,up to 26.22%for dual-core four-thread).
周叔欣;张见齐;王焕东;章隆兵
处理器芯片全国重点实验室(中国科学院计算技术研究所) 北京 100190||中国科学院计算技术研究所 北京 100190||中国科学院大学 北京 100049龙芯中科技术股份有限公司 北京 100190
内存控制器预取局部性
memory controllerprefetchlocality
《高技术通讯》 2024 (003)
248-255 / 8
国家重点研发计划(2022YFB3105100)资助项目.
评论