计算机工程与科学2011,Vol.33Issue(3):41-45,5.DOI:10.3969/j.issn.1007-130X.2011.03.008
模板操作在GPU上的实现与优化
Implementation and Optimization of Stencil Applications on GPUs
摘要
Abstract
With the fast development of GPUs, using them to accelerate scientific computing applications is becoming an inevitable trend.In this paper, we port two typical subroutines Rprj3 and Interp from Mgrid which contains rich stencil operations in SPEC2000 to run on an AMD GPU using Brook+.Using a thread granularity tuning mechanism provided by Brook+, we implement different ported program versions and analyze their performances.We also conclude how to utilize thread granularity tuning to optimize stencil program transplantation.Our experimental results show that under the largest problem size, Rprj3 obtains a speedup of 5.37 over its CPU version while Interp gains a speedup of 12.8 over its CPU version.关键词
GPU/优化/模板Key words
GPU/ optimization/ stencil分类
信息技术与安全科学引用本文复制引用
方旭东,唐玉华,王桂彬,唐滔..模板操作在GPU上的实现与优化[J].计算机工程与科学,2011,33(3):41-45,5.基金项目
国家自然科学基金资助项目(60621003) (60621003)