| 注册
首页|期刊导航|高技术通讯|基于龙芯3A2000处理器的高性能Goto BLAS库的实现

基于龙芯3A2000处理器的高性能Goto BLAS库的实现

张华亮 黄启印 吴少校

高技术通讯2016,Vol.26Issue(10):825-832,8.
高技术通讯2016,Vol.26Issue(10):825-832,8.DOI:10.3772/j.issn.1002-0470.2016.10-11.001

基于龙芯3A2000处理器的高性能Goto BLAS库的实现

Implementation of a high-performance Goto BLAS based on Loongson 3A2000 processor

张华亮 1黄启印 2吴少校3

作者信息

  • 1. 中国科学院计算技术研究所计算机体系结构国家重点实验室 北京100190
  • 2. 中国科学院大学 北京100049
  • 3. 龙芯中科技术有限公司 北京100190
  • 折叠

摘要

Abstract

Linpack was applied to evaluation of the performance of a computer system,and the Goto BLAS library was used as the function operation library.The performance of the library has a large impact on Linpack test results.To achieve its high performance,the study observed the performance expression of the Goto BLAS library on the Loongson 3 A2000 processor,and analyzed the testing software's execution flow and data processing methods,and then,according to the structural features of the processor,reasonably allocated the block matrix and optimized the scheme for implementation of the core loop in the function.Meanwhile,the data-fetching technologies of software and hardware,and the optimized TLB configuration schemes were adopted.With the combined effects of these optimizations,the efficiency of float point component on the simulation platform reached more than 90%,which means the optimization schemes achieved the significant results in this experiment.

关键词

Goto BLAS/性能优化/Linpack/矩阵运算/数据预取

Key words

Goto BLAS/performance optimization/Linpack/matrix operations/data prefetching

引用本文复制引用

张华亮,黄启印,吴少校..基于龙芯3A2000处理器的高性能Goto BLAS库的实现[J].高技术通讯,2016,26(10):825-832,8.

基金项目

“核高基”科技重大专项课题(2014ZX01020201)和863计划(2012AA012202,2013AA014301)资助项目. (2014ZX01020201)

高技术通讯

OA北大核心CSTPCD

1002-0470

访问量0
|
下载量0
段落导航相关论文