面向图卷积神经网络的FPGA部署及加速研究OA北大核心CSTPCD
Research on FPGA deployment and acceleration of graph convolutional neural network
图卷积神经网络(GCN)算法在处理图结构数据任务中取得了突破性的成功,然而训练图卷积神经网络需要大量的内存空间及多次的随机内存访问等,这限制该算法的进一步部署应用.现有图卷积神经网络的部署及加速方案大多基于VitisHLS工具,该工具利用C/C++进行开发,几乎没有采用硬件描述语言的方案,存在软硬件加速不彻底问题.针对上述问题,设计一种面向GCN的FPGA部署及加速架构.该架构主要由计算模块和存储模块构成,两者都是利用硬件描述语言实现.计算模块主要是用硬件描述语言来实现图卷积神经网络的关键算法,即将图卷积神经网络的关键算法映射到现场可编程门阵列中以实现硬件加速;缓存模块主要是调用ROM IP核以及定义二维寄存器组,对输入节点特征、归一化后的邻接矩阵、各个层的量化参数以及中间变量进行存储,从而提高GCN算法的并行度.首先在Pycharm平台上进行模型训练并提取参数进行量化,然后在Vivado平台上对图卷积神经网络进行设计和仿真测试,对比CPU、GPU的运算性能.实验结果表明,所设计的图卷积神经网络加速架构提升了模型的推理速度.
The graph convolutional neural network(GCN)algorithm has achieved breakthrough success in processing graph structured data tasks.However,training GCN requires a large amount of memory space and multiple random memory accesses,which limits the further deployment and application of the algorithm.Existing deployment and acceleration solutions for GCN mostly rely on the Vitis HLS tool,which is developed by means of C/C++.These solutions almost entirely neglect hardware description language,leading to incomplete software-hardware acceleration.To address these issues,a FPGA deployment and acceleration architecture tailored for GCN is proposed.The architecture is composed of computing modules and storage modules,which can be implemented by means of hardware description languages.In the computing module,the hardware description language is used to implement the key algorithm of GCN,mapping it to the field-programmable gate array(FPGA)for hardware acceleration.In the caching module,the read-only memory(ROM)IP core is primarily called and a two-dimensional register file is defined to store input node features,normalized adjacency matrices,quantized parameters of various layers,and intermediate variables,enhancing the parallelism of the GCN algorithm.The model training is conducted on the Pycharm platform and parameters are extracted for quantization,then the design and simulation test for GCN are conducted on the Vivado platform,and the computational performance of CPU and GPU are compared.The experimental results show that the designed GCN acceleration architecture can improve the inference speed of the model.
高强;邵春霖;李京润;沈宗凯
红云红河烟草(集团)有限责任公司昆明卷烟厂,云南 昆明 650000昆明理工大学 信息工程与自动化学院,云南 昆明 650000
电子信息工程
图卷积神经网络FPGA加速器硬件描述语言计算模块存储模块参数量化
GCNFPGA acceleratorhardware description languagecalculation modulestorage moduleparameter quantification
《现代电子技术》 2024 (010)
39-46 / 8
评论