| 注册
首页|期刊导航|计算机工程与科学|OpenLM:多平台高性能的大语言模型推理框架

OpenLM:多平台高性能的大语言模型推理框架

LIU Gao XU Jianliang ZHANG Xianyi LIU Xiandong

计算机工程与科学2025,Vol.47Issue(12):2129-2138,10.
计算机工程与科学2025,Vol.47Issue(12):2129-2138,10.DOI:10.3969/j.issn.1007-130X.2025.12.005

OpenLM:多平台高性能的大语言模型推理框架

OpenLM:A multi-platform and high-performance large language model inference framework

LIU Gao 1XU Jianliang 2ZHANG Xianyi 3LIU Xiandong3

作者信息

  • 1. Faculty of Information Science and Engineering,Ocean University of China,Qingdao 266100||Peng Feng(Beijing)Technology Co.,Ltd.,Beijing 100080,China
  • 2. Faculty of Information Science and Engineering,Ocean University of China,Qingdao 266100
  • 3. Peng Feng(Beijing)Technology Co.,Ltd.,Beijing 100080,China
  • 折叠

摘要

Abstract

As computational devices continue to diversify and computational power grows rapidly,the increasing number of large language models(LLMs)has made efficient multi-model inference across heterogeneous platforms a complex and formidable challenge.To address this,we propose OpenLM,a high-performance inference framework to support efficient deployment of multiple LLMs on diverse hardware platforms.The OpenLM framework boasts extensive model compatibility,providing efficient performance support for a wide range of models.It incorporates high-performance computing operators optimized for multiple platforms and architectures to maximize hardware performance.Meanwhile,OpenLM features a flexible framework architecture that facilitates rapid integration and support for the latest models.To further optimize memory(both GPU and CPU memory)consumption,task schedu-ling,and system stability during the inference process,the framework introduces features such as Paged-Attention mechanisms,dynamic batching,weight quantization,and KV cache quantization.According to the experimental results,these optimization strategies effectively enhance inference efficiency,reduce resource overhead,and bolster overall framework performance.

关键词

深度学习/大语言模型/高性能计算/大模型推理框架

Key words

deep learning/large language model(LLM)/high-performance computing(HPC)/LLM inference framework

分类

信息技术与安全科学

引用本文复制引用

LIU Gao,XU Jianliang,ZHANG Xianyi,LIU Xiandong..OpenLM:多平台高性能的大语言模型推理框架[J].计算机工程与科学,2025,47(12):2129-2138,10.

计算机工程与科学

OA北大核心

1007-130X

访问量0
|
下载量0
段落导航相关论文