首页|期刊导航|计算机工程与科学|OpenLM:多平台高性能的大语言模型推理框架

OpenLM:多平台高性能的大语言模型推理框架

LIU Gao XU Jianliang ZHANG Xianyi LIU Xiandong

计算机工程与科学2025，Vol.47Issue(12)：2129-2138,10.

计算机工程与科学2025，Vol.47Issue(12)：2129-2138,10.DOI:10.3969/j.issn.1007-130X.2025.12.005

OpenLM:多平台高性能的大语言模型推理框架

OpenLM:A multi-platform and high-performance large language model inference framework

LIU Gao ¹XU Jianliang ²ZHANG Xianyi ³LIU Xiandong³

作者信息

1. Faculty of Information Science and Engineering,Ocean University of China,Qingdao 266100||Peng Feng(Beijing)Technology Co.,Ltd.,Beijing 100080,China
2. Faculty of Information Science and Engineering,Ocean University of China,Qingdao 266100
3. Peng Feng(Beijing)Technology Co.,Ltd.,Beijing 100080,China
折叠

摘要

Abstract

As computational devices continue to diversify and computational power grows rapidly,the increasing number of large language models(LLMs)has made efficient multi-model inference across heterogeneous platforms a complex and formidable challenge.To address this,we propose OpenLM,a high-performance inference framework to support efficient deployment of multiple LLMs on diverse hardware platforms.The OpenLM framework boasts extensive model compatibility,providing efficient performance support for a wide range of models.It incorporates high-performance computing operators optimized for multiple platforms and architectures to maximize hardware performance.Meanwhile,OpenLM features a flexible framework architecture that facilitates rapid integration and support for the latest models.To further optimize memory(both GPU and CPU memory)consumption,task schedu-ling,and system stability during the inference process,the framework introduces features such as Paged-Attention mechanisms,dynamic batching,weight quantization,and KV cache quantization.According to the experimental results,these optimization strategies effectively enhance inference efficiency,reduce resource overhead,and bolster overall framework performance.

关键词

深度学习/大语言模型/高性能计算/大模型推理框架

Key words

deep learning/large language model(LLM)/high-performance computing(HPC)/LLM inference framework

分类

信息技术与安全科学

引用本文复制引用

LIU Gao,XU Jianliang,ZHANG Xianyi,LIU Xiandong..OpenLM:多平台高性能的大语言模型推理框架[J].计算机工程与科学,2025,47(12):2129-2138,10.

计算机工程与科学

OA北大核心

ISSN：1007-130X

访问量2

下载量0

段落导航