| 注册
首页|期刊导航|南京大学学报(自然科学版)|基于输出激活引导的大模型通道级自适应稀疏度剪枝方法

基于输出激活引导的大模型通道级自适应稀疏度剪枝方法

李沛鸿 贺傍 周彤昕 李丽 傅玉祥

南京大学学报(自然科学版)2026,Vol.62Issue(3):422-433,12.
南京大学学报(自然科学版)2026,Vol.62Issue(3):422-433,12.DOI:10.13232/j.cnki.jnju.2026.03.008

基于输出激活引导的大模型通道级自适应稀疏度剪枝方法

OGAS:Output-activation guided pruning with adaptive sparsity for large language model

李沛鸿 1贺傍 1周彤昕 1李丽 2傅玉祥1

作者信息

  • 1. 南京大学集成电路学院,苏州,215163
  • 2. VLSI实验室,南京大学电子科学与工程学院,南京,210023
  • 折叠

摘要

Abstract

Post-training pruning(PTP)has emerged as an efficient compression technique to address the challenges of limited computational resources and excessive memory footprint during the edge deployment of Large Language Models(LLMs).However,existing mainstream methods(e.g.,Wanda and SparseGPT)typically employ uniform layer-wise sparsity strategies,overlooking the significant heterogeneity in information contribution across different layers and channels.Moreover,their evaluation criteria predominantly focus on input-side intensity,making it difficult to identify high-energy static redundant channels,which leads to severe model performance degradation under high compression ratios.To address these limitations,this paper proposes OGAS,an Output-activation Guided Adaptive Sparsity pruning method at the channel level.First,a dual evaluation metric is constructed by integrating the output activation energy norm with the Peak-to-Average Power Ratio(PAPR)to accurately identify and protect sparse key features from the dimensions of response intensity and feature specificity.Second,a continuous mapping mechanism based on non-linear curvature is designed to achieve dynamic adaptive allocation of channel-level sparsity within a continuous space.Furthermore,a closed-loop optimization workflow is established by introducing the Golden Section Search algorithm to realize the automated layer-wise tuning of critical hyperparameters.Experimental results on mainstream open-source models,including LLaMA-3 and Mistral,demonstrate that at a 50%sparsity ratio,OGAS reduces the perplexity(PPL)of LLaMA-3.1-8B on the WikiText-2 dataset to 7.99,a significant improvement over the state-of-the-art first-order method Wanda(8.85).In common sense reasoning tasks,the average zero-shot accuracy reached 63.46%,representing a 1.6%improvement over Wanda.The results verify that OGAS effectively maintains the semantic understanding and logical reasoning capabilities of models after large-scale compression,exhibiting superior robustness and versatility across different model architectures.

关键词

大语言模型/训练后剪枝/自适应稀疏度/峰均比(PAPR)

Key words

large language model/model pruning/adaptive sparsity/Peak-to-Average Power Ratio

分类

信息技术与安全科学

引用本文复制引用

李沛鸿,贺傍,周彤昕,李丽,傅玉祥..基于输出激活引导的大模型通道级自适应稀疏度剪枝方法[J].南京大学学报(自然科学版),2026,62(3):422-433,12.

基金项目

国家重点研发计划(2023YFB2806800),国家自然科学基金(U21B2032),苏州市"揭榜挂帅"重点项目(SYG2024134) (2023YFB2806800)

南京大学学报(自然科学版)

0469-5097

访问量0
|
下载量0
段落导航相关论文