| 注册
首页|期刊导航|网络安全与数据治理|基于大语言模型的HTTP/HTTPS网络资产设备类型识别方法

基于大语言模型的HTTP/HTTPS网络资产设备类型识别方法

陈倩怡 苏马婧 陈紫璇 张永奇 马琰

网络安全与数据治理2026,Vol.45Issue(5):1-10,10.
网络安全与数据治理2026,Vol.45Issue(5):1-10,10.DOI:10.19358/j.issn.2097-1788.2026.05.001

基于大语言模型的HTTP/HTTPS网络资产设备类型识别方法

Device type identification of HTTP/HTTPS network asset based on large language models

陈倩怡 1苏马婧 2陈紫璇 2张永奇 2马琰2

作者信息

  • 1. 华北计算机系统工程研究所,北京 100083
  • 2. 华北计算机系统工程研究所,北京 100083||中国信息安全研究院有限公司,北京 102200
  • 折叠

摘要

Abstract

To address the limited generalization ability of traditional network asset identification methods based on static fingerprint rules and discriminative models in complex and open environments,this paper proposes an HTTP/HTTPS network asset device type identification method based on instruction fine-tuning of a large language model.A multi-source data collection scheme with multi-platform label aggregation is de-signed to construct the original network asset dataset.A data preprocessing strategy that prioritizes key feature retention is applied to reduce re-dundant noise in model inputs.Multiple heterogeneous features,including HTTP/HTTPS response bodies,response headers,SSL certificates,ports,and protocols,are further integrated to construct a unified serialized representation.Based on this representation,the LoRA technique is employed to perform parameter-efficient fine-tuning on the LLaMA-3-8B-Instruct model,enabling the model to learn the semantic associations between network asset characteristics and device types.Experimental results on a test dataset containing 380 000 real-world network assets demonstrate that the proposed method maintains stable performance under highly imbalanced samples and long-tail device scenarios,achieving a Weighted F1-score of 0.959 1,which significantly outperforms the unfine-tuned base model.In addition,the model inference throughput is improved by 62.81%.These results verify the effectiveness and practicality of the proposed method for large-scale automated network asset de-vice identification.

关键词

网络资产识别/大语言模型/指令微调/多源异构特征

Key words

network asset identification/large language models/instruction tuning/multi-source heterogeneous features

分类

信息技术与安全科学

引用本文复制引用

陈倩怡,苏马婧,陈紫璇,张永奇,马琰..基于大语言模型的HTTP/HTTPS网络资产设备类型识别方法[J].网络安全与数据治理,2026,45(5):1-10,10.

网络安全与数据治理

2097-1788

访问量1
|
下载量0
段落导航相关论文