电讯技术2025,Vol.65Issue(10):1545-1550,6.DOI:10.20079/j.issn.1001-893x.250204001
基于生成式LLM的开源情报分析方法
An Open Source Intelligence Analysis Method Based on Generative LLM
成磊峰 1罗吉 2王磊 3朱敏 4陶思彤2
作者信息
- 1. 四川大学 计算机学院,成都 610065||西南电子技术研究所,成都 610036
- 2. 中电信数智科技有限公司,北京 100001
- 3. 西南电子技术研究所,成都 610036
- 4. 四川大学 计算机学院,成都 610065
- 折叠
摘要
Abstract
The authors propose a method integrating generative large language models(LLMs),XPath,and retrieval-augmented generation(RAG)for web page information extraction in open-source intelligence analysis.Key innovations include a dynamic templated prompting strategy and multi-granularity semantic retrieval.The dynamic templates generate domain-constrained prompts based on intelligence types(events/persons/organizations),enhancing entity extraction accuracy.The multi-granular retrieval establishes a document-paragraph-entity hierarchy optimized by the BERT-Topk algorithm for fragmented long-text information.By aligning entities with OpenKG,a three-dimensional attribute-relation-event network is constructed to strengthen complex event analysis.Experiments on ClueWeb22 and TAC-KBP2022 datasets show the extraction rate is 0.85 and the response accuracy is 0.78,outperforming traditional RAG by 18%~31%.In practical applications,92%key fact accuracy is achieved in event briefings with a total cost of only 12%of GPT-4.关键词
开源情报分析/网页信息提取/生成式大语言模型/检索增强生成Key words
open source intelligence analysis/web information extraction/generative large language model/retrieval-augmented generation分类
信息技术与安全科学引用本文复制引用
成磊峰,罗吉,王磊,朱敏,陶思彤..基于生成式LLM的开源情报分析方法[J].电讯技术,2025,65(10):1545-1550,6.