首页|期刊导航|现代情报|中国大语言模型知识实体抽取能力评测

中国大语言模型知识实体抽取能力评测

魏瑞斌徐艳

现代情报2026，Vol.46Issue(4)：36-56,21.

现代情报2026，Vol.46Issue(4)：36-56,21.DOI:10.3969/j.issn.1008-0821.2026.04.004

中国大语言模型知识实体抽取能力评测

Knowledge Entity Extraction Capacity Evaluation of Large Language Models in China

魏瑞斌 ¹徐艳¹

作者信息

1. 安徽财经大学管理科学与工程学院,安徽蚌埠 233030
折叠

摘要

Abstract

[Purpose/Significance]The evaluation of Large Language Models(LLMs)for extracting knowledge entities from scientific texts is a growing area of international research.However,systematic and multi-dimensional assessments focusing on the capabilities of domestic LLMs within specialized academic fields remain relatively underexplored.This study aims to provide a comprehensive empirical evaluation of mainstream Chinese LLMs,using the Information Systems(IS)domain as a case study.The objective is to deliver a substantive and detailed benchmark to inform model selection and practical application in domain-specific text mining tasks.[Method/Process]To conduct this rigorous evaluation,the research first constructed a bilingual benchmark dataset tailored to the IS domain.This dataset comprises 250 Chinese research articles sourced from the Journal of Information System(spanning 2007-2025)and 865 English articles from MIS Quarterly(2008-2025).The research defined a taxonomy of six categories of fine-grained knowledge entities critical to IS research.A structured prompt engineering framework was then meticulously designed to guide eight prominent domestic LLMs—DeepSeek,GLM-4.6,Qwen3,Spark X1.5,Doubao-1.6,Hunyuan-T1,Kimi-K2,and ERNIE-X1—through the entity extraction task.It assessed model performance using a multi-dimensional analytical framework.This framework quantitatively analyzed key performance metrics(Macro Precision,Recall,and F1-score),conducted a fine-grained quali-tative error analysis(incorporating span comparison scores and entity type confusion matrices),and evaluated practical cost-effectiveness based on API pricing and average processing time per document.[Result/Conclusion]The evaluation reveals significant and noteworthy performance variation among the tested Chinese LLMs.DeepSeek consistently achieved the highest overall scores across both Chinese and English contexts,demonstrating superior extraction capability.Qwen3 presented a more balanced profile,offering competitive performance with favorable cost efficiency.All models exhibited strong competency in identifying well-structured,categorically clear entities such as research perspectives.Conversely,they faced consistent and pronounced difficulties in determining precise textual boundaries for fine-grained entities and showed a marked performance decline in cross-lingual settings.Error analysis further pinpointed systematic challenges in disambiguating semantically similar entity types.This study contributes a detailed,multi-dimensional benchmark that clarifies the current landscape of Chinese LLMs for specialized knowledge extraction.It affirms their utility as effective tools for preliminary knowledge entity recognition in scenarios with scarce annotated data.Simultaneously,it delineates persistent challenges—particularly in semantic precision,boundary detection,and cross-lingual generalization—that must inform future model development and application design.The proposed assessment framework offers an adaptable founda-tion for future comparative research across other academic domains and evolving model ecosystems.

关键词

知识实体/大语言模型/信息系统/实体抽取能力评测

Key words

knowledge entity/large language models/information system/evaluation of entity extraction capability

分类

社会科学

引用本文复制引用

魏瑞斌,徐艳..中国大语言模型知识实体抽取能力评测[J].现代情报,2026,46(4):36-56,21.

基金项目

国家社会科学基金项目"情报学研究方法的知识图谱构建及其应用场景推荐研究"(项目编号:20BTQ044). （项目编号:20BTQ044）

现代情报

OACHSSCD

ISSN：1008-0821

访问量0

下载量0

段落导航