现代情报2026,Vol.46Issue(4):36-56,21.DOI:10.3969/j.issn.1008-0821.2026.04.004
中国大语言模型知识实体抽取能力评测
Knowledge Entity Extraction Capacity Evaluation of Large Language Models in China
摘要
Abstract
[Purpose/Significance]The evaluation of Large Language Models(LLMs)for extracting knowledge entities from scientific texts is a growing area of international research.However,systematic and multi-dimensional assessments focusing on the capabilities of domestic LLMs within specialized academic fields remain relatively underexplored.This study aims to provide a comprehensive empirical evaluation of mainstream Chinese LLMs,using the Information Systems(IS)domain as a case study.The objective is to deliver a substantive and detailed benchmark to inform model selection and practical application in domain-specific text mining tasks.[Method/Process]To conduct this rigorous evaluation,the research first constructed a bilingual benchmark dataset tailored to the IS domain.This dataset comprises 250 Chinese research articles sourced from the Journal of Information System(spanning 2007-2025)and 865 English articles from MIS Quarterly(2008-2025).The research defined a taxonomy of six categories of fine-grained knowledge entities critical to IS research.A structured prompt engineering framework was then meticulously designed to guide eight prominent domestic LLMs—DeepSeek,GLM-4.6,Qwen3,Spark X1.5,Doubao-1.6,Hunyuan-T1,Kimi-K2,and ERNIE-X1—through the entity extraction task.It assessed model performance using a multi-dimensional analytical framework.This framework quantitatively analyzed key performance metrics(Macro Precision,Recall,and F1-score),conducted a fine-grained quali-tative error analysis(incorporating span comparison scores and entity type confusion matrices),and evaluated practical cost-effectiveness based on API pricing and average processing time per document.[Result/Conclusion]The evaluation reveals significant and noteworthy performance variation among the tested Chinese LLMs.DeepSeek consistently achieved the highest overall scores across both Chinese and English contexts,demonstrating superior extraction capability.Qwen3 presented a more balanced profile,offering competitive performance with favorable cost efficiency.All models exhibited strong competency in identifying well-structured,categorically clear entities such as research perspectives.Conversely,they faced consistent and pronounced difficulties in determining precise textual boundaries for fine-grained entities and showed a marked performance decline in cross-lingual settings.Error analysis further pinpointed systematic challenges in disambiguating semantically similar entity types.This study contributes a detailed,multi-dimensional benchmark that clarifies the current landscape of Chinese LLMs for specialized knowledge extraction.It affirms their utility as effective tools for preliminary knowledge entity recognition in scenarios with scarce annotated data.Simultaneously,it delineates persistent challenges—particularly in semantic precision,boundary detection,and cross-lingual generalization—that must inform future model development and application design.The proposed assessment framework offers an adaptable founda-tion for future comparative research across other academic domains and evolving model ecosystems.关键词
知识实体/大语言模型/信息系统/实体抽取能力评测Key words
knowledge entity/large language models/information system/evaluation of entity extraction capability分类
社会科学引用本文复制引用
魏瑞斌,徐艳..中国大语言模型知识实体抽取能力评测[J].现代情报,2026,46(4):36-56,21.基金项目
国家社会科学基金项目"情报学研究方法的知识图谱构建及其应用场景推荐研究"(项目编号:20BTQ044). (项目编号:20BTQ044)