| 注册
首页|期刊导航|计算机技术与发展|CMNER:基于微博的中文多模态实体识别数据集

CMNER:基于微博的中文多模态实体识别数据集

季源泽 李霏

计算机技术与发展2024,Vol.34Issue(10):110-117,8.
计算机技术与发展2024,Vol.34Issue(10):110-117,8.DOI:10.20165/j.cnki.ISSN1673-629X.2024.0203

CMNER:基于微博的中文多模态实体识别数据集

CMNER:A Chinese Multimodal NER Dataset Based on Weibo

季源泽 1李霏1

作者信息

  • 1. 武汉大学 国家网络安全学院 空天信息安全与可信计算教育部重点实验室,湖北 武汉 430072
  • 折叠

摘要

Abstract

Multimodal Named Entity Recognition(MNER)is a pivotal task designed to extract and classify named entities from text with the assistance of pertinent images.Nonetheless,a notable paucity of manual annotation data for Chinese MNER has considerably impeded the progress of Chinese multimodal named entity recognition.We compile a Chinese Multimodal NER dataset(CMNER)utilizing data sourced from social media platform,encompassing 5 000 Weibo posts paired with 18 326 corresponding images.The entities are classified into four distinct categories:person,location,organization,and miscellaneous.We applied the ACN model and UMT model as baseline experiments on CMNER.The experimental results indicate that the F1 scores of the two models reach 74.22%and 89.50%,respectively,validating the effectiveness of the dataset.Furthermore,we conducted cross-lingual experiments and the results substantiate that Chinese and English multimodal NER data can mutually enhance the performance of the NER model.To promote related research on Chinese MNER,the CMNER and related code are released.

关键词

多模态命名实体识别/图像/命名实体/中文/跨语言

Key words

multimodal named entity recognition/image/named entity/Chinese/cross-lingual

分类

信息技术与安全科学

引用本文复制引用

季源泽,李霏..CMNER:基于微博的中文多模态实体识别数据集[J].计算机技术与发展,2024,34(10):110-117,8.

基金项目

国家重点研发计划(2022YFB3103602) (2022YFB3103602)

湖北省自然科学基金(2021CFB385) (2021CFB385)

计算机技术与发展

OACSTPCD

1673-629X

访问量0
|
下载量0
段落导航相关论文