首页|期刊导航|数字图书馆论坛|国内自然语言处理领域数据集引用行为分析

国内自然语言处理领域数据集引用行为分析

徐琳宏王凯达张立杰

数字图书馆论坛2023，Vol.19Issue(11)：29-37,9.

数字图书馆论坛2023，Vol.19Issue(11)：29-37,9.DOI:10.3772/j.issn.1673-2286.2023.11.004

国内自然语言处理领域数据集引用行为分析

Analysis of Dataset Citing Behaviors in the Field of Natural Language Processing in China

徐琳宏 ¹王凯达 ¹张立杰¹

作者信息

1. 大连外国语大学软件学院,大连 116000
折叠

摘要

Abstract

With the increasing dependence of scientific research on data,investigating the reference behavior of datasets in the field of natural language processing(NLP)in China is conducive to promoting the standardized construction and citation of datasets and the fast development of this field.This paper selects 1 628 papers from the Journal of Chinese Information Processing from 2013 to 2022 as samples and the citation information of 1 970 datasets is manually marked through full-text analysis to study the citation behavior of datasets in the literature.In the field of NLP research in China,the number of papers citing others'datasets is gradually increasing,while the number of papers using self-built datasets is decreasing.Furthermore,the average citation frequency of papers citing datasets is higher than that of papers using self-built datasets.There is a tendency to cite multiple datasets,and the number of papers citing a single dataset is decreasing.Moreover,the average citation frequency of papers citing 2 to 3 datasets is higher than that of papers citing a single dataset.Dataset reusability is relatively low,and highly cited datasets primarily come from evaluations.

关键词

数据集引用/数据引用/自然语言处理/高被引数据集/数据集重用

Key words

Dataset Citation/Data Citation/Natural Language Processing/Highly Cited Dataset/Dataset Reuse

分类

社会科学

引用本文复制引用

徐琳宏,王凯达,张立杰..国内自然语言处理领域数据集引用行为分析[J].数字图书馆论坛,2023,19(11):29-37,9.

基金项目

本研究得到国家自然科学基金项目"面向社交媒体的多语种文本情感分析方法研究"(编号:61806038)资助. （编号:61806038）

数字图书馆论坛

OACSSCICSTPCD

ISSN：1673-2286

访问量8

下载量0

段落导航