数字图书馆论坛2023,Vol.19Issue(11):29-37,9.DOI:10.3772/j.issn.1673-2286.2023.11.004
国内自然语言处理领域数据集引用行为分析
Analysis of Dataset Citing Behaviors in the Field of Natural Language Processing in China
摘要
Abstract
With the increasing dependence of scientific research on data,investigating the reference behavior of datasets in the field of natural language processing(NLP)in China is conducive to promoting the standardized construction and citation of datasets and the fast development of this field.This paper selects 1 628 papers from the Journal of Chinese Information Processing from 2013 to 2022 as samples and the citation information of 1 970 datasets is manually marked through full-text analysis to study the citation behavior of datasets in the literature.In the field of NLP research in China,the number of papers citing others'datasets is gradually increasing,while the number of papers using self-built datasets is decreasing.Furthermore,the average citation frequency of papers citing datasets is higher than that of papers using self-built datasets.There is a tendency to cite multiple datasets,and the number of papers citing a single dataset is decreasing.Moreover,the average citation frequency of papers citing 2 to 3 datasets is higher than that of papers citing a single dataset.Dataset reusability is relatively low,and highly cited datasets primarily come from evaluations.关键词
数据集引用/数据引用/自然语言处理/高被引数据集/数据集重用Key words
Dataset Citation/Data Citation/Natural Language Processing/Highly Cited Dataset/Dataset Reuse分类
社会科学引用本文复制引用
徐琳宏,王凯达,张立杰..国内自然语言处理领域数据集引用行为分析[J].数字图书馆论坛,2023,19(11):29-37,9.基金项目
本研究得到国家自然科学基金项目"面向社交媒体的多语种文本情感分析方法研究"(编号:61806038)资助. (编号:61806038)