首页|期刊导航|中国科学数据（中英文网络版）|面向机器阅读理解的医学域数据集MedicalQA

面向机器阅读理解的医学域数据集MedicalQA

马宁吕文蓉郭泽晨

中国科学数据（中英文网络版）2024，Vol.9Issue(1)：356-365,10.

中国科学数据（中英文网络版）2024，Vol.9Issue(1)：356-365,10.DOI:10.11922/11-6035.csd.2022.0030.zh

面向机器阅读理解的医学域数据集MedicalQA

MedicalQA:A dataset of medical domain for machine reading comprehension

马宁 ¹吕文蓉 ¹郭泽晨¹

作者信息

1. 西北民族大学,中国民族语言文字信息技术重点实验室,兰州 730030||西北民族大学,甘肃省民族语言智能处理重点实验室,兰州 730030
折叠

摘要

Abstract

Machine reading comprehension aims to make the computer understand the paragraph semantics and answer the questions raised by users using algorithms.The quality of the dataset used in this task can directly affect the experimental results of the model.In order to enrich the medical domain dataset of machine reading comprehension,this paper constructs MedicalQA,a medical domain dataset for machine reading comprehension,employing a combination of web crawlers and manual annotation techniques.The dataset takes two medical platforms(i.e.Xunyiwenyao Network and 39 Health Network)as main data sources,and includes 19,502 paragraphs and Q&A pairs,covering 9 medical departments,such as internal medicine,surgery,obstetrics and gynecology.The dataset is formatted as an Excel file,organized with 5e columns.The first column denotes the paragraph ID;the second column indicates the department to which the paragraph belongs;the third column contains the paragraph content;the fourth column lists the questions,and the fifth column provides corresponding answers to the questions.The construction of this dataset is conducive to the establishment of machine reading comprehension models in the medical domain,and can also promote the sharing of medical datasets in the field of machine reading comprehension.

关键词

机器阅读理解/医学域/数据集

Key words

machine reading comprehension/medical domain/dataset

引用本文复制引用

马宁,吕文蓉,郭泽晨..面向机器阅读理解的医学域数据集MedicalQA[J].中国科学数据（中英文网络版）,2024,9(1):356-365,10.

中国科学数据（中英文网络版）

OACSTPCD

ISSN：2096-2223

访问量0

下载量0

段落导航