| 注册
首页|期刊导航|计算机应用研究|基于数据增强和扩张卷积的ICD编码分类

基于数据增强和扩张卷积的ICD编码分类

闫婧 赵迪 孟佳娜 林鸿飞

计算机应用研究2024,Vol.41Issue(11):3329-3336,8.
计算机应用研究2024,Vol.41Issue(11):3329-3336,8.DOI:10.19734/j.issn.1001-3695.2024.03.0088

基于数据增强和扩张卷积的ICD编码分类

ICD coding classification based on data augmentation and dilated convolution

闫婧 1赵迪 2孟佳娜 1林鸿飞3

作者信息

  • 1. 大连民族大学计算机科学与工程学院,辽宁大连 116600
  • 2. 大连民族大学计算机科学与工程学院,辽宁大连 116600||大连理工大学计算机科学与技术学院,辽宁 大连 116024||大连永佳电子技术有限公司,辽宁 大连 116024
  • 3. 大连理工大学计算机科学与技术学院,辽宁 大连 116024
  • 折叠

摘要

Abstract

To address the problems of unbalanced label distribution,excessively long medical record text and large label space in the international classification of diseases(ICD)coding classification task,this paper proposed an ICD coding classification method based on data augmentation and dilated convolution.Firstly,this method introduced the pre-trained model BioLink-BERT,trained in the biomedical domain using unsupervised learning,to alleviate the domain mismatch problem.Secondly,it applied the Mixup data augmentation technique to expand the hidden representations,thereby increasing data diversity and im-proving model robustness for classification,addressing the problem of imbalanced label distribution.Finally,the model effec-tively captured long-range dependencies in the text data using multi-granularity dilated convolution,avoiding the impact of long input text on the model's performance.The experimental results demonstrate that the proposed model achieves notable im-provements over the baseline model on two subsets of the MIMIC-Ⅲ dataset when compared with various methods.Specifical-ly,the F1 scores and precision@k values improves 0.4%to 1.5%and 1.2%to 1.6%,respectively.Therefore,this study provides an effective solution to solve the challenges of ICD coding classification.

关键词

ICD编码分类/BioLinkBERT预训练模型/Mixup数据增强/扩张卷积

Key words

ICD code classification/BioLinkBERT pre-trained model/Mixup data augmentation/dilated convolution

分类

信息技术与安全科学

引用本文复制引用

闫婧,赵迪,孟佳娜,林鸿飞..基于数据增强和扩张卷积的ICD编码分类[J].计算机应用研究,2024,41(11):3329-3336,8.

基金项目

辽宁省自然科学基金资助项目(2022-BS-104) (2022-BS-104)

计算机应用研究

OA北大核心CSTPCD

1001-3695

访问量0
|
下载量0
段落导航相关论文