计算机应用研究2024,Vol.41Issue(11):3329-3336,8.DOI:10.19734/j.issn.1001-3695.2024.03.0088
基于数据增强和扩张卷积的ICD编码分类
ICD coding classification based on data augmentation and dilated convolution
摘要
Abstract
To address the problems of unbalanced label distribution,excessively long medical record text and large label space in the international classification of diseases(ICD)coding classification task,this paper proposed an ICD coding classification method based on data augmentation and dilated convolution.Firstly,this method introduced the pre-trained model BioLink-BERT,trained in the biomedical domain using unsupervised learning,to alleviate the domain mismatch problem.Secondly,it applied the Mixup data augmentation technique to expand the hidden representations,thereby increasing data diversity and im-proving model robustness for classification,addressing the problem of imbalanced label distribution.Finally,the model effec-tively captured long-range dependencies in the text data using multi-granularity dilated convolution,avoiding the impact of long input text on the model's performance.The experimental results demonstrate that the proposed model achieves notable im-provements over the baseline model on two subsets of the MIMIC-Ⅲ dataset when compared with various methods.Specifical-ly,the F1 scores and precision@k values improves 0.4%to 1.5%and 1.2%to 1.6%,respectively.Therefore,this study provides an effective solution to solve the challenges of ICD coding classification.关键词
ICD编码分类/BioLinkBERT预训练模型/Mixup数据增强/扩张卷积Key words
ICD code classification/BioLinkBERT pre-trained model/Mixup data augmentation/dilated convolution分类
信息技术与安全科学引用本文复制引用
闫婧,赵迪,孟佳娜,林鸿飞..基于数据增强和扩张卷积的ICD编码分类[J].计算机应用研究,2024,41(11):3329-3336,8.基金项目
辽宁省自然科学基金资助项目(2022-BS-104) (2022-BS-104)