基于对比学习和预训练模型的临床诊断标准化OA北大核心CSTPCD
Clinical diagnosis normalization based on contrastive learning and pre-trained model
针对临床诊断标准化任务中存在的标准诊断词库规模大、文本相关性不显著且标准词个数不确定的问题,提出一种基于对比学习和预训练模型的临床诊断标准化方法.先用无监督和有监督相结合的方法对基于简单对比学习的句子嵌入(SimCSE)模型进行训练,并利用得到的模型从标准库中召回候选标准词,再利用基于转换器的双向编码表征(BERT)进行候选词重排序和标准词个数分类,最终得到标准化结果.实验结果表明:基于无监督和有监督相结合的SimCSE方法的召回率为86.76%,显著优于其他方法;在重排序和标准词个数分类任务中,相比于其他模型,BERT在多个指标上有明显提升;该方法在测试集上进行标准词预测的F1值达到72.54%,在临床诊断标准化中具有较好的表现.
Aiming at the problems caused by the current phenomenon of large scale of standard diagnostic thesaurus,limited textual relevance and uncertain number of standard words in clinical diagnosis normalization task,a clinical diagnosis normalization method based on contrastive learning and pre-training model was proposed.First,the simple contrastive learning of sentence embeddings(SimCSE)model was trained with a combination of unsupervised and supervised methods,and the obtained model was used to recall the candidate standard words from the standard thesaurus.Then,the candidate word reordering and classification of term counts were carried out based on bidirectional encoder representations from transformer(BERT),and finally the results were obtained.Experimental results show that the recall rate of the combined unsupervised and supervised SimCSE method is 86.76%,which is higher than other methods,and the BERT model has significant improvement in several metrics compared with other models in the reordering and classification of term counts.The proposed method achieves an F1 value of 72.54%for prediction on the test dataset,which is a good performance in clinical diagnosis normalization.
刘莹;崔丙剑;曹琉;程龙龙
天津大学医学工程与转化医学研究院,天津 300072天津大学医学工程与转化医学研究院,天津 300072||中电云脑(天津)科技有限公司,天津 300300中电云脑(天津)科技有限公司,天津 300300
计算机与自动化
临床诊断标准化对比学习预训练模型基于简单对比学习的句子嵌入(SimCSE)基于转换器的双向编码表征(BERT)
clinical diagnosis normalizationcontrastive learningpre-trained modelsimple contrastive learning of sentence embeddings(SimCSE)bidirectional encoder representations from transformer(BERT)
《华中科技大学学报(自然科学版)》 2024 (005)
23-28 / 6
国家重点研发计划资助项目(2021YFF1200600).
评论