华中科技大学学报(自然科学版)2024,Vol.52Issue(5):23-28,6.DOI:10.13245/j.hust.240133
基于对比学习和预训练模型的临床诊断标准化
Clinical diagnosis normalization based on contrastive learning and pre-trained model
摘要
Abstract
Aiming at the problems caused by the current phenomenon of large scale of standard diagnostic thesaurus,limited textual relevance and uncertain number of standard words in clinical diagnosis normalization task,a clinical diagnosis normalization method based on contrastive learning and pre-training model was proposed.First,the simple contrastive learning of sentence embeddings(SimCSE)model was trained with a combination of unsupervised and supervised methods,and the obtained model was used to recall the candidate standard words from the standard thesaurus.Then,the candidate word reordering and classification of term counts were carried out based on bidirectional encoder representations from transformer(BERT),and finally the results were obtained.Experimental results show that the recall rate of the combined unsupervised and supervised SimCSE method is 86.76%,which is higher than other methods,and the BERT model has significant improvement in several metrics compared with other models in the reordering and classification of term counts.The proposed method achieves an F1 value of 72.54%for prediction on the test dataset,which is a good performance in clinical diagnosis normalization.关键词
临床诊断标准化/对比学习/预训练模型/基于简单对比学习的句子嵌入(SimCSE)/基于转换器的双向编码表征(BERT)Key words
clinical diagnosis normalization/contrastive learning/pre-trained model/simple contrastive learning of sentence embeddings(SimCSE)/bidirectional encoder representations from transformer(BERT)分类
信息技术与安全科学引用本文复制引用
刘莹,崔丙剑,曹琉,程龙龙..基于对比学习和预训练模型的临床诊断标准化[J].华中科技大学学报(自然科学版),2024,52(5):23-28,6.基金项目
国家重点研发计划资助项目(2021YFF1200600). (2021YFF1200600)