集成技术2025,Vol.14Issue(1):78-90,13.DOI:10.12146/j.issn.2095-3135.20240422001
基于文本增强的眼底图像多病种识别方法
Multi-disease Recognition Method for Fundus Images Based on Text Enhancement
摘要
Abstract
In this work,a visual language model is introduced in ophthalmic image disease recognition.And a multi-disease recognition algorithm based on a pre-trained contrasting language-images model is proposed.First,a multi-labeled fundus image dataset MDFCD8 containing 8 categories is constructed based on several publicly available fundus image datasets.Then,the generative artificial intelligence GPT-4(Generative Pre-trained Transformer 4)is utilized to generate expert knowledge describing the fine-grained pathological features of fundus images,which solves the problem of the lack of text labels in fundus image datasets.The paper calculates the average precision(AP),F1 score,and area under the receiver operating characteristic curve(AUC),and takes the mean value of the three as the final performance evaluation index.The experimental results showed that,the method proposed in this paper outperforms the traditional convolutional neural network and Transformer network by 4.8%and 3.2%,respectively.This study also conducted ablation experiments on each module to validate the effectiveness of the method,demonstrating the potential application of visual language modeling in the field of auxiliary diagnosis of ophthalmic diseases.关键词
眼底图像/多病种/对比语言图像预训练/专家知识Key words
fundus images/multi-disease/constrastive language-image pretraining/expert knowledge分类
信息技术与安全科学引用本文复制引用
熊绍奎,陈世峰..基于文本增强的眼底图像多病种识别方法[J].集成技术,2025,14(1):78-90,13.基金项目
深圳市技术攻关项目(JSGG20220831105002004) This work is supported by Shenzhen Science and Technology Innovation Commission(JSGG20220831105002004) (JSGG20220831105002004)