广西科技大学学报2025,Vol.36Issue(5):89-98,10.DOI:10.16375/j.cnki.cn45-1395/t.2025.05.012
Vision Transformer模型在中医舌诊图像分类中的应用研究
Application of Vision Transformer models in tongue diagnosis image classification in traditional Chinese medicine
摘要
Abstract
Tongue diagnosis,a crucial and routine component of inspection in Traditional Chinese Medicine(TCM),plays an indispensable role in TCM clinical diagnosis.To overcome the limitations of traditional tongue diagnosis,which relies on subjective experience and the insufficient classification performance of Convolutional Neural Network(CNN)models,this study proposed a deep learning model based on Vision Transformer(ViT).Leveraging a high-quality tongue image classification dataset,pre-training and fine-tuning strategies were employed to optimize feature extraction capabilities and address the issue of class imbalance by incorporating data augmentation techniques.Experimental results demonstrate that the proposed model significantly outperforms existing CNN methods(such as ResNet50)in the classification tasks for six key tongue image features.Specifically,it achieves superior accuracy on five metrics:coating color(85.6%vs 78.0%),ecchymosis(98.0%vs 91.0%),texture(99.6%vs 92.0%),tongue color(96.6%vs 68.0%),and crack(87.8%vs 80.1%).These results validate the model's effectiveness and application potential in breaking through traditional performance bottlenecks and enhancing the reliability of intelligent diagnosis in TCM clinical practice.关键词
舌诊/Vision Transformer(ViT)/深度学习/医学图像分类Key words
tongue diagnosis/Vision Transformer(ViT)/deep learning/medical image classification分类
中医学引用本文复制引用
周坚和,王彩雄,李炜,周晓玲,张丹璇,吴玉峰..Vision Transformer模型在中医舌诊图像分类中的应用研究[J].广西科技大学学报,2025,36(5):89-98,10.基金项目
广西科技重大专项(桂科AA22096033) (桂科AA22096033)
广西中医药多学科交叉创新团队项目(GZKJ2312) (GZKJ2312)
柳州市科技局项目(2022SB031) (2022SB031)
广西中医药大学联合基金项目(2023L2066)资助 (2023L2066)