宁夏大学学报(自然科学版)2025,Vol.46Issue(2):134-142,9.
基于小样本的西夏文字识别模型的提升方法
Methods for Improving Recognition Models of Small-Sample Tangut Script Datasets
摘要
Abstract
Tangut script,which is characterized by its complex strokes and is an extinct writing system,has recently seen the use of convolutional neural networks become mainstream for its recognition.With only 667 annotated characters in the existing dataset,efforts were made to improve the model's performance by address-ing issues such as overfitting due to the limited sample size and the long-tail problem caused by data imbalance,using data augmentation and transfer learning methods.The study compared the performance of baseline models with those incorporating these improvement strategies.The results show that the model,which combined both strategies,achieved an average accuracy improvement of 5.65%.Additionally,an improved model named YOLOv8-VNeXt was proposed,which can serve as a reference for future research transitioning from single-character recognition based on image classification to multi-character recognition based on target detection.关键词
西夏文/文字识别/迁移学习/数据增强/预训练Key words
Tangut script/character recognition/transfer learning/data augment/pre-training分类
建筑与水利引用本文复制引用
赵心怡,史伟,李国民..基于小样本的西夏文字识别模型的提升方法[J].宁夏大学学报(自然科学版),2025,46(2):134-142,9.基金项目
国家自然科学基金资助项目(62166030,12061055) (62166030,12061055)