首页|期刊导航|宁夏大学学报（自然科学版）|基于小样本的西夏文字识别模型的提升方法

基于小样本的西夏文字识别模型的提升方法

赵心怡史伟李国民

宁夏大学学报（自然科学版）2025，Vol.46Issue(2)：134-142,9.

基于小样本的西夏文字识别模型的提升方法

Methods for Improving Recognition Models of Small-Sample Tangut Script Datasets

赵心怡 ¹史伟 ¹李国民¹

作者信息

1. 宁夏大学信息工程学院,宁夏银川 750021
折叠

摘要

Abstract

Tangut script,which is characterized by its complex strokes and is an extinct writing system,has recently seen the use of convolutional neural networks become mainstream for its recognition.With only 667 annotated characters in the existing dataset,efforts were made to improve the model's performance by address-ing issues such as overfitting due to the limited sample size and the long-tail problem caused by data imbalance,using data augmentation and transfer learning methods.The study compared the performance of baseline models with those incorporating these improvement strategies.The results show that the model,which combined both strategies,achieved an average accuracy improvement of 5.65%.Additionally,an improved model named YOLOv8-VNeXt was proposed,which can serve as a reference for future research transitioning from single-character recognition based on image classification to multi-character recognition based on target detection.

关键词

西夏文/文字识别/迁移学习/数据增强/预训练

Key words

Tangut script/character recognition/transfer learning/data augment/pre-training

分类

建筑与水利

引用本文复制引用

赵心怡,史伟,李国民..基于小样本的西夏文字识别模型的提升方法[J].宁夏大学学报（自然科学版）,2025,46(2):134-142,9.

基金项目

国家自然科学基金资助项目(62166030,12061055) （62166030,12061055）

宁夏大学学报（自然科学版）

ISSN：0253-2328

访问量5

下载量0

段落导航