首页|期刊导航|软件导刊|基于跨模态互译渲染模型的预训练视觉翻译技术

基于跨模态互译渲染模型的预训练视觉翻译技术

屈梦楠靳宇浩胡勃宁

软件导刊2024，Vol.23Issue(6)：59-66,8.

软件导刊2024，Vol.23Issue(6)：59-66,8.DOI:10.11907/rjdk.241144

基于跨模态互译渲染模型的预训练视觉翻译技术

Pre-trained Visual Translation Technology Based on Cross-modal Translation Rendering Model

屈梦楠 ¹靳宇浩 ¹胡勃宁¹

作者信息

1. 河北科技大学信息科学与工程学院,河北石家庄 050018
折叠

摘要

Abstract

How to replace foreign language in images with Chinese while maintaining the same style is an interesting and challenging problem.To this end,a pre trained visual translation technique is proposed for cross language conversion of text in images to maintain the original text style and layout style.Build a cross modal adaptive translation rendering model by combining text detection,font recognition,OCR,image res-toration,machine translation,and image rendering technologies.Firstly,use EAST algorithm to locate and extract text regions;Then,ResNet is used to recognize font styles,while CTC-OCR extracts text content and translates it into GPT;Finally,after repairing the text area using the LaMa algorithm,the region coordinate rendering algorithm is used to integrate the translated text into the repaired image,achieving high-qual-ity visual translation.The method of quantitatively evaluating translation effectiveness by evaluators has a subjective evaluation score of 7.90,indicating high accuracy.

关键词

视觉翻译/多模态/GPT/中文翻译/神经网络

Key words

visual translation/multi-modal/GPT/Chinese translation/neural network

分类

信息技术与安全科学

引用本文复制引用

屈梦楠,靳宇浩,胡勃宁..基于跨模态互译渲染模型的预训练视觉翻译技术[J].软件导刊,2024,23(6):59-66,8.

软件导刊

ISSN：1672-7800

访问量0

下载量0

段落导航