|国家科技期刊平台
首页|期刊导航|软件导刊|基于跨模态互译渲染模型的预训练视觉翻译技术

基于跨模态互译渲染模型的预训练视觉翻译技术OA

Pre-trained Visual Translation Technology Based on Cross-modal Translation Rendering Model

中文摘要英文摘要

如何在保证风格不变的情况下将图片中的外文替换为中文是一个有趣并富有挑战的问题.为此,针对图像中文本的跨语言转换提出一种预训练视觉翻译技术,结合文字检测、字体识别、OCR、图像修复、机器翻译及图像渲染技术构建跨模态自适应互译渲染模型,以保持原文风格和排版样式.首先使用EAST算法定位并提取文字区域;其次采用ResNet识别字体样式,CTC-OCR提取文字内容并由GPT模型进行翻译;最后由LaMa算法修复文字区域后,采用区域坐标渲染算法将翻译文字融入修复图像,实现高质量视觉翻译.由评估员对翻译效果进行定量评估,该方法主观评估分数达到7.90,具有较高准确性.

How to replace foreign language in images with Chinese while maintaining the same style is an interesting and challenging problem.To this end,a pre trained visual translation technique is proposed for cross language conversion of text in images to maintain the original text style and layout style.Build a cross modal adaptive translation rendering model by combining text detection,font recognition,OCR,image res-toration,machine translation,and image rendering technologies.Firstly,use EAST algorithm to locate and extract text regions;Then,ResNet is used to recognize font styles,while CTC-OCR extracts text content and translates it into GPT;Finally,after repairing the text area using the LaMa algorithm,the region coordinate rendering algorithm is used to integrate the translated text into the repaired image,achieving high-qual-ity visual translation.The method of quantitatively evaluating translation effectiveness by evaluators has a subjective evaluation score of 7.90,indicating high accuracy.

屈梦楠;靳宇浩;胡勃宁

河北科技大学 信息科学与工程学院,河北 石家庄 050018

计算机与自动化

视觉翻译多模态GPT中文翻译神经网络

visual translationmulti-modalGPTChinese translationneural network

《软件导刊》 2024 (006)

59-66 / 8

10.11907/rjdk.241144

评论