计算机技术与发展2017,Vol.27Issue(3):39-43,5.DOI:10.3969/j.issn.1673-629X.2017.03.008
基于反馈合并的中英文混排版面OCR技术研究
Investigation on Layout Analysis Technology of Chinese and English Mixed OCR Based on Feedback Merging
摘要
Abstract
So far,Optical Character Recognition ( OCR) technology has been widely applied in all aspects of social life,and a single char-acter set OCR has made a major breakthrough in the technology field. However,due to the obvious differences between Chinese and Eng-lish layout analysis,the performance of the existing English and Chinese mixed OCR technology is not satisfactory. According to the shortcomings and deficiencies of traditional OCR method,on the basis of the analysis of the segmentation technique difficulties in the study of Chinese and English mixed layout,an improved segmentation method of Chinese and English mixed layout OCR analysis based on feedback merging is proposed. Based on the comprehensive utilization of the Canny operator image binary method and median filter method for filter preprocessing,this method segments the character region twice by projection method,and has conducted the thorough re-search to the specific segmentation techniques. Experiment results show that the proposed method can be successfully separated in mixed document in Chinese,English and numeric characters. The correct rate is higher than the traditional method about 8 percentage points, which can reach 97%,effectively solving the problem of ineffective adhesion character for the traditional methods.关键词
文字识别/中英混排/版面分析/分离Key words
character recognition/English and Chinese mixed/layout analysis/separation分类
信息技术与安全科学引用本文复制引用
任荣梓,高航..基于反馈合并的中英文混排版面OCR技术研究[J].计算机技术与发展,2017,27(3):39-43,5.基金项目
江苏省科技成果转化专项资金(BA2012023) (BA2012023)