计算机与数字工程2017,Vol.45Issue(5):882-885,910,5.DOI:10.3969/j.issn.1672-9722.2017.05.020
基于特征相似度的可比语料挖掘汉柬命名实体等价对
Chinese-Khmer Named Entity Equivalents Excavation Based on Feature Similarity in Comparable Corpus
摘要
Abstract
Named entity translation equivalent has been playing a significant role in the processing of cross-language informa?tion. However limited by the corpora resource,few in-depth studies have been made on the extraction of the bilingual Chi?nese-Khmer named entity equivalents. Starting from the comparable corpus text,according to the type of entity characteristics and comparable corpus characteristics,the paper selects transliteration feature,translation feature,context feature of the bilingual Chi?nese-Khmer named entity equivalents and length feature. So a method based on multi-feature fusion is proposed to calculate the sim?ilarity to excavate the bilingual Chinese-Khmer named entity equivalents. The experiment shows this method has a good perfor?mance when the bilingual Chinese-Khmer named entity equivalents are acquired through the computation of feature similarity,turn?ing out that the method proposed in this paper is able to give better effect compared with the method using only a single feature.关键词
命名实体等价对/汉柬双语/多特征融合/可比语料/音译模型Key words
named entity equivalents/Chinese-Khmer bilingual/multi-feature fusion/comparable corpus/transliteration model分类
信息技术与安全科学引用本文复制引用
徐璐,严馨,夏青,周枫,莫源源..基于特征相似度的可比语料挖掘汉柬命名实体等价对[J].计算机与数字工程,2017,45(5):882-885,910,5.基金项目
基金资助:国家自然科学基金"柬埔寨语命名实体识别及汉柬双语语料库构建方法研究"(编号:61462055) (编号:61462055)
国家自然科学基金"基于篇章特征的越南语新闻事件元素抽取关键技术研究"(编号:61562049)资助. (编号:61562049)