计算机与数字工程2017,Vol.45Issue(10):1986-1989,2017,5.DOI:10.3969/j.issn.1672-9722.2017.10.020
基于文档相似度的双语文档排序学习
Learning to Rank Bilingual Document Based on Document Similarity
摘要
Abstract
The problem of learning to rank bilingual documents is addressed. Ranking is an essential part in information re-trieval.Ranking documents in monolingual context using machine learning has been studied a lot,but learning to rank bilingual doc-uments has not been investigated much yet.Bilingual documents are written in different languages,they can't be processed by using existing monolingual methods directly.In this paper a bilingual learning is proposed to rank model which utilizes monolingual model to give ranking score for documents in monolingual context as a base component.A word embedding approach is introduced to mea-sure document similarity in bilingual context,through which a relationship between documents in both languages can be made.We simply translate the query to foreign language at a phrase level to filter foreign language documents.Experiments show that our mod-el is effective in ranking bilingual documents in both English-Chinese context and English-Vietnamese context.关键词
排序学习/信息检索/文档相似度/查询翻译/双语语境Key words
learning to rank/information retrieval/document similarity/query translation/bilingual context分类
信息技术与安全科学引用本文复制引用
黄健..基于文档相似度的双语文档排序学习[J].计算机与数字工程,2017,45(10):1986-1989,2017,5.基金项目
国家自然科学基金项目(编号:61175068,61472168) (编号:61175068,61472168)
云南省关键项目科学基金项目(编号:2013FA130) (编号:2013FA130)
科技部科学技术创新人才项目(编号:2014HE001)资助. (编号:2014HE001)