四川大学学报:工程科学版2012,Vol.44Issue(6):127-132,6.
一种句词五特征融合模型的复述研究
Research on Word-level Contextual Paraphrase Retrieving with Five-features
摘要
Abstract
To solve the weakness of Chinese synonym dictionary Tongyici-Cilin's, which can't be used as a context-dependent para- phrase corpus, a word-level paraphrase method was presented to improved the Chinese paraphrase extraction accuracy. Based on its contextual sentence, the target word' s paraphrase candidates were identified and extracted from large-size corpuses. The target word was then paired up with each candidate, and a five-feature probability model captured the information of the target word, the context sentence, and the paraphrase candidates were established. Values of those five features were inputted to train a binary classifier which subsequently filtered out the paraphrase candidates. The experiment proved that through data mining the method for retrieving candidate paraphrases from large-size corpuses had pragmatic value, and on average 3.1 correct paraphrases were obtained for a word. Binary classifier was efficient in filtering out the paraphrases, with an accuracy rate of O. 65. 32% of the retrieved paraphrases could not be found in the Expanded Chinese Synonym Dictionary.关键词
中文复述/五特征融合/智能识别/二元分类Key words
Chinese paraphrase/five-feature/intelligent identification/binary classification分类
信息技术与安全科学引用本文复制引用
何贤江,何维维,左航..一种句词五特征融合模型的复述研究[J].四川大学学报:工程科学版,2012,44(6):127-132,6.基金项目
四川省科技平台支撑计划资助项目 ()