计算机科学与探索2017,Vol.11Issue(4):608-618,11.DOI:10.3778/j.issn.1673-9418.1604029
融合词向量的多特征句子相似度计算方法研究
Research on Multi-Feature Sentence Similarity Computing Method with Word Embedding
摘要
Abstract
Based on the summarization of sentence similarity computing methods,this paper applies 34 000 pieces of texts of People's Daily to train word vector space model for semantic similarity computing.Then,based on the trained word vector model,this paper designs a multi-feature sentence similarity computing method,which takes both word and sentence structure features into consideration.Firstly,the method takes note of possible effects of the number of overlapping words and word continuity,and then applies word vector model to calculate the semantic similarity of nonoverlapping words.Regarding the aspect of sentence structure,the method takes both overlapping word order and sentence length conformity into consideration.Finally,this paper designs and implements four different sentence similarity calculating methods,and further develops an experimental system.The experimental results show that the method proposed in this paper can get satisfactory results and the combination and optimization upon the features of words and sentence structures can improve the accuracy of sentence similarity calculating.关键词
词向量/句子相似度/Word2vec/算法设计Key words
word embedding/sentence similarity/Word2vec/algorithm design分类
信息技术与安全科学引用本文复制引用
李峰,侯加英,曾荣仁,凌晨..融合词向量的多特征句子相似度计算方法研究[J].计算机科学与探索,2017,11(4):608-618,11.基金项目
The National Natural Science Foundation of China under Grant No.61370126(国家自然科学基金) (国家自然科学基金)
the National High Technology Research and Development Program of China under Grant No.2015AA016004(国家高技术研究发展计划(863计划)) (国家高技术研究发展计划(863计划)
the National Social Science Foundation of China under Grant No.15GJ003-154(国家社会科学基金) (国家社会科学基金)
the Fund of the State Key Laboratory of Software Development Environment under Grant No.SKLSDE-2015ZX-16(软件开发环境国家重点实验室探索性自主研究课题基金). (软件开发环境国家重点实验室探索性自主研究课题基金)