计算机与数字工程2019,Vol.47Issue(7):1748-1752,5.DOI:10.3969/j.issn.1672-9722.2019.07.037
用户短文本无关语自动识别方法研究
Research on Automatic Recognition Method About the Irrelevant Words in User-oriented Short Text
摘要
Abstract
In user-oriented short text,sentences with the same meaning have a variety of expressions,these sentences has a lot of irrelevant information,which is called irrelevant words. In order to solve the problem that the accuracy of common recognition method is not high,an automatic recognition method is proposed for marking irrelevant words in the corpus to be marked by the sec?ond-order hidden Markov model. In order to solve the problem that the Hidden Markov Model can only consider the previous word as a feature when labeling the corpus and it has led to poor results,this method has considered each word itself in the labeling pro?cess,the speech and the relative position as features when marking. The results show that this method can avoid the limitation of hand-written rules for training texts,and improve the accuracy and recall rate to a certain extent.关键词
短文本/无关语/隐马尔科夫模型/机器学习Key words
short text/irrelevant words/HMM/machine learning分类
信息技术与安全科学引用本文复制引用
陈国,刘亮亮,张再跃..用户短文本无关语自动识别方法研究[J].计算机与数字工程,2019,47(7):1748-1752,5.基金项目
国家自然科学基金项目(编号:61371114,611170165) (编号:61371114,611170165)
江苏高校高技术船舶协同创新中心/江苏科技大学海洋装备研究院项目(编号:1174871701-9)资助. (编号:1174871701-9)