| 注册
首页|期刊导航|计算机与数字工程|用户短文本无关语自动识别方法研究

用户短文本无关语自动识别方法研究

陈国 刘亮亮 张再跃

计算机与数字工程2019,Vol.47Issue(7):1748-1752,5.
计算机与数字工程2019,Vol.47Issue(7):1748-1752,5.DOI:10.3969/j.issn.1672-9722.2019.07.037

用户短文本无关语自动识别方法研究

Research on Automatic Recognition Method About the Irrelevant Words in User-oriented Short Text

陈国 1刘亮亮 2张再跃1

作者信息

  • 1. 江苏科技大学计算机科学与工程学院 镇江 212003
  • 2. 上海对外经贸大学统计与信息学院 上海 201620
  • 折叠

摘要

Abstract

In user-oriented short text,sentences with the same meaning have a variety of expressions,these sentences has a lot of irrelevant information,which is called irrelevant words. In order to solve the problem that the accuracy of common recognition method is not high,an automatic recognition method is proposed for marking irrelevant words in the corpus to be marked by the sec?ond-order hidden Markov model. In order to solve the problem that the Hidden Markov Model can only consider the previous word as a feature when labeling the corpus and it has led to poor results,this method has considered each word itself in the labeling pro?cess,the speech and the relative position as features when marking. The results show that this method can avoid the limitation of hand-written rules for training texts,and improve the accuracy and recall rate to a certain extent.

关键词

短文本/无关语/隐马尔科夫模型/机器学习

Key words

short text/irrelevant words/HMM/machine learning

分类

信息技术与安全科学

引用本文复制引用

陈国,刘亮亮,张再跃..用户短文本无关语自动识别方法研究[J].计算机与数字工程,2019,47(7):1748-1752,5.

基金项目

国家自然科学基金项目(编号:61371114,611170165) (编号:61371114,611170165)

江苏高校高技术船舶协同创新中心/江苏科技大学海洋装备研究院项目(编号:1174871701-9)资助. (编号:1174871701-9)

计算机与数字工程

OACSTPCD

1672-9722

访问量0
|
下载量0
段落导航相关论文