电讯技术2017,Vol.57Issue(1):78-82,5.DOI:10.3969/j.issn.1001-893x.2017.01.013
结合词性的短文本相似度算法及其在文本分类中的应用
A Grammatical Category-combined Short-text Similarity Algorithm and Its Application in Text Categorization
摘要
Abstract
To address the problem that the categorization accuracy of hownet-based short-text similarity calculation method in short-text is low,a grammatical category-combined short-text similarity algorithm ( GCSSA) is proposed. Based on short-text hownet semantic similarity calculation method and combing with categorized features words,this method adds keywords grammatical category analysis,targets at catego-rized features words and the grammatical category information of keywords,gives different weights for differ-ent keywords,so as to differentiate the importance of various items' contribution in the text similarity calcu-lation of short-texts. Experiments show that compared with hownet-based short-text categorization algo-rithm,the proposed method improves the macro-average and micro-average accuracy by 4% in short-text categorization,and improves the short-text categorization accuracy effectively.关键词
短文本分类/短文本相似度/词性/hownet语义/分类准确率Key words
short text categorization/short-text similarity/grammatical category/hownet semantic/categori-zation accuracy分类
信息技术与安全科学引用本文复制引用
黄贤英,李沁东,刘英涛..结合词性的短文本相似度算法及其在文本分类中的应用[J].电讯技术,2017,57(1):78-82,5.基金项目
国家自然科学基金资助项目(11547148) (11547148)
重庆市教委科技计划项目(16SKGH133) (16SKGH133)
重庆市社会科学规划博士项目(2015BS059) (2015BS059)