数据采集与处理2017,Vol.32Issue(3):550-558,9.DOI:10.16337/j.1004-9037.2017.03.014
面向文本分类的有监督显式语义表示
Supervised Explicit Semantic Representation for Text Categorization
摘要
Abstract
As a fundamental problem of text categorization,text representation is widely concerned.Currently,there are three main ways of text representation:bag-of-words model,latent semantic representation and knowledge-based explicit semantic representation.The paper analyzes and compared the effects of these methods applied to text categorization.Experiments show that the knowledge-based explicit semantic representation cannot improve the text categorization performance as expected.To tackle the problem that the knowledge-based explicit semantic representation easily introduces noise in extending text,a supervised explicit semantic representation method is proposed.The dataset label information is used to identify the most relevant concepts in document and the document is represented in explicit semantic based on expanding those key concepts.The results of three datasets confirm the effectiveness of the proposed method.关键词
文本分类/文本表达/有监督显式语义表示Key words
text categorization/text representation/supervised explicit semantic representation分类
信息技术与安全科学引用本文复制引用
孙飞,郭嘉丰,兰艳艳,程学旗..面向文本分类的有监督显式语义表示[J].数据采集与处理,2017,32(3):550-558,9.基金项目
国家重大基础研究发展计划(“九七三”计划)(2012CB316303,2014CB340401)资助项目 (“九七三”计划)
国家高技术研究发展计划(“八六三”计划)(2012AA011003)资助项目 (“八六三”计划)
国家自然科学基金重点(61232010)资助项目 (61232010)
国家科技支撑计划子课题(2012BAH46B04)资助项目. (2012BAH46B04)