计算机工程与应用2017,Vol.53Issue(9):97-102,6.DOI:10.3778/j.issn.1002-8331.1601-0405
结合邻居辅助策略的两阶段层次文本分类模型
Two-stage hierarchical text classification model based on neighbor-assistant strategy
摘要
Abstract
The traditional Two-stage Hierarchical Text Classification model(THTC model)is an effective method to solve the problem of large-scale hierarchical text classification, but it still suffers from low classification accuracy. To alleviate this problem, a new Two-stage Hierarchical Text Classification model based on Neighbor-Assistant strategy(THTC-NA model)is proposed. THTC-NA model consists of two stages:search and classification. In the search stage, the flat strategy is used to select the related categories for a given document from all leaf categories. The categories are ranked and the most related categories are taken as category candidates. Thus, a large-scale hierarchy is pruned into a much smaller but focused one. In the classification stage, the classification results of each candidate are computed by combining the results of ancestor categories and sibling categories of the candidate. Finally, the results of the search stage and the classification stage are fused together todetermine the target category for a given document. The experiments on the data set News-groups-18828 show that, compared with the THTC model, the THTC-NA model has a great help to improve the classifica-tion accuracy.关键词
两阶段/层次文本分类/邻居辅助策略/类别层次Key words
two-stage/hierarchical text classification/neighbor-assistant strategy/class hierarchy分类
信息技术与安全科学引用本文复制引用
古平,王春元..结合邻居辅助策略的两阶段层次文本分类模型[J].计算机工程与应用,2017,53(9):97-102,6.基金项目
重庆市自然科学基金项目计划资助项目(No.cstc2012jjA40002) (No.cstc2012jjA40002)
中央高校基本科研基金资助项目(No.106112013CD-JZR180014). (No.106112013CD-JZR180014)