南京邮电大学学报(自然科学版)2013,Vol.33Issue(1):79-83,5.
基于改进BoS的Web文本分类研究
Research on Web Text Classification Based on Improved BoS
摘要
Abstract
An improved text similarity calculation method is proposed. By means of giving different weights to sentences of different text blocks, removing short sentences directly and combining with high similar sentences, the total number of sentences in BoS ( Bag of Sentences) can be decreased during similarity calculation and the processing speed can be increased. First of all,the improved text similarity calculation method calculates the similarity of the sentence according to the sentence similarity calculation method. Then the text similarity is calculated and finally the whole text similarity is calculated according to the weights of the text block. It is proved by experiments that the improved calculation method has significant improvement in recall rate and precision of text and F1 value.关键词
Web文本分类/句子包/向量空间模型/文本挖掘Key words
web text classification/ bag of sentences/ vector space model/ text mining分类
信息技术与安全科学引用本文复制引用
彭俊杰,陈丹敏..基于改进BoS的Web文本分类研究[J].南京邮电大学学报(自然科学版),2013,33(1):79-83,5.基金项目
河南省科技攻关项目(102102210489)资助项目 (102102210489)