| 注册
首页|期刊导航|南京邮电大学学报(自然科学版)|基于改进BoS的Web文本分类研究

基于改进BoS的Web文本分类研究

彭俊杰 陈丹敏

南京邮电大学学报(自然科学版)2013,Vol.33Issue(1):79-83,5.
南京邮电大学学报(自然科学版)2013,Vol.33Issue(1):79-83,5.

基于改进BoS的Web文本分类研究

Research on Web Text Classification Based on Improved BoS

彭俊杰 1陈丹敏1

作者信息

  • 1. 河南大学计算机与信息工程学院,河南开封475004
  • 折叠

摘要

Abstract

An improved text similarity calculation method is proposed. By means of giving different weights to sentences of different text blocks, removing short sentences directly and combining with high similar sentences, the total number of sentences in BoS ( Bag of Sentences) can be decreased during similarity calculation and the processing speed can be increased. First of all,the improved text similarity calculation method calculates the similarity of the sentence according to the sentence similarity calculation method. Then the text similarity is calculated and finally the whole text similarity is calculated according to the weights of the text block. It is proved by experiments that the improved calculation method has significant improvement in recall rate and precision of text and F1 value.

关键词

Web文本分类/句子包/向量空间模型/文本挖掘

Key words

web text classification/ bag of sentences/ vector space model/ text mining

分类

信息技术与安全科学

引用本文复制引用

彭俊杰,陈丹敏..基于改进BoS的Web文本分类研究[J].南京邮电大学学报(自然科学版),2013,33(1):79-83,5.

基金项目

河南省科技攻关项目(102102210489)资助项目 (102102210489)

南京邮电大学学报(自然科学版)

OA北大核心CSTPCD

1673-5439

访问量0
|
下载量0
段落导航相关论文