| 注册
首页|期刊导航|计算机应用与软件|一种朴素贝叶斯文本分类算法的分布并行实现

一种朴素贝叶斯文本分类算法的分布并行实现

郭绪坤 范冰冰

计算机应用与软件2016,Vol.33Issue(11):240-243,296,5.
计算机应用与软件2016,Vol.33Issue(11):240-243,296,5.DOI:10.3969/j.issn.1000-386x.2016.11.056

一种朴素贝叶斯文本分类算法的分布并行实现

DISTRIBUTED PARALLEL IMPLEMENTATION OF A NAIVE BAYESIAN TEXT CLASSIFICATION ALGORITHM

郭绪坤 1范冰冰2

作者信息

  • 1. 广州体育学院 广东 广州 510500
  • 2. 华南师范大学计算机学院 广东 广州 510631
  • 折叠

摘要

Abstract

According to the naive Bayes text classification algorithm in text classification of the existence of data sparse,inaccurate classification and low efficiency problem,this paper proposes a Dirichlet naive Bayes text classification algorithm based on MapReduce. Firstly,according to the words and signs within the meaning of the factors and the distribution of the weight classes is adjusted to be corrected on the TF-IDF;Then,we introduce Dirichlet data smoothing methods which in statistical language modeling techniques to reduce the impact on the classification performance of the sparse data,and we achieve this algorithm parallelization used by MapReduce programming model in the Hadoop cloud computing platform.Through experimental comparison analysis shows that the algorithm significantly improves accuracy and recall rate of traditional naive Bayes text classification algorithm,and it has excellent expansibility and data processing ability.

关键词

朴素贝叶斯/文本分类/TF-IDF修正/数据平滑/MapReduce并行化

Key words

Naive bayes/Text classification/TF-IDF correction/Data smoothing/MapReduce parallelization

分类

信息技术与安全科学

引用本文复制引用

郭绪坤,范冰冰..一种朴素贝叶斯文本分类算法的分布并行实现[J].计算机应用与软件,2016,33(11):240-243,296,5.

基金项目

广东省教育厅2015重大科研立项青年项目。 ()

计算机应用与软件

OACSTPCD

1000-386X

访问量5
|
下载量0
段落导航相关论文