| 注册
首页|期刊导航|计算机工程与应用|WCBVSM与SACA结合的文本分类模型

WCBVSM与SACA结合的文本分类模型

张燕平 刘超 曲永花

计算机工程与应用2012,Vol.48Issue(11):137-142,6.
计算机工程与应用2012,Vol.48Issue(11):137-142,6.DOI:10.3778/j.issn.1002-8331.2012.11.030

WCBVSM与SACA结合的文本分类模型

Text categorization model based on WCBVSM and SACA

张燕平 1刘超 2曲永花1

作者信息

  • 1. 安徽大学计算智能与信号处理教育部重点实验室,合肥230039
  • 2. 安徽大学计算机科学与技术学院,合肥230039
  • 折叠

摘要

Abstract

A new text categorization model based on the method which combines WCBVSM with SACA is proposed. The traditional methods of vector space model adopt the key words as the document semantic carrier. These traditional methods ignore the semantic information between the words of text. According to the word co-occurrence model, the Word Co-Occurrence Model Based VSM (WCB VSM) is presented. The model counts the word co-occurrence information of the texts, and adds this information into VSM. Therefore, it is easy to get the semantic information. In addition, because of the conflict between validity and extensibility in cross covering algorithm, this paper presents a Cross Cover Algorithm based on Simulated Annealing algorithm (SACA). This algorithm improves the situation that the selection of cross cover's center is random, and reduces the number of cover by increasing the sample number in each cover. It enhances the extensibility of the cover classification. The test results show that the proposed model accelerates the speed of recognition and improves the classification accuracy.

关键词

文本分类/向量空间模型/词共现模型/模拟退火/交叉覆盖算法

Key words

text categorization/ vector space model/ term co-occurrence model/ simulated annealing algorithm/ cross cover algorithm

分类

信息技术与安全科学

引用本文复制引用

张燕平,刘超,曲永花..WCBVSM与SACA结合的文本分类模型[J].计算机工程与应用,2012,48(11):137-142,6.

基金项目

国家自然种学基金(No.60675031,No.61073117) (No.60675031,No.61073117)

国家重点基础研究规划(973)计划项目(No.2004CB318108,No.2007CB311003) (973)

教育部社科研究基金青年资助项目(No.07JC870006). (No.07JC870006)

计算机工程与应用

OACSCDCSTPCD

1002-8331

访问量0
|
下载量0
段落导航相关论文