计算机工程与应用2012,Vol.48Issue(11):137-142,6.DOI:10.3778/j.issn.1002-8331.2012.11.030
WCBVSM与SACA结合的文本分类模型
Text categorization model based on WCBVSM and SACA
摘要
Abstract
A new text categorization model based on the method which combines WCBVSM with SACA is proposed. The traditional methods of vector space model adopt the key words as the document semantic carrier. These traditional methods ignore the semantic information between the words of text. According to the word co-occurrence model, the Word Co-Occurrence Model Based VSM (WCB VSM) is presented. The model counts the word co-occurrence information of the texts, and adds this information into VSM. Therefore, it is easy to get the semantic information. In addition, because of the conflict between validity and extensibility in cross covering algorithm, this paper presents a Cross Cover Algorithm based on Simulated Annealing algorithm (SACA). This algorithm improves the situation that the selection of cross cover's center is random, and reduces the number of cover by increasing the sample number in each cover. It enhances the extensibility of the cover classification. The test results show that the proposed model accelerates the speed of recognition and improves the classification accuracy.关键词
文本分类/向量空间模型/词共现模型/模拟退火/交叉覆盖算法Key words
text categorization/ vector space model/ term co-occurrence model/ simulated annealing algorithm/ cross cover algorithm分类
信息技术与安全科学引用本文复制引用
张燕平,刘超,曲永花..WCBVSM与SACA结合的文本分类模型[J].计算机工程与应用,2012,48(11):137-142,6.基金项目
国家自然种学基金(No.60675031,No.61073117) (No.60675031,No.61073117)
国家重点基础研究规划(973)计划项目(No.2004CB318108,No.2007CB311003) (973)
教育部社科研究基金青年资助项目(No.07JC870006). (No.07JC870006)