东南大学学报(英文版)2008,Vol.24Issue(3):339-342,4.
基于模糊分类规则树的文本分类
Text categorization based on fuzzy classification rules tree
摘要
Abstract
To deal with the problem that arises when the conventional fuzzy class-association method applies repetitive scans of the classifier to classify new texts, which has low efficiency, a new approach based on the FCR-tree (fuzzy classification rules tree)for text categorization is proposed. The compactness of the FCR-tree saves significant space in storing a large set of rules when there are many repeated words in the rules. In comparison with classification rules, the fuzzy classification rules contain not only words, but also the fuzzy sets corresponding to the frequencies of words appearing in texts. Therefore, the construction of an FCR-tree and its structure are different from a CR-tree. To debase the difficulty of FCR-tree construction and rules retrieval, more k-FCR-trees are built. When classifying a new text, it is not necessary to search the paths of the sub-trees led by those words not appearing in this text, thus reducing the number of traveling rules. Experimental results show that the proposed approach obviously outperforms the conventional method in efficiency.关键词
文本分类/模糊分类关联规则/分类规则树/模糊分类规则树Key words
text categorization/fuzzy classification association rule/classification rules tree/fuzzy classification rules tree分类
信息技术与安全科学引用本文复制引用
郭玉琴,袁方,刘海博..基于模糊分类规则树的文本分类[J].东南大学学报(英文版),2008,24(3):339-342,4.基金项目
The National Natural Science Foundation of China (No.60473045), the Technology Research Project of Hebei Province (No.05213573), the Research Plan of Education Office of Hebei Province (No.2004406). (No.60473045)