西北师范大学学报(自然科学版)2012,Vol.48Issue(5):43-47,5.
一种基于相似性度量的离散化方法
A similarity measuring-based discretization method
摘要
Abstract
This paper describes a discretization method using similarity measuring theory aiming at solving the inadequacies of information entropy method. After numeric attributes are discretized, the amount of information of each interval is measured using one similarity measuring formula called algebra-geometry mean distance formula and the distribution of class values would be fairly consistent within an interval. The number of intervals is decided by the size of the dataset. First, our discretization method and the information entropy-based discretization are combined to discretize several datasets, and then Naive Bayes Simple classifier is used to compare the accuracies of these discretized datasets. The result shows that our discretization method have better correct classification rate against the information entropy-based discretization.关键词
数据挖掘/离散化/相似性度量/信息熵Key words
data ming/ discretization/ similarity measuring/ information entropy分类
信息技术与安全科学引用本文复制引用
丁剑,白凤伟..一种基于相似性度量的离散化方法[J].西北师范大学学报(自然科学版),2012,48(5):43-47,5.基金项目
国家自然科学基金资助项目(71061001) (71061001)
宁夏自治区自然科学基金资助项目(NZ12214) (NZ12214)