山西大学学报(自然科学版)2018,Vol.41Issue(2):256-266,11.DOI:10.13451/j.cnki.shanxi.univ(nat.sci.).2018.02.002
基于类内距离参数估计的文本聚类评价方法
Text Clustering Evaluation Method based on Parameter Estimation of Distances within Clusters
摘要
Abstract
Text clustering evaluation method based on parameter estimation of distances within clusters uses the maximum likelihood estimation to estimate the parameters of the distances' distribution function,on the basis of the inner distance following the normal distribution approximately.According to the result of estimation,the logical range of the within-class distance is determined,and the over range text vectors are adjusted in accordance with the size of distances within classes.But the final result must be validated by the clustering evaluation indexes.This paper not only validates text clustering evaluation method based on parameter estimation of distances within clusters is feasible while the number of clusters is too small or at the same time as the true class number,but also weakens the influence of initial class center selection on K-means algorithm and improves the accuracy of clustering results,by using text clustering evaluation method based on parameter estimation of distances within clusters to adjust clustering results generated by K-means algorithm.关键词
类内距离/极大似然估计/聚类评价/K-means算法/聚类调整Key words
the within-class distance/maximum likelihood estimation/clustering evaluation/K-means algorithm/clustering adjustment分类
社会科学引用本文复制引用
牛奉高,张荣杰..基于类内距离参数估计的文本聚类评价方法[J].山西大学学报(自然科学版),2018,41(2):256-266,11.基金项目
国家自然科学基金(71503151) (71503151)
山西省高等学校创新人才支持计划(2016052006) (2016052006)