| 注册
首页|期刊导航|南京大学学报:自然科学版|一种基于Seeds集和成对约束的半监督聚类算法

一种基于Seeds集和成对约束的半监督聚类算法

常瑜 梁吉业 高嘉伟 杨静

南京大学学报:自然科学版2012,Vol.48Issue(4):405-411,7.
南京大学学报:自然科学版2012,Vol.48Issue(4):405-411,7.

一种基于Seeds集和成对约束的半监督聚类算法

A semi-supervised clustering algorithm based on seeds and pair-wise constraints

常瑜 1梁吉业 1高嘉伟 1杨静1

作者信息

  • 1. 山西大学计算机与信息技术学院,太原030006/计算智能与中文信息处理教育部重点实验室,太原030006
  • 折叠

摘要

Abstract

Abstract:Semi-supervised learning, a kind of application-driven machine learning method, has become one of the hot topics of artificial intelligence and pattern recognition. As the main branch of semi-supervised learning, semi- supervised clustering gives a small amount of supervision information into the search process of optimal clustering. Recently, kinds of semi-supervised clustering algorithms are proposed, such as methods based on search, methods based on similarity, methods based on search and similarity. However, most current semi-supervised clustering algorithms don't use valuable seeds and pair-wise constraints at the same time. Therefore, a semi-supervised clustering algorithm based on seeds and pair-wise constraints is introduced, in order to make full use of given supervision information. In addition, Tri-training algorithm is a representative method based on Co-training mechanism. Considering that Tri-training algorithm can use three classifiers to label unlabeled samples, the proposed algorithm will utilize it to get more labeled samples. Firstly, based on Tri-training method, some unlabeledsamples are selected and annotated, to enlarge the number of initial labeled samples. Secondly, pair wise constraints are utilized to optimize enlarged labeled samples, with the purpose of improving its quality. Thirdly, initial clustering centers are acquired by optimized labeled samples. Finally, K-Means algorithm is carried out, and in the search process, pair-wise constraints are used to modify the partitioning results each time. Furthermore the proposed algorithm is compared with K-Means, Seeded-K-Means and COP-K-Means algorithm. And experimental results on three UCI data sets in same setting demonstrate that this method can take full advantage o{ given supervision information and get a better clustering result. Moreover, the experiment in Haberman data set is conducted to analyze relative impact on the algorithm's performance of pair-wise constraints and labeled samples numbers. Experimental results illustrate that the more pair-wise constraints numbers, or the more labeled samples numbers, the better this algorithm's performance.

关键词

半监督聚类/Seeds集/成对约束

Key words

semi-supervised clustering/seeds/pair-wise constraints

分类

计算机与自动化

引用本文复制引用

常瑜,梁吉业,高嘉伟,杨静..一种基于Seeds集和成对约束的半监督聚类算法[J].南京大学学报:自然科学版,2012,48(4):405-411,7.

基金项目

国家自然科学基金(71031006,70971080),国家“973”计划前期研究专项课题(2011CB311805),高等学校博士学科点专项科研基金 ()

南京大学学报:自然科学版

OACSCDCSTPCD

0469-5097

访问量2
|
下载量0
段落导航相关论文