山西大学学报(自然科学版)2013,Vol.36Issue(2):180-186,7.
基于用户兴趣域的混合数据聚类标签算法
Mixed Data Clustering Label Algorithm Based on User's Interest Domain
摘要
Abstract
Data clustering label technology is a method that performs clustering on a small-scale sample set and then labels the rest of samples by using the clustering results.It is an effective way to improve the efficiency of large-scale data clustering.The mixed data are the most widely used data type in real-world applications.This paper regards user's interest data as a small-scale data and it is clustered by the K-prototypes clustering algorithm.The clustering result is used to construct the user's interest domains.The membership degree of a sample to the user's interest domain is defined by the relationship of the attribute values of the unlabeled sample and the components of the users' interest domain.A mixed data clustering label algorithm is proposed based on the concepts of users' interest domain and “data-user's interest domain” membership degree.This algorithm can overcome the limitation that unlabeled data is only assigned a class label by the existing data label algorithms.It can be applied to recommendation service and user behavior analysis in electronic commerce.The experiments show that the algorithm has better results on the mixed data clustering label processing.关键词
混合数据/聚类/用户兴趣域/UIMCL算法Key words
mixed data / clustering/ user' s interest domain/ UIMCL algorithm分类
信息技术与安全科学引用本文复制引用
李德玉,翁小奎,李艳红..基于用户兴趣域的混合数据聚类标签算法[J].山西大学学报(自然科学版),2013,36(2):180-186,7.基金项目
国家自然科学基金(61175067 ()
61272095) ()
山西省自然科学基金(2010011021-1) (2010011021-1)
山西省科技攻关项目(20110321027-02) (20110321027-02)