首页|期刊导航|山西大学学报（自然科学版）|基于用户兴趣域的混合数据聚类标签算法

基于用户兴趣域的混合数据聚类标签算法

李德玉翁小奎李艳红

山西大学学报（自然科学版）2013，Vol.36Issue(2)：180-186,7.

基于用户兴趣域的混合数据聚类标签算法

Mixed Data Clustering Label Algorithm Based on User's Interest Domain

李德玉 ¹翁小奎 ²李艳红¹

作者信息

1. 山西大学计算机与信息技术学院,山西太原030006
2. 山西大学计算智能与中文信息处理教育部重点实验室,山西太原030006
折叠

摘要

Abstract

Data clustering label technology is a method that performs clustering on a small-scale sample set and then labels the rest of samples by using the clustering results.It is an effective way to improve the efficiency of large-scale data clustering.The mixed data are the most widely used data type in real-world applications.This paper regards user's interest data as a small-scale data and it is clustered by the K-prototypes clustering algorithm.The clustering result is used to construct the user's interest domains.The membership degree of a sample to the user's interest domain is defined by the relationship of the attribute values of the unlabeled sample and the components of the users' interest domain.A mixed data clustering label algorithm is proposed based on the concepts of users' interest domain and “data-user's interest domain” membership degree.This algorithm can overcome the limitation that unlabeled data is only assigned a class label by the existing data label algorithms.It can be applied to recommendation service and user behavior analysis in electronic commerce.The experiments show that the algorithm has better results on the mixed data clustering label processing.

关键词

混合数据/聚类/用户兴趣域/UIMCL算法

Key words

mixed data / clustering/ user' s interest domain/ UIMCL algorithm

分类

信息技术与安全科学

引用本文复制引用

李德玉,翁小奎,李艳红..基于用户兴趣域的混合数据聚类标签算法[J].山西大学学报（自然科学版）,2013,36(2):180-186,7.

基金项目

国家自然科学基金(61175067 （）

61272095) （）

山西省自然科学基金(2010011021-1) （2010011021-1）

山西省科技攻关项目(20110321027-02) （20110321027-02）

山西大学学报（自然科学版）

OA北大核心CSCDCSTPCD

ISSN：0253-2395

访问量0

下载量0

段落导航