| 注册
首页|期刊导航|南京大学学报(自然科学版)|基于事务型滑动窗口的数据流中高效用项集挖掘算法

基于事务型滑动窗口的数据流中高效用项集挖掘算法

宋威 刘明渊 李晋宏

南京大学学报(自然科学版)Issue(4):494-504,11.
南京大学学报(自然科学版)Issue(4):494-504,11.DOI:10.13232/j.cnki.jnju.2014.04.014

基于事务型滑动窗口的数据流中高效用项集挖掘算法

High utility itemsets mining algorithm over data stream based on transaction-sensitive sliding window

宋威 1刘明渊 1李晋宏1

作者信息

  • 1. 北方工业大学信息工程学院,北京,100144
  • 折叠

摘要

Abstract

Traditional frequent itemset mining methods consider an equal profit/weight for all items and only binary occurrences(0/1 )of the items in transactions.By considering the non-binary frequency values of items in transactions and different profit values for each item,high utility itemset mining becomes a very important research issue in the field of data mining.Mining high utility itemsets from a transactional database refers to the discovery of itemsets with high utility like profits.Although a number of relevant algorithms have been proposed in recent years,they incur the problem of producing a large number of candidate itemsets for high utility itemsets.Such a large number of candidate itemsets degrades the mining performance in terms of execution time and space requirement.The situation may become worse when the database contains lots of long transactions or long high utility itemsets.Meanwhile,data stream mining is an emerging research topic in the data mining and knowledge discovery community.Finding frequent itemsets is one of the most important tasks in data stream mining with wide applications like online e-business and web click-stream analysis.However,there are few works on how to discover high utility itemsets in data stream.To meet the scenario of data stream,an algorithm based on transaction-sensitive sliding window,called MHUIDS(Mine High Utility Itemsets over Data Stream),for mining high utility itemsets over data stream is proposed.Firstly,the problem of high utility itemsets mining in transaction-sensitive sliding window is defined formally.Secondly,a tree structure,called the High Transaction-Weighted Utilization Itemset Tree (HTWUI-Tree ), is introduced based on binary vector.By recording itemset as well as utility and bit vectors,HTWUI-Tree can describe the searching space effectively,which lays foundation for quickening the process of high utility itemset mining.Then, initialization and sliding algorithms of transaction-sensitive sliding window are described respectively,which are used for con-structing and modifying HTWUI-Tree .Finally,pruning strategies and mining algorithm are proposed,which can reduce the number of candidate itemsets and lower the cost of database scanning.Experimental results show that MHUIDS algorithm is efficient and consumes low storage cost.

关键词

数据挖掘/数据流/事务型滑动窗口/高效用项集/HTWUI 树

Key words

data mining/data stream/transaction-sensitive sliding window/high utility itemset/HTWUI-tree

引用本文复制引用

宋威,刘明渊,李晋宏..基于事务型滑动窗口的数据流中高效用项集挖掘算法[J].南京大学学报(自然科学版),2014,(4):494-504,11.

基金项目

国家自然科学基金(61105045),北方工业大学科研人才提升计划(CCXZ201303) (61105045)

南京大学学报(自然科学版)

OACSCDCSTPCD

0469-5097

访问量5
|
下载量0
段落导航相关论文