计算机应用研究2024,Vol.41Issue(10):2970-2977,8.DOI:10.19734/j.issn.1001-3695.2024.01.0027
基于统计显著性检验的高效用项集挖掘算法
Mining high utility itemsets based on statistical significance testing
摘要
Abstract
Aiming at the problem of traditional high utility itemset mining algorithms reporting false positive high utility item-sets in transactions with class labels,this paper proposed two high utility itemset mining algorithms called FHUI and PHUI.The FHUI and PHUI firstly found all the candidates and grouped them by length.Then,the FHUI established null distribu-tions with the frequency distributions,while the PHUI established null distributions by the permutation strategy within or be-tween transactions.Finally,the FHUI and PHUI calculated the p values from the null distributions and exploited the false dis-covery rate to eliminate the false positive high utility itemsets.The experiments on the benchmark data sets show that the FHUI and PHUI can eliminate a large number of false positive itemsets,which allows them to achieve higher accuracy rates in the classification tasks.The experiments on synthetic data sets reveal that the proportions of false positive itemsets reported by FHUI and PHUI are lower than 4.8%and the average utility values are higher than 39 000.Experimental results prove that the statistically significant high utility itemsets reported by the FHUI and PHUI are more reliable and practical in transactions with class labels.关键词
数据挖掘/高效用项集挖掘/统计显著性检验/Fisher检验/置换检验Key words
data mining/high utility itemset mining/statistical significance testing/Fisher testing/permutation testing分类
计算机与自动化引用本文复制引用
吴军,魏丹丹,欧阳艾嘉,王亚..基于统计显著性检验的高效用项集挖掘算法[J].计算机应用研究,2024,41(10):2970-2977,8.基金项目
国家自然科学基金资助项目(62066049) (62066049)
贵州省教育厅高等学校青年资助项目(黔教技[2022]313,黔教合KY[2022]015) (黔教技[2022]313,黔教合KY[2022]015)
贵州省科技厅科技支撑计划资助项目(黔科合支撑[2023]257) (黔科合支撑[2023]257)
遵义市科技合作资助项目(遵市科合HZ字(2022)123) (遵市科合HZ字(2022)