|国家科技期刊平台
首页|期刊导航|计算机技术与发展|基于自适应密度邻域关系的多标签在线流特征选择

基于自适应密度邻域关系的多标签在线流特征选择OACSTPCD

Multi-label Online Stream Feature Selection Based on Adaptive Density Neighborhood Relation

中文摘要英文摘要

流特征选择指从以流形式到来的特征数据中选出最优特征子集,现有方法大多在模型训练中需要事先学习领域信息并预设给定参数值.实际应用中,由于不同的数据集数据结构和来源不同,在模型学习过程中研究人员无法提前获取相关领域知识且针对不同类型数据集指定一个统一参数存在巨大挑战.基于此,提出一种基于自适应密度邻域关系的多标签在线流特征选择方法(multi-label online stream feature selection based on adaptive density neighborhood relation,ML-OFS-ADNR),基于邻域粗糙集理论,所提方法在特征依赖计算时无需任何先验领域信息.此外,提出了一种新的自适应密度邻域关系,使用周围实例的密度信息,可以在流特征选择过程中自动选择适当数量的邻域,不需要事先指定任何参数.通过模糊等价约束,ML-OFS-ADNR可以选择高依赖低冗余度的特征.实验表明在10 种不同类型的数据集上,所提方法在特征数量相同的情况下优于传统特征选择方法和先进的在线流特征选择方法.

Stream feature selection selects the optimal feature subset from the feature data arriving in the form of stream.Most existing methods require prior learning of domain information and presetting of given parameter values during model training.In real-world appli-cations,due to the differences in data structure and source,researchers cannot obtain relevant domain information in advance during the model learning process for different datasets,and it is a huge challenge for them to specify a unified parameter for different types of datasets.Motivated by this,we propose a multi-label online stream feature selection based on adaptive density neighborhood relation(ML-OFS-ADNR).On the basis of the neighborhood rough set theory,the proposed method does not require any prior domain information in feature dependency calculation.Moreover,a new adaptive density neighborhood relationship is proposed,which can auto-matically select an appropriate number of neighborhoods in the streaming feature selection process using the density information of surrounding instances,and there is no need to specify any parameters in advance.By the fuzzy equal constraint,ML-OFS-ADNR can select features with high dependency and low redundancy.Experimental studies on ten different types of data sets show that the proposed method is superior to traditional feature selection methods with the same numbers of features and state-of-the-art online streaming feature selection algorithms in an online manner.

张海翔;李培培;胡学钢

蚌埠医学院附属合肥市第二人民医院 讯息处,安徽 合肥 230012合肥工业大学 大数据知识工程教育部重点实验室,安徽 合肥 230601

计算机与自动化

多标签分类流特征邻域粗糙集自适应密度邻域在线流特征选择

multi-label classificationstreaming featureneighborhood rough setadaptive density neighborhoodonline streaming

《计算机技术与发展》 2024 (001)

23-29 / 7

国家自然科学基金资助项目(61976077,62076085,62120106008);蚌埠医学院科技计划项目(2022byzd225sk)

10.3969/j.issn.1673-629X.2024.01.004

评论