基于主动学习的深度半监督聚类模型OA北大核心CSTPCD
Deep active semi-supervised clustering model
深度半监督聚类旨在利用少量的监督信息达到更好的聚类效果.然而,由于标注成本昂贵,监督信息的数量往往是有限的.因此,在监督信息有限的情况下,如何选择对聚类最有价值的监督信息变得至关重要.针对以上问题,提出了基于主动学习的深度半监督聚类模型(DASCM).该模型设计了一种主动学习方法,能够挑选出蕴涵丰富信息的边缘文本,并进一步生成蕴涵边缘文本的高价值监督信息.该模型利用这些监督信息指导聚类,从而提升聚类性能.在5个真实文本数据集上的实验表明,DASCM的聚类性能有显著提升.这一结果验证了利用主动学习方法生成的涵盖边缘文本的监督信息对于提升聚类效果是有效的.
Deep semi-supervised clustering aims to achieve better clustering results using a small amount of supervised infor-mation.However,the amount of supervised information is often limited due to the expensive labelling cost.Therefore,with limited supervised information,it becomes crucial to select the most valuable supervisory information for clustering.To address the above problem,this paper proposed a deep active semi-supervised clustering model(DASCM)which designed an active learning method that was able to select marginal texts containing rich information and further generated high-value supervised information containing edge texts.The model used this supervised information to guide the clustering,thus improving the clus-tering performance.The experimental results on five real text datasets show that the clustering performance of DASCM is signi-ficantly improved.This result verifies that supervised information generated using active learning methods that cover marginal text is effective in improving clustering.
付艳艳;黄瑞章;薛菁菁;任丽娜;陈艳平;林川
贵州大学文本计算与认知智能教育部工程研究中心,贵阳 550025||贵州大学公共大数据国家重点实验室,贵阳 550025||贵州大学计算机科学与技术学院,贵阳 550025
计算机与自动化
深度半监督聚类主动学习边缘文本
deep semi-supervised clusteringactive learningmarginal text
《计算机应用研究》 2024 (010)
2955-2961 / 7
国家自然科学基金资助项目(62066007);贵州省科技支撑计划资助项目(黔科合支撑[2022]一般277)
评论