基于相异度矩阵的碎片化回复文本聚类方法OACSTPCD
Text Clustering Method for Fragmented Reply Based on Dissimilarity Matrix
针对问答社区碎片化回复文本中有效抽取所需文本信息的问题,本文提出一种基于相异度矩阵的碎片化回复文本聚类方法.首先,根据文本之间相异度设计聚类中心,以聚类方式将社区中碎片化回复文本分类;然后,使用基于RNN+CNN的问题文本特征提取方法提取用户问题的文本特征;最后,结合提取的问题文本特征,使用基于TF-IDF算法的抽取式文本自动生成算法,实现回复文本信息的快速自动提取.实验结果表明本文方法可以自动抽取所需文本信息,抽取结果精度高且稳定,可应用于问答社区碎片化回复文本的抽取.
In response to the problem of effectively extracting the required text information from fragmented reply texts in Q&A communities,this paper proposes a clustering method for fragmented reply texts based on dissimilarity matrix.Firstly,the clus-tering center is designed based on dissimilarity between texts and the fragmented reply texts in the community are classified by the clustering way.Then,the text features of user questions are extracted based on RNN+CNN.Finally,the automatic extraction of fragmented response text is achieved based on TF-IDF algorithm using the extracted question text features.The experimental results show that the proposed method can automatically extract the required text information with high accuracy and stability,and can be applied to the extraction of fragmented reply texts in question answering communities.
刘文亮;吴飞;何德明;赵维伟;潘建宏
国家电网福建省电力公司,福建 福州 350000福建亿榕信息技术有限公司,福建 福州 350003国家电网有限公司,北京 100000
计算机与自动化
问答社区碎片化回复文本自动抽取聚类相异度
question-answer communityfragmented reply textautomatic extractionclusteringdissimilarity
《计算机与现代化》 2024 (009)
56-60 / 5
福建省科技项目(SGFJ0000KXJS1700225)
评论