计算机工程与应用Issue(16):142-145,154,5.DOI:10.3778/j.issn.1002-8331.1202-0458
结合LDA和谱聚类的多文档摘要
Multi-document summary using LDA and spectral clustering
摘要
Abstract
Automatic summarization aims to compress lengthy document into a few short paragraphs, offers comprehensive and concise information to the users and improves the efficiency and accuracy of the information. A summarization method based on Latent Dirichlet Allocation(LDA)is proposed, using Gibbs sampling to estimate the word probability on topics and topic proba-bility on sentences, combing with the LDA parameters and spectral clustering algorithm to extract multi-document summariza-tion. The proposed approach uses a linear formula to integrate the sentence weights, extracting 400-words multi-document sum-marization. The experimental results show that the proposed method can improve the quality of summary effectively with the au-tomatic summarization evaluation toolkit ROUGE on DUC2002.关键词
Latent Dirichlet Allocation(LDA)/Gibbs抽样/谱聚类/多文档摘要Key words
Latent Dirichlet Allocation(LDA)/Gibbs sampling/spectral clustering/multi-document summary分类
信息技术与安全科学引用本文复制引用
付玲,张晖..结合LDA和谱聚类的多文档摘要[J].计算机工程与应用,2013,(16):142-145,154,5.基金项目
国家高技术研究发展计划项目(863)(No.2007AA01Z151)。 ()