计算机工程与应用Issue(24):113-117,150,6.DOI:10.3778/j.issn.1002-8331.1305-0146
基于潜在语义索引的科技文献主题挖掘
Research of topic mining for scientific papers based on LSI
摘要
Abstract
Based on a method improved by Latent Semantic Indexing, a topic mining for scientific papers is proposed. This paper describes a process which is used to mine the topics of the scientific papers. It performs conversion, removes non-alphabetic and stop word before further processing. It constructs the term-document matrix based on all words’weight. It uses modified LSI algorithm to cut the dimension of the matrix and gets a new topic-document matrix. It takes the highest weight of the top five themes as the papers’topic. This method utilizes the Frobenius norm to regulate matrix, reducing the dimension of the matrix. So the theme of the scientific papers can be mined quickly and accurately.关键词
潜在语义索引/主题挖掘/科技文献Key words
latent semantic indexing/topic modeling/scientific documents分类
信息技术与安全科学引用本文复制引用
刘勘,朱芳芳..基于潜在语义索引的科技文献主题挖掘[J].计算机工程与应用,2014,(24):113-117,150,6.基金项目
国家自然科学基金(No.71203164)。 ()