| 注册
首页|期刊导航|计算机技术与发展|一种基于词加权LDA模型的专利文献分类方法

一种基于词加权LDA模型的专利文献分类方法

孙伟 刘文静 葛丽阁 余璇

计算机技术与发展2019,Vol.29Issue(3):23-29,7.
计算机技术与发展2019,Vol.29Issue(3):23-29,7.DOI:10.3969/j.issn.1673-629X.2019.03.005

一种基于词加权LDA模型的专利文献分类方法

A Patent Document Classification Method Based on Word Weighted LDA Model

孙伟 1刘文静 1葛丽阁 1余璇1

作者信息

  • 1. 上海海事大学 信息工程学院, 上海 201306
  • 折叠

摘要

Abstract

When the traditional topic model carries on the text classification, its characteristic words choose the high frequency words under the law of statistics. However, in the patent literature classification, most professional words are often overwhelmed by high frequency words, resulting in the low accuracy of the topic model in the classification of patent documents. Therefore, we present a supervised LDA topic model based on word weighted for the classification of patent documents. Based on the co-occurrence relationship between professional words and high-frequency words, KeyGraph algorithm is used to select the keywords with better characterization, and the mutual information function is used to calculate the weight of each keyword to establish a professional word dictionary. On this basis, a supervised LDA model is built, the word weighted is extended to the LDA model and Gibbs Sampling is used to estimate the parameters. Compared with the LDA model and its two variant models, the classification accuracy of the model is improved by 4.62%, 3.74% and 3.26% respectively on the patent documents. It shows that the high degree of specialization words selected by the model has a higher degree of relevance to the topic, and the classification efficiency and accuracy are significantly improved.

关键词

加权模型/LDA/KeyGraph算法/专利文献分类

Key words

weighted model/latent Dirichlet allocation/KeyGraph algorithm/patent literature classification

分类

信息技术与安全科学

引用本文复制引用

孙伟,刘文静,葛丽阁,余璇..一种基于词加权LDA模型的专利文献分类方法[J].计算机技术与发展,2019,29(3):23-29,7.

基金项目

国家自然科学基金青年项目(61203240) (61203240)

计算机技术与发展

OACSTPCD

1673-629X

访问量0
|
下载量0
段落导航相关论文