| 注册
首页|期刊导航|数字图书馆论坛|互信息特征选择法在《中图法》内容相似类目中的运用及改进——以E271和E712.51为例

互信息特征选择法在《中图法》内容相似类目中的运用及改进——以E271和E712.51为例

李湘东 阮涛

数字图书馆论坛Issue(1):46-52,7.
数字图书馆论坛Issue(1):46-52,7.DOI:10.3772/j.issn.1673-2286.2018.01.008

互信息特征选择法在《中图法》内容相似类目中的运用及改进——以E271和E712.51为例

The Application and Improvement of Mutual Information Feature Selection Method in the Similar Categories of Classification in CLC: Take E271 and E712.51 as an Example

李湘东 1阮涛2

作者信息

  • 1. 武汉大学信息管理学院,武汉430072
  • 2. 武汉大学电子商务研究与发展中心,武汉430072
  • 折叠

摘要

Abstract

An improved mutual information feature selection method is proposed to improve the effect of automatic classification of two kinds of text, which is characterized by the existence of a large number of common features in text, which is difficult to distinguish automatical y. The E271 (Chinese army) and E712.51 (American army) bibliographic information in CLC are used as the object of two types of text classification. Firstly, the traditional mutual information feature selection method, which does not consider the negative correlation feature, however the DNCF_MI feature selection method has overcome the weakness. Secondly, the DNCF_MI does not consider the difference between the two types of features in two categories, because the features that wil appear simultaneously in two categories, have different degrees of contribution to characteristics that appear only in one of the classes. So, this paper introduces the field-independent features, domain-related features and proposes an improved DNCF_DI_MI feature selection method. Finally, the knn classifier is used for classification, and the Marco-F1 value and the Mirco-F1 value are used to evaluate the classification results. The experimental results show that the Marco-F1 and Mirco-F1 values of the proposed method are 24.1%and 28.5%higher than that of the traditional mutual information respectively, and 4.5%higher than that of DNCF_MI, which proves that the method is valid.

关键词

内容相似类目/中国图书馆分类法/两类分类/互信息/特征选择

Key words

Similar Content Category/Chinese Library Classification/Two Categories of Classification/Mutual Information/Feature Selection

分类

社会科学

引用本文复制引用

李湘东,阮涛..互信息特征选择法在《中图法》内容相似类目中的运用及改进——以E271和E712.51为例[J].数字图书馆论坛,2018,(1):46-52,7.

数字图书馆论坛

OACSSCICSTPCD

1673-2286

访问量0
|
下载量0
段落导航相关论文