| 注册
首页|期刊导航|东南大学学报(英文版)|使用本体语义提高文本聚类

使用本体语义提高文本聚类

罗娜 左万利 袁福宇 张靖波 张慧杰

东南大学学报(英文版)2006,Vol.22Issue(3):370-374,5.
东南大学学报(英文版)2006,Vol.22Issue(3):370-374,5.

使用本体语义提高文本聚类

Using ontology semantics to improve text documents clustering

罗娜 1左万利 2袁福宇 1张靖波 1张慧杰2

作者信息

  • 1. 吉林大学计算机科学与技术学院,长春,130012
  • 2. 东北师范大学计算机学院,长春,130024
  • 折叠

摘要

Abstract

In order to improve the clustering results and select in the results,the ontology semantic is combined with document clustering.A new document clustering algorithm based WordNet in the phrase of document processing is proposed.First,every word vector by new entities is extended after the documents are represented by tf-idf.Then the feature extracting algorithm is applied for the documents.Finally,the algorithm of ontology aggregation clustering (OAC) is proposed to improve the result of document clustering.Experiments are based on the data set of Reuters 20 News Group,and experimental results are compared with the results obtained by mutual information(MI).The conclusion draws that the proposed algorithm of document clustering based on ontology is better than the other existed clustering algorithms such as MNB,CLUTO,co-clustering,etc.

关键词

本体/文本聚类/词典/WordNet

Key words

ontology/text clustering/lexicon/WordNet

分类

信息技术与安全科学

引用本文复制引用

罗娜,左万利,袁福宇,张靖波,张慧杰..使用本体语义提高文本聚类[J].东南大学学报(英文版),2006,22(3):370-374,5.

基金项目

The National Natural Science Foundation of China (No.60373099),the Natural Science Foundation for Young Scholars of Northeast Normal University (No.20061005). (No.60373099)

东南大学学报(英文版)

1003-7985

访问量3
|
下载量0
段落导航相关论文