| 注册
首页|期刊导航|重庆理工大学学报(自然科学版)|一种改进的向量空间模型的文本表示算法

一种改进的向量空间模型的文本表示算法

张小川 于旭庭 张宜浩

重庆理工大学学报(自然科学版)2017,Vol.31Issue(1):87-92,6.
重庆理工大学学报(自然科学版)2017,Vol.31Issue(1):87-92,6.DOI:10.3969/j.issn.1674-8425(z).2017.01.014

一种改进的向量空间模型的文本表示算法

Text Representation Based on Improved Vector Space Model

张小川 1于旭庭 1张宜浩1

作者信息

  • 1. 重庆理工大学 计算机科学与工程学院,重庆400054
  • 折叠

摘要

Abstract

Text representation transfers the readable text into computer-identifiable data structure,and it is a fundamental problem in text information processing field.As a text representation approach in Vector Space Model (VSM),tf-idf algorithm just considers the relevancy between term feature and document,but class.In order to solve this problem,the paper introduce the Chi-square concept of mathematical statistics,and propose a text representation algorithm——tf-idf-cθ.And the algorithm takes the term c value as a factor of a text representation,and c value measures the term distribution difference in classes, and also considers the term characteristic as θvalue to produce the corresponding text representation based on the improved VSM.Last,it classifies short text using twoalgorithms above,and the experiment results show that the modified method is more effective,and partly solve the relevancy between term feature and class.

关键词

文本表示/向量空间模型/卡方分布/tf-idf

Key words

text representation/VSM/CHI/tf-idf

分类

信息技术与安全科学

引用本文复制引用

张小川,于旭庭,张宜浩..一种改进的向量空间模型的文本表示算法[J].重庆理工大学学报(自然科学版),2017,31(1):87-92,6.

基金项目

国家自然科学基金资助项目(61502064) (61502064)

重庆市"121"科技支撑示范工程项目(cstc2014fazktjcsf40009) (cstc2014fazktjcsf40009)

重庆理工大学学报(自然科学版)

OA北大核心CSTPCD

1674-8425

访问量0
|
下载量0
段落导航相关论文