| 注册
首页|期刊导航|燕山大学学报|基于熵优化的TF-IDF算法研究

基于熵优化的TF-IDF算法研究

王逸蓓 王芳

燕山大学学报2025,Vol.49Issue(5):422-428,7.
燕山大学学报2025,Vol.49Issue(5):422-428,7.DOI:10.3969/j.issn.1007-791X.2025.05.005

基于熵优化的TF-IDF算法研究

Research on TF-IDF algorithm based on entropy optimization

王逸蓓 1王芳1

作者信息

  • 1. 燕山大学 理学院,河北 秦皇岛 066004
  • 折叠

摘要

Abstract

The traditional TF-IDF(Term Frequency-Inverse Document Frequency)algorithm represents text features based on the frequency of feature terms.However,this approach has inherent limitations in considering category distribution information,as it overlooks the distribution of feature terms both within and across classes.To address this,a TF-IDF algorithm optimized using information entropy is proposed initially in this paper.This algorithm incorporates decentralized term frequency factors and information entropy to capture the distribution characteristics of feature terms both within and between classes.Building on this foundation,a TF-IDF algorithm based on expected cross-entropy optimization is introduced further,which integrates the theory of expected information entropy.Comparative experiments reveal that,while the TF-IDF algorithm optimized with information entropy enhances model performance to a certain extent,the TF-IDF algorithm based on expected cross-entropy optimization exhibits superior performance in terms of precision,recall,and F1 score.

关键词

TF-IDF/特征项/词频/期望交叉熵

Key words

TF-IDF/feature word/word frequency/expected cross entropy

分类

信息技术与安全科学

引用本文复制引用

王逸蓓,王芳..基于熵优化的TF-IDF算法研究[J].燕山大学学报,2025,49(5):422-428,7.

基金项目

河北省自然科学基金资助项目(F2022203085) (F2022203085)

河北省高等学校科学技术研究项目(ZD2022012) (ZD2022012)

燕山大学学报

OA北大核心

1007-791X

访问量0
|
下载量0
段落导航相关论文