燕山大学学报2025,Vol.49Issue(5):422-428,7.DOI:10.3969/j.issn.1007-791X.2025.05.005
基于熵优化的TF-IDF算法研究
Research on TF-IDF algorithm based on entropy optimization
摘要
Abstract
The traditional TF-IDF(Term Frequency-Inverse Document Frequency)algorithm represents text features based on the frequency of feature terms.However,this approach has inherent limitations in considering category distribution information,as it overlooks the distribution of feature terms both within and across classes.To address this,a TF-IDF algorithm optimized using information entropy is proposed initially in this paper.This algorithm incorporates decentralized term frequency factors and information entropy to capture the distribution characteristics of feature terms both within and between classes.Building on this foundation,a TF-IDF algorithm based on expected cross-entropy optimization is introduced further,which integrates the theory of expected information entropy.Comparative experiments reveal that,while the TF-IDF algorithm optimized with information entropy enhances model performance to a certain extent,the TF-IDF algorithm based on expected cross-entropy optimization exhibits superior performance in terms of precision,recall,and F1 score.关键词
TF-IDF/特征项/词频/期望交叉熵Key words
TF-IDF/feature word/word frequency/expected cross entropy分类
信息技术与安全科学引用本文复制引用
王逸蓓,王芳..基于熵优化的TF-IDF算法研究[J].燕山大学学报,2025,49(5):422-428,7.基金项目
河北省自然科学基金资助项目(F2022203085) (F2022203085)
河北省高等学校科学技术研究项目(ZD2022012) (ZD2022012)