| 注册
首页|期刊导航|情报杂志|基于正态分布的词频分析法高频词阈值研究

基于正态分布的词频分析法高频词阈值研究

安兴茹

情报杂志Issue(10):129-136,8.
情报杂志Issue(10):129-136,8.DOI:10.3969/j.issn.1002-1965.2014.10.022

基于正态分布的词频分析法高频词阈值研究

The Research on the Threshold of High-Frequency Words Based on the Normal Distribution in Word Frequency Analysis

安兴茹1

作者信息

  • 1. 内蒙古科技大学图书馆 包头 014010
  • 折叠

摘要

Abstract

Along with the outburst of information and the developing of information analysis, word frequency analysis is becoming more and more popular in which the defining of high-frequency words serves as the cornerstone. By summarizing the precedent literature resear-ches, this paper first concluded four methods of defining high-frequency words at present, i. e. TOPN, WF>=M, %WF=P and T for-mula. After briefly discussing the main and obvious shortcomings of the above four methods, such as depending on experience too much, subjectivity, lack of theoretical background, inapplicability or impracticability and so on, the paper empirically tested and verified the nor-mal distribution of high-frequency words in depositories, and accordingly proposed the F formula for threshold analysis of high-frequency words. At the final part, the paper compared and contrasted the T formula and the F formula through the analysis of many datasets, and by doing this the F formula was theoretically and applicably legitimized in the research of threshold of high-frequency words based on normal distribution.

关键词

词频分析法/正态分布/高频词/齐普夫定律

Key words

word frequency analysis/normal distribution/High-frequency Words/Zipf's Law

分类

社会科学

引用本文复制引用

安兴茹..基于正态分布的词频分析法高频词阈值研究[J].情报杂志,2014,(10):129-136,8.

情报杂志

OA北大核心CHSSCDCSSCI

1002-1965

访问量0
|
下载量0
段落导航相关论文