| 注册
首页|期刊导航|计算机应用研究|统计与规则相融合的领域术语抽取算法

统计与规则相融合的领域术语抽取算法

樊梦佳 段东圣 杜翠兰 张仰森 佟玲玲

计算机应用研究2016,Vol.33Issue(8):2282-2285,2306,5.
计算机应用研究2016,Vol.33Issue(8):2282-2285,2306,5.DOI:10.3969/j.issn.1001-3695.2016.08.009

统计与规则相融合的领域术语抽取算法

Domain-specific terms extraction algorithm based on combination of statistics and rules

樊梦佳 1段东圣 2杜翠兰 2张仰森 1佟玲玲2

作者信息

  • 1. 北京信息科技大学 智能信息处理研究所,北京 100192
  • 2. 国家计算机网络应急技术处理协调中心,北京100190
  • 折叠

摘要

Abstract

By using rules and integration of several statistical strategies,and from the perspective of unithood and termhood, this paper proposed a domain-specific term extraction algorithm and constructed the extraction system.System processes inclu-ded obtaining candidate terms based on information entropy expansion,unithood screening strategy based on part-of-speech match rules and boundary detection,and termhood screening strategy based on TF-IDF.Using this algorithm,it could not only extract commonly used domain-specific terms,but also dig out new words about domain.The experiment results show that the accuracy of the term extraction system is 84.33%,and the proposed method can effectively support the automatic term extrac-tion based on a specific domain.

关键词

领域术语抽取/词语度/领域度/左右信息熵扩展/边界检测/词频-逆文档频率

Key words

domain-specific term extraction/unithood/termhood/information entropy expansion/boundary detection/TF-IDF

分类

信息技术与安全科学

引用本文复制引用

樊梦佳,段东圣,杜翠兰,张仰森,佟玲玲..统计与规则相融合的领域术语抽取算法[J].计算机应用研究,2016,33(8):2282-2285,2306,5.

基金项目

国家自然科学基金资助项目(61370139);北京市属高等学校创新团队建设与教师职业发展计划项目(IDHT20130519);北京市教委专项资助项目 ()

计算机应用研究

OA北大核心CSCDCSTPCD

1001-3695

访问量0
|
下载量0
段落导航相关论文