计算机应用研究2016,Vol.33Issue(8):2282-2285,2306,5.DOI:10.3969/j.issn.1001-3695.2016.08.009
统计与规则相融合的领域术语抽取算法
Domain-specific terms extraction algorithm based on combination of statistics and rules
摘要
Abstract
By using rules and integration of several statistical strategies,and from the perspective of unithood and termhood, this paper proposed a domain-specific term extraction algorithm and constructed the extraction system.System processes inclu-ded obtaining candidate terms based on information entropy expansion,unithood screening strategy based on part-of-speech match rules and boundary detection,and termhood screening strategy based on TF-IDF.Using this algorithm,it could not only extract commonly used domain-specific terms,but also dig out new words about domain.The experiment results show that the accuracy of the term extraction system is 84.33%,and the proposed method can effectively support the automatic term extrac-tion based on a specific domain.关键词
领域术语抽取/词语度/领域度/左右信息熵扩展/边界检测/词频-逆文档频率Key words
domain-specific term extraction/unithood/termhood/information entropy expansion/boundary detection/TF-IDF分类
信息技术与安全科学引用本文复制引用
樊梦佳,段东圣,杜翠兰,张仰森,佟玲玲..统计与规则相融合的领域术语抽取算法[J].计算机应用研究,2016,33(8):2282-2285,2306,5.基金项目
国家自然科学基金资助项目(61370139);北京市属高等学校创新团队建设与教师职业发展计划项目(IDHT20130519);北京市教委专项资助项目 ()