计算机应用与软件2016,Vol.33Issue(3):48-51,4.DOI:10.3969/j.issn.1000-386x.2016.03.010
一种面向专利摘要的领域术语抽取方法
A FIELD TERMINOLOGY EXTRACTION METHOD FOR PATENT ABSTRACTS
摘要
Abstract
The quality of ontology is determined by the result of terminology extraction in patent field.In this paper we propose a method of terminology extraction,which automatically generates the filtering dictionary and combines the effect of factors such as the intensity of vocabulary terms.First,on the basis of word segmentation and parts of speech tagging,it matches the template generated by the parts of speech rule algorithm on the literatures and gets the candidate long terms set and word-type short terms set.Then it uses the filtering dictionaries generated with documentation coincidence to filter part of the candidate long term set.Finally,in light of the characteristic of long terms constitution,it uses the weighted average of three term factors of word intensity,document discrepancy ratio and document consistency as the term weight of whole long terms,and sorts them from high to low.Experiments were conducted on the benchmark corpus of 8000 patent summary literatures,and we randomly selected five sets of experimental data,the average accuracy rate achieved 86%.Results showed that the method was effective in the aspect of field terminology extraction.关键词
领域术语/本体构建/过滤词典/词汇密集度Key words
Field terminology/Ontology creation/Filtering dictionary/Words intensity分类
信息技术与安全科学引用本文复制引用
曾镇,吕学强,李卓..一种面向专利摘要的领域术语抽取方法[J].计算机应用与软件,2016,33(3):48-51,4.基金项目
国家自然科学基金项目(61271304);北京市教委科技发展计划重点项目暨北京市自然科学基金 B 类重点项目(KZ201311232037);北京市属高等学校创新团队建设与教师职业发展计划项目(IDHT20130519)。 ()