计算机应用研究2017,Vol.34Issue(4):972-976,5.DOI:10.3969/j.issn.1001-3695.2017.04.003
基于网页信息和分词的中文机构名全称和简称提取方法
Extraction method of organization full names and abbreviations based on Web page and word segmentation
摘要
Abstract
When processing the correspondence between full names and abbreviations,search engine can only use the way of manually adding in the past,resulting in abbreviations omission and low recall rate of search results.To solve these problems,this paper proposed an extraction method of organizations' full names and abbreviations based on Web page and word segmentation.It obtained source code of website homepage of organization firstly.Then it extracted relevant organization full name from the source code,and extracted candidate abbreviations based on contextual features collection of organization names.Finally it calculated the similarity between candidate abbreviations and full name to determine which candidates were the exact abbreviations.Through experiments on 1 287 organization websites,the full names' correct rate of this method is 93.9%,the abbreviations' recall rate and correct rate are 85.3% and 90.8% separately.Experimental results show that the method has a good effect.关键词
机构名简称提取/机构名全称提取/网页分析/简称相似度计算Key words
extraction of organization abbreviations/extraction of organization full name/Web page analysis/abbreviation similarity calculation分类
信息技术与安全科学引用本文复制引用
张俊玲,耿光刚,延志伟,李晓东..基于网页信息和分词的中文机构名全称和简称提取方法[J].计算机应用研究,2017,34(4):972-976,5.基金项目
国家自然科学基金资助项目(61375039,61272433) (61375039,61272433)