四川大学学报(自然科学版)2011,Vol.48Issue(2):308-314,7.DOI:10.3969/j.issn.0490-6756.2011.02.012
中文财经文本中公司名简称的自动识别
Automatic recognition of company name abbreviations in Chinese financial texts
陈超 1朱洪波 1王亚强 1韩国辉 1谭斌 2于中华1
作者信息
- 1. 四川大学计算机学院,成都,610065
- 2. 四川大学锦江学院,彭山,620860
- 折叠
摘要
Abstract
At present, Named Entity (NE) recognition is one of the hot problems in Natural Language Processing (NLP), and plays a significant role in information retrieval and information extraction.However, the majority of studies were concentrated on NE full name recognition.Taking financial as the example domain, this paper studied the problem of recognizing financial NE abbreviations in texts and mapping the abbreviations to their corresponding full names, and proposed a heuristic algorithm to solve the problem.The algorithm at first extracted every n-gram from a text as a candidate of a company name abbreviation, then established the optimal alignment between the candidate and every company full name in a full name list,and finally recognized the candidate as an abbreviation and mapped it to its full name based on evaluating and filtering heuristically the alignments.The experiments preformed on a text set obtained randomly from the Web showed that the precision, recall and F-score of the algorithm reach 83.62%、 87.28% and 85.41% respectively.关键词
命名实体识别/公司名/简称/启发式Key words
named entity recognition/ company name/ abbreviation/ heuristic分类
信息技术与安全科学引用本文复制引用
陈超,朱洪波,王亚强,韩国辉,谭斌,于中华..中文财经文本中公司名简称的自动识别[J].四川大学学报(自然科学版),2011,48(2):308-314,7.