| 注册
首页|期刊导航|Journal of Changshu Institute of Technology|词结合型未登录词识别方法研究

词结合型未登录词识别方法研究

周蕾 朱巧明

Journal of Changshu Institute of TechnologyIssue(4):110-114,5.
Journal of Changshu Institute of TechnologyIssue(4):110-114,5.

词结合型未登录词识别方法研究

Research on the Recognition Method of Unknown Chinese Words Based On Compound Words Recognition

周蕾 1朱巧明2

作者信息

  • 1. 常熟理工学院计算机科学与工程学院,江苏常熟215500
  • 2. 江苏省计算机信息处理技术重点实验室,江苏苏州215006
  • 折叠

摘要

Abstract

  This paper introduces a method to extract unknown Chinese words based on compound words recogni⁃tion. This method builds a bi-gram model on the text which is processed by fragments segmentation, and it uses mutual information and regulations to combine some adjacent words to unknown words. The precision on the open test sets is 84.71% and recall is 72.13%.

关键词

未登录词/二元模型/互信息

Key words

unknown Chinese words/bi-gram model/mutual information

分类

信息技术与安全科学

引用本文复制引用

周蕾,朱巧明..词结合型未登录词识别方法研究[J].Journal of Changshu Institute of Technology,2012,(4):110-114,5.

基金项目

江苏省自然科学基金资助项目“基于超媒体引擎的个人办公移动桌面”(BK2003030) (BK2003030)

江苏省教育厅自然基金资助项目“汉语新词汇自动抽取和发布信息网格的研究”(04KKB320134) (04KKB320134)

Journal of Changshu Institute of Technology

1008-2794

访问量0
|
下载量0
段落导航相关论文