Journal of Changshu Institute of TechnologyIssue(4):110-114,5.
词结合型未登录词识别方法研究
Research on the Recognition Method of Unknown Chinese Words Based On Compound Words Recognition
摘要
Abstract
This paper introduces a method to extract unknown Chinese words based on compound words recogni⁃tion. This method builds a bi-gram model on the text which is processed by fragments segmentation, and it uses mutual information and regulations to combine some adjacent words to unknown words. The precision on the open test sets is 84.71% and recall is 72.13%.关键词
未登录词/二元模型/互信息Key words
unknown Chinese words/bi-gram model/mutual information分类
信息技术与安全科学引用本文复制引用
周蕾,朱巧明..词结合型未登录词识别方法研究[J].Journal of Changshu Institute of Technology,2012,(4):110-114,5.基金项目
江苏省自然科学基金资助项目“基于超媒体引擎的个人办公移动桌面”(BK2003030) (BK2003030)
江苏省教育厅自然基金资助项目“汉语新词汇自动抽取和发布信息网格的研究”(04KKB320134) (04KKB320134)