南京理工大学学报(自然科学版)Issue(4):526-530,5.
基于对偶编码的中文分词算法
Chinese word segmentation algorithm based on pair coding
摘要
Abstract
To improve the segmentation velocity and storage efficiency of the Chinese word segmentation algorithm,this paper proposes a characteristic matching algorithm based on pair coding. The characteristic value is extracted from the Chinese character position. This method can support fuzzy matching and don’t need match multi-character Chinese words,so the characteristic value extraction is extracted from the adjacent Chinese character position. In addition,the data compression method can contribute to reduce storage space and improve the performance of Chinese word segmentation.关键词
对偶编码/中文分词/特征匹配/数据压缩/散列/特征值/模糊匹配Key words
pair coding/Chinese word segmentation/characteristic matching/data compression/hash/characteristic value/fuzzy matching分类
信息技术与安全科学引用本文复制引用
张冰怡,魏博,陈建成,魏杰,饶国政..基于对偶编码的中文分词算法[J].南京理工大学学报(自然科学版),2014,(4):526-530,5.基金项目
国家“973”计划资助项目(2013CB329301) (2013CB329301)
国家自然科学基金(61373165) (61373165)
中国民航信息技术科研基地开放基金(CAAC-ITRB-201209) (CAAC-ITRB-201209)