计算机工程与应用2012,Vol.48Issue(14):139-142,167,5.DOI:10.3778/j.issn.1002-8331.2012.14.029
综合最大匹配和歧义检测的中文分词粗分方法
Method of Chinese word rough segmentation by maximum match and ambiguity detection algorithms
摘要
Abstract
Segmentation of words in Chinese text is very important preprocessing in Chinese information processing. In present, for some demerits such as low accuracy of Chinese word segmentation and big set of Chinese word rough segmentation, a method, CWRS, based on maximal match algorithm is proposed along with omni-segmentation algorithm. It greatly improves the accuracy and reduces the set of rough segmentation according to combination of ambiguity detection and cross ambiguity detection, which lays the foundation for precise segmentation of words in Chinese text. All the experiments are good effects by comparison of CWRS with other algorithms on the same data set of common Chinese texts.关键词
中文分词/粗分/最大匹配算法/全切分算法/歧义检测Key words
Chinese word segmentation/ rough segmentation/ maximum match algorithm/ omni-segmentation algorithm/ ambiguity detection分类
信息技术与安全科学引用本文复制引用
李国和,刘光胜,秦波波,吴卫江,李洪奇..综合最大匹配和歧义检测的中文分词粗分方法[J].计算机工程与应用,2012,48(14):139-142,167,5.基金项目
国家自然科学基金(No.60473125) (No.60473125)
国家高新技术研究发展计划(No.2009AA062802) (No.2009AA062802)
中国石油(CNPC)石油科技中青年创新基金(No.05E7013) (CNPC)
国家重大专项子课题(No.G5800-08-ZS-WX). (No.G5800-08-ZS-WX)