电子科技大学学报2017,Vol.46Issue(2):426-433,8.DOI:10.3969/j.issn.1001-0548.2017.02.018
基于Aho-Corasick自动机算法的概率模型中文分词CPACA算法
A Probability Model Chinese Word Segmentation Algorithm Based on Aho-Corasick Automata Algorithm
徐懿彬1
作者信息
- 1. 女王大学工程与应用科学学院加拿大安大略省金斯顿市 K7L 3N6
- 折叠
摘要
Abstract
Aho-Corasick automata algorithm is a famous multi-string matching algorithm, which backtracks to the effective subsequence state through the fail pointer when it fails in one pattern matching, where one or more effective subsequent states may exist. According to the above characteristics, this paper proposes an automata algorithm suitable for Chinese segmentation. The algorithm calculates the context matching probability of the current pattern by dynamic programming method, and backtracks to the optimal subsequent state of maximum probability, namely, it can realize the combination of the mechanical Chinese segmentation and statistics and probability model. The experimental result shows that a high accuracy rate in Chinese segmentation can be obtained through this algorithm.关键词
AC自动机/中文分词/动态规划/Trie树Key words
Aho-Corasick automation/Chinese segmentation/dynamic programming/trie tree分类
信息技术与安全科学引用本文复制引用
徐懿彬..基于Aho-Corasick自动机算法的概率模型中文分词CPACA算法[J].电子科技大学学报,2017,46(2):426-433,8.