| 注册
首页|期刊导航|电子科技大学学报|基于Aho-Corasick自动机算法的概率模型中文分词CPACA算法

基于Aho-Corasick自动机算法的概率模型中文分词CPACA算法

徐懿彬

电子科技大学学报2017,Vol.46Issue(2):426-433,8.
电子科技大学学报2017,Vol.46Issue(2):426-433,8.DOI:10.3969/j.issn.1001-0548.2017.02.018

基于Aho-Corasick自动机算法的概率模型中文分词CPACA算法

A Probability Model Chinese Word Segmentation Algorithm Based on Aho-Corasick Automata Algorithm

徐懿彬1

作者信息

  • 1. 女王大学工程与应用科学学院加拿大安大略省金斯顿市 K7L 3N6
  • 折叠

摘要

Abstract

Aho-Corasick automata algorithm is a famous multi-string matching algorithm, which backtracks to the effective subsequence state through the fail pointer when it fails in one pattern matching, where one or more effective subsequent states may exist. According to the above characteristics, this paper proposes an automata algorithm suitable for Chinese segmentation. The algorithm calculates the context matching probability of the current pattern by dynamic programming method, and backtracks to the optimal subsequent state of maximum probability, namely, it can realize the combination of the mechanical Chinese segmentation and statistics and probability model. The experimental result shows that a high accuracy rate in Chinese segmentation can be obtained through this algorithm.

关键词

AC自动机/中文分词/动态规划/Trie树

Key words

Aho-Corasick automation/Chinese segmentation/dynamic programming/trie tree

分类

信息技术与安全科学

引用本文复制引用

徐懿彬..基于Aho-Corasick自动机算法的概率模型中文分词CPACA算法[J].电子科技大学学报,2017,46(2):426-433,8.

电子科技大学学报

OA北大核心CSCDCSTPCD

1001-0548

访问量0
|
下载量0
段落导航相关论文