科技创新与应用2025,Vol.15Issue(8):1-5,5.DOI:10.19981/j.CN23-1581/G3.2025.08.001
基于藏文音节特征的WM多模式匹配算法
摘要
Abstract
In recent years,with the popularization and development of the Internet,especially the mobile Internet,in Xizang,the governance of Tibetan-related Internet public opinion has become increasingly important.The most basic method is sensitive word detection.The multi-pattern(string)matching algorithm is the core technical means for sensitive word detection.As an efficient multi-pattern matching algorithm,the WM(Wu-Manber)algorithm is widely used in many scenarios because of its good practical performance.The algorithm uses character block jump technology to speed up the matching process.However,as a syllable script,Tibetan has significant differences in text characteristics from Chinese and English characters.If the WM algorithm is directly used for Tibetan multi-pattern matching,the effect is not ideal.To solve this problem,this paper makes full use of the syllable structure characteristics of Tibetan,improves and optimizes the WM algorithm,and proposes a multi-pattern matching algorithm for Tibetan-TWM(Tibetan Wu-Manber).Experimental results show that the TWM algorithm is significantly improved in efficiency and accuracy compared to the original WM algorithm in Tibetan multi-pattern matching tasks.关键词
多模式匹配/WM算法/藏文处理/藏文音节/音节结构特性Key words
multi-pattern matching/WM algorithm/Tibetan processing/Tibetan syllable/syllable structure characteristics分类
信息技术与安全科学引用本文复制引用
杨媛婷,彭展..基于藏文音节特征的WM多模式匹配算法[J].科技创新与应用,2025,15(8):1-5,5.基金项目
西藏自治区自然科学基金项目(XZ202101ZR0089G) (XZ202101ZR0089G)