计算机应用与软件Issue(2):86-88,3.DOI:10.3969/j.issn.1000-386x.2014.02.023
HMM词性标注中高频生词的处理
PROCESSING HIGH-FREQUENT UNKNOWN WORDS IN HMM POS TAGGING
摘要
Abstract
The thesis introduces a method to process high-frequent unknown words in POS tagging.On the basis of analyzing the concrete implementation of HMMin POS tagging and the key points of unknown words problem,by combining existing unknown words processing meth-ods,configuring correspondent threshold values,integrating POS characteristics of unknown words,targeting at characteristics of English-Chi-nese unknown words POS distribution,the thesis chooses more valuable words from unknown words to be added to the training corpora in order to perfect the corpora and improve the accuracy of tagging.Moreover comparison is carried out between tagging accuracies of HMM and the model that has been added with unknown word processing method.Experiment shows that the above methods can effectively choose the repre-sentative high-frequent words from a certain field.When those words are added into the training corpora,the POS tagging accuracy is signifi-cantly promoted.Therefore it satisfies the fundamental demands of applying POS tagging to practice.关键词
隐马尔科夫模型/词性标注/生词处理Key words
HMM/POS tagging/Unknown words processing分类
信息技术与安全科学引用本文复制引用
牛秀萍,马建芬..HMM词性标注中高频生词的处理[J].计算机应用与软件,2014,(2):86-88,3.基金项目
山西省留学归国人员科研项目(2011-027);山西省留学人员科技活动择优项目 ()