计算机应用与软件2017,Vol.34Issue(2):42-47,6.DOI:10.3969/j.issn.1000-386x.2017.02.007
基于改进的隐马尔可夫模型在网页信息抽取中的研究与应用
RESEARCH AND APPLICATION FOR WEB INFORMATION EXTRACTION BASED ON IMPROVED HIDDEN MARKOV MODEL
摘要
Abstract
The task of information extraction is to obtain the objective information precisely and quickly from a large scale of data and improve the utilization of information.According to the characteristics of web data,an improved hidden Markov model (HMM) for web information extraction is proposed,which means combining the advantage of maximum entropy (ME) model in the representation of feature knowledge.The backward dependency assumption in the HMM is added and the model parameters are adjusted by using the characteristic of the emission unit.The state transition probability and the output probability of the improved HMM are not only dependent on the current state of the model,but also be corrected by the forward and backward state values of the historical state of the model.The experimental results show that applying the improved HMM model to web information extraction can effectively improve the quality of web information extraction.关键词
隐马尔可夫模型/最大熵模型/网页信息抽取Key words
Hidden markov model/Maximum entropy model/Web information extraction分类
信息技术与安全科学引用本文复制引用
双哲,孙蕾..基于改进的隐马尔可夫模型在网页信息抽取中的研究与应用[J].计算机应用与软件,2017,34(2):42-47,6.基金项目
国家自然科学基金项目(61502170). (61502170)