| 注册
首页|期刊导航|现代电子技术|基于半监督学习的Web页面内容分类技术研究

基于半监督学习的Web页面内容分类技术研究

赵夫群

现代电子技术2016,Vol.39Issue(1):108-112,117,6.
现代电子技术2016,Vol.39Issue(1):108-112,117,6.DOI:10.16652/j.issn.1004-373x.2016.01.029

基于半监督学习的Web页面内容分类技术研究

Study on Web page content classification technology based on semi-supervised learning

赵夫群1

作者信息

  • 1. 西北大学 可视化研究所,陕西 西安 710069;咸阳师范学院,陕西 咸阳 712000
  • 折叠

摘要

Abstract

For the key issues that how to use labeled and unlabeled data to conduct Web classification,a classifier of com-bining generative model with discriminative model is explored. The maximum likelihood estimation is adopted in the unlabeled training set to construct a semi-supervised classifier with high classification performance. The Dirichlet-polynomial mixed distri-bution is used to model the text,and then a hybrid model which is suitable for the semi-supervised learning is proposed. Since the EM algorithm for the semi-supervised learning has fast convergence rate and is easy to fall into local optimum,two intelli-gent optimization methods of simulated annealing algorithm and genetic algorithm are introduced,analyzed and processed. A new intelligent semi-supervised classification algorithm was generated by combing the two algorithms,and the feasibility of the algorithm was verified.

关键词

Web页面内容分类/半监督学习/半监督分类/智能优化/Dirichlet分布

Key words

Web page content classification/semi-supervised learning/semi-supervised classification/intelligent optimiza-tion/Dirichlet distribution

分类

信息技术与安全科学

引用本文复制引用

赵夫群..基于半监督学习的Web页面内容分类技术研究[J].现代电子技术,2016,39(1):108-112,117,6.

基金项目

咸阳师范学院专项科研计划项目:基于人工智能的三维油藏数据处理研究(07XSYK224) (07XSYK224)

陕西省教育厅专项科研计划项目:信息化环境下关中方言的保护与传承(12JK0212) (12JK0212)

现代电子技术

OA北大核心CSTPCD

1004-373X

访问量0
|
下载量0
段落导航相关论文