计算机应用研究2017,Vol.34Issue(2):334-338,5.DOI:10.3969/j.issn.1001-3695.2017.02.003
互联网广告点击率预估模型中特征提取方法的研究与实现
Research and implementation of feature extraction methods on Internet CTR prediction model
摘要
Abstract
Internet advertising is a hundreds of billions of dollars of market.CTR(click-through-rate) is an important indicator of the effectiveness of Internet advertising.In the CTR prediction model,features are used to be a key factor to the success or failure of many machine learning projects and the characteristics of the feature will directly affect the final model.In order to make the Internet advertisement CTR prediction model can be more accurate,this paper put forward a GBDT-based multidimensional feature extraction method which ran on the Hadoop big data platform.This method used raw data to build a multidimensional feature library and put all the basic features into GBDT model for feature selection except for ID features,in order to get high level features for further classification.This method not only reduces labor costs and time costs in feature extraction stage,but largely enhances the accuracy of the CTR prediction model.关键词
CTR预估/特征提取/互联网广告/Hadoop大数据平台/GBDTKey words
CTR prediction/feature extraction/Internet advertising/Hadoop big data platform/GBDT分类
信息技术与安全科学引用本文复制引用
田嫦丽,张珣,潘博,杨超,许彦茹..互联网广告点击率预估模型中特征提取方法的研究与实现[J].计算机应用研究,2017,34(2):334-338,5.基金项目
北京市自然科学基金重点项目B类(KZ201410011014) (KZ201410011014)
2015年研究生科研能力提升计划资助项目 ()
北京市自然科学基金青年项目(9164025) (9164025)
国家教育部人文社会科学研究青年基金资助项目(15YJCZH224) (15YJCZH224)