土壤学报2017,Vol.54Issue(1):36-47,12.DOI:10.11766/trxb201604210130
土壤图更新中基于土壤类型面积分级的训练样点选择方法
Training Sample Selection Method Based on Grading of Soil Types by Area for Updating Conventional Soil Maps
摘要
Abstract
Objective]Traditional soil surveyshave turned out huge piles of conventional soil maps various in scale and nature. Although these maps are not very high in spatial detail or accuracy,they contain large volumes of valuable expertise concerning soil-environment relationships in relevant regions. Data mining models can be used to extract from these maps information useful to updating of the conventional soil maps. In using data mining models to extract the information of soil spatial distribution,selection of training samples is an essential step. Quality of training samples will affect to a great extent full expression of soil-environmental relationships and accuracy of the updatedsoil maps. The area-weighted proportion method was a common method for selecting of training samples. However,this method usually assigns too much weight to those soil types large in area,so that too many training samples would be selected. Meanwhile,random selection of training samples from polygons of the same soil type may bring in some“noise”samples,occurring on transition areas between soil types,which make the accuracy of the updated soil maps not high.[Method]In this paper,a new method was developed to select training samples from conventional soil maps based on grading of soil types by area. The method consists of the following two steps. The first step is to specify typical(representative)samples of each soil type based on conventional soil map,so as to avoid generation of“noise pixels”due to misplacement in delineating boundaries between soil polygons. It is assumed that most of the boundaries of the soil polygons of a certain soil type are correctly delineated,and then the peak of the histogram of a certain environmental factor enclosed in the polygons of the soil type represents the typical environmental conditionunder which the soil develops or exists. The pixels close to the selected environmental conditions or within the peak zone of the histogram are considered as representative samples. All the representative samples selected through histograms of various environmental conditions of a certain soil type are combined into a typical sample set of the soil type. The second step is to select training samples based on grading of soil type by area,with a view to keep the numbers of samples of each soil type in balance. Soil types in the same grade should have the same number of training samples out of the typical sample set of each of the soil types.The random forest model adopted in this study is to update conventional soil maps based on the selected training samples. To evaluate the above-proposed method,comparison was made between this method and two other training sample selection methods. One is to randomly select training samples from polygons of each soil type and the number of training samples for each soil type depended on proportion of the grade the soil type is in,while the other is the common area-weighed proportion method,which randomly selects training samples form the soil polygons of a soil type and the number of training samples for each soil type depended on the area-weighted proportion of the soil type. The study area was a small watershed in Raffelson, Wisconsin of USA. The three selection methods were tried repeatedly,each for 500 times,and validate mean precision of the inferential mapping and proportion of the updated conventional soil maps with 92 independent verification samples in the field.[Result]Results show that based on the 500 trails,comparison of this method with the other two reveals that about 79.5%,71.8% and 63.6% of the conventional soil maps could be updated,respectively. Meanwhile,the updated soil maps based on the proposed training sample selection method are more consistent with the actual soil distribution in the Raffelson watershed.[Conclusion]It is concluded that the proposed method is an effective training sample selection method for data mining model to update conventional soil maps.关键词
训练样点/数据挖掘模型/传统土壤图更新/土壤-环境关系Key words
Training sample/Data mining model/Update conventional soil map/Soil-environmental relationships分类
天文与地球科学引用本文复制引用
刘雪琦,朱阿兴,杨琳,缪亚敏,曾灿英..土壤图更新中基于土壤类型面积分级的训练样点选择方法[J].土壤学报,2017,54(1):36-47,12.基金项目
国家自然科学基金项目(41431177,41471178)、江苏省高校自然科学研究重大项目(14KJA170001)、江苏省高校研究生科研创新计划项目(KYLX15_0715)、国家重点基础研究发展计划973项目(2015CB954102)和千人计划资助 Supported by the National Natural Science Foundation of China(Nos.41431177 and 41471178),the Natural Science Research Program of Jiangsu(No.14KJA170001),the Graduate Research Innovation Program of Jiangsu(No.KYLX15_0715),the National Basic Research Program of China(No.2015CB954102),and the“One-Thousand Talents”Program of China ()