生物信息学2011,Vol.9Issue(3):229-234,6.DOI:10.3969/j.issn.1672-5565.2011.03.013
基于CART算法的肺癌微阵列数据的分类
Classification based on CART algorithm for microarray data of lung cancer
摘要
Abstract
The gene chip technology is a significant tool in the genomics research. But the gene chip data ( microar-ray data) is often high -dimensional, make the dimensionality reduction a necessary step. In this paper, the mi-croarray data of lung cancer we analyze that provided by Gavin J. Gordon ect. From the Harvard Medical School. Firstly, t - test, Wilcoxon rank - sum test methods are used for feature selection to reduce the dimensionality of mi-croarray data; then according to CART (Classification and Regression Tree) algorithm, take Gini index as the error function, with the feature attributes fitting an extension to the classification tree, find the optimal size of the tree by pruning, improve the generalization performance of the tree to perfectly adapt to the new samples. Experimental results show; the recognition rate can be up to over 96% for lung cancer microarray data classification using our method, and is very stable; also discovery of significant rules which can be understand easily and key genes information for classification.关键词
微阵列数据/分类/决策树/CART算法Key words
Microarray data/Classification/Decision tree/CART algorithm分类
生物科学引用本文复制引用
陈磊,刘毅慧..基于CART算法的肺癌微阵列数据的分类[J].生物信息学,2011,9(3):229-234,6.基金项目
山东省自然科学基金项目(编号:Y2008G30). (编号:Y2008G30)