南京大学学报(自然科学版)2009,Vol.45Issue(5):613-619,7.
基于基因表达谱的肿瘤样本分类规则提取
Rule extraction for tumor/normal tissue classification based on microarray data
摘要
Abstract
Classification rule extraction is an important technique for acquiring knowledge from data in the fields of machine learning and data mining. DNA microarray technology can monitor the expression patterns of thousands of genes simultaneously in a single experiment, and thus provides a successful way to a comprehensive understanding of the genetic alterations presented in tumors. Extracting rules from microarray data for distinguishing tumor tissue samples from normal ones can provide useful information to understand the underlying nature of carcinogenesis, and it also benefits the gene diagnosis of tumor. This work addresses the problem of extracting tumor/normal classification rules from broad patterns of gene expression profiles by employing a two-step strategy. The first step employed a feature selection method to remove the genes irrelevant to the tissue categories. In order to obtain accurate weights of genes for classification, a feature selection algorithm, RFE- Relief, was proposed based on the Relief algorithm and the strategy of 'Recursive Feature Elimination'. Multiple candidate gene subsets were generated. We used support vector machine as classifier to evaluate the classification abilities of these gene subsets by performing a cross-validation procedure on the training set, and selected the gene subset with the best classification performance as the feature subset for distinguishing Tumor/Normal tissue samples. The second step performed the CART algorithm to build a decision tree based on the expressions of genes of the feature subset, and then a prune algorithm was employed to obtain a reduced tree with improved generalization performance. We applied our method on a dataset containing multiple tumor tissues as well as their normal counterparts to extract rules for making accurate tissue classification. A set of rules represented by a decision tree for distinguishing tumor tissues from normal ones were obtained. We evaluated these rules on an independent test set and the results showed the good classification performance of these rules. In the end of the paper, these classification rules were also analyzed in detail to explore their classification information.关键词
规则提取/特征选择/决策树/基凶表达谱/肿瘤Key words
rule extraction/ feature selection/ decision tree/ gene expression profiles/ tumor分类
信息技术与安全科学引用本文复制引用
李颖新,姜远,周志华..基于基因表达谱的肿瘤样本分类规则提取[J].南京大学学报(自然科学版),2009,45(5):613-619,7.基金项目
江苏省自然科学基金(BK2008018),江苏省博士后基金(0802001C) (BK2008018)