| 注册
首页|期刊导航|南京大学学报(自然科学版)|基于基因表达谱的肿瘤样本分类规则提取

基于基因表达谱的肿瘤样本分类规则提取

李颖新 姜远 周志华

南京大学学报(自然科学版)2009,Vol.45Issue(5):613-619,7.
南京大学学报(自然科学版)2009,Vol.45Issue(5):613-619,7.

基于基因表达谱的肿瘤样本分类规则提取

Rule extraction for tumor/normal tissue classification based on microarray data

李颖新 1姜远 1周志华1

作者信息

  • 1. 南京大学软件新技术国家重点实验室,南京,210093
  • 折叠

摘要

Abstract

Classification rule extraction is an important technique for acquiring knowledge from data in the fields of machine learning and data mining. DNA microarray technology can monitor the expression patterns of thousands of genes simultaneously in a single experiment, and thus provides a successful way to a comprehensive understanding of the genetic alterations presented in tumors. Extracting rules from microarray data for distinguishing tumor tissue samples from normal ones can provide useful information to understand the underlying nature of carcinogenesis, and it also benefits the gene diagnosis of tumor. This work addresses the problem of extracting tumor/normal classification rules from broad patterns of gene expression profiles by employing a two-step strategy. The first step employed a feature selection method to remove the genes irrelevant to the tissue categories. In order to obtain accurate weights of genes for classification, a feature selection algorithm, RFE- Relief, was proposed based on the Relief algorithm and the strategy of 'Recursive Feature Elimination'. Multiple candidate gene subsets were generated. We used support vector machine as classifier to evaluate the classification abilities of these gene subsets by performing a cross-validation procedure on the training set, and selected the gene subset with the best classification performance as the feature subset for distinguishing Tumor/Normal tissue samples. The second step performed the CART algorithm to build a decision tree based on the expressions of genes of the feature subset, and then a prune algorithm was employed to obtain a reduced tree with improved generalization performance. We applied our method on a dataset containing multiple tumor tissues as well as their normal counterparts to extract rules for making accurate tissue classification. A set of rules represented by a decision tree for distinguishing tumor tissues from normal ones were obtained. We evaluated these rules on an independent test set and the results showed the good classification performance of these rules. In the end of the paper, these classification rules were also analyzed in detail to explore their classification information.

关键词

规则提取/特征选择/决策树/基凶表达谱/肿瘤

Key words

rule extraction/ feature selection/ decision tree/ gene expression profiles/ tumor

分类

信息技术与安全科学

引用本文复制引用

李颖新,姜远,周志华..基于基因表达谱的肿瘤样本分类规则提取[J].南京大学学报(自然科学版),2009,45(5):613-619,7.

基金项目

江苏省自然科学基金(BK2008018),江苏省博士后基金(0802001C) (BK2008018)

南京大学学报(自然科学版)

OACSCDCSTPCD

0469-5097

访问量0
|
下载量0
段落导航相关论文