Clinical Cancer Research CTRC-AACR San Antonio Breast Cancer Symposium
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Cancer Research Clinical Cancer Research
Cancer Epidemiology Biomarkers & Prevention Molecular Cancer Therapeutics
Molecular Cancer Research Cancer Prevention Research
Cancer Prevention Journals Portal Cancer Reviews Online
Annual Meeting Education Book Meeting Abstracts Online

Clinical Cancer Research 14, 108-114, January 1, 2008. doi: 10.1158/1078-0432.CCR-07-0443
© 2008 American Association for Cancer Research

This Article
Right arrow Full Text
Right arrow Full Text (PDF)
Right arrow Supplementary Data
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Dobbin, K. K.
Right arrow Articles by Simon, R. M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Dobbin, K. K.
Right arrow Articles by Simon, R. M.

Imaging, Diagnosis, Prognosis

How Large a Training Set is Needed to Develop a Classifier for Microarray Data?

Kevin K. Dobbin, Yingdong Zhao and Richard M. Simon

Authors' Affiliation: Biometric Research Branch, Division of Cancer Treatment and Diagnosis, National Cancer Institute, NIH, Rockville, Maryland

Requests for reprints: Kevin K. Dobbin, National Cancer Institute, 6130 Executive Boulevard, EPN Room 8124, Rockville, MD 20852. Phone: 301-451-6244; E-mail: dobbinke{at}mail.nih.gov.

Purpose: A common goal of gene expression microarray studies is the development of a classifier that can be used to divide patients into groups with different prognoses, or with different expected responses to a therapy. These types of classifiers are developed on a training set, which is the set of samples used to train a classifier. The question of how many samples are needed in the training set to produce a good classifier from high-dimensional microarray data is challenging.

Experimental Design: We present a model-based approach to determining the sample size required to adequately train a classifier.

Results: It is shown that sample size can be determined from three quantities: standardized fold change, class prevalence, and number of genes or features on the arrays. Numerous examples and important experimental design issues are discussed. The method is adapted to address ex post facto determination of whether the size of a training set used to develop a classifier was adequate. An interactive web site for performing the sample size calculations is provided.

Conclusion: We showed that sample size calculations for classifier development from high-dimensional microarray data are feasible, discussed numerous important considerations, and presented examples.




This article has been cited by other articles:


Home page
Clin. Cancer Res.Home page
R. Simon
The Use of Genomics in Clinical Trial Design
Clin. Cancer Res., October 1, 2008; 14(19): 5984 - 5993.
[Abstract] [Full Text] [PDF]




HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Cancer Research Clinical Cancer Research
Cancer Epidemiology Biomarkers & Prevention Molecular Cancer Therapeutics
Molecular Cancer Research Cancer Prevention Research
Cancer Prevention Journals Portal Cancer Reviews Online
Annual Meeting Education Book Meeting Abstracts Online
Copyright © 2008 by the American Association for Cancer Research.