Abstract
The edgeR package, an R-based tool within the Bioconductor project, offers a flexible statistical framework for detection of changes in abundance based on counts. In this chapter, we illustrate the use of edgeR on a human embryonic stem cell dataset, in particular for RNA-seq and ChIP-seq data. We focus on a step-by-step statistical analysis of differential expression, going from raw data to a list of putative differentially expressed genes and give examples of integrative analysis using the ChIP-seq data. We emphasize data quality spot checks and the use of positive controls throughout the process and give practical recommendations for reproducible research.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Anders S, McCarthy DJ, Chen Y, Okoniewski M, Smyth GK, Huber W, Robinson MD (2013) Count-based differential expression analysis of RNA sequencing data using R and Bioconductor. Nat Prot (in press)
Anders S, Reyes A, Huber W (2012) Detecting differential usage of exons from RNA-seq data. Genome Res Adv Ac (2008):1–19. ISSN 10889051
Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y (2008) RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res 18(9):1509–17. ISSN 10889051
McCarthy DJ, Chen Y, Smyth GK (2012) Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic Acids Res 1–10. ISSN 1362-4962
Robinson MD, Smyth GK (2007) Moderated statistical tests for assessing differences in tag abundance. Bioinformatics 23(21):2881–2887
Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini AJ, Sawitzki G, Smith C, Smyth G, Tierney L, Yang JYH, Zhang J (2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 5(10):R80
R Development Core Team R (2011) R: A language and environment for statistical computing. ISSN 16000706
Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL (2013) TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 14(4):R36. ISSN 14656906
Wu TD, Nacu S (2010) Fast and SNP-tolerant detection of complex variants and splicing in short reads. Bioinformatics 26(7):873–881
Wang K, Singh D, Zeng Z, Coleman SJ, Huang Y, Savich GL, He X, Mieczkowski P, Grimm SA, Perou CM, MacLeod JN, Chiang DY, Prins JF, Liu J (2010) MapSplice: Accurate mapping of RNA-seq reads for splice junction discovery. Nucleic Acids Res 38(18):e178
Liao Y, Smyth GK, Shi W (2013) The subread aligner: fast, accurate and scalable read mapping by seed-and-vote. Nucleic Acids Res 41(10):e108
Fiume M, Williams V, Brudno M (2010) Savant: Genome Browser for high throughput sequencing data. Bioinformatics 26(1):1–7
Fiume M, Smith EJM, Brook A, Strbenac D, Turner B, Mezlini AM, Robinson MD, Wodak SJ, Brudno M (2012) Savant Genome Browser 2: visualization and analysis for population-scale genomics. Nucleic Acids Res 40(W1):1–7. ISSN 13624962
Thorvaldsdóttir H, Robinson JT, Mesirov JP (2012) Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Brief Bioinform Adv pu:bbs017. ISSN 14774054.
Ihaka R, Gentleman R (1996) R: a language for data analysis and graphics. J Comput Graph Stat 5(3):299–314
Morgan M, Anders S, Lawrence M, Aboyoun P, Pagès H, Gentleman R (2009) ShortRead: a bioconductor package for input, quality assessment and exploration of high-throughput sequence data. Bioinformatics 25(19):2607–2608
Robinson MD, McCarthy DJ, Smyth GK (2010) edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26(1):139–140
Carlson M, Pages H, Aboyoun P, Falcon S, Morgan M, Sarkar D, Lawrence M. GenomicFeatures: Tools for making and manipulating transcript centric annotations
Lawrence M, Gentleman R, Carey V (2009) rtracklayer: an R package for interfacing with genome browsers. Bioinformatics 25:1841–1842
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R (2009) The sequence alignment/Map format and SAMtools. Bioinformatics 25(16):2078–2079
Xie R, Everett LJ, Lim H-W, Patel Na, Schug J, Kroon E, Kelly OG, Wang A, D’Amour Ka, Robins AJ, Won KJ, Kaestner KH, Sander M (2013) Dynamic chromatin remodeling mediated by polycomb proteins orchestrates pancreatic differentiation of human embryonic stem cells. Cell Stem Cell 12(2):224–37. ISSN 1875–9777
Lerch A, Gaiditzis D, Stadler MB (2012) QuasR: quantify and annotate short reads in R
Durinck S, Moreau Y, Kasprzyk A, Davis S, De Moor B, Brazma A, Huber W (2005) BioMart and bioconductor: a powerful link between biological databases and microarray data analysis. Bioinformatics 21(16):3439–40. ISSN 13674803
Durinck S, Spellman P, Birney E, Huber W (2009) Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt. Nat Protoc 4(8):1184–1191
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer Science+Business Media New York
About this protocol
Cite this protocol
Nikolayeva, O., Robinson, M.D. (2014). edgeR for Differential RNA-seq and ChIP-seq Analysis: An Application to Stem Cell Biology. In: Kidder, B. (eds) Stem Cell Transcriptional Networks. Methods in Molecular Biology, vol 1150. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-0512-6_3
Download citation
DOI: https://doi.org/10.1007/978-1-4939-0512-6_3
Published:
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-0511-9
Online ISBN: 978-1-4939-0512-6
eBook Packages: Springer Protocols