Abstract
The burgeoning field of high-throughput sequencing significantly improves our ability to understand the complexity of transcriptomes. Alternative splicing, as one of the most important driving forces for transcriptome diversity, can now be studied at an unprecedent resolution. Efficient and powerful computational and statistical methods are in urgent need to facilitate the characterization and quantification of alternative splicing events. Here we discuss methods in splice junction read mapping, and methods in exon-centric or isoform-centric quantification of alternative splicing. In addition, we discuss HITS-CLIP and splicing QTL analyses which are novel high-throughput sequencing based approaches in the dissection of splicing regulation.
Similar content being viewed by others
References
Au K, Jiang H, Lin L, Xing Y, Wong W (2010) Detection of splice junctions from paired-end RNA-seq data by SpliceMap. Nucleic Acids Res 38(14):4570–4578
Benjamini Y, Speed T (2012) Summarizing and correcting the GC content bias in high-throughput sequencing. Nucleic Acids Res. Epub ahead of print
Birol I, Jackman S, Nielsen C, Qian J, Varhol R, Stazyk G, Morin R, Zhao Y, Hirst M, Schein J (2009) De novo transcriptome assembly with ABySS. Bioinformatics 25(21):2872–2877
Black D (2000) Protein diversity from alternative splicing: A challenge for bioinformatics and post-genome biology. Cell 103(3):367–370
Bullard J, Purdom E, Hansen K, Dudoit S (2010) Evaluation of statistical methods for normalization and differential expression in mRNA-seq experiments. BMC Bioinform 11:(94)
Burset M, Seledtsov I, Solovyev V (2000) Analysis of canonical and non-canonical splice sites in mammalian genomes. Nucleic Acids Res 28(21):4364–4375
Castle J, Zhang C, Shah J, Kulkarni A, Kalsotra A, Cooper T, Johnson J (2008) Expression of 24,426 human alternative splicing events and predicted cis regulation in 48 tissues and cell lines. Nat Genet 40(12):1416–1425
Chi S, Zang J, Mele A, Darnell R (2009) Argonaute HITS-CLIP decodes microRNA-mRNA interaction maps. Nature 460(7254):479–486
Consul P, Jain G (1973) Generalization of Poisson distribution. Technometrics 15(4):791–799
Cooper T, Wan L, Dreyfuss G (2009) RNA and disease. Cell 136(4):777–793
Coulombe-Huntington J, Lam K, Dias C, Majewski J (2009) Fine-scale variation and genetic determinants of alternative splicing across individuals. PLoS Genet 5(12):e1000766
Darnell R (2010) HITS-CLIP: Panoramic views of protein-RNA regulation in living cells. Wiley Interdiscip Rev RNA 1(2):266–286
Gonzalez-Porta M, Calvo M, Sammeth M, Guigo R (2011) Estimation of alternative splicing variability in human populations. Genome Res, page Epub ahead of print
Grant G, Farkas M, Pizarro A, Lahens N, Schug J, Brunk B, Stoeckert C, Hogenesch J, Pierce E (2011) Comparative analysis of RNA-seq alignment algorithms and the RNA-seq unified mapper (RUM). Bioinformatics 27(18):2518–2528
Guttman M, Garber M, Levin J, Donaghey J, Robinson J, Adiconis X, Fan L, Koziol M, Gnirke A, Nusbaum C (2010) Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nat Biotechnol 28(5):503–510
Hansen K, Brenner S, Dudoit S (2010) Biases in illumina transcriptome sequencing caused by random hexamer priming. Nucleic Acids Res 38(12):e131
Holste D, Huo G, Tung V, Burge C (2006) Hollywood: A comparative relational database of alternative splicing. Nucleic Acids Res 34:D56–62
Huang H, Horng J, Lin F, Chang Y, Huang C (2005) SpliceInfo: An information repository for mRNA alternative splicing in human genome. Nucleic Acids Res 33:D80–85
Jiang H, Wong W (2009) Statistical inferences for isoform expression in RNA-seq. Bioinformatics 25(8):1026–1032
Katz Y, Wang E, Airoldi E, Burge C (2010) Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nat Methods 7(12):1009–1015
Konig J, Zarnack K, Rot G, Curk T, Kayikci M, Zupan B, Turner D, Luscombe N, Ule J (2010) iClip reveals the function of hnRNP particles in splicing at individual nucleotide resolution. Nat Struct Mol Biol 17(7):909–915
Kwan T, Benovoy D, Dias C, Gurd S, Serre D, Zuzan H, Clark T, Schweitzer A, Staples M, Wang H (2007) Heritability of alternative splicing in the human genome. Genome Res 17(8):1210–1218
Langmead B, Trapnell C, Pop M, Salzberg S (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10(3):R25
Lee B, Tan T, Ranganathan S (2004) DEDB: A database of Drosophila melanogaster exons in splicing graph form. BMC Bioinform 5:189
Leipzig J, Pevzner P, Heber S (2004) The alternative splicing gallery (ASG): Bridging the gap between genome and transcriptome. Nucleic Acids Res 32(13):3977–3983
Li B, Ruotti V, Stewart R, Thomson J, Dewey C (2010) RNA-seq gene expression estimation with read mapping uncertainty. Bioinformatics 26(4):493–500
Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25(14):1754–1760
Li H, Ruan J, Durbin R (2008) Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res 18(11):1851–1858
Li R, Yu C, Li Y, Lam T, Yiu S, Kristiansen K, Wang J (2009) SOAP2: An improved ultrafast tool for short read alignment. Bioinformatics 25(15):1966–1967
Licatalosi D, Darnell R (2006) Splicing regulation in neurologic disease. Neuron 52(1):93–101
Licatalosi D, Mele A, Fak J, Ule J, Kayikci M, Chi S, Clark T, Schweitzer A, Blume J, Wang X (2008) HITS-CLIP yields genome-wide insights into brain alternative RNA processing. Nature 456(7221):464–469
Lunter G, Goodson M (2011) Stampy: A statistical algorithm for sensitive and fast mapping of illumina sequence reads. Genome Res 21(6):936–939
Montgomery S, Sammeth M, Gutierrez-Arcelus M, Lach R, Ingle C, Nisbett J, Guigo R, Dermitzakis E (2010) Transcriptome genetics using second generation sequencing in a Caucasian population. Nature 464(7289):773–777
Mortazavi A, Williams B, McCue K, Schaeffer L, Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-seq. Nat Methods 5(7):621–628
Oshlack A, Wakefield M (2009) Transcript length bias in RNA-seq data confounds systems biology. Biol Direct 4:14
Pan Q, Shai O, Lee L, Frey B, Blencowe B (2008) Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat Genet 40(12):1413–1415
Pickrell J, Marioni J, Pai A, Degner J, Engelhardt B, Nkadori E, Veyrieras J, Stephens M, Gilad Y, Pritchard J (2010) Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature 464(7289):768–772
Quail M, Kozarewa I, Smith F, Scally A, Stephens P, Durbin R, Swerdlow H, Turner D (2008) A large genome center’s improvements to the illumina sequencing system. Nat Methods 5(12):1005–1010
Raponi M, Baralle D (2010) Alternative splicing: Good and bad effects of translationally silent substitutions. FEBS J 277(4):836–840
Roberts A, Trapnell C, Donaghey J, Rinn J, Pachter L (2011) Improving RNA-seq expression estimates by correcting for fragment bias. Genome Biol 12(3):R22
Robertson G, Schein J, Chiu R, Corbett R, Field M, Jackman S, Mungall K, Lee S, Okada H, Qian J (2010) De novo assembly and analysis of RNA-seq data. Nat Methods 7(11):909–912
Sanford J, Wang X, Mort M, Vanduyn N, Cooper D, Mooney S, Edenberg H, Liu Y (2009) Splicing factor SFRS1 recognizes a functionally diverse landscape of RNA transcripts. Genome Res 19(3):381–394
Schulz M, Zerbino D, Vingron M, Birney E (2012) Oases: Robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics, Epub ahead of print
Sinha R, Lenser T, Jahn N, Gausmann U, Friedel S, Szafranski K, Huse K, Rosenstiel P, Hampe J, Schuster S, Hiller M, Backofen R, Platzer M (2010) TassDB2—A comprehensive database of subtle alternative splicing events. BMC Bioinform 11:216
Srivastava S, Chen L (2010) A two-parameter generalized Poisson model to improve the analysis of RNA-seq data. Nucleic Acids Res 38(17):e170
Stamm S, Riethoven J, Le Texier V, Gopalakrishnan C, Kumanduri V, Tang Y, Barbosa-Morais N, Thanaraj T (2006) ASD: A bioinformatics resource on alternative splicing. Nucleic Acids Res 34:D46–55
Takeda J, Suzuki Y, Sakate R, Sato Y, Gojobori T, Imanishi T, Sugano S (2010) H-DBAS: Human-transcriptome database for alternative splicing: update 2010. Nucleic Acids Res 38:D86–90
Trapnell C, Pachter L, Salzberg S (2009) TopHat: Discovering splice junctions with RNA-seq. Bioinformatics 25(9):1105–1111
Trapnell C, Williams B, Pertea G, Mortazavi A, Kwan G, van Baren MJ, Salzberg S, Wold B, Pachter L (2010) Transcript assembly and quantification by RNA-seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol 28(5):511–515
Ule J, Jensen K, Ruggiu M, Mele A, Ule A, Darnell R (2003) Clip identifies nova-regulated RNA networks in the brain. Science 302(5648):1212–1215
Wang E, Sandberg R, Luo S, Khrebtukova I, Zhang L, Mayr C, Kingsmore S, Schroth G, Burge C (2008) Alternative isoform regulation in human tissue transcriptomes. Nature 456(7221):470–476
Wang G, Cooper T (2007) Splicing in disease: Disruption of the splicing code and the decoding machinery. Nat Rev Genet 8(10):749–761
Wang K, Singh D, Zeng Z, Coleman S, Huang Y, Savich G, He X, Mieczkowski P, Grimm S, Perou C (2010) MapSplice: Accurate mapping of RNA-seq reads for splice junction discovery. Nucleic Acids Res 38(18):e178
Wu J, Akerman M, Sun S, McCombie W, Krainer A, Zhang M (2011) SpliceTrap: A method to quantify alternative splicing under single cellular conditions. Bioinformatics 27(21):3010–3016
Xue Y, Zhou Y, Wu T, Zhu T, Ji X, Kwon Y, Zhang C, Yeo G, Black D, Sun H (2009) Genome-wide analysis of PTB-RNA interactions reveals a strategy used by the general splicing repressor to modulate exon inclusion or skipping. Mol Cell 36(6):996–1006
Yeo G, Coufal N, Liang T, Peng G, Fu X, Gage F (2009) An RNA code for the FOX2 splicing regulator revealed by mapping RNA-protein interactions in stem cells. Nat Struct Mol Biol 16(2):130–137
Zhang C, Frias M, Mele A, Ruggiu M, Eom T, Marney C, Wang H, Licatalosi D, Fak J, Darnell R (2010) Integrative modeling defines the nova splicing-regulatory network and its combinatorial controls. Science 329(5990):439–443
Zhao Q, Wang Y, Kong Y, Luo D, Li X, Hao P (2011) Optimizing de novo transcriptome assembly from short-read RNA-seq data: A comparative study. BMC Bioinform 12(Suppl 14):S2
Zheng S, Chen L (2009) A hierarchical bayesian model for comparing transcriptomes at the individual transcript isoform level. Nucleic Acids Res 37(10):e75
Zheng W, Chung L, Zhao H (2011) Bias detection and correction in RNA-sequencing data. BMC Bioinform 12:290
Zisoulis D, Lovci M, Wilbert M, Hutt K, Liang T, Pasquinelli A, Yeo G (2010) Comprehensive discovery of endogenous argonaute binding sites in Caenorhabditis elegans. Nat Struct Mol Biol 17(2):173–179
Acknowledgement
L. Chen is supported in part by the NIH Grant R01GM097230.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chen, L. Statistical and Computational Methods for High-Throughput Sequencing Data Analysis of Alternative Splicing. Stat Biosci 5, 138–155 (2013). https://doi.org/10.1007/s12561-012-9064-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12561-012-9064-7