Skip to main content
Log in

Statistical and Computational Methods for High-Throughput Sequencing Data Analysis of Alternative Splicing

  • Published:
Statistics in Biosciences Aims and scope Submit manuscript

Abstract

The burgeoning field of high-throughput sequencing significantly improves our ability to understand the complexity of transcriptomes. Alternative splicing, as one of the most important driving forces for transcriptome diversity, can now be studied at an unprecedent resolution. Efficient and powerful computational and statistical methods are in urgent need to facilitate the characterization and quantification of alternative splicing events. Here we discuss methods in splice junction read mapping, and methods in exon-centric or isoform-centric quantification of alternative splicing. In addition, we discuss HITS-CLIP and splicing QTL analyses which are novel high-throughput sequencing based approaches in the dissection of splicing regulation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Au K, Jiang H, Lin L, Xing Y, Wong W (2010) Detection of splice junctions from paired-end RNA-seq data by SpliceMap. Nucleic Acids Res 38(14):4570–4578

    Article  Google Scholar 

  2. Benjamini Y, Speed T (2012) Summarizing and correcting the GC content bias in high-throughput sequencing. Nucleic Acids Res. Epub ahead of print

  3. Birol I, Jackman S, Nielsen C, Qian J, Varhol R, Stazyk G, Morin R, Zhao Y, Hirst M, Schein J (2009) De novo transcriptome assembly with ABySS. Bioinformatics 25(21):2872–2877

    Article  Google Scholar 

  4. Black D (2000) Protein diversity from alternative splicing: A challenge for bioinformatics and post-genome biology. Cell 103(3):367–370

    Article  Google Scholar 

  5. Bullard J, Purdom E, Hansen K, Dudoit S (2010) Evaluation of statistical methods for normalization and differential expression in mRNA-seq experiments. BMC Bioinform 11:(94)

    Google Scholar 

  6. Burset M, Seledtsov I, Solovyev V (2000) Analysis of canonical and non-canonical splice sites in mammalian genomes. Nucleic Acids Res 28(21):4364–4375

    Article  Google Scholar 

  7. Castle J, Zhang C, Shah J, Kulkarni A, Kalsotra A, Cooper T, Johnson J (2008) Expression of 24,426 human alternative splicing events and predicted cis regulation in 48 tissues and cell lines. Nat Genet 40(12):1416–1425

    Article  Google Scholar 

  8. Chi S, Zang J, Mele A, Darnell R (2009) Argonaute HITS-CLIP decodes microRNA-mRNA interaction maps. Nature 460(7254):479–486

    Google Scholar 

  9. Consul P, Jain G (1973) Generalization of Poisson distribution. Technometrics 15(4):791–799

    Article  MathSciNet  MATH  Google Scholar 

  10. Cooper T, Wan L, Dreyfuss G (2009) RNA and disease. Cell 136(4):777–793

    Article  Google Scholar 

  11. Coulombe-Huntington J, Lam K, Dias C, Majewski J (2009) Fine-scale variation and genetic determinants of alternative splicing across individuals. PLoS Genet 5(12):e1000766

    Article  Google Scholar 

  12. Darnell R (2010) HITS-CLIP: Panoramic views of protein-RNA regulation in living cells. Wiley Interdiscip Rev RNA 1(2):266–286

    Article  Google Scholar 

  13. Gonzalez-Porta M, Calvo M, Sammeth M, Guigo R (2011) Estimation of alternative splicing variability in human populations. Genome Res, page Epub ahead of print

  14. Grant G, Farkas M, Pizarro A, Lahens N, Schug J, Brunk B, Stoeckert C, Hogenesch J, Pierce E (2011) Comparative analysis of RNA-seq alignment algorithms and the RNA-seq unified mapper (RUM). Bioinformatics 27(18):2518–2528

    Google Scholar 

  15. Guttman M, Garber M, Levin J, Donaghey J, Robinson J, Adiconis X, Fan L, Koziol M, Gnirke A, Nusbaum C (2010) Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nat Biotechnol 28(5):503–510

    Article  Google Scholar 

  16. Hansen K, Brenner S, Dudoit S (2010) Biases in illumina transcriptome sequencing caused by random hexamer priming. Nucleic Acids Res 38(12):e131

    Article  Google Scholar 

  17. Holste D, Huo G, Tung V, Burge C (2006) Hollywood: A comparative relational database of alternative splicing. Nucleic Acids Res 34:D56–62

    Article  Google Scholar 

  18. Huang H, Horng J, Lin F, Chang Y, Huang C (2005) SpliceInfo: An information repository for mRNA alternative splicing in human genome. Nucleic Acids Res 33:D80–85

    Article  Google Scholar 

  19. Jiang H, Wong W (2009) Statistical inferences for isoform expression in RNA-seq. Bioinformatics 25(8):1026–1032

    Article  Google Scholar 

  20. Katz Y, Wang E, Airoldi E, Burge C (2010) Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nat Methods 7(12):1009–1015

    Article  Google Scholar 

  21. Konig J, Zarnack K, Rot G, Curk T, Kayikci M, Zupan B, Turner D, Luscombe N, Ule J (2010) iClip reveals the function of hnRNP particles in splicing at individual nucleotide resolution. Nat Struct Mol Biol 17(7):909–915

    Article  Google Scholar 

  22. Kwan T, Benovoy D, Dias C, Gurd S, Serre D, Zuzan H, Clark T, Schweitzer A, Staples M, Wang H (2007) Heritability of alternative splicing in the human genome. Genome Res 17(8):1210–1218

    Article  Google Scholar 

  23. Langmead B, Trapnell C, Pop M, Salzberg S (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10(3):R25

    Article  Google Scholar 

  24. Lee B, Tan T, Ranganathan S (2004) DEDB: A database of Drosophila melanogaster exons in splicing graph form. BMC Bioinform 5:189

    Article  Google Scholar 

  25. Leipzig J, Pevzner P, Heber S (2004) The alternative splicing gallery (ASG): Bridging the gap between genome and transcriptome. Nucleic Acids Res 32(13):3977–3983

    Article  Google Scholar 

  26. Li B, Ruotti V, Stewart R, Thomson J, Dewey C (2010) RNA-seq gene expression estimation with read mapping uncertainty. Bioinformatics 26(4):493–500

    Article  Google Scholar 

  27. Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25(14):1754–1760

    Article  Google Scholar 

  28. Li H, Ruan J, Durbin R (2008) Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res 18(11):1851–1858

    Article  Google Scholar 

  29. Li R, Yu C, Li Y, Lam T, Yiu S, Kristiansen K, Wang J (2009) SOAP2: An improved ultrafast tool for short read alignment. Bioinformatics 25(15):1966–1967

    Article  Google Scholar 

  30. Licatalosi D, Darnell R (2006) Splicing regulation in neurologic disease. Neuron 52(1):93–101

    Article  Google Scholar 

  31. Licatalosi D, Mele A, Fak J, Ule J, Kayikci M, Chi S, Clark T, Schweitzer A, Blume J, Wang X (2008) HITS-CLIP yields genome-wide insights into brain alternative RNA processing. Nature 456(7221):464–469

    Article  Google Scholar 

  32. Lunter G, Goodson M (2011) Stampy: A statistical algorithm for sensitive and fast mapping of illumina sequence reads. Genome Res 21(6):936–939

    Article  Google Scholar 

  33. Montgomery S, Sammeth M, Gutierrez-Arcelus M, Lach R, Ingle C, Nisbett J, Guigo R, Dermitzakis E (2010) Transcriptome genetics using second generation sequencing in a Caucasian population. Nature 464(7289):773–777

    Article  Google Scholar 

  34. Mortazavi A, Williams B, McCue K, Schaeffer L, Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-seq. Nat Methods 5(7):621–628

    Article  Google Scholar 

  35. Oshlack A, Wakefield M (2009) Transcript length bias in RNA-seq data confounds systems biology. Biol Direct 4:14

    Article  Google Scholar 

  36. Pan Q, Shai O, Lee L, Frey B, Blencowe B (2008) Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat Genet 40(12):1413–1415

    Article  Google Scholar 

  37. Pickrell J, Marioni J, Pai A, Degner J, Engelhardt B, Nkadori E, Veyrieras J, Stephens M, Gilad Y, Pritchard J (2010) Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature 464(7289):768–772

    Article  Google Scholar 

  38. Quail M, Kozarewa I, Smith F, Scally A, Stephens P, Durbin R, Swerdlow H, Turner D (2008) A large genome center’s improvements to the illumina sequencing system. Nat Methods 5(12):1005–1010

    Article  Google Scholar 

  39. Raponi M, Baralle D (2010) Alternative splicing: Good and bad effects of translationally silent substitutions. FEBS J 277(4):836–840

    Article  Google Scholar 

  40. Roberts A, Trapnell C, Donaghey J, Rinn J, Pachter L (2011) Improving RNA-seq expression estimates by correcting for fragment bias. Genome Biol 12(3):R22

    Article  Google Scholar 

  41. Robertson G, Schein J, Chiu R, Corbett R, Field M, Jackman S, Mungall K, Lee S, Okada H, Qian J (2010) De novo assembly and analysis of RNA-seq data. Nat Methods 7(11):909–912

    Article  Google Scholar 

  42. Sanford J, Wang X, Mort M, Vanduyn N, Cooper D, Mooney S, Edenberg H, Liu Y (2009) Splicing factor SFRS1 recognizes a functionally diverse landscape of RNA transcripts. Genome Res 19(3):381–394

    Article  Google Scholar 

  43. Schulz M, Zerbino D, Vingron M, Birney E (2012) Oases: Robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics, Epub ahead of print

  44. Sinha R, Lenser T, Jahn N, Gausmann U, Friedel S, Szafranski K, Huse K, Rosenstiel P, Hampe J, Schuster S, Hiller M, Backofen R, Platzer M (2010) TassDB2—A comprehensive database of subtle alternative splicing events. BMC Bioinform 11:216

    Article  Google Scholar 

  45. Srivastava S, Chen L (2010) A two-parameter generalized Poisson model to improve the analysis of RNA-seq data. Nucleic Acids Res 38(17):e170

    Article  Google Scholar 

  46. Stamm S, Riethoven J, Le Texier V, Gopalakrishnan C, Kumanduri V, Tang Y, Barbosa-Morais N, Thanaraj T (2006) ASD: A bioinformatics resource on alternative splicing. Nucleic Acids Res 34:D46–55

    Article  Google Scholar 

  47. Takeda J, Suzuki Y, Sakate R, Sato Y, Gojobori T, Imanishi T, Sugano S (2010) H-DBAS: Human-transcriptome database for alternative splicing: update 2010. Nucleic Acids Res 38:D86–90

    Article  Google Scholar 

  48. Trapnell C, Pachter L, Salzberg S (2009) TopHat: Discovering splice junctions with RNA-seq. Bioinformatics 25(9):1105–1111

    Article  Google Scholar 

  49. Trapnell C, Williams B, Pertea G, Mortazavi A, Kwan G, van Baren MJ, Salzberg S, Wold B, Pachter L (2010) Transcript assembly and quantification by RNA-seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol 28(5):511–515

    Article  Google Scholar 

  50. Ule J, Jensen K, Ruggiu M, Mele A, Ule A, Darnell R (2003) Clip identifies nova-regulated RNA networks in the brain. Science 302(5648):1212–1215

    Article  Google Scholar 

  51. Wang E, Sandberg R, Luo S, Khrebtukova I, Zhang L, Mayr C, Kingsmore S, Schroth G, Burge C (2008) Alternative isoform regulation in human tissue transcriptomes. Nature 456(7221):470–476

    Article  Google Scholar 

  52. Wang G, Cooper T (2007) Splicing in disease: Disruption of the splicing code and the decoding machinery. Nat Rev Genet 8(10):749–761

    Article  Google Scholar 

  53. Wang K, Singh D, Zeng Z, Coleman S, Huang Y, Savich G, He X, Mieczkowski P, Grimm S, Perou C (2010) MapSplice: Accurate mapping of RNA-seq reads for splice junction discovery. Nucleic Acids Res 38(18):e178

    Article  Google Scholar 

  54. Wu J, Akerman M, Sun S, McCombie W, Krainer A, Zhang M (2011) SpliceTrap: A method to quantify alternative splicing under single cellular conditions. Bioinformatics 27(21):3010–3016

    Article  Google Scholar 

  55. Xue Y, Zhou Y, Wu T, Zhu T, Ji X, Kwon Y, Zhang C, Yeo G, Black D, Sun H (2009) Genome-wide analysis of PTB-RNA interactions reveals a strategy used by the general splicing repressor to modulate exon inclusion or skipping. Mol Cell 36(6):996–1006

    Article  Google Scholar 

  56. Yeo G, Coufal N, Liang T, Peng G, Fu X, Gage F (2009) An RNA code for the FOX2 splicing regulator revealed by mapping RNA-protein interactions in stem cells. Nat Struct Mol Biol 16(2):130–137

    Article  Google Scholar 

  57. Zhang C, Frias M, Mele A, Ruggiu M, Eom T, Marney C, Wang H, Licatalosi D, Fak J, Darnell R (2010) Integrative modeling defines the nova splicing-regulatory network and its combinatorial controls. Science 329(5990):439–443

    Article  Google Scholar 

  58. Zhao Q, Wang Y, Kong Y, Luo D, Li X, Hao P (2011) Optimizing de novo transcriptome assembly from short-read RNA-seq data: A comparative study. BMC Bioinform 12(Suppl 14):S2

    Article  Google Scholar 

  59. Zheng S, Chen L (2009) A hierarchical bayesian model for comparing transcriptomes at the individual transcript isoform level. Nucleic Acids Res 37(10):e75

    Article  MathSciNet  Google Scholar 

  60. Zheng W, Chung L, Zhao H (2011) Bias detection and correction in RNA-sequencing data. BMC Bioinform 12:290

    Article  Google Scholar 

  61. Zisoulis D, Lovci M, Wilbert M, Hutt K, Liang T, Pasquinelli A, Yeo G (2010) Comprehensive discovery of endogenous argonaute binding sites in Caenorhabditis elegans. Nat Struct Mol Biol 17(2):173–179

    Article  Google Scholar 

Download references

Acknowledgement

L. Chen is supported in part by the NIH Grant R01GM097230.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Liang Chen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, L. Statistical and Computational Methods for High-Throughput Sequencing Data Analysis of Alternative Splicing. Stat Biosci 5, 138–155 (2013). https://doi.org/10.1007/s12561-012-9064-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12561-012-9064-7

Keywords

Navigation