Skip to main content

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1097))

Abstract

Stochastic context free grammars are a formalism which plays a prominent role in RNA secondary structure analysis. This chapter provides the theoretical background on stochastic context free grammars. We recall the general definitions and study the basic properties, virtues, and shortcomings of stochastic context free grammars. We then introduce two ways in which they are used in RNA secondary structure analysis, secondary structure prediction and RNA family modeling. This prepares for the discussion of applications of stochastic context free grammars in the chapters on Rfam (6), Pfold (8), and Infernal (9).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Our definition is a bit more relaxed than the one found in the formal language literature.

  2. 2.

    Amusingly, the natural language processing community consistently refers to it as the CKY algorithm. The reason for such confusion is that there is no joint paper by these three authors. Reference [6] tells the story behind CYK. This classical textbook presents CYK because of its “intuitive simplicity,” but remains “doubtful, however, that it will find practical use.” Those were the days when a state-of-the-art computer had 65K bytes of memory.

  3. 3.

    When other types of scoring schemes are associated with G, such an efficiency improving transformation may not be possible. This has been called the yield parsing paradox in dynamic programming [7].

  4. 4.

    The present URL of this tool is http://www.brics.dk/grammar/.

References

  1. Eddy SR, Durbin R (1994) RNA sequence anaysis using covariance models. Nucleic Acids Res 22(11):2079–2088

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  2. Sakakibara Y, Brown M, Hughey R, Mian IS, Sjölander K, Underwood RC, Haussler D (1994) Stochastic context-free grammars for tRNA modeling. Nucleic Acids Res 22(23):5112–5120. URL http://www.hubmed.org/display.cgi?uids=7800507

    Google Scholar 

  3. Booth TL, Thompson RA (1973) Applying probability measures to abstract languages. IEEE Trans Comput 22(5):442–450

    Google Scholar 

  4. Baker JK (1979) Trainable grammars for speech recognition. J Acoust Soc Am 54–550

    Google Scholar 

  5. Hopcroft JE, Ullman JD (1969) Formal languages and their relation to automata. Addison-Wesely, Reading, MA

    Google Scholar 

  6. Aho AV, Ullman JD (1973) The theory of parsing, translation and compiling. Prentice-Hall, Englewood Cliffs, NJ. I and II.

    Google Scholar 

  7. Giegerich R, Meyer C, Steffen P (2004) A discipline of dynamic programming over sequence data. Sci Comput Program 51(3): 215–263

    Article  Google Scholar 

  8. Durbin R, Eddy S, Krogh A, Mitchison G (1998) Biological sequence analysis, 2006 edn. Cambridge University Press, Cambridge

    Google Scholar 

  9. Giegerich R (2000) Explaining and controlling ambiguity in dynamic programming. In: Proceedings of combinatorial pattern matching, vol 1848 of Lecture notes in computer science, pp 46–59. Springer, New York

    Google Scholar 

  10. Dowell RD, Eddy SR (2004) Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction. BMC Bioinformatics 5:71–71. doi: 10.1186/1471-2105-5-71. URL http://www.hubmed.org/display.cgi?uids=15180907

  11. Brejová B, Brown DG, Vinař T (2007) The most probable annotation problem in HMMs and its application to bioinformatics. J Comput Syst Sci 73(7):1060–1077

    Article  Google Scholar 

  12. Reeder J, Steffen P, Giegerich R (2005) Effective ambiguity checking in biosequence analysis. BMC Bioinformatics 6(153). URL http://www.biomedcentral.com/1471-2105/6/153

  13. Knudsen B, Hein J (2003) Pfold: RNA secondary structure prediction using stochastic context-free grammars. Nucleic Acids Res 31(13):3423–3428. URL http://www.hubmed.org/display.cgi?uids=12824339

  14. Nawrocki EP, Kolbe DL, Eddy SR (2009) Infernal 1.0: inference of RNA alignments. Bioinformatics 25(10):1335–1337. doi: 10.1093/bioinformatics/btp157. URL http://www.hubmed.org/display.cgi?uids=19307242

  15. Nebel M, Scheid A (2011) Analysis of the free energy in a stochastic RNA secondary structure model. IEEE/ACM Trans Comput Biol Bioinformatics 8(6):1468–1482

    Article  CAS  Google Scholar 

  16. Giegerich R, Höner zu Siederdissen C (2011) Semantics and ambiguity of stochastic rna family models. IEEE/ACM Trans Comput Biol Bioinformatics 8(2):499–516. ISSN 1545-5963. doi: http://doi.ieeecomputersociety.org/10.1109/TCBB.2010.12

  17. Braband C, Giegerich R, Møller A (2010) Analyzing ambiguity of context-free grammars. Sci Comput Program 75(3):176–191. Earlier version in Proc. 12th International Conference on Implementation and Application of Automata, CIAA ’07, Springer LNCS vol. 4783

    Google Scholar 

  18. Rivas E, Lang R, Eddy S (2012) A range of complex probabilistic models for RNA secondary structure prediction that includes the nearest-neighbor model and more. RNA 18:193–212

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  19. Sauthoff G, Janssen S, Giegerich R (2011) Bellman’s GAP - a declarative language for dynamic programming. In: Schneider-Kamp (ed) Principles and practice of declarative programming. ACM Press, New York, NY, pp 29–40

    Google Scholar 

Download references

Acknowledgements

Thanks go to Jan Reinkensmeier for a careful reading of this manuscript.

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer Science+Business Media New York

About this protocol

Cite this protocol

Giegerich, R. (2014). Introduction to Stochastic Context Free Grammars. In: Gorodkin, J., Ruzzo, W. (eds) RNA Sequence, Structure, and Function: Computational and Bioinformatic Methods. Methods in Molecular Biology, vol 1097. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-62703-709-9_5

Download citation

  • DOI: https://doi.org/10.1007/978-1-62703-709-9_5

  • Published:

  • Publisher Name: Humana Press, Totowa, NJ

  • Print ISBN: 978-1-62703-708-2

  • Online ISBN: 978-1-62703-709-9

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics