Introduction to Stochastic Context Free Grammars

Giegerich, Robert

doi:10.1007/978-1-62703-709-9_5

Robert Giegerich⁴

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1097))

8042 Accesses
4 Citations

Abstract

Stochastic context free grammars are a formalism which plays a prominent role in RNA secondary structure analysis. This chapter provides the theoretical background on stochastic context free grammars. We recall the general definitions and study the basic properties, virtues, and shortcomings of stochastic context free grammars. We then introduce two ways in which they are used in RNA secondary structure analysis, secondary structure prediction and RNA family modeling. This prepares for the discussion of applications of stochastic context free grammars in the chapters on Rfam (6), Pfold (8), and Infernal (9).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Softcover Book: USD 179.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Our definition is a bit more relaxed than the one found in the formal language literature.
2.
Amusingly, the natural language processing community consistently refers to it as the CKY algorithm. The reason for such confusion is that there is no joint paper by these three authors. Reference [6] tells the story behind CYK. This classical textbook presents CYK because of its “intuitive simplicity,” but remains “doubtful, however, that it will find practical use.” Those were the days when a state-of-the-art computer had 65K bytes of memory.
3.
When other types of scoring schemes are associated with G, such an efficiency improving transformation may not be possible. This has been called the yield parsing paradox in dynamic programming [7].
4.
The present URL of this tool is http://www.brics.dk/grammar/.

References

Eddy SR, Durbin R (1994) RNA sequence anaysis using covariance models. Nucleic Acids Res 22(11):2079–2088
Article CAS PubMed Central PubMed Google Scholar
Sakakibara Y, Brown M, Hughey R, Mian IS, Sjölander K, Underwood RC, Haussler D (1994) Stochastic context-free grammars for tRNA modeling. Nucleic Acids Res 22(23):5112–5120. URL http://www.hubmed.org/display.cgi?uids=7800507
Google Scholar
Booth TL, Thompson RA (1973) Applying probability measures to abstract languages. IEEE Trans Comput 22(5):442–450
Google Scholar
Baker JK (1979) Trainable grammars for speech recognition. J Acoust Soc Am 54–550
Google Scholar
Hopcroft JE, Ullman JD (1969) Formal languages and their relation to automata. Addison-Wesely, Reading, MA
Google Scholar
Aho AV, Ullman JD (1973) The theory of parsing, translation and compiling. Prentice-Hall, Englewood Cliffs, NJ. I and II.
Google Scholar
Giegerich R, Meyer C, Steffen P (2004) A discipline of dynamic programming over sequence data. Sci Comput Program 51(3): 215–263
Article Google Scholar
Durbin R, Eddy S, Krogh A, Mitchison G (1998) Biological sequence analysis, 2006 edn. Cambridge University Press, Cambridge
Google Scholar
Giegerich R (2000) Explaining and controlling ambiguity in dynamic programming. In: Proceedings of combinatorial pattern matching, vol 1848 of Lecture notes in computer science, pp 46–59. Springer, New York
Google Scholar
Dowell RD, Eddy SR (2004) Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction. BMC Bioinformatics 5:71–71. doi: 10.1186/1471-2105-5-71. URL http://www.hubmed.org/display.cgi?uids=15180907
Brejová B, Brown DG, Vinař T (2007) The most probable annotation problem in HMMs and its application to bioinformatics. J Comput Syst Sci 73(7):1060–1077
Article Google Scholar
Reeder J, Steffen P, Giegerich R (2005) Effective ambiguity checking in biosequence analysis. BMC Bioinformatics 6(153). URL http://www.biomedcentral.com/1471-2105/6/153
Knudsen B, Hein J (2003) Pfold: RNA secondary structure prediction using stochastic context-free grammars. Nucleic Acids Res 31(13):3423–3428. URL http://www.hubmed.org/display.cgi?uids=12824339
Nawrocki EP, Kolbe DL, Eddy SR (2009) Infernal 1.0: inference of RNA alignments. Bioinformatics 25(10):1335–1337. doi: 10.1093/bioinformatics/btp157. URL http://www.hubmed.org/display.cgi?uids=19307242
Nebel M, Scheid A (2011) Analysis of the free energy in a stochastic RNA secondary structure model. IEEE/ACM Trans Comput Biol Bioinformatics 8(6):1468–1482
Article CAS Google Scholar
Giegerich R, Höner zu Siederdissen C (2011) Semantics and ambiguity of stochastic rna family models. IEEE/ACM Trans Comput Biol Bioinformatics 8(2):499–516. ISSN 1545-5963. doi: http://doi.ieeecomputersociety.org/10.1109/TCBB.2010.12
Braband C, Giegerich R, Møller A (2010) Analyzing ambiguity of context-free grammars. Sci Comput Program 75(3):176–191. Earlier version in Proc. 12th International Conference on Implementation and Application of Automata, CIAA ’07, Springer LNCS vol. 4783
Google Scholar
Rivas E, Lang R, Eddy S (2012) A range of complex probabilistic models for RNA secondary structure prediction that includes the nearest-neighbor model and more. RNA 18:193–212
Article CAS PubMed Central PubMed Google Scholar
Sauthoff G, Janssen S, Giegerich R (2011) Bellman’s GAP - a declarative language for dynamic programming. In: Schneider-Kamp (ed) Principles and practice of declarative programming. ACM Press, New York, NY, pp 29–40
Google Scholar

Download references

Acknowledgements

Thanks go to Jan Reinkensmeier for a careful reading of this manuscript.

Author information

Authors and Affiliations

Faculty of Technology and Center of Biotechnology, Bielefeld University, Bielefeld, Germany
Robert Giegerich

Authors

Robert Giegerich
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for non-coding RNA in Technology and Health, IKVH University of Copenhagen, Frederiksberg, Denmark
Jan Gorodkin
University of Washington Dept. Computer Science & Engineering, Seattle, Washington, USA
Walter L. Ruzzo

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Giegerich, R. (2014). Introduction to Stochastic Context Free Grammars. In: Gorodkin, J., Ruzzo, W. (eds) RNA Sequence, Structure, and Function: Computational and Bioinformatic Methods. Methods in Molecular Biology, vol 1097. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-62703-709-9_5

Download citation

DOI: https://doi.org/10.1007/978-1-62703-709-9_5
Published: 02 December 2013
Publisher Name: Humana Press, Totowa, NJ
Print ISBN: 978-1-62703-708-2
Online ISBN: 978-1-62703-709-9
eBook Packages: Springer Protocols

Publish with us

Policies and ethics