Abstract
Stochastic context free grammars are a formalism which plays a prominent role in RNA secondary structure analysis. This chapter provides the theoretical background on stochastic context free grammars. We recall the general definitions and study the basic properties, virtues, and shortcomings of stochastic context free grammars. We then introduce two ways in which they are used in RNA secondary structure analysis, secondary structure prediction and RNA family modeling. This prepares for the discussion of applications of stochastic context free grammars in the chapters on Rfam (6), Pfold (8), and Infernal (9).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Our definition is a bit more relaxed than the one found in the formal language literature.
- 2.
Amusingly, the natural language processing community consistently refers to it as the CKY algorithm. The reason for such confusion is that there is no joint paper by these three authors. Reference [6] tells the story behind CYK. This classical textbook presents CYK because of its “intuitive simplicity,” but remains “doubtful, however, that it will find practical use.” Those were the days when a state-of-the-art computer had 65K bytes of memory.
- 3.
When other types of scoring schemes are associated with G, such an efficiency improving transformation may not be possible. This has been called the yield parsing paradox in dynamic programming [7].
- 4.
The present URL of this tool is http://www.brics.dk/grammar/.
References
Eddy SR, Durbin R (1994) RNA sequence anaysis using covariance models. Nucleic Acids Res 22(11):2079–2088
Sakakibara Y, Brown M, Hughey R, Mian IS, Sjölander K, Underwood RC, Haussler D (1994) Stochastic context-free grammars for tRNA modeling. Nucleic Acids Res 22(23):5112–5120. URL http://www.hubmed.org/display.cgi?uids=7800507
Booth TL, Thompson RA (1973) Applying probability measures to abstract languages. IEEE Trans Comput 22(5):442–450
Baker JK (1979) Trainable grammars for speech recognition. J Acoust Soc Am 54–550
Hopcroft JE, Ullman JD (1969) Formal languages and their relation to automata. Addison-Wesely, Reading, MA
Aho AV, Ullman JD (1973) The theory of parsing, translation and compiling. Prentice-Hall, Englewood Cliffs, NJ. I and II.
Giegerich R, Meyer C, Steffen P (2004) A discipline of dynamic programming over sequence data. Sci Comput Program 51(3): 215–263
Durbin R, Eddy S, Krogh A, Mitchison G (1998) Biological sequence analysis, 2006 edn. Cambridge University Press, Cambridge
Giegerich R (2000) Explaining and controlling ambiguity in dynamic programming. In: Proceedings of combinatorial pattern matching, vol 1848 of Lecture notes in computer science, pp 46–59. Springer, New York
Dowell RD, Eddy SR (2004) Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction. BMC Bioinformatics 5:71–71. doi: 10.1186/1471-2105-5-71. URL http://www.hubmed.org/display.cgi?uids=15180907
Brejová B, Brown DG, Vinař T (2007) The most probable annotation problem in HMMs and its application to bioinformatics. J Comput Syst Sci 73(7):1060–1077
Reeder J, Steffen P, Giegerich R (2005) Effective ambiguity checking in biosequence analysis. BMC Bioinformatics 6(153). URL http://www.biomedcentral.com/1471-2105/6/153
Knudsen B, Hein J (2003) Pfold: RNA secondary structure prediction using stochastic context-free grammars. Nucleic Acids Res 31(13):3423–3428. URL http://www.hubmed.org/display.cgi?uids=12824339
Nawrocki EP, Kolbe DL, Eddy SR (2009) Infernal 1.0: inference of RNA alignments. Bioinformatics 25(10):1335–1337. doi: 10.1093/bioinformatics/btp157. URL http://www.hubmed.org/display.cgi?uids=19307242
Nebel M, Scheid A (2011) Analysis of the free energy in a stochastic RNA secondary structure model. IEEE/ACM Trans Comput Biol Bioinformatics 8(6):1468–1482
Giegerich R, Höner zu Siederdissen C (2011) Semantics and ambiguity of stochastic rna family models. IEEE/ACM Trans Comput Biol Bioinformatics 8(2):499–516. ISSN 1545-5963. doi: http://doi.ieeecomputersociety.org/10.1109/TCBB.2010.12
Braband C, Giegerich R, Møller A (2010) Analyzing ambiguity of context-free grammars. Sci Comput Program 75(3):176–191. Earlier version in Proc. 12th International Conference on Implementation and Application of Automata, CIAA ’07, Springer LNCS vol. 4783
Rivas E, Lang R, Eddy S (2012) A range of complex probabilistic models for RNA secondary structure prediction that includes the nearest-neighbor model and more. RNA 18:193–212
Sauthoff G, Janssen S, Giegerich R (2011) Bellman’s GAP - a declarative language for dynamic programming. In: Schneider-Kamp (ed) Principles and practice of declarative programming. ACM Press, New York, NY, pp 29–40
Acknowledgements
Thanks go to Jan Reinkensmeier for a careful reading of this manuscript.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer Science+Business Media New York
About this protocol
Cite this protocol
Giegerich, R. (2014). Introduction to Stochastic Context Free Grammars. In: Gorodkin, J., Ruzzo, W. (eds) RNA Sequence, Structure, and Function: Computational and Bioinformatic Methods. Methods in Molecular Biology, vol 1097. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-62703-709-9_5
Download citation
DOI: https://doi.org/10.1007/978-1-62703-709-9_5
Published:
Publisher Name: Humana Press, Totowa, NJ
Print ISBN: 978-1-62703-708-2
Online ISBN: 978-1-62703-709-9
eBook Packages: Springer Protocols