Journal of Molecular Biology
Regular articleDynalign: an algorithm for finding the secondary structure common to two RNA sequences1
Introduction
The rapidly expanding databases of genome sequences provide a foundation for rapidly generating new databases of RNA secondary structures. These secondary structures are important for understanding structure-function relationships and choosing drug targets. Comparative sequence analysis is the gold standard for determination of RNA secondary structure in the absence of a structure solved by X-ray crystallography.1 Structures of large RNAs solved by X-ray crystallography have largely verified the base-pairs predicted by comparative sequence analysis.2, 3, 4, 5, 6, 7, 8 While only a small number of RNA structures have been determined by crystallography, many classes of RNAs have secondary structures determined by comparative sequence analysis. These include the small subunit rRNA,9 large subunit rRNA,10 5 S rRNA,11 group I intron,12 group II intron,13 RNAase P RNA,14 SRP RNA,15 tRNA,16 telomerase RNA,17, 18 and tmRNA.19
Comparative sequence analysis requires an alignment of a large number of sequences with identical function. When only one sequence is available, the secondary structure can be predicted on the basis of free energy minimization with an accuracy of roughly 73 % on average for sequences of less than 700 nucleotides.20, 21 Several algorithms are available for free energy minimization of RNA secondary structure.20, 21, 22, 23, 24, 25
Algorithms have also been developed to combine free energy minimization with comparative sequence analysis.26, 27, 28, 29, 30, 31, 32 The advantages of these programs are the improved accuracy of secondary structure prediction and automation of the laborious process of comparative sequence analysis.
Many of the algorithms that employ free energy minimization as a tool for comparative sequence analysis require a fixed sequence alignment as input.26, 27, 28, 31, 32 Alignments determined by sequence matching, however, are complicated by compensating base changes and the fact that most RNAs are composed of only four different nucleotides. The fixed alignment can be flawed and so may restrict the algorithms’ ability to find a conserved structure.
Algorithms that use free energy minimization to find a conserved structure without assuming a fixed alignment are more robust,30, 33 although they are generally more time consuming. Notredame et al.29 wrote a program that uses a genetic algorithm to find the structure of a sequence given a second, related sequence with known structure. Chen et al.30 developed a genetic algorithm that finds a conserved structure for a set of sequences without requiring a known structure.
Eddy and Durbin34 developed an approach to automate comparative sequence analysis that is not based on free energy minimization. They developed a covariance model that takes a set of unaligned RNA sequences and determines a sequence alignment and consensus structure with multiple rounds of refinement.34
Sankoff35 proposed that a dynamic programming algorithm could simultaneously solve the sequence alignment and folding problems for multiple sequences. Gorodkin et al.33 wrote the first practical algorithm of this type, FOLDALIGN, by utilizing three simplifications to speed the calculation. Firstly, the dynamic programming calculation is limited to predicting the structures for two sequences at a time. Secondly, the algorithm optimizes the number of base-pairs in the structures, rather than the free energies. Thirdly, multibranch loops are not allowed.
Here, a dynamic programming algorithm, called Dynalign, is presented that aligns two sequences and finds a common structure, including multibranch loops. Dynalign is based on the dynamic programming solution proposed by Sankoff35 and uses nearest-neighbor rules for predicting the free energies of secondary structures.20, 36, 37 When tested with tRNA, 5 S rRNA, and R2 3′ UTR RNAs, Dynalign improves the accuracy of secondary structure prediction relative to prediction for a single sequence by free energy minimization.
Section snippets
Algorithm
Dynalign is a dynamic programming algorithm that takes two sequences as input and then outputs a sequence alignment and a common structure for the two sequences. The sequence alignment indicates the nucleotides aligned in paired regions, but does not align exactly those nucleotides in unpaired regions. For the common structure, base-pairs are allowed only if both sequences can accommodate a canonical pair at the same position in the alignment. Dynalign minimizes the total free energy of the
Discussion
Determining RNA secondary structure is important for revealing structure-function relationships and designing oligonucleotides for antisense applications and gene chip arrays by identifying targetable regions and suggesting possible confounding structures.45, 46, 47, 48 The Dynalign algorithm takes advantage of both free energy minimization and comparative sequence analysis to predict RNA secondary structure. It can improve the accuracy of secondary structure prediction compared to standard
Dynalign algorithm
Dynalign is a four-dimensional dynamic programming algorithm and as such the calculation is divided into two steps. The fill step calculates three arrays of free energies, W(i,j,k,l), V(i,j,k,l), and W5(i,k). W(i,j,k,l) is the sum of the minimum free energies for nucleotide fragments i to j from the first sequence and k to l from the second sequence with i aligned to k and j aligned to l plus any gap penalties for interior nucleotides in the sequence alignment. V(i,j,k,l) is defined the same as
Acknowledgements
This work was supported by NIH grant GM22939. D.H.M. is a trainee in the medical scientist training program, NIH grant 5T32 GM07356
References (59)
- et al.
Structure of functionally activated small ribosomal subunit at 3.3 angstroms resolution
Cell
(2000) - et al.
Comprehensive comparison of structural characteristics in Eukaryotic cytoplasmic large subunit (23 S-like) ribosomal RNA
J. Mol. Biol.
(1996) - et al.
Comparative and functional anatomy of group II catalytic introns - a review
Gene
(1989) - et al.
Secondary structure of vertebrate telomerase RNA
Cell
(2000) - et al.
A conserved secondary structure for telomerase RNA
Cell
(1991) - et al.
Expanded sequence dependence of thermodynamic parameters provides improved prediction of RNA secondary structure
J. Mol. Biol.
(1999) - et al.
The computer simulation of RNA folding pathways using a genetic algorithm
J. Mol. Biol.
(1995) - et al.
A Bayesian statistical algorithm for RNA secondary structure prediction
Comput. Chem.
(1999) - et al.
A dynamic programming algorithm for RNA structure prediction including pseudoknots
J. Mol. Biol.
(1999) - et al.
RNA Secondary structure prediction based on free energy and phylogenetic analysis
J. Mol. Biol.
(1999)
Thermodynamic prediction of conserved secondary structureapplication to the RRE element of HIV, the tRNA-like element of CMV and the mRNA of prion protein
J. Mol. Biol.
A general method applicable to the search for similarities in the amino acid sequence of two proteins
J. Mol. Biol.
Enzymatic approaches to probing RNA secondary and tertiary structure
Methods Enzymol.
Suboptimal sequence alignment in molecular biology. Alignment with error analysis
J. Mol. Biol.
Comparison of bio-sequences
Advan. Appl. Math.
Probing RNA structure, function, and history by comparative analysis
Crystal structure of a group I ribozyme domainprinciples of RNA packing
Science
The complete atomic structure of the large ribosomal subunit at 2.4 Å resolution
Science
Structure of the 30S ribosomal subunit
Nature
Three dimensional tertiary structure of yeast phenylalanine transfer RNA
Science
Structure of yeast phenylalanine tRNA at 3 Å resolution
Nature
Crystal structure of the ribosome at 5.5 Å resolution
Science
Collection of small subunit (16 S- and 16 S-like) ribosomal RNA structures
Nucl. Acids Res.
5 S rRNA data bank
Nucl. Acids Res.
A comparative database of group I intron structures
Nucl. Acids Res.
The ribonuclease P database
Nucl. Acids Res.
The signal recognition particle database (SRPDB)
Nucl. Acids Res.
Compilation of tRNA sequences and sequences of tRNA genes
Nucl. Acids Res.
tmRDB (tmRNA database)
Nucl. Acids Res.
Cited by (321)
Growth associated polyhydroxybutyrate production by the novel Zobellellae tiwanensis strain DD5 from banana peels under submerged fermentation
2020, International Journal of Biological MacromoleculesCitation Excerpt :The tree topologies were evaluated by bootstrap analysis (1000 replications). Further, secondary structure analysis of rRNA was conducted using M-fold web server (http://www.bioinfo.rpi.edu/application/mfold) [21]. Secondary structure was analyzed based on the number of stems, multiple loops, hairpin loops and bulges.
Comparative analysis of PHAs production by Bacillus megaterium OUAT 016 under submerged and solid-state fermentation
2020, Saudi Journal of Biological SciencesCitation Excerpt :The tree topologies were evaluated by bootstrap analysis (1000 replications). Further, secondary structure analysis of rRNA was conducted by M-fold web server (http://www.bioinfo.rpi.edu/application/mfold) (Mathews and Turner, 2002) to get more stable structure of RNA with lowest free energy (Mohapatra et al., 2016a). Bacterial cell biomass yield is regulated by several growth parameters.
RNA Secondary Structure Motifs of the Influenza A Virus as Targets for siRNA-Mediated RNA Interference
2020, Molecular Therapy Nucleic AcidsAccurate prediction of secondary structure of tRNAs
2019, Biochemical and Biophysical Research CommunicationsIdentifying and validating small molecules interacting with RNA (SMIRNAs)
2019, Methods in EnzymologyCitation Excerpt :It is thus provocative to think that RNA should be considered in toxicological optimization of small molecule drug candidates. Decades of research has enabled annotation of folded structures within an RNA sequence by various methods including phylogenic comparison, free energy minimization with or without experimental constraints (Mathews et al., 2004), and a combination of conservation and free energy minimization (Mathews & Turner, 2002). Protocols to annotate RNA structure from sequence using free energy minimization are available (Mathews, 2014).
RNA threading with secondary structure and sequence profile
2024, Bioinformatics
- 1
Edited by I. Tinoco