A statistical test for conserved RNA structure shows lack of evidence for structure in lncRNAs

Rivas, Elena; Clements, Jody; Eddy, Sean R

doi:10.1038/nmeth.4066

Brief Communication
Published: 07 November 2016

A statistical test for conserved RNA structure shows lack of evidence for structure in lncRNAs

Nature Methods volume 14, pages 45–48 (2017)Cite this article

9396 Accesses
218 Citations
39 Altmetric
Metrics details

Subjects

Abstract

Many functional RNAs have an evolutionarily conserved secondary structure. Conservation of RNA base pairing induces pairwise covariations in sequence alignments. We developed a computational method, R-scape (RNA Structural Covariation Above Phylogenetic Expectation), that quantitatively tests whether covariation analysis supports the presence of a conserved RNA secondary structure. R-scape analysis finds no statistically significant support for proposed secondary structures of the long noncoding RNAs HOTAIR, SRA, and Xist.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Independent substitutions on a tree can create confounding covariations.**

**Figure 2: Covariation analysis of known or proposed RNA secondary structures.**

Hovlinc is a recently evolved class of ribozyme found in human lncRNA

Article 22 March 2021

Yue Chen, Fei Qi, … Philipp Kapranov

Long non-coding RNAs: definitions, functions, challenges and recommendations

Article 03 January 2023

John S. Mattick, Paulo P. Amaral, … Mian Wu

The RNA Atlas expands the catalog of human non-coding RNAs

Article 17 June 2021

Lucia Lorenzi, Hua-Sheng Chiu, … Pieter Mestdagh

References

Holley, R.W. et al. Science 147, 1462–1465 (1965).
Article CAS Google Scholar
Noller, H.F. et al. Nucleic Acids Res. 9, 6167–6189 (1981).
Article CAS Google Scholar
Pace, N.R., Smith, D.K., Olsen, G.J. & James, B.D. Gene 82, 65–75 (1989).
Article CAS Google Scholar
Williams, K.P. & Bartel, D.P. RNA 2, 1306–1310 (1996).
CAS PubMed PubMed Central Google Scholar
Michel, F., Costa, M., Massire, C. & Westhof, E. Methods Enzymol. 317, 491–510 (2000).
Article CAS Google Scholar
Gutell, R.R., Power, A., Hertz, G.Z., Putz, E.J. & Stormo, G.D. Nucleic Acids Res. 20, 5785–5795 (1992).
Article CAS Google Scholar
Davidovich, C. & Cech, T.R. RNA 21, 2007–2022 (2015).
Article CAS Google Scholar
Ji, Z., Song, R., Regev, A. & Struhl, K. eLife 4, e08890 (2015).
Article Google Scholar
Akmaev, V.R., Kelley, S.T. & Stormo, G.D. Bioinformatics 16, 501–512 (2000).
Article CAS Google Scholar
Lindgreen, S., Gardner, P.P. & Krogh, A. Bioinformatics 22, 2988–2995 (2006).
Article CAS Google Scholar
Yeang, C.-H., Darot, J.F.J., Noller, H.F. & Haussler, D. Mol. Biol. Evol. 24, 2119–2131 (2007).
Article CAS Google Scholar
Dutheil, J.Y. Brief. Bioinform. 13, 228–243 (2012).
Article Google Scholar
Somarowthu, S. et al. Mol. Cell 58, 353–361 (2015).
Article CAS Google Scholar
Weinberg, Z. & Breaker, R.R. BMC Bioinformatics 12, 3 (2011).
Article CAS Google Scholar
Nawrocki, E.P. et al. Nucleic Acids Res. 43, D130–D137 (2015).
Article CAS Google Scholar
Woolf, B. Ann. Hum. Genet. 21, 397–409 (1957).
Article CAS Google Scholar
Dunn, S.D., Wahl, L.M. & Gloor, G.B. Bioinformatics 24, 333–340 (2008).
Article CAS Google Scholar
Szymanski, M., Barciszewska, M.Z., Erdmann, V.A. & Barciszewski, J. Nucleic Acids Res. 30, 176–178 (2002).
Article CAS Google Scholar
Fu, Y., Deiorio-Haggar, K., Anthony, J. & Meyer, M.M. Nucleic Acids Res. 41, 3491–3503 (2013).
Article CAS Google Scholar
del Val, C., Rivas, E., Torres-Quesada, O., Toro, N. & Jiménez-Zurdo, J.I. Mol. Microbiol. 66, 1080–1091 (2007).
Article CAS Google Scholar
Novikova, I.V., Hennelly, S.P. & Sanbonmatsu, K.Y. Nucleic Acids Res. 40, 5034–5051 (2012).
Article CAS Google Scholar
Maenner, S. et al. PLoS Biol. 8, e1000276 (2010).
Article Google Scholar
Fang, R., Moss, W.N., Rutenberg-Schoenberg, M. & Simon, M.D. PLoS Genet. 11, e1005668 (2015).
Article Google Scholar
Rinn, J.L. & Chang, H.Y. Annu. Rev. Biochem. 81, 145–166 (2012).
Article CAS Google Scholar
Rivas, E., Lang, R. & Eddy, S.R. RNA 18, 193–212 (2012).
Article CAS Google Scholar
Price, M.N., Dehal, P.S. & Arkin, A.P. PLoS One 5, e9490 (2010).
Article Google Scholar
Shannon, C.E. Bell Syst. Tech. J. 27, 379–423 (1948).
Article Google Scholar
Gutell, R.R., Larsen, N. & Woese, C.R. Microbiol. Rev. 58, 10–26 (1994).
CAS PubMed PubMed Central Google Scholar
Martin, L.C., Gloor, G.B., Dunn, S.D. & Wahl, L.M. Bioinformatics 21, 4116–4124 (2005).
Article CAS Google Scholar
Fodor, A.A. & Aldrich, R.W. Proteins 56, 211–221 (2004).
Article CAS Google Scholar
Hofacker, I.L., Fekete, M. & Stadler, P.F. J. Mol. Biol. 319, 1059–1066 (2002).
Article CAS Google Scholar
Gerstein, M., Sonnhammer, E.L.L. & Chothia, C. J. Mol. Biol. 236, 1067–1078 (1994).
Article CAS Google Scholar
Gorodkin, J., Staerfeldt, H.H., Lund, O. & Brunak, S. Bioinformatics 15, 769–770 (1999).
Article CAS Google Scholar
Weigt, M., White, R.A., Szurmant, H., Hoch, J.A. & Hwa, T. Proc. Natl. Acad. Sci. USA 106, 67–72 (2009).
Article CAS Google Scholar
De Leonardis, E. et al. Nucleic Acids Res. 43, 10444–10455 (2015).
CAS PubMed PubMed Central Google Scholar
Weinreb, C. et al. Cell 165, 963–975 (2016).
Article CAS Google Scholar
Fitch, W.M. Syst. Zool. 20, 406–416 (1971).
Article Google Scholar
Goebel, B., Dawy, Z., Hagenauer, J. & Mueller, J.C. in IEEE International Conference on Communications Vol. 2, 1102–1106 (IEEE, 2005).
Rivas, E. & Eddy, S.R. BMC Bioinformatics 16, 406 (2015).
Article Google Scholar
Guindon, S. et al. Syst. Biol. 59, 307–321 (2010).
Article CAS Google Scholar
Jung, S. et al. Nucleic Acids Res. 39, 7529–7547 (2011).
Article CAS Google Scholar
del Val, C. et al. RNA Biol. 9, 119–129 (2012).
Article CAS Google Scholar
Wheeler, T.J. et al. Nucleic Acids Res. 41, D70–D82 (2013).
Article CAS Google Scholar

Download references

Acknowledgements

We thank S.E.R. Egnor for suggesting the name R-scape and the Centro de Ciencias de Benasque Pedro Pascual in Spain, where part of this manuscript was drafted.

Author information

Authors and Affiliations

Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts, USA
Elena Rivas & Sean R Eddy
Janelia Research Campus, Howard Hughes Medical Institute, Ashburn, Virginia, USA
Jody Clements
Howard Hughes Medical Institute, Harvard University, Cambridge, Massachusetts, USA
Sean R Eddy
FAS Center for Systems Biology, Harvard University, Cambridge, Massachusetts, USA
Sean R Eddy
John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, Massachusetts, USA
Sean R Eddy

Authors

Elena Rivas
View author publications
You can also search for this author in PubMed Google Scholar
Jody Clements
View author publications
You can also search for this author in PubMed Google Scholar
Sean R Eddy
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

E.R. and S.R.E. designed the method and wrote the manuscript. E.R. wrote the code, and designed and carried out the experiments. J.C. wrote the R-scape web application.

Corresponding author

Correspondence to Elena Rivas.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Characterization of different covariation statistics on a positive testset of 104 RNAs.

(a) Plots of the F measure---the harmonic mean of sensitivity (SEN) and positive predictive value (PPV), F=2*SEN*PPV / (SEN+PPV)---for four different covariation statistics as a function of the score's E-value, over all alignments, using R=scape with default parameters. (b) Effect of alignment gaps on the different covariation statistics, seen by including all alignment columns (right) as compared to the R-scape default (left). (c) Effect of measuring covariation using a binary classification (whether a pair is canonical Watson-Crick/G:U or not) versus using the full sixteen-way classification. (d) Covariation detection as a function of the number of sequences in the alignments. (e) The F measure for each of the 104 RNA Rfam alignments in the positive testset as a function of average percentage identity, at an E-value threshold of 0.05.

Supplementary Figure 2 Comparison of R-scape to related methods CoMap and MICA [12] on the testset of 104 RNAs.

(a) Sensitivity (percentage of significant base pairs) and positive predictive value (percentage of significant pairs that are base pairs) as a function of the score's E-value. (b) Running times for the three methods (R-scape in black, CoMap cyan, MICA red) on a log-log plot as a function of the number of sequences in the alignment (left) and as a function of the alignment length (right). Running times are for a single 3GHz intel Core i7 with 8GB 1600GHz DDR3 RAM. Running times for R-scape and CoMap include the cost of generating a phylogenetic tree using FastTree [26].

Supplementary Figure 3 Examples of RNAs with significant covariation support for their proposed structures.

(a) R-scape analysis of a multiple sequence alignment of αr14, a putative regulatory small RNA in α-proteobacteria [20,42]. (b) R-cape analysis of a multiple sequence alignment of Arisong RNA, a noncoding RNA identified in the ciliate Oxytricha [41]. (c) Example of detecting an underannotated structure, an S15 mRNA leader in γ-proteobacteria that autoregulates ribosomal protein synthesis [19]. Three out of the seven significantly covarying pairs are not in the proposed structure. These covarying pairs support the existence of a conserved pseudoknot, which was already known, but happened to not be annotated in the provided alignment [19]. (d) Example of using R-scape to improve a structural annotation for the Rfam seed alignment for SAM-I riboswitch. The R-scape modified structure has seven significant pairs not included in the Rfam-annotated SAM-I structure. The R-scape structure is in agreement with the secondary structure derived from the SAM-I riboswitch crystal structure (RK Montange & RT Batey, Nature, {\bf 441}441, 1172-1175, 2006). Notation is as in Figure 2.

Supplementary Figure 4 Covariation analysis of HOTAIR putative helices H7 and H10.

The structural alignments have been extracted from the HOTAIR Domain1 alignment (with 37 sequences) provided in [13]. The H7 and H10 alignments have 28 and 27 sequences respectively, after removing species for which the region does not include any residues. For any two base paired positions, changes are annotated in color relative to the most frequent Watson-Crick or G:U pair. Green arrows indicate the base pairs (one for H7 and 3 for H10) proposed as covarying in [13]. For putative helix H7, the proposed covarying pair (columns 8:36 marked in green) has covariation score -0.16 (E-value 7.74). Gray arrows indicate the best scoring putative Watson-Crick pair (columns 10:30, with a consensus C:G) which was not part of the proposed structure. This best scoring alternative pair would have one U:A compensatory and one U:G half-compensatory changes, and covariation score 3.66 (E-value 5.52). For both alignments, we also provide the R-scape analysis for all pairs. For putative helix H10, the one covariation above the null hypothesis corresponds to a G:G/U:C non-Watson-Crick covariation in a pair of adjacent columns that are not in the proposed structure and are too close to be a base pair.

Supplementary Figure 5 Covariation analysis of putative helices H3 and H4 of ncSRA.

Color annotation as in Supplementary Figure 4. Green arrows indicate the seven base pairs identified in [21] as significantly covarying. We also provide the R-scape analysis for all pairs in this partial ncSRA alignment.

Supplementary Figure 6 Covariation analysis of putative helices H19, H20, and H21 of ncSRA.

Color annotation as in Supplementary Figure 4. Green arrows indicate eight base pairs identified in [21] as significantly covarying. We also provide the R-scape analysis for all pairs in this partial ncSRA alignment.

Supplementary Figure 7 Apparent covariations in 13 aligned Xist RepA region sequences [23].

(a) An alignment column pair was counted as covarying in [23] if it is entirely consistent with Watson-Crick or G:U base pairing, and at least one substitution and no more than two gaps are observed in each column. The dot plot shows 541 column pairs that satisfy these criteria in the RepA alignment used in [23], including (in blue) three of the four cited as support for the secondary structure in [30] (the other has a A:A non canonical pair, thus does not strictly satisfy the rule), 454 pairs that consist of a U+C column and a G+A column (red), and 84 other pairs (black). (b) Example of how single substitutions in conserved U+C and G+A columns can create apparent covariation.

Supplementary Figure 8 Properties of the structural alignments used in this study.

The alignments we analyzed are derived from the original alignments such that columns with less than 50% occupied positions are not considered. Information for the original alignments is given in parentheses if different from the analyzed alignment. Alignments are available as Stockholm files in the online Supplementary Information.

Source data

Source data to Fig. 1

Source data to Fig. 2

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rivas, E., Clements, J. & Eddy, S. A statistical test for conserved RNA structure shows lack of evidence for structure in lncRNAs. Nat Methods 14, 45–48 (2017). https://doi.org/10.1038/nmeth.4066

Download citation

Received: 01 April 2016
Accepted: 14 September 2016
Published: 07 November 2016
Issue Date: January 2017
DOI: https://doi.org/10.1038/nmeth.4066

This article is cited by

Discovery and structural mechanism of DNA endonucleases guided by RAGATH-18-derived RNAs
- Kuan Ren
- Fengxia Zhou
- Zhiwei Huang
Cell Research (2024)
Deep generative design of RNA family sequences
- Shunsuke Sumi
- Michiaki Hamada
- Hirohide Saito
Nature Methods (2024)
Deep Conservation and Unexpected Evolutionary History of Neighboring lncRNAs MALAT1 and NEAT1
- Forrest Weghorst
- Martí Torres Marcén
- Karina S. Cramer
Journal of Molecular Evolution (2024)
Crystal structure of a highly conserved enteroviral 5′ cloverleaf RNA replication element
- Naba K. Das
- Nele M. Hollmann
- Deepak Koirala
Nature Communications (2023)
In vivo secondary structural analysis of Influenza A virus genomic RNA
- Barbara Mirska
- Tomasz Woźniak
- Elzbieta Kierzek
Cellular and Molecular Life Sciences (2023)