Linear indices of the ‘macromolecular graph’s nucleotides adjacency matrix’ as a promising approach for bioinformatics studies. Part 1: Prediction of paromomycin’s affinity constant with HIV-1 Ψ-RNA packaging region

https://doi.org/10.1016/j.bmc.2005.03.010Get rights and content

Abstract

The design of novel anti-HIV compounds has now become a crucial area for scientists around the world. In this paper a new set of macromolecular descriptors (that are calculated from the macromolecular graph’s nucleotide adjacency matrix) of relevance to nucleic acid QSAR/QSPR studies, nucleic acids’ linear indices. A study of the interaction of the antibiotic Paromomycin with the packaging region of the HIV-1 Ψ-RNA has been performed as example of this approach. A multiple linear regression model predicted the local binding affinity constants [Log K (10−4 M−1)] between a specific nucleotide and the aforementioned antibiotic. The linear model explains more than 87% of the variance of the experimental Log K (R = 0.93 and s = 0.102 × 10−4 M−1) and leave-one-out press statistics evidenced its predictive ability (q2 = 0.82 and scv = 0.108 × 10−4 M−1). The comparison with other approaches (macromolecular quadratic indices, Markovian Negentropies and ‘stochastic’ spectral moments) reveals a good behavior of our method.

Introduction

The number of new discovered genomes has dramatically increased in recent years and this has once again highlighted the problem of protein and nucleic acid functions.1, 2 The complete sequencing of the genomes of various species will undoubtedly contribute to a better understanding of its evolution. Public databases such as GenBank are growing in size at an exponential rate.1 A significant proportion of the data corresponds to genomic sequences containing the structures not only of many genes but also of RNA.3, 4

The study of the interactions of drugs with biomolecules is now the hot topic in modern bioinformatics. This kind of study constitutes a significant step toward rational drug design. In this sense, the use of footprinting techniques has proven to be an important experimental method for the discovery of significant processes in molecular biology and specifically the field of genomics.5, 6, 7, 8, 9 The interactions between aminoglycosides and the packaging region of type-1 HIV (human immunodeficiency virus) appear to represent a promising route for antiviral discoveries.10 Aminoglycoside drugs are cationic natural products that interact with RNA.11 The bactericidal effects inherent in these compounds stem from their ability to block protein synthesis by binding to the A site on ribosomal RNA.12

Recently, a novel scheme to the rational in silico molecular design (or selection/identification of chemicals) and to QSAR/QSPR studies has been introduced by our group. The so-called TOpological MOlecular COMputer Design (TOMOCOMD).13 This method generates molecular fingerprints based on the Discrete Mathematic and Linear Algebra Theory. In this sense, atom, atom type and total quadratic and linear molecular fingerprints have been defined in analogy to the quadratic and linear mathematical maps.14, 15 This approach has been successfully employed in QSPR and QSAR studies,14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24 including studies related to nucleic acid–drug interactions.25

The TOMOCOMD–CARDD (acronym of the Computed-Aided ‘Rational’ Drug Design) strategy is very useful for the selection of novel subsystems of compounds having a desired property/activity,22, 23, 24 which can be further optimized by using some of the many molecular modeling methods available for medicinal chemists. The method has also demonstrated flexibility in relation to many different problems. In this sense, the TOMOCOMD–CARDD approach has been applied to the fast-track experimental discovery of novel anthelmintic compounds.22, 24 The prediction of the physical, chem-physical and chemical properties of organic compounds is a problem that can also be addressed using this approach.14, 19, 21 Codification of chirality and other 3D structural features constitutes another advantage of this method.20 This latter opportunity allows the description of the significance interpretation and the comparison to other molecular descriptors.15, 19 Additionally, promising results have been found in the modeling of the interaction between drugs and HIV packaging-region RNA in the field of bioinformatics using TOMOCOMD-CANAR (Computed-Aided Nucleic Acid Research) approach.25 Finally, an alternative formulation of our approach for structural characterization of proteins was carried out recently.26 This extends methodology [TOMOCOMD-CAMPS (Computed-Aided Modelling in Protein Science)] which was used to encompass protein stability studies—specifically how alanine scan on Arc repressor wild-type protein affects protein stability—by means of a combination of protein quadratic indices (macromolecular fingerprints) and statistical (linear and nonlinear models) methods.26

Therefore, describing an extended TOMOCOMD-CANAR approach to account for RNA structure constitutes the main aim of this paper. In the present study, we propose a total and local definition of nucleic acid linear indices of the ‘macromolecular graph’s nucleotides adjacency matrix’. Besides, the present work is focused on developing quantitative structure–property relationships to predict the affinity with which paromomycin binds to the HIV-1 Ψ-RNA packaging region and compare our results with other cheminformatic methods previously reported.

Section snippets

Computational methods

A nucleic acid is a long, unbranched polynucleotide, that is, a polymer consisting of nucleotides. Each nucleotide has the three following components: (1) A cyclic five-carbon sugar, (2) a purine or a pyrimidine base attached to the 1′-carbon atom of sugar by N-glycoside bond, and (3) a phosphate attached to the 5′-carbon of the sugar by a phosphoester linkage. The nucleotides in nucleic acids are covalently linked by a second phosphoester bond that joins the 5′-phosphate of one nucleotide and

Results and discussion

The data set of footprinted and binding nucleotides was extracted from the literature.37 Figure 1 depicts the secondary structure of the HIV-1 Ψ-RNA packaging region as well as the binding sites of Paromomycin. The local affinity constant values [Log K(10−4 M−1)] were also obtained from the literature.37 Is very important to know the strength of each interaction in the studies of the drug–RNA interaction. In order to prove the applicability of this new approach, a quantitative linear model was

Conclusions

Although there have been many discoveries in the last years in the field of bioinformatics, it is necessary the definition of novel macromolecular descriptors that could explain different bio-macromolecular properties by means of a QSAR approach. In this sense, the approach described here represents a novel and very promising method for bioinformatics research. It presents a new set of macromolecular descriptors that are calculated from the macromolecular graph’s nucleotide adjacency matrix. We

Footprinting data

The data set of footprinted and binding nucleotides was extracted from the literature.37 Figure 1 depicts the secondary structure of the HIV-1 Ψ-RNA packaging region as well as the binding sites of Paromomycin. A representation of the Ψ-RNA appears along with a summary of binding/enhancement information for Paromomycin. The RNA consists of the ‘main stem’, positions 213–238 and 361–388; SL-1, which contains the dimmer initiation site; SL-2, having the 5′ splice donor site; SL-3, and SL-4, the

Acknowledgements

The authors would like to offer their sincere thanks to the referees for their critical opinions about the manuscript, which have significantly contributed to improve its presentation and quality.

References and notes (41)

  • Z. Yuan

    FEBS Lett.

    (1999)
  • M. Brenowitz et al.

    Methods Enzymol.

    (1986)
  • J.M. Sullivan et al.

    Bioorg. Med. Chem. Lett.

    (2002)
  • S.R. Lynch et al.

    Methods Enzymol.

    (2000)
  • Y. Marrero-Ponce

    Bioorg. Med. Chem.

    (2004)
  • Y. Marrero-Ponce et al.

    Bioorg. Med. Chem.

    (2004)
  • Y. Marrero-Ponce et al.

    Bioorg. Med. Chem.

    (2005)
  • Y. Marrero-Ponce et al.

    Bioorg. Med. Chem.

    (2005)
  • A. Golbraikh et al.

    J. Mol. Graphics Modell.

    (2002)
  • D.A. Benson et al.

    Nucleic Acids Res.

    (2000)
  • S. Saxonov et al.

    Nucleic Acids Res.

    (2000)
  • N.J. Schisler et al.

    Nucleic Acids Res.

    (2000)
  • T.D. Tullius

    Annu. Rev. Biophys. Biophys. Chem.

    (1989)
  • A. Henn et al.

    Nucleic Acids Res.

    (2001)
  • D.J. Galas et al.

    Nucleic Acid Res.

    (1978)
  • O.N. Ozoline et al.

    Nucleic Acids Res.

    (2001)
  • E.F. Gale et al.

    The Molecular Basis of Antibiotic Action

    (1981)
  • Marrero-Ponce, Y.; Romero, V. TOMOCOMD software. Central University of Las Villas. 2002. TOMOCOMD (TOpological...
  • Y. Marrero-Ponce

    Molecules

    (2003)
  • Y. Marrero-Ponce

    J. Chem. Inf. Comput. Sci.

    (2004)
  • Cited by (50)

    • Ligand-based discovery of novel trypanosomicidal drug-like compounds: In silico identification and experimental support

      2011, European Journal of Medicinal Chemistry
      Citation Excerpt :

      This approach, which is based on principles of novel methods in chemical graph and algebraic theories, has been successfully used for the description of different physical, chemo-physical, and chemical properties of organic compounds [20,24,25], as well as to the prediction of, pharmacokinetical [26,27], biological [28–35] and toxicological [36] properties. In addition, this method has been applied to studies in the field of proteomics and nucleic acid–drug interactions [37,38]. Furthermore, these molecular descriptors (MDs) have been extended to consider three-dimensional (3D) features of small/medium-sized molecules based on the trigonometric-3D-chirality-correction factor approach [39–43].

    • Plasmod-PPI: A web-server predicting complex biopolymer targets in plasmodium with entropy measures of protein-protein interactions

      2010, Polymer
      Citation Excerpt :

      The model presents a good overall classification of pPPI and npPPI. This level of accuracy is generally accepted by other researchers that have applied LDA to find QSAR models useful in molecular parasitology and related areas; e.g., the works of García-Domenech, Marrero-Ponce, Bruno-Blanch, Galvez, Gozalbes and others predicting active compounds against Trypanosoma cruzi, Mycobacterium avium, Toxoplasma gondii, P. falciparum, Trichomonas vaginalis, Fasciola hepatica, and other parasites [96–100]; see also the works of Marrero-Ponce on protein and DNA/RNA QSAR studies [101–103]. The comparison of linear and non-linear models is essential to test how directly our parameters are correlated to the biological property [104].

    View all citing articles on Scopus
    View full text