Journal of Molecular Biology
Volume 327, Issue 5, 11 April 2003, Pages 1021-1030
Journal home page for Journal of Molecular Biology

Structural Location of Disease-associated Single-nucleotide Polymorphisms

https://doi.org/10.1016/S0022-2836(03)00240-7Get rights and content

Abstract

Non-synonymous single-nucleotide polymorphism (nsSNP) of genes introduces amino acid changes to proteins, and plays an important role in providing genetic functional diversity. To understand the structural characteristics of disease-associated SNPs, we have mapped a set of nsSNPs derived from the online mendelian inheritance in man (OMIM) database to the structural surfaces of encoded proteins. These nsSNPs are disease-associated or have distinctive phenotypes. As a control dataset, we mapped a set of nsSNPs derived from SNP database dbSNP to the structural surfaces of those encoded proteins. Using the alpha shape method from computational geometry, we examine the geometric locations of the structural sites of these nsSNPs. We classify each nsSNP site into one of three categories of geometric locations: those in a pocket or a void (type P); those on a convex region or a shallow depressed region (type S); and those that are buried completely in the interior (type I). We find that the majority (88%) of disease-associated nsSNPs are located in voids or pockets, and they are infrequently observed in the interior of proteins (3.2% in the data set). We find that nsSNPs mapped from dbSNP are less likely to be located in pockets or voids (68%). We further introduce a novel application of hidden Markov models (HMM) for analyzing sequence homology of SNPs on various geometric sites. For SNPs on surface pocket or void, we find that there is no strong tendency for them to occur on conserved residues. For SNPs buried in the interior, we find that disease-associated mutations are more likely to be conserved. The approach of classifying nsSNPs with alpha shape and HMM developed in this study can be integrated with additional methods to improve the accuracy of predictions of whether a given nsSNP is likely to be disease-associated.

Introduction

Single-nucleotide polymorphisms (SNPs) are the most common form of human genetic variation. The coding regions of the human genome contain about 500,000 SNPs.1 Among these, the non-synonymous SNPs (nsSNPs) cause changes in the amino acid residues, and are likely to be an important factor contributing to the functional diversity of encoded proteins in the human population.2 There are well known examples where nsSNPs affect the functional roles of proteins in signal transduction of visual, hormonal and other stimulants,3., 4. in gene regulation by altering DNA and transcription factor binding,5 and in maintaining the structural integrity of cells and tissues.6 In addition, by affecting drug-target proteins such as G-protein coupled receptors,7 enzymes,8 ion channels9 and proteins involved in the detoxification pathways,10 nsSNPs play important roles in the diverse responses in efficacy and toxicity of the human population to therapeutic agents.

nsSNPs can affect human physiology through many different mechanisms. nsSNPs may inactivate functional sites of enzymes11 or alter splice sites and thereby form defective gene products.12 They may destabilize proteins, or reduce protein solubility.13 To understand the mechanism of phenotypic variations due to nsSNPs, it is important to assess the structural consequences of the alteration of amino acid residue. A classical example is sickle-cell anemia, the first molecular disease discovered.14 First studied by Sir John Kendrew 50 years ago, sickle-cell anemia results from a single base change and residue V is changed to E at position 6 of the beta chain of hemoglobin. This residue is located at the interface of the alpha and beta chains, and the E6V mutation reduces the solubility of the deoxygenated form of hemoglobin markedly. The knowledge of structural role of this mutation is essential for understanding the disease mechanism of sickle-cell anemia.

With the advent of high-throughput SNP detection techniques, the number of known nsSNPs is growing rapidly, providing an important source of information for studying the relationship between genotypes and phenotypes of human diseases. An important study has shown recently that there is a strong correlation between disease-associated polymorphism and sites of low solvent-accessibility.15 In this study, we introduce new geometric classifications for characterizing disease associated SNPs. Here, we attempt to align SNPs to protein surface pockets and voids that may be potential functional binding regions.

Section snippets

Many disease-associated nsSNPs are located in pockets or voids

Compared to control nsSNPs, disease-associated nsSNPs derived from the online mendelian inheritance in man (OMIM) database are more likely to be located in well-formed surface pocket or void locations. Of the disease-associated nsSNPs derived from OMIM, 88% are located in pockets or voids (with 95% confidence interval of 77–100%), while 68% of non-disease control SNPs are located in pockets or voids (with 95% confidence intervals of 55–83%). An example of this type of nsSNP is insulin receptor

Discussion

In this study, we have described a new approach for SNP classification. For SNPs that can be mapped to protein structures, we classify them into three geometric sites: those in a pocket or a void, those on a convex region or a shallow depressed region, and those buried in the interior. Specifically, we find that the majority of disease-associated nsSNPs are located in voids or pockets on proteins, and only a small number of SNPs are buried completely in the interior (Figure 2). For disease SNPs

Geometric locations of mutation sites

Amino acid residues are located at different geometric locations. Some of them are located in the interior of a protein and have zero solvent-accessibility. Others may be on the outer boundary surface of the protein, or on the wall surface of an interior void. In this study, we formally classify amino acid residues altered by nsSNPs to be located at three different geometric sites: (1) in the interior of proteins (type I); (2) on the wall of a surface pocket or an interior void (type P); and

Supplementary Files

Acknowledgements

This work was supported by funding from the National Science Foundation (CAREER DBI0133856, DBI0078270, and MCB998008).

References (49)

  • M. Facello

    Implementation of a randomized algorithm for Delaunay and regular triangulations in three dimensions

    Comput. Aided Genome Des.

    (1995)
  • A. Krogh et al.

    Hidden Markov models in computational biology. Applications to protein modeling

    J. Mol. Biol.

    (1994)
  • F.S. Collins et al.

    A DNA polymorphism discovery resource for research on human genetic variation

    Genome Res.

    (1998)
  • E.S. Lander

    The new genomics: global views of biology

    Science

    (1996)
  • T.P. Dryja et al.

    Mutations within the rhodopsin gene in patients with autosomal dominant retinitis pigmentosa

    N. Engl. J. Med.

    (1990)
  • E.P. Smith et al.

    Estrogen resistance caused by a mutation in the estrogen-receptor gene in a man

    N. Engl. J. Med.

    (1994)
  • I. Barroso et al.

    Dominant negative mutations in human PPARgamma associated with severe insulin resistance, diabetes mellitus and hypertension

    Nature

    (1999)
  • A. Bonnardeaux et al.

    Angiotensin II type 1 receptor gene polymorphisms in human essential hypertension

    Hypertension

    (1994)
  • K.P. Vatsis et al.

    Diverse point mutations in the human gene for polymorphic N-acetyltransferase

    Proc. Natl Acad. Sci. USA

    (1991)
  • Q. Wang et al.

    Positional cloning of a novel potassium channel gene: KVLQT1 mutations cause cardiac arrhythmias

    Nature Genet.

    (1996)
  • C. Hassett et al.

    Human microsomal epoxide hydrolase: genetic polymorphism and functional expression in vitro of amino acid variants

    Hum. Mol. Genet.

    (1994)
  • A. Yoshida et al.

    Molecular abnormality of an inactive aldehyde dehydrogenase variant commonly found in Orientals

    Proc. Natl Acad. Sci. USA

    (1984)
  • R.L. Proia et al.

    Synthesis of beta-hexosaminidase in cell-free translation and in intact fibroblasts: an insoluble precursor alpha chain in a rare form of Tay-Sachs disease

    Proc. Natl Acad. Sci. USA

    (1982)
  • L. Stryer

    Biochemistry

    (1995)
  • Cited by (0)

    View full text