Structural Location of Disease-associated Single-nucleotide Polymorphisms

doi:10.1016/S0022-2836(03)00240-7

Journal of Molecular Biology

Volume 327, Issue 5, 11 April 2003, Pages 1021-1030

https://doi.org/10.1016/S0022-2836(03)00240-7 Get rights and content

Abstract

Non-synonymous single-nucleotide polymorphism (nsSNP) of genes introduces amino acid changes to proteins, and plays an important role in providing genetic functional diversity. To understand the structural characteristics of disease-associated SNPs, we have mapped a set of nsSNPs derived from the online mendelian inheritance in man (OMIM) database to the structural surfaces of encoded proteins. These nsSNPs are disease-associated or have distinctive phenotypes. As a control dataset, we mapped a set of nsSNPs derived from SNP database dbSNP to the structural surfaces of those encoded proteins. Using the alpha shape method from computational geometry, we examine the geometric locations of the structural sites of these nsSNPs. We classify each nsSNP site into one of three categories of geometric locations: those in a pocket or a void (type P); those on a convex region or a shallow depressed region (type S); and those that are buried completely in the interior (type I). We find that the majority (88%) of disease-associated nsSNPs are located in voids or pockets, and they are infrequently observed in the interior of proteins (3.2% in the data set). We find that nsSNPs mapped from dbSNP are less likely to be located in pockets or voids (68%). We further introduce a novel application of hidden Markov models (HMM) for analyzing sequence homology of SNPs on various geometric sites. For SNPs on surface pocket or void, we find that there is no strong tendency for them to occur on conserved residues. For SNPs buried in the interior, we find that disease-associated mutations are more likely to be conserved. The approach of classifying nsSNPs with alpha shape and HMM developed in this study can be integrated with additional methods to improve the accuracy of predictions of whether a given nsSNP is likely to be disease-associated.

Introduction

Single-nucleotide polymorphisms (SNPs) are the most common form of human genetic variation. The coding regions of the human genome contain about 500,000 SNPs.¹ Among these, the non-synonymous SNPs (nsSNPs) cause changes in the amino acid residues, and are likely to be an important factor contributing to the functional diversity of encoded proteins in the human population.² There are well known examples where nsSNPs affect the functional roles of proteins in signal transduction of visual, hormonal and other stimulants,3., 4. in gene regulation by altering DNA and transcription factor binding,⁵ and in maintaining the structural integrity of cells and tissues.⁶ In addition, by affecting drug-target proteins such as G-protein coupled receptors,⁷ enzymes,⁸ ion channels⁹ and proteins involved in the detoxification pathways,¹⁰ nsSNPs play important roles in the diverse responses in efficacy and toxicity of the human population to therapeutic agents.

nsSNPs can affect human physiology through many different mechanisms. nsSNPs may inactivate functional sites of enzymes¹¹ or alter splice sites and thereby form defective gene products.¹² They may destabilize proteins, or reduce protein solubility.¹³ To understand the mechanism of phenotypic variations due to nsSNPs, it is important to assess the structural consequences of the alteration of amino acid residue. A classical example is sickle-cell anemia, the first molecular disease discovered.¹⁴ First studied by Sir John Kendrew 50 years ago, sickle-cell anemia results from a single base change and residue V is changed to E at position 6 of the beta chain of hemoglobin. This residue is located at the interface of the alpha and beta chains, and the E6V mutation reduces the solubility of the deoxygenated form of hemoglobin markedly. The knowledge of structural role of this mutation is essential for understanding the disease mechanism of sickle-cell anemia.

With the advent of high-throughput SNP detection techniques, the number of known nsSNPs is growing rapidly, providing an important source of information for studying the relationship between genotypes and phenotypes of human diseases. An important study has shown recently that there is a strong correlation between disease-associated polymorphism and sites of low solvent-accessibility.¹⁵ In this study, we introduce new geometric classifications for characterizing disease associated SNPs. Here, we attempt to align SNPs to protein surface pockets and voids that may be potential functional binding regions.

Section snippets

Many disease-associated nsSNPs are located in pockets or voids

Compared to control nsSNPs, disease-associated nsSNPs derived from the online mendelian inheritance in man (OMIM) database are more likely to be located in well-formed surface pocket or void locations. Of the disease-associated nsSNPs derived from OMIM, 88% are located in pockets or voids (with 95% confidence interval of 77–100%), while 68% of non-disease control SNPs are located in pockets or voids (with 95% confidence intervals of 55–83%). An example of this type of nsSNP is insulin receptor

Discussion

In this study, we have described a new approach for SNP classification. For SNPs that can be mapped to protein structures, we classify them into three geometric sites: those in a pocket or a void, those on a convex region or a shallow depressed region, and those buried in the interior. Specifically, we find that the majority of disease-associated nsSNPs are located in voids or pockets on proteins, and only a small number of SNPs are buried completely in the interior (Figure 2). For disease SNPs

Geometric locations of mutation sites

Amino acid residues are located at different geometric locations. Some of them are located in the interior of a protein and have zero solvent-accessibility. Others may be on the outer boundary surface of the protein, or on the wall surface of an interior void. In this study, we formally classify amino acid residues altered by nsSNPs to be located at three different geometric sites: (1) in the interior of proteins (type I); (2) on the wall of a surface pocket or an interior void (type P); and

Supplementary Files

Acknowledgements

This work was supported by funding from the National Science Foundation (CAREER DBI0133856, DBI0078270, and MCB998008).

References (49)

R. Thomas et al.
Identification of mutations in the repeated part of the autosomal dominant polycystic kidney disease type 1 gene PKD1, by long-range PCR
Am. J. Hum. Genet.
(1999)
J. Jaruzelska et al.
In vitro splicing deficiency induced by a C to T mutation at position-3 in the intron 10 acceptor site of the phenylalanine hydroxylase gene in a patient with phenylketonuria
J. Biol. Chem.
(1995)
S. Sunyaev et al.
Towards a structural basis of human non-synonymous single nucleotide polymorphisms
Trends Genet.
(2000)
A. Cama et al.
Substitution of glutamic acid for alanine 1135 in the putative catalytic loop of the tyrosine kinase domain of the human insulin receptor. A mutation that impairs proteolytic processing into subunits and inhibits receptor tyrosine kinase activity
J. Biol. Chem.
(1993)
J.C. Reese et al.
Characterization of a temperature-sensitive mutation in the hormone binding domain of the human estrogen receptor. Studies in cell extracts and intact cells and their implications for hormone-dependent transcriptional activation
J. Biol. Chem.
(1992)
D. Chasman et al.
Predicting the functional consequences of non-synonymous single nucleotide polymorphisms: structure-based assessment of amino acid variation
J. Mol. Biol.
(2001)
J. Liang et al.
Computation of molecular electrostatics with boundary element methods
Biophys. J.
(1997)
L. Adamian et al.
Helix–helix packing and interfacial pairwise interactions of residues in membrane proteins
J. Mol. Biol.
(2001)
J. Liang et al.
Are proteins well-packed?
Biophys. J.
(2001)
H. Edelsbrunner et al.
On the definition and the construction of pockets in macromolecules
Disc. Appl. Math.
(1998)

M. Facello

Implementation of a randomized algorithm for Delaunay and regular triangulations in three dimensions

Comput. Aided Genome Des.

(1995)

A. Krogh et al.

Hidden Markov models in computational biology. Applications to protein modeling

J. Mol. Biol.

(1994)

F.S. Collins et al.

A DNA polymorphism discovery resource for research on human genetic variation

Genome Res.

(1998)

E.S. Lander

The new genomics: global views of biology

Science

(1996)

T.P. Dryja et al.

Mutations within the rhodopsin gene in patients with autosomal dominant retinitis pigmentosa

N. Engl. J. Med.

(1990)

E.P. Smith et al.

Estrogen resistance caused by a mutation in the estrogen-receptor gene in a man

N. Engl. J. Med.

(1994)

I. Barroso et al.

Dominant negative mutations in human PPARgamma associated with severe insulin resistance, diabetes mellitus and hypertension

Nature

(1999)

A. Bonnardeaux et al.

Angiotensin II type 1 receptor gene polymorphisms in human essential hypertension

Hypertension

(1994)

K.P. Vatsis et al.

Diverse point mutations in the human gene for polymorphic N-acetyltransferase

Proc. Natl Acad. Sci. USA

(1991)

Q. Wang et al.

Positional cloning of a novel potassium channel gene: KVLQT1 mutations cause cardiac arrhythmias

Nature Genet.

(1996)

C. Hassett et al.

Human microsomal epoxide hydrolase: genetic polymorphism and functional expression in vitro of amino acid variants

Hum. Mol. Genet.

(1994)

A. Yoshida et al.

Molecular abnormality of an inactive aldehyde dehydrogenase variant commonly found in Orientals

Proc. Natl Acad. Sci. USA

(1984)

R.L. Proia et al.

Synthesis of beta-hexosaminidase in cell-free translation and in intact fibroblasts: an insoluble precursor alpha chain in a rare form of Tay-Sachs disease

Proc. Natl Acad. Sci. USA

(1982)

L. Stryer

Biochemistry

(1995)

Cited by (0)

View full text

Journal of Molecular Biology

Structural Location of Disease-associated Single-nucleotide Polymorphisms

Abstract

Introduction

Section snippets

Many disease-associated nsSNPs are located in pockets or voids

Discussion

Geometric locations of mutation sites

Supplementary Files

Acknowledgements

Am. J. Hum. Genet.

J. Biol. Chem.

Trends Genet.

J. Biol. Chem.

J. Biol. Chem.

J. Mol. Biol.

Biophys. J.

J. Mol. Biol.

Biophys. J.

Disc. Appl. Math.

Comput. Aided Genome Des.

J. Mol. Biol.

A DNA polymorphism discovery resource for research on human genetic variation

Genome Res.

The new genomics: global views of biology

Science

Mutations within the rhodopsin gene in patients with autosomal dominant retinitis pigmentosa

N. Engl. J. Med.

Estrogen resistance caused by a mutation in the estrogen-receptor gene in a man

N. Engl. J. Med.

Dominant negative mutations in human PPARgamma associated with severe insulin resistance, diabetes mellitus and hypertension

Nature

Angiotensin II type 1 receptor gene polymorphisms in human essential hypertension

Hypertension

Diverse point mutations in the human gene for polymorphic N-acetyltransferase

Proc. Natl Acad. Sci. USA

Positional cloning of a novel potassium channel gene: KVLQT1 mutations cause cardiac arrhythmias

Nature Genet.

Human microsomal epoxide hydrolase: genetic polymorphism and functional expression in vitro of amino acid variants

Hum. Mol. Genet.

Molecular abnormality of an inactive aldehyde dehydrogenase variant commonly found in Orientals

Proc. Natl Acad. Sci. USA

Synthesis of beta-hexosaminidase in cell-free translation and in intact fibroblasts: an insoluble precursor alpha chain in a rare form of Tay-Sachs disease

Proc. Natl Acad. Sci. USA

Biochemistry