Journal of Molecular Biology
Structural Location of Disease-associated Single-nucleotide Polymorphisms
Introduction
Single-nucleotide polymorphisms (SNPs) are the most common form of human genetic variation. The coding regions of the human genome contain about 500,000 SNPs.1 Among these, the non-synonymous SNPs (nsSNPs) cause changes in the amino acid residues, and are likely to be an important factor contributing to the functional diversity of encoded proteins in the human population.2 There are well known examples where nsSNPs affect the functional roles of proteins in signal transduction of visual, hormonal and other stimulants,3., 4. in gene regulation by altering DNA and transcription factor binding,5 and in maintaining the structural integrity of cells and tissues.6 In addition, by affecting drug-target proteins such as G-protein coupled receptors,7 enzymes,8 ion channels9 and proteins involved in the detoxification pathways,10 nsSNPs play important roles in the diverse responses in efficacy and toxicity of the human population to therapeutic agents.
nsSNPs can affect human physiology through many different mechanisms. nsSNPs may inactivate functional sites of enzymes11 or alter splice sites and thereby form defective gene products.12 They may destabilize proteins, or reduce protein solubility.13 To understand the mechanism of phenotypic variations due to nsSNPs, it is important to assess the structural consequences of the alteration of amino acid residue. A classical example is sickle-cell anemia, the first molecular disease discovered.14 First studied by Sir John Kendrew 50 years ago, sickle-cell anemia results from a single base change and residue V is changed to E at position 6 of the beta chain of hemoglobin. This residue is located at the interface of the alpha and beta chains, and the E6V mutation reduces the solubility of the deoxygenated form of hemoglobin markedly. The knowledge of structural role of this mutation is essential for understanding the disease mechanism of sickle-cell anemia.
With the advent of high-throughput SNP detection techniques, the number of known nsSNPs is growing rapidly, providing an important source of information for studying the relationship between genotypes and phenotypes of human diseases. An important study has shown recently that there is a strong correlation between disease-associated polymorphism and sites of low solvent-accessibility.15 In this study, we introduce new geometric classifications for characterizing disease associated SNPs. Here, we attempt to align SNPs to protein surface pockets and voids that may be potential functional binding regions.
Section snippets
Many disease-associated nsSNPs are located in pockets or voids
Compared to control nsSNPs, disease-associated nsSNPs derived from the online mendelian inheritance in man (OMIM) database are more likely to be located in well-formed surface pocket or void locations. Of the disease-associated nsSNPs derived from OMIM, 88% are located in pockets or voids (with 95% confidence interval of 77–100%), while 68% of non-disease control SNPs are located in pockets or voids (with 95% confidence intervals of 55–83%). An example of this type of nsSNP is insulin receptor
Discussion
In this study, we have described a new approach for SNP classification. For SNPs that can be mapped to protein structures, we classify them into three geometric sites: those in a pocket or a void, those on a convex region or a shallow depressed region, and those buried in the interior. Specifically, we find that the majority of disease-associated nsSNPs are located in voids or pockets on proteins, and only a small number of SNPs are buried completely in the interior (Figure 2). For disease SNPs
Geometric locations of mutation sites
Amino acid residues are located at different geometric locations. Some of them are located in the interior of a protein and have zero solvent-accessibility. Others may be on the outer boundary surface of the protein, or on the wall surface of an interior void. In this study, we formally classify amino acid residues altered by nsSNPs to be located at three different geometric sites: (1) in the interior of proteins (type I); (2) on the wall of a surface pocket or an interior void (type P); and
Supplementary Files
Acknowledgements
This work was supported by funding from the National Science Foundation (CAREER DBI0133856, DBI0078270, and MCB998008).
References (49)
- et al.
Identification of mutations in the repeated part of the autosomal dominant polycystic kidney disease type 1 gene PKD1, by long-range PCR
Am. J. Hum. Genet.
(1999) - et al.
In vitro splicing deficiency induced by a C to T mutation at position-3 in the intron 10 acceptor site of the phenylalanine hydroxylase gene in a patient with phenylketonuria
J. Biol. Chem.
(1995) - et al.
Towards a structural basis of human non-synonymous single nucleotide polymorphisms
Trends Genet.
(2000) - et al.
Substitution of glutamic acid for alanine 1135 in the putative catalytic loop of the tyrosine kinase domain of the human insulin receptor. A mutation that impairs proteolytic processing into subunits and inhibits receptor tyrosine kinase activity
J. Biol. Chem.
(1993) - et al.
Characterization of a temperature-sensitive mutation in the hormone binding domain of the human estrogen receptor. Studies in cell extracts and intact cells and their implications for hormone-dependent transcriptional activation
J. Biol. Chem.
(1992) - et al.
Predicting the functional consequences of non-synonymous single nucleotide polymorphisms: structure-based assessment of amino acid variation
J. Mol. Biol.
(2001) - et al.
Computation of molecular electrostatics with boundary element methods
Biophys. J.
(1997) - et al.
Helix–helix packing and interfacial pairwise interactions of residues in membrane proteins
J. Mol. Biol.
(2001) - et al.
Are proteins well-packed?
Biophys. J.
(2001) - et al.
On the definition and the construction of pockets in macromolecules
Disc. Appl. Math.
(1998)