Journal of Molecular Biology
Databases/Web ServersAlloRep: A Repository of Sequence, Structural and Mutagenesis Data for the LacI/GalR Transcription Regulators
Graphical abstract
Introduction
Sequence- and structure-based comparisons of protein homologs have been frequently used to predict amino acids critical to function. With advances in high-throughput sequence and structure determination, the amount of data available has exploded. To translate these data into meaningful information, myriad computational tools have been developed (i) to detect patterns of amino acid change and (ii) to make predictions about homolog function and mutational outcomes. Development and validation of these programs requires experimental datasets against which to test predictions.
One commonly used (e.g., Refs. [1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12]) dataset comprises in vivo characterization of ~ 4100 mutational variants of the lactose repressor protein (LacI) [13], [14], [15], [16]. In addition, scores of mutational variants for LacI and its paralogs have been the subject of detailed biochemical and expanded phenotypic studies over the last three decades. However, these additional experimental results have been under-utilized by the computational community due to the challenge of curating the relevant information scattered throughout the literature. Nevertheless, these studies provide in-depth insights that would be extremely valuable for assessing computational predictions. Further, structures for numerous LacI/GalR homologs have become available, mainly through the Protein Structure Initiative [17], [18].
Here, we present AlloRep, a repository of published experimental information for homologs of the LacI/GalR family. AlloRep contains (i) manually curated sequence alignments for > 3100 sequences, (ii) experimental results for > 5750 LacI/GalR mutational variants, and (iii) residue–residue contact networks that were derived from 65 crystallographic structures available for full-length homologs and/or their regulatory domains of 17 LacI/GalR subfamilies (Fig. 1). This database† can be queried using MySQL. Information contained in the AlloRep database complements information about predicted regulons for > 1300 LacI/GalR homologs, which was recently added to the RegPrecise database [19].
The data in AlloRep also have important applications in protein design: These data can be used to test robustness of protein engineering approaches and to hypothesize novel ideas for engineering synthetic transcription repressors. As proof of principle, we used AlloRep to merge structural, mutational, and sequence data to identify a position that can be mutated to alter allosteric regulation. Most LacI/GalR homologs are allosterically regulated: the DNA binding domains of the apoproteins bind to their cognate DNA sequences with high affinity, and DNA binding is modulated when a distant site on the regulatory domain is occupied by a small molecule effector (or in some cases, a heteroprotein). The LacI/GalR paralogs have evolved specificities for different DNA sequences and allosteric effectors [20]. Although domain recombination shows that the allosteric mechanism may largely be the same, the magnitude and direction of allosteric response can be modulated [20], [21]. In general, predicting the locations of allosteric positions has been challenging. Our prediction was successfully tested in a synthetic, chimeric repressor that was previously constructed from LacI and the cellobiose repressor (CelR).
Section snippets
Overview of the AlloRep database
The AlloRep database comprises 14 tables and can be queried using MySQL. Example queries and a database scheme are supplied in the accompanying “Data in Brief” publication [22]. A key advantage of AlloRep is that all entries have been mapped to the analogous position of a single homolog—the full-length Escherichia coli LacI protein. This is a powerful way to compare different homologs, as well as different structural conformations of the same protein. This mapping allows a single query to
Conclusion
The AlloRep database† organizes available sequence, structural, and experimental data for the LacI/GalR protein family. This dataset will be useful for the development and validation of computational analyses of protein families. We are committed to the continued integration of mutagenesis, structural, and sequence information as they become available for LacI/GalR homologs. We invite the scientific community to send their mutagenesis data to AlloRep so that this experimental resource remains
Sequence retrieval and alignments
The sequence identity boundaries of the new LacI/GalR subfamilies were defined as described in Ref. [26]. For the subfamilies represented by the new PSI PDB structures, a structure-based reference alignment was constructed with PROMALS3D [46] and integrated into the whole family alignment with the program MARS-Prot3 [25]. For all new homologs, subfamily alignments were constructed using MUSCLE [47] and representative sequences were integrated into the whole family alignment with MARS-Prot.
Contact maps
Acknowledgments
This work was supported by the Fundação para a Ciência e Tecnologia grant SFRH/BPD/73058/2010 (F.L.S.), the National Institutes of Health grant GM 079423 (L.S.K.), the University of Kansas Medical Center Biomedical Research Training Program (D.J.P.), the joint National Science Foundation/National Institute of General Medical Sciences Mathematical Biology Program R01GM104974 (M.R.B.), the Robert A. Welch Foundation grant C-1729 (M.R.B.), and private funds. We thank Tina Perica for many
References (50)
- et al.
Genetic studies of the lac repressor. XIV. Analysis of 4000 altered Escherichia coli lac repressors reveals essential and non-essential residues, as well as “spacers” which do not require a specific sequence
J. Mol. Biol.
(1994) Genetic studies of the lac repressor. XII. Amino acid replacements in the DNA binding domain of the Escherichia coli lac repressor
J. Mol. Biol.
(1984)- et al.
Genetic studies of the lac repressor. XIII. Extensive amino acid replacements generated by the use of natural and synthetic nonsense suppressors
J. Mol. Biol.
(1990) - et al.
Genetic studies of the Lac repressor. XV: 4000 single amino acid substitutions and analysis of the resulting phenotypes on the basis of the protein structure
J. Mol. Biol.
(1996) - et al.
Allostery in the LacI/GalR family: Variations on a theme
Curr. Opin. Microbiol.
(2009) - et al.
Comparing the functional roles of nonconserved sequence positions in homologous transcription repressors: Implications for sequence/function analyses
J. Mol. Biol.
(2010) - et al.
Evolution of protein structures and interactions from the perspective of residue contact networks
Curr. Opin. Struct. Biol.
(2013) - et al.
Network analysis of protein structures identifies functional residues
J. Mol. Biol.
(2004) - et al.
Characterization and cloning of celR, a transcriptional regulator of cellulase genes from Thermomonospora fusca
J. Biol. Chem.
(1999) - et al.
Site-directed mutagenesis by overlap extension using the polymerase chain reaction
Gene
(1989)
Prediction of functional specificity determinants from protein sequences using log-likelihood ratios
Bioinformatics
Determinants, discriminants, conserved residues—A heuristic approach to detection of functional divergence in protein families
PLoS ONE
An automated stochastic approach to the identification of the protein specificity determinants and functional subfamilies
Algorithms Mol. Biol.
The use of orthologous sequences to predict the impact of amino acid substitutions on protein function
PLoS Genet.
Bi-directional SIFT predicts a subset of activating mutations
PLoS ONE
Tracing evolutionary pressure
Bioinformatics
Qualifying the relationship between sequence conservation and molecular function
Genome Res.
Multi-RELIEF: A method to recognize specificity determining residues from multiple sequence alignments using a machine-learning approach for feature weighting
Bioinformatics
Predicting the effect of missense mutations on protein function: Analysis with Bayesian networks
BMC Bioinformatics
Physicochemical constraint violation by missense substitutions mediates impairment of protein function and disease severity
Genome Res.
A comparative study of machine-learning methods to predict the effects of single nucleotide polymorphisms on protein function
Bioinformatics
Predicting deleterious amino acid substitutions
Genome Res.
1,000 structures and more from the MCSG
BMC Struct. Biol.
Trends in structural coverage of the protein universe and the impact of the Protein Structure Initiative
Proc. Natl. Acad. Sci. U. S. A.
Comparative genomics and evolution of regulons of the LacI-family transcription factors
Front. Microbiol.
Cited by (0)
- 7
Present address: D. J. Parente and J. A. Hessman, University of Kansas School of Medicine, Kansas City, KS 66160, USA.