Copyright © 2004 Elsevier Ltd All rights reserved.
Progress in the development and application of computational methods for probabilistic protein design
Received 22 September 2003;
Abstract
Proteins exhibit a wide range of physical and chemical properties, including highly selective molecular recognition and catalysis, and are also key components in biological metabolic, catabolic, and signaling pathways. Given that proteins are well-structured and can be rapidly synthesized, they are excellent targets for engineering both molecular structure and biological function. Computational analysis of the protein design problem allows scientists to explore sequence space and systematically discover novel protein molecules. Nonetheless, the complexity of proteins, the subtlety of the determinants of folding, and the exponentially large number of possible sequences impede the search for peptide sequences compatible with a desired structure and function. Directed search algorithms, which identify directly a small number of sequences, have achieved some success in identifying sequences with desired structures and functions. Alternatively, one can adopt a probabilistic approach. Instead of a finite number of sequences, such calculations result in a probabilistic description of the sequence ensemble. In particular, by casting the formalism in the language of statistical mechanics, the site-specific amino acid probabilities of sequences compatible with a target structure may be readily estimated. These computed probabilities are well suited for both de novo protein design of particular sequences as well as combinatorial, library-based protein engineering. The computed site-specific amino acid profile may be converted to a nucleotide base distribution to allow assembly of a partially randomized gene library. The ability to synthesize readily such degenerate oligonucleotide sequences according to the prescribed distribution is key to constructing a biased peptide library genuinely reflective of the computational design. Herein we illustrate how a standard DNA synthesizer can be used with only a slight modification to the synthesis protocol to generate a pool of degenerate DNA sequences, which encodes a predetermined amino acid distribution with high fidelity.
Keywords: Computational protein design; Combinatorial library; Protein engineering; Biased codon library
Article Outline
- 1. Introduction
- 1.1. Protein design
- 1.2. “Directed” methods of protein design
- 1.3. Probabilistic approaches to protein design
- 1.4. Combinatorial experiments
- 2. Methods for probabilistic protein design
- 2.1. Alignment of related sequences
- 2.2. Directed search methods to build profiles
- 2.3. Statistical theory of sequence ensembles
- 3. Gene libraries from site-specific probabilities
- 3.1. Computational design of gene libraries
- 3.2. Synthesis of oligonucleotides subject to arbitrary nucleotide probabilities
- 4. Summary
- Acknowledgements
- Appendix A. Appendix
- A.1. Energy functions
- A.2. Solvation and hydrophobic energy
- A.3. Reference energy
- A.4. Rotamer and identity probabilities
- References






E-mail Article
Add to my Quick Links

Cited By in Scopus (9)







