Maximum-likelihood crystallization
Introduction
The National Institutes of Health (NIH) funding for a substantial Structural Genomics (SG) Initiative (Norvell and Zapp-Machalek, 2000), under way since fall of 2000, provides significant public funding to nine P50 Structural Genomics Centers. One of the main objectives of these centers is the advancement of high-throughput crystallography, including methods for high-throughput protein crystallization. Two major conclusions can be drawn already at this early stage from initial communications from the public efforts. First, it has become obvious that a bottleneck potentially more significant to high throughput than protein crystallization is the production of proteins (in particular, “inherently crystallizable” proteins; Segelke, 2001) and second, that a number of different crystallization methods can reasonably be adapted to achieve high throughput. Few reports are yet available on new crystallization statistics and predictions from the initiatives, and a distinct probability exists that under pressure to produce structures (which is the ultimate goal of a SG Center), the opportunity to create comprehensive and consistent crystallization databases through the centers may be lost. This concern is not entirely unfounded, as both the omission of negative results and the lack of the most basic quantity in statistics, the number of trials, have made the publicly available databases (Biological Macromolecule Crystallization Database or BMCD, Gilliland et al., 1994; Protein Data Bank or PDB, Berman et al., 2000) very difficult to use for the purpose of rigorous statistical analysis and inference without significant restructuring and annotation (Hennessy et al., 2000). This brief report discusses general aspects of experimental design of crystallization experiments in view of statistical analysis and machine learning, interspersed with implementation and preliminary results obtained at the crystallization facility of the TB Structural Genomics Consortium (TBSGC; Goulding et al., 2002), one of the nine NIH-funded structural genomics pilot projects. A major objective of the TBSGC crystallization facility has been to maximize overall operational efficiency within the budget constraints of public funding. An overview of our underlying philosophy and the challenges faced in the first 2 years of the TBSG crystallization facility is provided in a separate review (Rupp, 2003). Technical implementation details, including crystallization robotics, crystal recognition, data collection, and structure solution, are provided in the special literature (Rupp et al., 2002).
Section snippets
High-throughput crystallization
In a high-throughout environment, large numbers of samples, with limited prior knowledge of each, need to be processed. A compromise is necessary between effort, time, and material spent on in-depth physicochemical analysis of the crystallization process of each specific protein leading to rational approaches directed toward improving crystallization (Dale et al., 1999; D’Arcy, 1994), while still achieving high throughput and the required high overall success rate for the effort. This is a most
Crystallization cocktail production
As a consequence of de novo cocktail design for each protein construct, a large number of crystallization cocktails need to be prepared for random screening and optimization. We thus implemented customizable random screen generation in the computer program CRYSTOOL (Segelke and Rupp, 1998) and interfaced it with a liquid-handling robot to automatically produce crystallization cocktails in 96-well format (Rupp et al., 2002). Production of de novo random screens is time consuming (20–40 min per
Analysis of experiments
At the lowest level of crystallization data analysis, one seeks to make inferences about hot spots in success rates by simple frequency statistics, resembling the way the sparse matrix set (Jancarik and Kim, 1991) was assembled. Frequency analysis can be carried out globally for all proteins or by categorizing proteins into groups with distinct properties. Important for the long run are correlations (generalizations) within the phase space and with known properties of the protein, i.e., priors
Outlook and suggestions for the future
From our initial operation, as limited as the analysis of results has to be at this time, we still can draw a number of conclusions and propose some considerations for future work. The concept of random sampling seems to perform at least as well other methods (about 1/3 of the TB proteins received could be crystallized robotically without any prior assumptions, tips, or tricks), and of those, about 1/3 diffracted beyond 2.5 Å without need for optimization. In addition, by consistent random
Acknowledgements
I thank the members of the TB Structural Genomics Consortium crystallization facility team (B.W. Segelke, H.I. Krupka, T. Lekin, J. Schafer, and D. Toppani) for their contributions to the development of techniques described in this paper. K.A. Kantardjieff, CSUF, has provided assistance with statistical data analysis and manuscript revisions. The cloning and protein production facilities under J. Perry, C. Goulding, D. Eisenberg (UCLA), T. Terwilliger, M. Park, and G. Waldo (LANL) have supplied
References (35)
- et al.
Re-clustering the database for crystallization of macromolecules
J. Cryst. Growth
(1998) - et al.
Macromolecular crystallization in a high throughput laboratory—the search phase
J. Cryst. Growth
(2001) Efficiency analysis of sampling protocols used in protein crystallization screening
J. Cryst. Growth
(2001)Curr. Opin. Struct. Biol.
(2000)- et al.
Strategies in the crystallization of glycoproteins and protein complexes
J. Cryst. Growth
(1992) Randomness
(1998)- et al.
The protein data bank
Nucleic Acids Res.
(2000) - et al.
The prospects of nanocrystallography
Acta Crystallogr. D
(2002) Experimental design, quantitative analysis, and the cartography of crystal growth
- et al.
Protein crystallization using incomplete factorial experiments
J. Biol. Chem.
(1979)
Comparative studies of protein crystallization by vapour-diffusion and microbatch techniques
Acta Crystallogr. D
Protein crystallization for genomics: towards high-throughput optimization techniques
Acta Crystallogr. D
Screening and optimization strategies for macromolecular crystal growth
Acta Crystallogr. D
Crystal engineering: deletion mutagenesis of the 24 kDa fragment of the DNA gyrase B subunit from Staphylococcus aureus
Acta Crystallogr. D
Crystallizing proteins—a rational approach?
Acta Crystallogr. D
Protein production: feeding the crystallographers and NMR spectroscopists
Nat. Struct. Biol. Suppl.
The biological macromolecule crystallization database, version 3.0: new features, data, and the NASA archive for protein crystal growth data
Acta Crystallogr. D
Cited by (40)
Current trends in protein crystallization
2016, Archives of Biochemistry and BiophysicsCitation Excerpt :HTC implies the used of robotic systems to set-up the crystallization experiments (may include the preparation/distribution of precipitant cocktail) [34,35] and could be the first choice for initial screening when available at the home laboratory or when it is commercially accessible at reasonable prices. It is certainly the best way to screen a sufficient number of different crystallization cocktails with low consumption of sample [48]. Nonetheless the selection of the initial screen will be the first blind selection to be done on the crystallization loop.
Benefits of automated crystallization plate tracking, imaging, and analysis
2005, StructureCitation Excerpt :The general thrust has been to seek generic methodologies that are amenable to automation, miniaturization, and parallelization. In order to optimize the technologies and manage the process effectively, informatics developments are also required; indeed, the proper capture of both positive and negative results and the potential to query and mine the databases may contribute significantly to the efficiency of the process (Hui and Edwards, 2003; Rupp, 2003b; Page et al., 2003; Goh et al., 2004; Page and Stevens, 2004). In the early stages, the major effort in structural genomics was focused on establishing the core technologies and the major stumbling block was usually the production of soluble protein suitable for crystallization studies.
The MORPHEUS II protein crystallization screen
2015, Acta Crystallographica Section:F Structural Biology CommunicationsProtein crystallization with microseed matrix screening: Application to human germline antibody Fabs
2014, Acta Crystallographica Section F:Structural Biology CommunicationsOn the need for an international effort to capture, share and use crystallization screening data
2012, Acta Crystallographica Section F: Structural Biology and Crystallization CommunicationsPi sampling: A methodical and flexible approach to initial macromolecular crystallization screening
2011, Acta Crystallographica Section D: Biological Crystallography