Maximum-likelihood crystallization

https://doi.org/10.1016/S1047-8477(03)00047-9Get rights and content

Abstract

The crystallization facility of the TB Structural Genomics Consortium, one of nine NIH-sponsored structural genomics pilot projects, employs a combinatorial random sampling technique in high-throughput crystallization screening. Although data are still sparse and a comprehensive analysis cannot be performed at this stage, preliminary results appear to validate the random-screening concept. A discussion of statistical crystallization data analysis aims to draw attention to the need for comprehensive and valid sampling protocols. In view of limited overlap in techniques and sampling parameters between the publicly funded high-throughput crystallography initiatives, exchange of information should be encouraged, aiming to effectively integrate data mining efforts into a comprehensive predictive framework for protein crystallization.

Introduction

The National Institutes of Health (NIH) funding for a substantial Structural Genomics (SG) Initiative (Norvell and Zapp-Machalek, 2000), under way since fall of 2000, provides significant public funding to nine P50 Structural Genomics Centers. One of the main objectives of these centers is the advancement of high-throughput crystallography, including methods for high-throughput protein crystallization. Two major conclusions can be drawn already at this early stage from initial communications from the public efforts. First, it has become obvious that a bottleneck potentially more significant to high throughput than protein crystallization is the production of proteins (in particular, “inherently crystallizable” proteins; Segelke, 2001) and second, that a number of different crystallization methods can reasonably be adapted to achieve high throughput. Few reports are yet available on new crystallization statistics and predictions from the initiatives, and a distinct probability exists that under pressure to produce structures (which is the ultimate goal of a SG Center), the opportunity to create comprehensive and consistent crystallization databases through the centers may be lost. This concern is not entirely unfounded, as both the omission of negative results and the lack of the most basic quantity in statistics, the number of trials, have made the publicly available databases (Biological Macromolecule Crystallization Database or BMCD, Gilliland et al., 1994; Protein Data Bank or PDB, Berman et al., 2000) very difficult to use for the purpose of rigorous statistical analysis and inference without significant restructuring and annotation (Hennessy et al., 2000). This brief report discusses general aspects of experimental design of crystallization experiments in view of statistical analysis and machine learning, interspersed with implementation and preliminary results obtained at the crystallization facility of the TB Structural Genomics Consortium (TBSGC; Goulding et al., 2002), one of the nine NIH-funded structural genomics pilot projects. A major objective of the TBSGC crystallization facility has been to maximize overall operational efficiency within the budget constraints of public funding. An overview of our underlying philosophy and the challenges faced in the first 2 years of the TBSG crystallization facility is provided in a separate review (Rupp, 2003). Technical implementation details, including crystallization robotics, crystal recognition, data collection, and structure solution, are provided in the special literature (Rupp et al., 2002).

Section snippets

High-throughput crystallization

In a high-throughout environment, large numbers of samples, with limited prior knowledge of each, need to be processed. A compromise is necessary between effort, time, and material spent on in-depth physicochemical analysis of the crystallization process of each specific protein leading to rational approaches directed toward improving crystallization (Dale et al., 1999; D’Arcy, 1994), while still achieving high throughput and the required high overall success rate for the effort. This is a most

Crystallization cocktail production

As a consequence of de novo cocktail design for each protein construct, a large number of crystallization cocktails need to be prepared for random screening and optimization. We thus implemented customizable random screen generation in the computer program CRYSTOOL (Segelke and Rupp, 1998) and interfaced it with a liquid-handling robot to automatically produce crystallization cocktails in 96-well format (Rupp et al., 2002). Production of de novo random screens is time consuming (20–40 min per

Analysis of experiments

At the lowest level of crystallization data analysis, one seeks to make inferences about hot spots in success rates by simple frequency statistics, resembling the way the sparse matrix set (Jancarik and Kim, 1991) was assembled. Frequency analysis can be carried out globally for all proteins or by categorizing proteins into groups with distinct properties. Important for the long run are correlations (generalizations) within the phase space and with known properties of the protein, i.e., priors

Outlook and suggestions for the future

From our initial operation, as limited as the analysis of results has to be at this time, we still can draw a number of conclusions and propose some considerations for future work. The concept of random sampling seems to perform at least as well other methods (about 1/3 of the TB proteins received could be crystallized robotically without any prior assumptions, tips, or tricks), and of those, about 1/3 diffracted beyond 2.5 Å without need for optimization. In addition, by consistent random

Acknowledgements

I thank the members of the TB Structural Genomics Consortium crystallization facility team (B.W. Segelke, H.I. Krupka, T. Lekin, J. Schafer, and D. Toppani) for their contributions to the development of techniques described in this paper. K.A. Kantardjieff, CSUF, has provided assistance with statistical data analysis and manuscript revisions. The cloning and protein production facilities under J. Perry, C. Goulding, D. Eisenberg (UCLA), T. Terwilliger, M. Park, and G. Waldo (LANL) have supplied

References (35)

  • N.E. Chayen

    Comparative studies of protein crystallization by vapour-diffusion and microbatch techniques

    Acta Crystallogr. D

    (1998)
  • N.E. Chayen et al.

    Protein crystallization for genomics: towards high-throughput optimization techniques

    Acta Crystallogr. D

    (2002)
  • R. Cudney et al.

    Screening and optimization strategies for macromolecular crystal growth

    Acta Crystallogr. D

    (1994)
  • G.E. Dale et al.

    Crystal engineering: deletion mutagenesis of the 24 kDa fragment of the DNA gyrase B subunit from Staphylococcus aureus

    Acta Crystallogr. D

    (1999)
  • A. D’Arcy

    Crystallizing proteins—a rational approach?

    Acta Crystallogr. D

    (1994)
  • A.M. Edwards et al.

    Protein production: feeding the crystallographers and NMR spectroscopists

    Nat. Struct. Biol. Suppl.

    (2000)
  • G.L. Gilliland et al.

    The biological macromolecule crystallization database, version 3.0: new features, data, and the NASA archive for protein crystal growth data

    Acta Crystallogr. D

    (1994)
  • Cited by (40)

    • Current trends in protein crystallization

      2016, Archives of Biochemistry and Biophysics
      Citation Excerpt :

      HTC implies the used of robotic systems to set-up the crystallization experiments (may include the preparation/distribution of precipitant cocktail) [34,35] and could be the first choice for initial screening when available at the home laboratory or when it is commercially accessible at reasonable prices. It is certainly the best way to screen a sufficient number of different crystallization cocktails with low consumption of sample [48]. Nonetheless the selection of the initial screen will be the first blind selection to be done on the crystallization loop.

    • Benefits of automated crystallization plate tracking, imaging, and analysis

      2005, Structure
      Citation Excerpt :

      The general thrust has been to seek generic methodologies that are amenable to automation, miniaturization, and parallelization. In order to optimize the technologies and manage the process effectively, informatics developments are also required; indeed, the proper capture of both positive and negative results and the potential to query and mine the databases may contribute significantly to the efficiency of the process (Hui and Edwards, 2003; Rupp, 2003b; Page et al., 2003; Goh et al., 2004; Page and Stevens, 2004). In the early stages, the major effort in structural genomics was focused on establishing the core technologies and the major stumbling block was usually the production of soluble protein suitable for crystallization studies.

    • The MORPHEUS II protein crystallization screen

      2015, Acta Crystallographica Section:F Structural Biology Communications
    • Protein crystallization with microseed matrix screening: Application to human germline antibody Fabs

      2014, Acta Crystallographica Section F:Structural Biology Communications
    • On the need for an international effort to capture, share and use crystallization screening data

      2012, Acta Crystallographica Section F: Structural Biology and Crystallization Communications
    • Pi sampling: A methodical and flexible approach to initial macromolecular crystallization screening

      2011, Acta Crystallographica Section D: Biological Crystallography
    View all citing articles on Scopus
    View full text