Molecules of InterestCupins: the most functionally diverse protein superfamily?
This review describes the diversity of structure and function in this superfamily that comprises a wide range of metalloenzymes and non-enzymic factors such as the seed globulins.
Introduction
Identification of the cupin superfamily was originally based on the realisation that the wheat protein germin, an unusual thermostable protein produced during the early phase of germination in wheat embryos, shared a nine amino acid sequence (HI/THPRATEI) with a stress-related protein, a spherulin, produced during starvation of the slime mould Physarum polycephalum. This similarity was then extended to a group of germin-like proteins (GLPs) in dicotyledonous plants, and subsequent comparative sequence analysis noted a much weaker, though consistent, level of similarity to the globulin storage proteins such as the desiccation-tolerant vicilins and legumins from plant seeds and spores. Knowledge of the 3D structure of these seed proteins enabled a more detailed structure-based alignment to be produced and this revealed a larger grouping of proteins, including many microbial examples (Dunwell and Gane, 1998); collectively they were given the annotation of cupins on the basis of their β-barrel shape (‘cupa’ is the Latin term for small barrel) (Dunwell, 1998). The characteristic cupin domain comprises two conserved motifs, each corresponding to two β-strands, separated by a less conserved region composed of another two β-strands with an intervening variable loop. The total size of the intermotif region varies from a minimum of 11 AAs in some microbial enzymes, to ca. 50 AAs in the non-enzymatic seed storage proteins, and to >100 AAs in certain eukaryotic transcription factors and dioxygenases (Dunwell et al., 2000, Dunwell et al., 2001). For Motif 1 the characteristic conserved sequence was originally designated as G(X)5HXH(X)3,4E(X)6G and G(X)5PXG(X)2H(X)3N for Motif 2 (Fig. 1); these two motifs were defined before confirmation (Woo et al., 2000) of the previous prediction (Gane et al., 1998) that the two His residues and the Glu residue in Motif 1, together with the His residue in Motif 2, acted as ligands for the binding of the active site metal manganese ion in the archetypal cupin, germin (Fig. 2a) (see below). It is now clear that the primary sequence of two motifs is much less conserved than first suggested. For example, one of the two His residues in Motif 1 is substituted by Gln or Asp/Glu in some isomerases or dioxygenases (Fig. 1).
The various enzymatic and non-enzymatic functions will be described below in an overall survey of the cupin superfamily. It has been estimated previously that there are minimum of 18 different functional subclasses (Dunwell et al., 2001), but this figure is now known to be much higher. However, experimental biochemical evidence can only be provided for a minority of these classes. It should be emphasised that much of the recent information on this superfamily has been derived from the rapidly increasing number of cupin structures (Fig. 2b) that have been resolved by X-ray crystallography or NMR techniques (Table 1), and by the subsequent use of structure-based alignments. Data from these studies have confirmed many of the early predictions about possible members of the superfamily, but also have extended membership to proteins with very limited primary sequence similarity and not included in the original reviews.
The different major cupin subgroups will be considered in turn below, based upon whether the proteins comprise a single cupin domain, or whether they have a duplicated (bicupin) or multicupin (>2 cupin domains) structure. In a review of this length, only a minimum of information can be provided about each type; emphasis will be given to recent publications and to proteins known to be present in plants or in microbes associated with plants. The reader is referred to previous reviews (Dunwell et al., 2000, Dunwell et al., 2001) for more detail.
Section snippets
Monocupins
These are proteins that include a single cupin domain, either at the centre of a simple protein (ca. 80–200 AAs in length) or as one of several domains in a complex protein (200–ca. 1000 AAs). They mostly comprise enzymes, but microbial non-enzymatic transcription factors (eg AraC), in which a sugar-binding effector domain is linked to a DNA-binding domain, are also well known.
Bicupins
These proteins are most likely to have evolved from the duplication and then fusion of a single domain ancestor (Dunwell et al., 2000, Dunwell et al., 2001), though the possibility remains in some instances of the fusion of two different monocupin precursors.
Multicupins
Recent analysis of genomic data has identified several examples of proteins that contain multiple (>2) copies of cupin domains (Dunwell, unpublished). These include potential examples such as the Arabidopsis protein gi|25406927 that comprises four 2OG-Fe2+ domains; to date it has no assigned function.
Concluding remarks
This review has described the remarkable chemical diversity found in a single protein superfamily, and confirmed the cupins as being amongst the most versatile of all protein folds described to date (Anantharanan et al., 2003). This realisation has come largely from the recent ability efficiently to integrate information from genomic sequencing projects with the great increase in the number of 3D protein structures (Table 1). Without this integration it would not have been possible to
Acknowledgements
Part of the work described here was supported by grants from the Biotechnology and Biological Sciences Research Council (BBSRC).
References (48)
- et al.
JmjCcupin metalloenzyme-like domains in jumonji, hairless and phospholipase A2β
Trends Biochem. Sci.
(2001) - et al.
A 2-oxoglutarate-dependent dioxygenase is integrated in DIMBOA-biosynthesis
Phytochem.
(2003) - et al.
Hypoxia-inducible factor (HIF) asparagine hydroxylase is identical to factor inhibiting HIF (FIH) and is related to the cupin structural family
J. Biol. Chem.
(2002) - et al.
Crystal structure of the copper-containing quercetin 2,3-dioxygenase from Aspergillus japonicus
Structure
(2002) - et al.
Structure of phaseolin at 2.2 Å resolution. Implications for a common vicilin/legumin structure and the genetic engineering of seed storage proteins
J. Mol. Biol.
(1994) - et al.
Isolation of wound-responsive genes from chestnut (Castanea sativa) microstems by mRNA display and their differential expression upon wounding and infection with the chestnut blight fungus (Cryphonectria parasitica)
J. Physiol. Mol. Plant Pathol.
(2002) - et al.
In vivo functional dissection of human inner kinetochore protein CENP-C
J. Struct. Biol.
(2002) - et al.
Structure and mechanism of anthocyanidin synthase from Arabidopsis thaliana
Structure
(2002) - et al.
Identification and analysis of a conserved immunoglobulin E-binding epitope in soybean G1a and G2a and peanut Ara h 3 glycinins
Archiv. Biochem. Biophys.
(2002) - et al.
Human and bacterial oxidative demethylases repair alkylation damage in both RNA and DNA
Nature
(2003)