Elsevier

Phytochemistry

Volume 65, Issue 1, January 2004, Pages 7-17
Phytochemistry

Molecules of Interest
Cupins: the most functionally diverse protein superfamily?

https://doi.org/10.1016/j.phytochem.2003.08.016Get rights and content

Abstract

The cupin superfamily of proteins, named on the basis of a conserved β-barrel fold (‘cupa’ is the Latin term for a small barrel), was originally discovered using a conserved motif found within germin and germin-like proteins from higher plants. Previous analysis of cupins had identified some 18 different functional classes that range from single-domain bacterial enzymes such as isomerases and epimerases involved in the modification of cell wall carbohydrates, through to two-domain bicupins such as the desiccation-tolerant seed storage globulins, and multidomain transcription factors including one linked to the nodulation response in legumes. Recent advances in comparative genomics, and the resolution of many more 3-D structures have now revealed that the largest subset of the cupin superfamily is the 2-oxyglutarate-Fe2+ dependent dioxygenases. The substrates for this subclass of enzyme are many and varied and in total amount to probably 50–100 different biochemical reactions, including several involved in plant growth and development. Although the majority of enzymatic cupins contain iron as an active site metal, other members contain either copper, zinc, cobalt, nickel or manganese ions as a cofactor, with each cofactor allowing a different type of chemistry to occur within the conserved tertiary structure. This review discusses the range of structures and functions found in this most diverse of superfamilies.

This review describes the diversity of structure and function in this superfamily that comprises a wide range of metalloenzymes and non-enzymic factors such as the seed globulins.

  1. Download : Download high-res image (350KB)
  2. Download : Download full-size image

Introduction

Identification of the cupin superfamily was originally based on the realisation that the wheat protein germin, an unusual thermostable protein produced during the early phase of germination in wheat embryos, shared a nine amino acid sequence (HI/THPRATEI) with a stress-related protein, a spherulin, produced during starvation of the slime mould Physarum polycephalum. This similarity was then extended to a group of germin-like proteins (GLPs) in dicotyledonous plants, and subsequent comparative sequence analysis noted a much weaker, though consistent, level of similarity to the globulin storage proteins such as the desiccation-tolerant vicilins and legumins from plant seeds and spores. Knowledge of the 3D structure of these seed proteins enabled a more detailed structure-based alignment to be produced and this revealed a larger grouping of proteins, including many microbial examples (Dunwell and Gane, 1998); collectively they were given the annotation of cupins on the basis of their β-barrel shape (‘cupa’ is the Latin term for small barrel) (Dunwell, 1998). The characteristic cupin domain comprises two conserved motifs, each corresponding to two β-strands, separated by a less conserved region composed of another two β-strands with an intervening variable loop. The total size of the intermotif region varies from a minimum of 11 AAs in some microbial enzymes, to ca. 50 AAs in the non-enzymatic seed storage proteins, and to >100 AAs in certain eukaryotic transcription factors and dioxygenases (Dunwell et al., 2000, Dunwell et al., 2001). For Motif 1 the characteristic conserved sequence was originally designated as G(X)5HXH(X)3,4E(X)6G and G(X)5PXG(X)2H(X)3N for Motif 2 (Fig. 1); these two motifs were defined before confirmation (Woo et al., 2000) of the previous prediction (Gane et al., 1998) that the two His residues and the Glu residue in Motif 1, together with the His residue in Motif 2, acted as ligands for the binding of the active site metal manganese ion in the archetypal cupin, germin (Fig. 2a) (see below). It is now clear that the primary sequence of two motifs is much less conserved than first suggested. For example, one of the two His residues in Motif 1 is substituted by Gln or Asp/Glu in some isomerases or dioxygenases (Fig. 1).

The various enzymatic and non-enzymatic functions will be described below in an overall survey of the cupin superfamily. It has been estimated previously that there are minimum of 18 different functional subclasses (Dunwell et al., 2001), but this figure is now known to be much higher. However, experimental biochemical evidence can only be provided for a minority of these classes. It should be emphasised that much of the recent information on this superfamily has been derived from the rapidly increasing number of cupin structures (Fig. 2b) that have been resolved by X-ray crystallography or NMR techniques (Table 1), and by the subsequent use of structure-based alignments. Data from these studies have confirmed many of the early predictions about possible members of the superfamily, but also have extended membership to proteins with very limited primary sequence similarity and not included in the original reviews.

The different major cupin subgroups will be considered in turn below, based upon whether the proteins comprise a single cupin domain, or whether they have a duplicated (bicupin) or multicupin (>2 cupin domains) structure. In a review of this length, only a minimum of information can be provided about each type; emphasis will be given to recent publications and to proteins known to be present in plants or in microbes associated with plants. The reader is referred to previous reviews (Dunwell et al., 2000, Dunwell et al., 2001) for more detail.

Section snippets

Monocupins

These are proteins that include a single cupin domain, either at the centre of a simple protein (ca. 80–200 AAs in length) or as one of several domains in a complex protein (200–ca. 1000 AAs). They mostly comprise enzymes, but microbial non-enzymatic transcription factors (eg AraC), in which a sugar-binding effector domain is linked to a DNA-binding domain, are also well known.

Bicupins

These proteins are most likely to have evolved from the duplication and then fusion of a single domain ancestor (Dunwell et al., 2000, Dunwell et al., 2001), though the possibility remains in some instances of the fusion of two different monocupin precursors.

Multicupins

Recent analysis of genomic data has identified several examples of proteins that contain multiple (>2) copies of cupin domains (Dunwell, unpublished). These include potential examples such as the Arabidopsis protein gi|25406927 that comprises four 2OG-Fe2+ domains; to date it has no assigned function.

Concluding remarks

This review has described the remarkable chemical diversity found in a single protein superfamily, and confirmed the cupins as being amongst the most versatile of all protein folds described to date (Anantharanan et al., 2003). This realisation has come largely from the recent ability efficiently to integrate information from genomic sequencing projects with the great increase in the number of 3D protein structures (Table 1). Without this integration it would not have been possible to

Acknowledgements

Part of the work described here was supported by grants from the Biotechnology and Biological Sciences Research Council (BBSRC).

References (48)

  • R Anand et al.

    Structure of oxalate decarboxylase from Bacillus subtilis at 1.75 Å resolution

    Biochemistry

    (2002)
  • V Anantharanan et al.

    Emergence of diverse biochemical activities in evolutionarily conserved scaffolds of proteins

    Curr. Opin. Chem. Biol.

    (2003)
  • L Aravind et al.

    The DNA-repair protein AlkB, EGL-9, and leprecan define new families of 2-oxoglutarate- and iron-dependent dioxygenases

    Genome Biol. 2, Research

    (2001)
  • T.J Begly et al.

    AlkB mystery solvedoxidative demethylation of N1-methyladenine and N3-methylcytosine adducts by a direct reversal process

    Trends Biochem. Sci.

    (2003)
  • S.L Bishop-Hurley et al.

    Isolation and molecular characterization of genes expressed during somatic embryo development in Pinus radiata

    Plant Cell Tissue Organ Cult

    (2003)
  • Boscariol, R.L., Almeida, W.A., Derbyshire, M.T., Mourao Filho, F.A., Mendes, B.M. 2003. The use of the PMI/mannose...
  • E.R Cober et al.

    Partial resistance to white mold in a transgenic soybean line

    Crop Sci.

    (2003)
  • B.G De Los Reyes et al.

    Cultivar-specific seedling vigor and expression of a putative oxalate oxidase germin-like protein in sugar beet (Beta vulgaris L.)

    Theor. Appl. Genet.

    (2003)
  • J.M Dunwell

    Cupinsa new superfamily of functionally-diverse proteins that include germins and plant seed storage proteins

    Biotechnol. Genet. Engin. Rev.

    (1998)
  • J.M Dunwell

    Future prospects for transgenic crops

    Phytochem. Rev.

    (2002)
  • J.M Dunwell

    Structure, function and evolution of the legumin seed storage proteins

  • J.M Dunwell et al.

    Evolution of functional diversity in the cupin superfamily

    Trends Biochem. Sci.

    (2001)
  • J.M Dunwell et al.

    Microbial relatives of seed storage proteinsconservation of motifs in a functionally diverse superfamily of enzymes

    J. Mol. Evol.

    (1998)
  • J.M Dunwell et al.

    Microbial relatives of the seed storage proteins of higher plantsconservation of structure, and diversification of function during evolution of the cupin superfamily

    Microbiol. Mol. Biol. Rev.

    (2000)
  • Cited by (0)

    View full text