Introduction

High-throughput omic technologies (metabolomics, proteomics, transcriptomics, etc.) continue to generate a wealth of data that, when analysed in unison, potentially offer the key to our most comprehensive understanding of the fundamental complexities of life to date. Climate change, developing cleaner energy sources, combating environmental pollution, halting loss of biodiversity, improving nutrition and formulating directed heath care are some of the major challenges facing society today. In order to address these challenges using omic technologies, it is important to be able to place the data they produce in context. Questions one might want to ask include: What was the original purpose of the experiment? What was the biological sample used? Where was it located? When was it obtained? What were the environmental conditions? This type of ‘data about data’ (or meta-data), is most often stored in laboratory note books, occasionally summarised for human consumption in scientific journals but, all too frequently, some information remains only in the mind of the investigator. Concomitant with an increasing number of omic experiments now being carried out, the quantity of associated experimental meta-data has reached unprecedented levels and traditional approaches to biological meta-data management are no longer adequate. In order to make use of this knowledge, there is a pressing need for formal approaches to experimental meta-data annotation, storage and retrieval. Here we (the Metabolomics Standards Initiative-Environmental Context Working Sub-Group, MSI-ECWSG) present a first step towards addressing this need in the context of environmental metabolomics.

Scope

The scope of our effort is to identify, develop and disseminate a core set of reporting requirements necessary for the minimal description of biological samples and procedures particular to environmental metabolomic experiments.

This effort should be considered within the wider context of the reporting requirements for all types of biological samples in metabolomics experiments currently being developed by the Metabolomics Society (http://www.metabolomicssociety.org/)—Metabolomics Standards Initiative (MSI) of which it is part (See Preface, this issue). Additionally, the requirements we have identified should be thought of as minimal and should not be considered as an exhaustive set.

Aim

It is our intention that these reporting requirements should guide standardised annotation and support the development of a data exchange format and ontology for the dissemination and meaningful interpretation of environmentally derived metabolomic data. Furthermore, they should aim to do so in a range of identifiable contexts, such as academic journals, software tools and public databases.

What is environmental metabolomics?

Consistent with a general description of sample for omics technology (Morrison et al. 2006a), we define here a ‘biological sample’ as a discrete entity comprised of one or more organism(s), or parts thereof. Examples include: a species of plant or animal, a community of bacteria, a tissue biopsy or a biofluid (e.g. urine). Furthermore, a biological sample may also include a physical substrate as a component part. Examples include soil, sediment, seawater or ice-core samples.

A search of the literature provides many and varied definitions of the term ‘environment’, indicating the context dependency and semantic heterogeneity in which it is used. Famously, when asked for his definition, Albert Einstein replied: “The environment is everything that is not me”. This erudite response captures the fact that the environment is defined as the totality of circumstances external to a definable entity. In the context of these reporting requirements we define the environment as everything external to the ‘biological sample’ under investigation. The challenge therefore, is not to provide reporting requirements for everything, but to provide a framework where the environmental features considered relevant to a particular sample, which will be highly context dependent, can be captured in a structured form. We have also tried to arrange our requirements in a manner which makes them flexible and extensible, so that they can adapt to the needs of the community in the future.

This working group considers ‘environmental metabolomics’ to be the application of metabolomics to the investigation of both free-living organisms obtained directly from the natural environment (whether studied in that environment or transferred to a laboratory for further experimentation), and of organisms reared under laboratory conditions (whether studied in the laboratory or transferred to the environment for further experimentation), where any laboratory experiments specifically serve to mimic scenarios encountered in the natural environment.

Data generated by metabolomics experiments are often heavily influenced by a variety of other factors such as diet (Stella et al. 2006), gender (Plumb et al. 2003; Stanley et al. 2005), age (Pears et al. 2005), parasite load (Kant et al. 2004; Wang et al. 2004) the breed/strain of the animal under study (Gavaghan McKee et al. 2006; Plumb et al. 2003) as well as diurnal (Plumb et al. 2003) and oestrus cycles (Bollard et al. 2001).

Environmental stressors such as temperature may also play an important role in affecting metabolic phenotypes. For example, an NMR based metabolomics study has documented the effect of elevated seawater temperature on host-pathogen-drug interactions in bacterial infected red abalone (Haliotis rufescens), a shellfish, treated with oxytetracyclin (OTC). At only 4°C above ambient temperature subtle metabolic differences between OTC-treated and untreated abalone were detectable, but were completely absent in animals maintained in ambient temperature seawater (Rosenblum et al. 2006). Population-specific metabolic phenotypes, based on metabolite fingerprinting (Dunn et al. 2005), were detected within the same species (Arabidopsis lyrata ssp. petraea) from plants grown from seed collected in different locations across Europe (http://www.petraea.shef.ac.uk/). These metabolic fingerprints for A. l. petraea were affected by exposure to low temperatures; other metabolic studies in different populations of Arabidopsis thaliana have also observed this (Hannah et al. 2006).

The above examples illustrate the importance of being as comprehensive as possible in recording information about environmental metabolomics experiments.

Diversity of participation

As a working group, we aim, through our diverse membership to represent the views of environmental life science community members working in the area of environmental metabolomics in an unbiased and open fashion and have consulted widely within the environmental science community. The group itself includes members from academia and government bodies, and is composed of computer scientists, knowledge engineers, bioinformaticians, plant physiologists, environmental toxicologists, as well as environmental, physical and biophysical chemists.

Environmental metabolomic ‘real-world examples’

The group has engaged with environmental metabolomics practitioners through an iterative process; collecting and analysing real-world examples, generating reporting requirements and then validating the examples against them. This has been an extremely important process in the development of the requirements. Four examples were used to help define the reporting requirements, two of which (3 and 4 below) have been validated against the requirements. Details of all the examples can be found on the project website (http://msi-workgroups.sourceforge.net/bio-metadata/reporting/env/). The examples cover a range of taxa and environmental conditions and include: (1) Identifying and defining the bases of individual and population susceptibility and adaptation to environmental pollutants in fish: An integrated ‘omic’ approach (http://www.biosciences.bham.ac.uk/fishtoxicogenomics/); (2) Plant responses to abiotic stress at range margins (http://www.petraea.shef.ac.uk/); (3) Bioprospection of cyanobacteria and microalgae (Soukup et al. in preparation); and (4) Comparison of the effects of two compounds with established mode of action on global and specific biochemical responses in animal models (http://nomiracle.jrc.it/).

We have also employed ‘mind-mapping’ techniques (Castro et al. 2006) for sharing and discussing the semi-formal conceptual structure of the requirements. These can also be found on the project website (http://msi-workgroups.sourceforge.net/bio-metadata/reporting/env/).

Omic standards initiatives

In order to be most effective, the standardised annotation of environmental metabolomic data should integrate with and draw upon experience from other omic standardisation efforts. In developing these reporting requirements we have considered knowledge gained by a number of allied initiatives, all of which aim to standardise best practice within their respective communities, including the following: The Microarray Gene Expression Data (MGED) society’s—Minimum Information About a Microarray Experiment (MIAME) (Brazma et al. 2001); the Genomics Standard Consortium’s (GSC)—Minimum Information about Genomic Sequences (MIGS) (Field et al. 2006); Minimum Information About a Metabolomics Experiment (MIAMET) (Bino et al. 2004); a data model for plant metabolomics known as ArMet (Architecture for Metabolomics) (Jenkins et al. 2004) and the Human Proteome Organisation—Proteomics Standard Initiative’s (HuPO-PSI)—Minimum Information About a Proteomics Experiment (MIAPE) (Taylor 2006).

In particular, we drew upon knowledge and experience gained from the extension of MIAME for environmental transcriptomics (MIAME/Env) (Morrison et al. 2006b), which parallels the efforts of this group. We identified and inherited core reporting requirements fundamental to the description of environmental data from this effort, such as geographical location and environmental habitat.

Recognising the need for consistency, we are also interacting with two organisations that aim to define commonalities and harmonise the efforts of what is becoming an increasingly diverse array of initiatives, namely, the Reporting Structure for Biological Investigations working group (RSBI) (Sansone et al. 2006) and the umbrella organization—Minimum Information about Biological and Biomedical Investigations (MIBBI, previously MIcheck) (Taylor et al. in revision). The MIBBI portal (http://micheck.sf.net/) is a ‘one-stop shop’ of extant and in-progress projects with the goal of fostering collaborative development and ultimately, promoting integration. This checklist has been registered with the MIBBI portal.

Standard

Use of ontologies

‘An ontology may take a variety of forms, but necessarily it will include a vocabulary of terms, and some specification of their meaning. This includes definitions and an indication of how concepts are inter-related which collectively impose a structure on the domain and constrain the possible interpretations of terms’ (Uschold et al. 1998).

We recommend that all of the reporting requirements mentioned in this document should reference publicly available ontologies or controlled vocabularies (CVs) wherever possible, such as those registered under the Open Biomedical Ontologies umbrella (OBO, http://obo.sourceforge.net). Currently in the early stages of development, the Ontology for Biomedical Investigations (OBI, http://obi.sourceforge.net/) promises to be particularly useful for describing the biological sample and associated environment. Our terminology requirements and recommendations will also be collected by the MSI Ontology Working Group (http://msi-ontology.sourceforge.net/) (Sansone et al. this issue), which is registered under OBO.

Reporting requirements for environmental metabolomics

The following are the reporting requirements developed by the ECWSG for environmental metabolomic experiments.

Instructions to users of the reporting requirements

We have presented these reporting requirements in a number of sections each focussed on one portion of the reporting process that we term ‘requirement groups’. The sub-grouping reflected in these sections has been designed to allow their interconnection, thus enabling as wide a variety of experimental models to be described as possible. It is not the aim of these reporting requirements to be prescriptive about how one should perform such experiments. Rather, we hope the manner in which we have organised them will be flexible enough to meet the demands of a dynamic, innovative and rapidly evolving discipline, both now and in the future.

These requirement groups and sub-groups are not a linear checklist. Therefore, one should not attempt to work through them in sequence. Instead, when describing a study, please consider the requirements from any of the following groups which are relevant. Please note that you may need to use some of the groups more than once in your description and some you may not use at all. For worked examples, see the project website (http://msi-workgroups.sourceforge.net/bio-metadata/reporting/env/).

The requirements have been split into three top-level sections:

  1. 1.

    Sample (S)—descriptions of the biological sample(s) involved in the study.

  2. 2.

    Environment (E)—descriptions of the environment(s) involved in the study.

  3. 3.

    Process (P)—descriptions of the processes involved in the study.

For each part of your experimental description, you should identify which group is most suitable, and follow the requirements of that group.

For example:

  • If your study involved obtaining shellfish from an aquaculture farm and cultivating them in a laboratory before analysis (Viant et al. 2003), then you should examine in particular those groups that describe: the biological sample (S1), the laboratory environment (E2), the aquatic environment (E3.2), and the processes carried out on the sample (P1–6).

  • If your study involved collecting whole plants from the side of a mountain, dissecting tissue and examining metabolites (e.g. see data reporting requirements from the ‘petraea’ project, available at http://www.petraea.shef.ac.uk/), then you should examine in particular those groups that describe: the biological sample (S1), the field environment (E1), the terrestrial environment (E3.1), the process of sampling (P1), and the process of dissection (P6).

All requested information outlined below should be considered as ‘strongly recommended’ for submission and any missing information should be justified. The exception is any requested information marked in arial font which should be considered as optional further information.

Requirement group—Sample (S)

S1—Description of the biological sample involved in the studyFootnote 1

  • Taxonomic classification of organism(s). Please give details of all organisms sampled, as far along the taxonomic scale as possible, ideally to the levels of genus, species and sub-species. Refer to a taxonomic classification, such as the NCBI taxonomy (http://www.ncbi.nlm.nih.gov/Taxonomy/) or the Integrated Taxonomic Information System (http://www.itis.gov/)

    • Also, include where possible:

      • Common name(s) (vernacular)

      • Genotype(s)

      • Ecotype(s)

  • Sample composition. Please provide details of the organism(s) that constitute the sample. Amounts may be described in ‘absolute’ (number of individuals) or relative terms (50% Organism X; 25% Organism Y; 25% Unknown)

  • Sample Type. Please give details of the type of sample (For example; Community, Population, Whole Organism, Organ, Biofluid, etc.).

  • Condition of specimen(s). Please give details of general observations on health, etc.

  • Phenotypic characteristic(s)

  • Weight of specimen(s)

  • Age(s) of specimen(s)

  • Sex(es) of specimen(s)

  • Stage(s) of development

  • Image data. Please provide copies of any photographs of samples taken in the field during collection (or URLs to such images).

Requirement group—Environment (E)

E1—Description of ANY field environment

  • Geographic location. Please specify latitude and longitude in decimal degrees. If relevant, you can also provide position in a local coordinate system e.g. the UK’s Ordnance Survey grid (http://www.ordnancesurvey.co.uk/)

  • Altitude/depth. Please specify in meters above/below sea level

  • Habitat. Please provide a descriptor of habitat type.

  • Meteorological conditions. For example:

    • Weather type (for example, sunny, snowing etc)

    • Humidity

    • Precipitation

    • Wind speed and direction

  • Lunar/solar phase

  • All other measured parameters. For example:

    • Pollutant concentration(s)

E2—Description of ANY laboratory environment

Please also refer to the MSI—In vivo context requirements and the MSI Plant context requirements (Both at: http://msi-workgroups.sourceforge.net/bio-metadata/reporting/) (Griffin et al. this issue; Nikolau et al. this issue).

  • Laboratory address and contact details

E3.1—Description of terrestrial environment

See also the requirements in E1, E2.

  • Inclination and aspect

  • Substrate type

  • Substrate temperature

  • All other measured parameters. For example:

    • Substrate pH

    • Substrate organic content

E3.2—Description of aquatic environment

See also the requirements in E1, E2.

  • Sample(s) was submerged/emerged (how deep and for how long in this condition)

  • Water temperature

  • Tidal phase

  • All other measured parameters. For example:

    • pH

    • Salinity

    • Dissolved (in)organic content

E3.3—Description of atmospheric environment

See also the requirements in E1, E2.

  • Atmospheric temperature

  • All other measured parameters. For example:

    • Atmospheric pressure

    • (In)organic content

E4—Description of biotic environmentFootnote 2

Please also refer to the MSI—In vitro Biology/Microbiology context requirements (http://msi-workgroups.sourceforge.net/bio-metadata/reporting/) (van der Werf et al. this issue).

  • Description of host organism

  • Relationship of organism(s) to host. For example: parasitism

  • All other measured parameters. For example:

    • pH

    • Temperature

Requirement group—Process (P)

Any description of a process must be accompanied by information identifying who performed the action, and the time-point (relative or absolute) or interval over which it occurred. Relative time points can be expressed in terms of a specific interval, since or until, an identifiable event, i.e. 4 h since dose of environmental toxicant, or 20 min after sunrise.

All processes should include a concise, ‘free text’ description, detailing the specific process that has been applied or taken place. As detailed above, if possible one should aim to include terms in the description that reference and are obtained from an ontology or CV.

P1—Description of capture/sampling of sample or organism(s)

  • Description of capture/sampling procedure

    • Include details such the method used, for example: Netted, electrically stunned, anaesthetised, razor cut, etc.

  • Reason for capture

  • Other capture parameters

P2—Description of storage/preservation of sample(s)

  • Description of storage/preservation procedure

    • Include details such as the storage/preservation medium, for example: Liquid nitrogen, dry ice, formaldehyde, etc.

  • Reason for storage/preservation

  • Other storage/preservation parameters. For example:

    • Temperature

P3—Description of maintenance of organism(s)

Please also refer to the MSI—In vivo context requirements and the MSI Plant context requirements (Both at: http://msi-workgroups.sourceforge.net/bio-metadata/reporting/) (Griffin et al. this issue; Nikolau et al. this issue).

  • Description of maintenance procedure

    • Include details such as the type of housing, for example: Cage, aquaria, continuous culture, seed bag or plant pot, etc.

  • Reason for maintenance

  • Other maintenance parameters. For example:

    • Feeding regime

    • Cage dimensions

P4—Description of transportation of samples or organism(s)

Transportation involves storage and/or maintenance; see also the requirements in either P2 or P3 as appropriate.

  • Description of transportation procedure

    • Include details such as the means of transport, for example: Refrigerated container, etc.

  • Reason for transportation

  • Other transportation parameters

P5—Description of acclimation of organism(s)

Acclimation involves maintenance; see also the requirements in P3 as appropriate.

  • Description of acclimation procedure

    • Include details such as the type of housing, for example: Cage, aquaria, continuous culture, seed bag or plant pot, etc.

  • Reason for acclimation

  • Other acclimation parameters

P6Description of general manipulation of sample or organism(s)

Record details of any controlled manipulation as part of the study

  • Manipulation type. For example: Perturbation such as exposure to a toxicant, dissection, sacrifice etc.

  • Description of manipulation procedure. For example: Perturbation of specific environmental parameter; dissection of specific tissue

  • Reason for manipulation

  • Other manipulation parameters

Request for feedback

Environmental metabolomics is a diverse and heterogeneous discipline and we welcome input and representation from all members of the community. Please get involved in this working group by joining the mailing list at: http://msi-workgroups.sourceforge.net/. Alternatively please send any comments or feedback about these requirements to: Msi-workgroups-feedback@lists.sourceforge.net

The reporting requirements detailed above will be subject to revision by the ECWSG in response to the needs of the community. For the most up to date version of the requirements please refer to the project website: http://msi-workgroups.sourceforge.net/bio-metadata/reporting/.

Discussion

We have formed an active working group that has made significant progress in the development of a standard set of reporting requirements for the description of biological samples. These requirements have been developed in order to aid the standardised annotation, dissemination and interpretation of data with respect to the domain of environmental metabolomic experiments. However, we consider that the modular structure of the requirement groups should promote their reuse and will facilitate easier integration and harmonisation with other reporting requirements outside of this domain.

We believe that the decision to devolve the requirements analysis into four sub-groups (in-vivo/mammalian biology; plant biology; in-vitro/microbiology; environmental) has ensured that each of these domains has received appropriate representation from their respective communities. As a next step, we recommend that the efforts of the groups should come together to form a unified set of reporting requirements to represent the ‘biological context of metabolomics experiments’.

There are a growing number of reporting requirements developed in association with distinct technological domains, such as transcriptomics and proteomics. For a comprehensive list, see those detailed on the MIBBI portal http://micheck.sf.net/. In order to avoid duplication of effort, we suggest that the wider biological community would best be served if these efforts came together to identify a single ‘technology independent’ set of reporting requirements for biological samples and their manipulations.

These reporting requirements will also inform the development of data exchange standards (Hardy et al. this issue) in order to provide a mode of transport which will meet an urgent demand from the metabolomics community.