1 Introduction

Nutrition research in the 20th century led to the discovery of the functions of essential nutrients. Nutritional recommendations have been made for populations to cover the needs of these essential nutrients, and to ensure the good functioning of the organism. Beyond these essential effects, it is also clear today that many of these nutrients, together with non-essential bioactive compounds also present in foods and the diet, interact with a number of metabolic pathways and influence health, reducing or increasing the risk of diseases such as cancers or cardiovascular diseases. Deciphering these complex interactions between nutrients and the human organism constitutes a considerable challenge for the 21st century (Doets et al. 2008).

The classical approaches in nutrition research are hypothesis-driven. Methods used to prove or disprove a hypothesis were largely derived from those used in pharmacology. However nutrients do not specifically interact with a defined target like some drugs but they most likely interact with a number of targets, metabolic pathways and functions. Furthermore the magnitude of their effects is often much lower than that commonly observed for drugs. Recent research on vitamin E illustrates the limits of these approaches: tocopherols do not only have a vitamin function but they are also powerful free radical scavengers. Added to fats, they limit their peroxidation and increase their shelf life. It is known from animal studies and short-term clinical trials that they can also limit LDL peroxidation in the artery wall and improve some surrogate markers of cardiovascular disease risk (Huang et al. 2002; Meydani 2004). However, despite such evidence, the results of large intervention studies were largely disappointing and did not show a reduction of disease and mortality outcome (Bjelakovic et al. 2007; Miller et al. 2004). Therefore, short-term intervention studies and the use of current surrogate markers failed to predict the effects of vitamin E supplementation on diseases and mortality.

Omics approaches and metabolomics in particular should allow to characterize the effects of a nutrient, a food or a diet with much more precision. Metabolomics allows to analyse hundreds of metabolites in a given biological sample (biofluid, tissue, cells, etc.). When applied to urine or plasma samples, it allows to differentiate individual phenotypes better than with conventional clinical endpoints or with small sets of metabolites (Assfalg et al. 2008; Brindle et al. 2002; Yang et al. 2004). It also allows to explore the metabolic effects of a nutrient in a more global way. In the field of nutrition, metabolomics has been used to characterize the effects of both a deficiency or a supplementation of different nutrients, and to compare the metabolic effects of closely related foods such as whole-grain or refined wheat flours (Fardet et al. 2007; Rezzi et al. 2007). It may also allow to better separate the effects of the diet from those of confounding factors such as age, gender, physiological states and lifestyle once the effects of these factors on the metabolome will have been characterized in sufficient details. Metabolomics and the food metabolome made of all the products of food digestion may also be used to estimate the food or nutrient intake from a urine, sera or plasma sample (Fardet et al. 2008b; Wishart 2008). Metabolomics may help solving problems associated to the methods currently used for measuring food intake (Manach et al. 2009). A literature search retrieved 128 papers dealing with metabolomics in human nutrition and published since 2001 (Web of Science, December 22, 2008). They included 45 original papers, 60 reviews and 23 papers focused on method development or the characterization of metabolome variability. Nearly two-third of these papers were published in the last 2 years.

In the majority of original papers (62%), proton NMR was used for data acquisition. However, due to its more limited sensitivity, not more than 60 different metabolites are commonly estimated in biological samples (Martin et al. 2007). HPLC-separations coupled with coulometric electrode array detectors are extremely sensitive (detection of subnanomolar electrochemically active species in sera), and can detect >1000 compounds in sera (Milbury 1997; Vigneau-Callahan et al. 2001), but their use remains limited by low throughput, inability to observe non-electrochemically active species and difficulties associated with metabolite identification. Mass spectrometry (MS) techniques are also highly sensitive and provide spectral information (exact mass of molecular ion, fragmentation patterns) which contribute to the identification of the metabolites (Dettmer et al. 2007). For these reasons, the number of MS-based metabolomics studies grows quickly and now exceeds that of NMR-based studies (Dettmer et al. 2007). Both targeted profiling (in which metabolites are known a priori) and fingerprinting (the identity of the metabolites of interest is established a posteriori) have been carried out in MS-based metabolomics. Targeted profiling is usually developed for quantification of a given class of metabolites (lipids, fatty acids, acylcarnitines, bile acids, organic acids, nucleosides, etc.). It has been used for many years in nutrition research. However the increasing power of MS techniques which allows today the simultaneous analysis of several hundred metabolites explains why it has been called metabolomics (Altmaier et al. 2008; Watkins et al. 2002). MS-based fingerprinting was only applied recently to nutrition with about 10 papers published in 2008 (Clish et al. 2004; Fardet et al. 2008a; Kuhl et al. 2008; Shen et al. 2008). This approach offers a considerable potential but progress is still hampered by many unsolved problems (Table 1) and most notably the lack of well established and standardized methods or procedures, and the difficulties still met in the identification of the discriminating metabolites (Jiye et al. 2005; Lawton et al. 2008; Wishart et al. 2008).

Table 1 Problems and recommendations for further developing MS-based metabolomics in nutrition research

This paper is based on the discussions held during the workshop “Tools and Methods for Mass Spectrometry Metabolomics in Nutrition” organized by NuGO, the European Nutrigenomics Organization (www.nugo.org) on December 12–14, 2007 in Clermont-Ferrand (France). Each section of the manuscript summarizes the problems identified and proposes recommendations to solve them.

2 Sampling strategy for metabolomics

In metabolomic studies, minimisation of unwanted sources of variation is important. Such variation can be broadly summarised as analytical and biological variation and examples include (Maher et al. 2007): (1) sample collection, storage and stability, (2) sample pre-treatment, including metabolite extraction, prior to analysis, (3) instrument variation/stability, (4) intra-individual variations due to environmental factors such as diurnal variation and stress, (5) inter-individual variations due to genetic factors, (6) inter-individual variations due to the presence/absence of disease. Failure to minimise such unwanted variation can have a negative impact on the outcome of the study resulting in, for example, identification of fewer biomarkers. This section of the workshop addressed some of the points above in the context of designing metabolomic studies for nutritional studies. Most metabolomic studies are designed to address a specific hypothesis and while a lot of attention is devoted to the design from the hypothesis view point it is easy to overlook sample collection, stability and storage issues. Recently, there have been a number of studies highlighting the importance of these points (Gika et al. 2007, 2008; Lauridsen et al. 2007; Maher et al. 2007; Saude et al. 2007; Saude and Sykes 2007; Shurubor et al. 2007; Teahan et al. 2006; Zhang et al. 2007). While a number of these studies have been carried out using NMR spectroscopy, the same issues and recommendations will also apply to LC–MS and GC–MS approaches.

When considering the design of a metabolomic study the first issue to consider is the time of the sampling. Diurnal variation has been documented in urine samples taken from healthy volunteers (Maher et al. 2007; Walsh et al. 2006). However, one cannot simply decide to collect fasting morning samples as changes have been reported between samples collected with a 2 h separation (Maher et al. 2007). To minimise variation a very specific description of the collection of urine sample needs to be given such as “first void midstream urine samples”.

The next issue to be addressed is the effects of preservatives on metabolic urinary profiles. Lauridsen et al. investigated the effects of addition of NaN3 and NaF to human urine samples (Lauridsen et al. 2007). The authors concluded that for long-term storage at −25 and −80°C there was no requirement for the preservative. However, if preservative has to be added, for example due to extended storage at room temperature, the recommended preservative is NaN3 because of limited interferences with NMR spectra. It was shown to slow down changes of the metabolite concentrations in urine kept at room temperature (Saude and Sykes 2007).

With respect to blood sample collection recent studies have also highlighted the importance of how collection is performed. The generic advice of collection of fasting blood samples is not enough if systematic bias is to be avoided. In fact, details such as clotting time, clotting temperature and treatment conditions prior to centrifugation need to be considered. Teahan et al. demonstrated that clotting time had an impact on the metabolic profile and that clotting on ice delayed the observed changes (Teahan et al. 2006). When collecting serum samples for metabolomics studies one must record the clotting time and temperature at which the clotting occurred and standardise across all samples. Another issue which is often overlooked is that blood collection tubes can release materials into the samples and interfere with the mass spectrometry analysis (Drake et al. 2004). In addition, batch differences have been reported for certain vacutainers and again these concerns should be addressed in study design. With respect to plasma samples one must consider the anticoagulant to be used and possibility of unwanted peaks.

Sample storage and stability for human biofluids with respect to metabolomics are issues that have been addressed for NMR based and LC–MS based approaches. Lauridsen et al. showed that urine samples should be stored at or below −25°C and recommended not to store at 4°C for prolonged periods (Lauridsen et al. 2007). No beneficial effect of storing at lower temperature was seen for the duration of this study (up to 26 weeks) using NMR based metabolomics. Similar results were reported by Maher et al. using NMR based metabolomics (Maher et al. 2007). A recent study by Gika et al. showed using untargeted LC–MS and UPLC–MS approaches that human urine samples were stable for up to 6 months stored at −20 and −80°C (Gika et al. 2008). Short term storage of human urine samples at 4°C was shown to be stable for up to at least 48 h using LC–MS profiling meaning that samples for this type of work can be considered stable in a chilled autosampler (Gika et al. 2008).

The effects of freeze thaw cycles on the metabolic profiles have been investigated using NMR, LC–MS and GC–MS. In the case of the LC–MS study, freeze thaw cycles of up to 9 cycles of human urine samples, taken from 6 individuals, did not impact on the clustering of the data in a PCA plot (Gika et al. 2007, 2008). Investigating the effects of the a freeze thaw cycle on the relative concentrations of 26 compounds in rat urine using GC–MS Zhang et al. showed that one freeze thaw cycle did not result in significant differences for the compounds analysed (Zhang et al. 2007). Studies in our laboratory showed that one freeze thaw cycle had minimal impact on the NMR spectra and the within bin correlation was above 0.97 for 90% of the regions studied (unpublished data). Saude and Sykes used a targeted NMR based approach and reported that repeated freeze thaw cycles over 4 weeks resulted in an intermediate degree of metabolite change when compared to storage at room temperature and at −80°C (Saude and Sykes 2007). Overall, to minimise confounding factors in urinary metabolomics it is recommended to keep freeze-thaw cycles to a minimum.

Other unwanted sources of variation that are important to consider in study design are recent consumption of certain foods and medication. Recently, it has been demonstrated that standardisation of the subjects diet reduced the variation in the urinary metabolic profiles (Walsh et al. 2006). In addition, there are many reports where diet, alcohol and medication have resulted in outlying samples. When setting out to design metabolomics studies it is essential that one considers food restrictions and medication and that subjects record their dietary and medication intake 24 h prior to biofluid collection and avoid the use of medication. A well planned study with collection of sufficient metadata (dietary, medication and physical activity at a minimum) should allow one to avoid the need for a pilot study and to use metabolomics to its full potential.

Blood samples contain a range of low molecular weight compounds and proteins and efficient removal of the proteins is necessary prior to LC–MS or GC–MS analysis for metabolomic studies. Since human blood contains a range of low molecular weight compounds with a wide range of concentrations, stability and ability to bind to proteins the development of an extraction procedure is complex. Recently, Want et al. examined a range of deproteinization methods in combination with LC–MS profiling and found that methanol precipitation was the most effective and reproducible approach resulting in the detection of over 2000 metabolite features and less than 2% protein (Want et al. 2006). A recent study investigated two protein precipitation methods for plasma samples and found the methanol method to be superior comparing the number of signals, sensitivity and reproducibility for use with UPLC–MS (Bruce et al. 2007). In addition, for use with GC–MS the methanol extraction procedure was found to be highly efficient and reproducible (Jiye et al. 2005). However, it is necessary to point out that these studies have used an untargeted profiling approach and in the case of targeted approaches the deproteinization protocol to be used will vary according to the metabolite classes targeted. While protein depletion is not necessary prior to acquisition of NMR spectra previous studies have shown that it can be useful for gaining information on low-concentration metabolites (Daykin et al. 2002; Tiziani et al. 2008).

Faecal samples are an example of another biofluid relevant to nutritional research which requires pre-treatment prior to metabolomic analysis. Recent papers have appeared in the literature using both water extraction and methanol extraction procedures of frozen faecal samples in conjunction with NMR based profiling (Jacobs et al. 2008; Saric et al. 2008). Both methods gave reproducible profiles which contained complementary information (Jacobs et al. 2008). In addition, Saric et al. investigated the differences between water extractions from fresh and frozen stool samples and only found higher concentrations of amino acids and glucose in frozen samples. As a result they concluded that use of frozen samples was acceptable (Saric et al. 2008).

All of the issues described above need to be considered and standard operating procedures (SOPs) drawn up for the biofluids used for each metabolomic study. The importance of this cannot be underestimated, especially in the case of multi-centre nutritional intervention studies, if meaningful results are to be obtained. In addition, the adaptation of a standard reporting system is essential for the description of the collection of biofluids relevant to nutritional research. To this end steps have been taken by the metabolomics standards initiative and a recent publication describes the recommended reporting requirements for biological samples (Griffin et al. 2007). In addition, NuGO intends to compile SOPs relevant to sample collection, storage and collection for nutritional metabolomics studies.

3 Mass spectrometry for metabolomics

Mass spectrometry (MS) has played an important role in the development of methods for profiling of metabolites due to its selectivity and sensitivity. Broad screening approaches of metabolites in biofluids combined with biostatistical tools for data evaluation have been developed, first by GC and GC–MS in the 1970s. In the 1990s, major improvements in interfacing LC and MS (electrospray and atmospheric pressure chemical ionization) enabled the use of LC–MS for metabolite profiling. However, more comprehensive metabolite profiling methods were only reported after 2000 (van der Greef and Smilde 2005). Actually, for the acquisition of a comprehensive metabolite profile more than one method has to be used (van der Greef et al. 2007). As the physicochemical properties (e.g. pK a, polarity, size) of metabolites cover a wide range, there is not one separation method (GC, LC or CE) to separate all metabolites with one method only. In addition, there is not one detector that can measure all metabolites, even a mass spectrometer cannot detect all metabolites as some metabolites do not ionize with a certain MS method, or as their concentration are too low. The dynamic range of most mass spectrometers is also still only 3–4 orders of magnitude, whereas the range of concentrations of metabolites in biological samples is often much larger. In addition, a specific challenge in nutritional metabolomics is the diversity of dietary compounds and of their metabolites formed after ingestion. Many of these compounds have not yet been fully described. As an example, 869 metabolites have been detected in tomato, including 494 not found in the main metabolite databases and still awaiting to be characterized (Iijima et al. 2008). With most metabolite profiling methods there are thus many unknown metabolites detected, and for many provisionally identified metabolites, standards for identification or quantification are often still lacking.

To analyse the different types of metabolites found in biological samples collected in nutrition experiments, proper sample preparation, metabolite separation and MS detection are needed. This implies the choice of the most appropriate separation method (GC, LC or CE), interface between separation method and MS, and type of MS detector. Alternatively, no separation at all is used, and a direct infusion approach is followed (Boernsen et al. 2005). Important criteria for selection of a method are coverage, selectivity, dynamic range, detection limit, accuracy, precision, and price per sample.

The advantages of using GC are high separation efficiency and reproducible retention times, which can be compared between different labs via the retention index concept using retention time markers. With GC–MS, universal electron ionization (EI) is most often used. It provides response for all metabolites and characteristic, reproducible and standardized mass spectral fingerprints. These mass spectra allow identification of the peak when searching in public and commercial databases. For the analysis of medium polar and polar metabolites with GC–MS, metabolites are first derivatized, mostly by oximation and subsequent silylation (Fiehn et al. 2000). The oximation step is to protect certain carbonyl groups, and to assure that only two peaks in a reproducible ratio are obtained for carbohydrates. The advantage of silylation is that a wide range of functional groups are derivatized such as hydroxyl, amine, amide, phosphate and thiol group. However, the disadvantage is that the derivatized products like some amino acids are not very stable, and can degrade during injection and separation. Therefore, internal quality markers are added to the sample to monitor the performance of the system (Kanani et al. 2008; Koek et al. 2006). For GC–MS, there are generally less options for MS detectors compared to LC–MS. Most often, a quadrupole MS or a time-of-flight (TOF)-MS detector are used. Alternatives are triple quadrupole MS and ion trap MS detectors. For comprehensive GC×GC, a fast TOF-MS is used. For the identification of metabolites with GC–MS, chemical ionization is often used to reveal the molecular weight of a metabolite. To increase the peak capacity of the GC separation method, i.e. increasing the number of separated metabolites, comprehensive multi-dimensional GC×GC approaches are increasingly used (Hagan et al. 2007; Koek et al. 2008). It has been shown that GC×GC–MS is in principle more robust for metabolite profiling than GC–MS. However, data is more complex, and no optimal solution for data processing is available yet (Koek et al. 2008).

For LC–MS, the advantage is that in principle no derivatization is required for the analysis of polar or high molecular weight metabolites, allowing fast analysis of small samples. A wide range of different detectors are available for LC–MS, ranging from ultra-high resolution MS such as Fourier transform ion cyclotron resonance (FT-ICR) or Orbitrap FT, and high resolution MS (TOF) to low resolution MS (ion traps, triple quads) up to hybrid systems. Most recent addition are ion-mobility TOF-MS systems (Dwivedi et al. 2008). Different methods have been developed depending on the nature of the metabolites of interest. Reversed-phase liquid chromatography (RPLC)–MS has been used for global profiling of metabolites (Plumb et al. 2006). A RPLC–MS approach is also used for the profiling of lipids, allowing the detection of more than hundred lipids of various classes (Hu et al. 2009; Laaksonen et al. 2006; Verhoeckx et al. 2004). Polar metabolites are mostly analyzed by hydrophilic interaction chromatography (HILIC–MS) (Bajad et al. 2006; Tolstikov and Fiehn 2002) or after derivatization before RPLC–MS (Carlson and Cravatt 2007b).

A disadvantage of LC–MS is ion suppression: ionization of metabolites may depend on the presence of matrix compounds, particularly with electrospray ionization (ESI) and, to a lesser extent, atmospheric pressure chemical ionization (APCI). This can be overcome to some extent by miniaturization of electrospray ionization to nanospray ionization and by a better separation of metabolites. Obviously, the best quantitative results can be obtained using isotopically labelled reference metabolites for each metabolite in a targeted approach, but this cannot be applied to the profiling of a large number of metabolites, either because the labelled metabolites are not available, or for cost reasons. Instead, isotope tagging methods can be developed for targeted metabolite classes (Guo et al. 2007). To increase the peak capacity in LC, options are smaller particles in LC columns (requiring often higher pressure of the LC) (Plumb et al. 2006), using longer columns such as monolithic columns (Tolstikov et al. 2003). Comprehensive LC×LC approach can also be used (Stoll et al. 2007), but it is less straightforward compared to GC×GC, as less options are available for refocusing peaks in the second dimension. High separation efficiency can also be obtained by CE–MS. However, migration times are often less reproducible and sensitivity is often lower as compared to LC–MS. Still, promising applications of CE–MS for metabolites carrying a charge at a certain pH have been reported (Soga et al. 2003).

The strategy for identifying metabolites in LC–MS differs from that used in GC–MS, as only the molecular ion is most often detected. Additional MS/MS fragmentation experiments are required to obtain more information about the structure of the metabolites. However, these MS/MS spectra depend on the equipment and experimental conditions used, and are usually not as comparable as they are for GC-EI-MS spectra (see Sect. 6 below). The use of TOF-MS and especially of Fourier transformed MS (FT-ICR or Orbitrap) allows the acquisition of metabolite profiles with high mass accuracy, i.e. better than 3 ppm. This allows often to identify the elemental composition of a metabolite using internal mass calibration, and with TOF MS data often the isotopic pattern is taken into account (Kind and Fiehn 2007). Then in combination with using databases and possibly MS/MS information, the provisional assignment of those metabolites present in the database is often possible. However, it should be mentioned that often several isomers are possible with the same elemental composition; and obviously not all metabolites, especially dietary compounds and biotransformed (after ingestion) plant metabolites, are present in databases yet. Next, the alignment of a set of LC–MS chromatograms is in principle more straightforward if data are acquired with high MS resolution.

Direct infusion (DI)-nanospray ionization-high resolution MS can also be used for metabolomics profiling (Boernsen et al. 2005), including for lipids (Han and Gross 2005). The advantage is high throughput of samples. In recent years the analytical performance of DI-MS has been improved and several applications in metabolomics have been published. Extension of the dynamic range was achieved using wide scan FT-ICR MS method, using collection of multiple adjacent selected ion monitoring windows that are stitched together (Southam et al. 2007). If proper internal standards are used, linearity up to nearly three decades can be achieved (Han et al. 2008). However, for those compounds where no proper correction with (preferably isotopically labelled) internals standards are possible, limits in the quantification of metabolites are commonly observed due to possible ion suppression effects. These can be prevented (at least to some extent) using more extensive sample preparation. Matrix assisted laser desorption ionization (MALDI)-MS has also been used for metabolite profiling, however, when applied to small molecules, it is generally less quantitative compared to ESI (Wang et al. 2009).

A key aspect in metabolite profiling methods with MS is how to realize comparability of studies and data. Recently, a set of minimum requirements for reporting chemical analysis of a metabolite profiling experiment has been suggested (Sumner et al. 2007). However, the minimum requirements for the validation of metabolite profiling methods have not been described yet, and one often encounters publications where the validation of an analytical metabolite profiling method is not well described. Validation should include at least a description of the calibration model (linearity and range), repeatability and intermediate precision, accuracy (use of matrix variation studies, etc.) and lower limit of quantification. In addition, the recovery, reproducibility, robustness (e.g. different technicians, freeze-thaw cycles, etc.) and carry over should be characterized. When large series of samples are analysed, the use of quality control samples is recommended to control the analytical systems performance, or to compensate for drift of the chromatographic system or the mass spectrometer (van der Greef et al. 2007). Comparison between metabolomics studies can be best achieved by reporting ultimately concentrations of identified peaks rather than relative peak areas. In addition, internal standards added to the sample prior to sample preparation improves quantification accuracy.

Nutritional metabolomics experiments can be conducted in two manners, (i) a non-targeted profiling approach (using direct infusion MS or LC/GC/CE–MS), where the obtained data may be only semi-quantitative, or (ii) a more targeted profiling approach aiming at quantitative data by using more internal standards and reference compounds, ultimately resulting in absolute concentrations. Often a non-targeted semi-quantitative profiling approach is applied first for hypothesis finding, and a targeted quantitative approach to zoom into biochemical processes and mechanisms, and to validate mechanisms of action or markers. The numbers of metabolites covered by the various MS analytical methods vary a lot, also depending whether it is a non-targeted profiling or a targeted approach. When using a GC–MS global profiling approach of plasma, the identity of only about 30–70% of the peaks detected is commonly assigned. In LC–MS experiments, much less peaks are generally assigned (Wishart et al. 2008). With targeted profiling, often 50–200 metabolites can be analyzed with one analytical method. When using a set of complementary methods, e.g. GC–MS and LC–MS, often about 100–500 metabolites can be analysed in blood in a targeted approach, and about 600–1000 can be detected in a fingerprinting mode (Jiye et al. 2005; Lawton et al. 2008; Shaham et al. 2008; van der Greef et al. 2007). Clearly coverage of the methods has to be improved to also include all metabolites of importance in nutrition research, eventually present at low concentrations and currently not measured in metabolomics studies so far published. A list of relevant metabolites for the nutritional research field will be helpful (see Sect. 7) to guide the development of analytical metabolomics platforms.

4 Extraction of information from mass spectrometry metabolomics data

Extraction of information from metabolomics data has always been a problematic process. This was true in the long history of comprehensive analysis of biological materials and still holds today. For example, in the study of medicinal plants and plant cell biotechnology, the metabolomics approach was daily routine. Back then, the low tech gold standard was thin layer chromatography (1D or 2D) in the ‘pre’ HPLC, GC, MS and NMR era, because it was cheap, and provided detailed information about sample composition (primary and secondary plant metabolites). Metabolites were visualized by coloring reaction with a multitude of different spraying reagents and/or fluorescence and the data was captured by photography and notes in logbooks. Data extraction was very laborious, inefficient and ineffective. Progress in analytical technology brought HPLC, GC MS, LC–MS and NMR to these labs, but also in the early days of these techniques data was captured on paper recorder rolls, photographic plates, etc. Only after the computer revolution in the 80’s digital data systems became commonplace to capture analytical data. We finally got digital data, and the main purpose of the software that came along with analytical instruments was visualization of data, and computer aided extraction of data in a manner similar to what was done in the past with a ruler, pair of scissors and other utensils.

Currently NMR and MS (LCMS and GCMS) are the metabolomics workhorses. In the past decade advances in analytical technology have provided us with reliable instrumentation capable of producing vast amounts of very rich data at increasing higher speeds simply because we need this (that’s what we told the instrument companies). The resulting data burden is commonly regarded as one of the major bottlenecks in analytical laboratories, and is not specific for metabolomics. The complexity and richness of metabolomics data (and also mass spec proteomics data) makes the ‘omics’ community the hardest hit.

Extraction of information from mass spectrometry metabolomics data may be done with a diverse set of approaches, methods and tools. The fundamental problem of data extraction: the same data, extracted with n different methods, gives n different data sets and probably n different statistical models (if a single statistical procedure is applied). In applications where major changes to the metabolome occur (e.g. toxicity models) the differences in the final outcome are relatively small. Unfortunately, in all other applications where metabolome changes are more subtle (i.e. nutrition research), data extraction errors have a dramatic impact on the outcome of a study. For 1D-NMR popular data extraction methods are binning (total signal in fixed chemical shift regions of 0.01–0.05 ppm) and peak-picking (peak finding, baseline subtraction and alignment) (Forshed et al. 2005; Vogels et al. 1996). Recently, deconvolution of NMR spectra into compounds has been added to these two (Weljie et al. 2006). This approach is superior from a fundamental, theoretical perspective because it delivers quantitative metabolite data and not just NMR signal vectors. It is obvious these three fundamentally different methods will give different study outcomes when applied to the same data.

Hyphenated mass spectrometry techniques such as LC–MS, GC–MS, 2D-NMR and in particular GC×GC–MS, result in far more complex data than 1D NMR due to the increased dimensionality. On top of differences between data extraction methods, all these techniques have a large number of operation and acquisition modes, and other factors increasing the data diversity, e.g. for mass spectrometry:

  • centroid or profile data acquisition

  • fixed scan time (ToF, Q) versus variable scan time (IT)

  • positive, negative and alternating/mixed polarity

  • full scan MS1 versus SRM in MSn

  • nominal mass resolution to ultra high resolution

  • the ability to mix of all of the above

  • proprietary data formats and instrument specific acquisition modes/experiments

All instruments come with software for quantitative data processing as it is used in routine analysis (e.g. bioanalysis, residue analysis, environmental analysis, etc.). Unfortunately this software was not made for processing very rich comprehensive metabolomics LC–MS and GC–MS data (e.g. 10,000 features in a single file) of unknown composition. This software can be used, but this limits the number of compounds to a few hundred. It should be realized that multiple target processing with the standard software results in the best data quality because the process is supervised and transparent. Integration errors are relatively easy to detect and correct through well designed and functional user interfaces. Furthermore, the output does not contain contaminant peaks, isotopes, (auto)adducts, fragments, multiple charge states etc., in other words it is very clean data. However, the price of good data quality is time, in the order of 2–4 weeks (or more), depending on the number of target metabolites and number of samples. This data extraction approach has in addition to its time consuming nature another more relevant major drawback. It does not take full advantage of the richness of the data and the conceptual aspect of metabolomics, the data is incomplete, information and markers are being lost in the process. The only solution to this is the use of brute force comprehensive automated data extraction tools.

Instrument companies have only recently become active in producing automated data extraction tools for metabolomics and proteomics (e.g. Waters, AB, Thermo, Agilent, Bruker) for selling Plug & Play Metabolomics Systems. Major disadvantages of these proprietary tools are (1) they only work for specific types of data and data formats and (2) they are black box systems (little is known about the underlying algorithms). At the same time more and more 3rd party software is becoming available for automated data extraction, e.g. GeneData, ACD-Labs, Rosetta, Non-Linear Dynamics to name a few. These programs offer more flexibility with respect to the format of the input LC–MS data which is a clear advantage when using instruments from different vendors because data from the different instruments can be processed in exactly the same way.

The proteomics and metabolomics communities (and also others) have been very active the last 15 years in developing their own tools mainly because commercial software was (is) not available and/or the poor performance of some of the available software. Some of these DIY data extraction tools are freely available on the web (CODA algorithm (Windig et al. 1996), XCMS (Smith et al. 2006), OpenMS (Sturm et al. 2008), MZmine 1 and 2 (Katajamaa et al. 2006), MetAlign (Tolstikov et al. 2003), etc.) and appear to be cheap solutions to the data extraction problems. In reality they are not cheap because they all require training and good understanding of what the software is doing right and wrong. On top of that issues such as support and long term continuity are not favorable. For an excellent overview and description of all these tools and software see (Katajamaa and Oresic 2007).

The various automated data extraction tools (commercial, public domain and DIY) use different algorithms. Popular approaches include:

  1. (1)

    Peak detection and integration, followed by alignment in the feature space.

  2. (2)

    Warping of the m/z—retention time plane (alignment at raw data level), background and blank subtraction, followed by peak detection and integration.

  3. (3)

    De-isotoping of data after Steps 1 or 2, e.g. correlation analysis, accurate mass data (de-isotoping may also be done as a 1st step in data extraction).

  4. (4)

    Metabolite extraction by using spectrum information: deconvolution, spectrum libraries, accurate mass, etc.

  5. (5)

    ….

Analogous to NMR (see above) the extraction of metabolite information (#4) is strongly preferred compared to the extraction of all the features in the data: the possibility of getting 15000 features from a single LC–MS file seems very impressive, but these represent a much smaller number of metabolites.

A quick search on the web resulted in approximately 20 different tools for automated data extraction (commercial and public domain). None of these tools is generic, i.e. they can only be used for either low or high resolution data, centroid or profile data, etc. The functionality and performance of all these tools is a white spot on the metabolomics map, and it is widely known that all automated data extraction tools have their specific problems which result in data with a variable amount of errors. The fact that high quality raw data is being corrupted due to these errors is very alarming. Typical problems include:

  • missed peaks

  • integrated noise peaks

  • database mismatches

  • misalignment

  • integration errors

These errors become more and more dominant in the data at low signal to noise ratios. All algorithms eventually stumble on classification problems such as: is this noise or signal? Is this peak A or B or neither of the two? Are these spectra the same?, etc. Unfortunately, many metabolites of interest happen to be present at low concentrations and at low signal to noise. These extraction errors have major detrimental effects on the outcome of metabolomics studies. The majority of these tools, unlike the standard quantification tools provided by instrument companies, are applied without supervision in batch processing mode, and typically do not have interactive review and error correction functionalities.

It is obvious that there is big need for improvements in the area of data extraction. The ultimate goal is a (set of) perfect tool(s), but it is expected this will take quite some time. Development of new better algorithms and software (user interfaces, databases, etc.) are relatively slow processes, and a lag between availability of the appropriate data extraction tools and rapid technological/methodological innovation is inevitable. A serious implication of this lag is that there is no point to waiting for the perfect solution. It is equally important to focus on workflow optimization and quality control measures to detect, characterize, and reduce data extraction errors for the currently available metabolomics data extraction tools. International collaboration, between data extraction software experts and users, is essential, and would ideally involve programs for selecting well characterized test data (LC–MS, GC–MS, etc.) and benchmarking of existing and new data extraction tools/algorithms. Analysis and sharing of benchmarking results (the good and the bad) is essential for improving the data extraction software. This iterative process will eventually lead to many, almost flawless, solutions for high throughput automated data extraction.

5 Multivariate analysis considerations for metabolomics studies

The multivariate analysis considerations and procedures for metabolomics studies are roughly comparable to those in the microarray field (Broadhurst and Kell 2006; D’Haeseleer 2005; Goodacre et al. 2007; Westerhuis et al. 2008). Similarly, the vast majority of issues that impact on the multivariate analysis component of any metabolomics study also impact those in nutritional metabolomic (Broadhurst and Kell 2006; Garosi et al. 2005). Basic approaches to data analysis include classification, dimensional reduction, visualization, pattern recognition, and modeling (Broadhurst and Kell 2006; Brown et al. 2000; Goodacre et al. 2007; Jansen et al. 2004; Lindon et al. 2001; Tamayo et al. 1999; Trygg and Wold 2002; Wold et al. 2001). This section addresses multivariate analysis in the context of the mathematical (dimensional) reduction of large multivariate or megavariate (or numerical) data sets, e.g. into smaller parcels of data that can be comprehended by humans and/or into models that can predict behavior or classify a sample into a group of interest (e.g. control, diseased). Bioinformatics issues related to biological interpretation of the data, e.g. pathway and/or literature annotation and systems biology, and issues related to inter-lab data sharing and integration are discussed in Sect. 7.

Arguably the most important conceptual issue that distinguishes nutritional metabolomics studies from those in other areas, such as toxicology, pharmacology or disease diagnostics (Bhattacharjee et al. 2001; Garber et al. 2001; Garosi et al. 2005; Kenny et al. 2008; Lindon et al. 2001), is that the signals may be more subtle, an issue that makes the specifics of experimental systems even more demanding. In general, the choice of experimental question is, or more precisely needs to be, directly and inextricably linked with the experimental design (Bidaut et al. 2006; Brown et al. 2000; Jansen et al. 2004; Taylor et al. 2002; Trygg and Wold 2002; Wold et al. 2001). It is at best extremely difficult and, often, impossible to rescue a study that is poorly designed. By building the multivariate analysis in as part of the experimental design, it is possible to avoid studies that are doomed from the start and to make sure the study will be sufficiently powerful (statistically speaking) to draw conclusions, define models, or make predictions—all while avoiding over-fitting, the bane of large multivariate studies (Broadhurst and Kell 2006; Westerhuis et al. 2008).

One constant trade-off that exemplifies the issues related to experimental/analytical design is between specificity and power (e.g. such as occurs when building a model in a tightly controlled biological model system) versus robustness and generalized applicability (such as occurs when building a model in a broad human population) (Bhattacharjee et al. 2001; Garber et al. 2001; Kristal et al. 2005). A general study seeking to describe the metabolome present in a person or animal on one or another given diet, for example is inherently different from a study geared at building a profile that can distinguish the individual on those diets (Shi et al. 2004). This, in turn, differs from one that seeks to distinguish the effects of these diets in only one specific gender or at a specific age (Paolucci et al. 2004a, b). Similarly, what constitutes a control group (Bhattacharjee et al. 2001; Garber et al. 2001)? Is someone who is medicated or someone with a known disease included (Rozen 2005)? Are they equally represented (Paolucci et al. 2004a, b)? Are numbers representative of the general population? This continues as one refines and defines progressively more specific/narrow criteria (or, alternatively, progressively broader, more robust criteria). Another choice is whether one wants to find the few most powerful single markers (Kenny et al. 2008), or whether one aims at a general profile (Goodacre et al. 1996; Paolucci et al. 2004b). Is the long-term goal clinical or scientific, a choice that often changes the acceptable options for model building? Clinical models generally prioritize robustness, simplified upstream analysis, and often cost per analysis and throughput, whereas scientific models might prioritize maximum information about a very limited sample set, and be more willing to accept initially costly and complicated analytical approaches.

Another closely related issue in terms of matching the multivariate analysis approach and the experimental design consists of determining, ideally in advance, what the quality control metric will be, and how errors will be viewed (Broadhurst and Kell 2006; Paolucci et al. 2004b; Rubingh et al. 2006; Tominaga 1999; van den Berg et al. 2006; Wold et al. 2001). In practice, the standard to which one’s data analysis must be held is often considered—for better or worse—an extremely subjective one. In some cases, the chosen metric will be an internal validation, e.g. examining overfit of class assignment by permutation analysis (Broadhurst and Kell 2006; Paolucci et al. 2004b; Rubingh et al. 2006; Westerhuis et al. 2008). Alternatively, the standard might be the same metric, but from a blinded cohort (Shi et al. 2004). In a descriptive study, it may be the development of a model that captures class identity within a cohort, or, alternatively, between two different cohorts (Shi et al. 2002, 2004). In other cases, it may be clinical measures such as specificity, sensitivity, positive predictive value, or negative predictive values. In some cases, a measure of distance from one or more classes might be of interest; in others, a more Bayesian viewpoint of probabilistic class assignment may be considered more useful. Finally, reciprocal errors cannot always be considered equivalent, e.g. the clinical cost of a false positive and false negative clinical test differ substantially.

A second general concept is that of over-fitting (Broadhurst and Kell 2006; Rubingh et al. 2006; Westerhuis et al. 2008). Over-fitting is probably, today, the single greatest multivariate analysis problem that we observe. Basically, overfitting is the consequence of an incorrect (over-zealous) use of an multivariate analysis approach to describe a dataset and/or make predictions (e.g. about classification), for example using PLS-DA to find metabolites that distinguish two classes of interest. The problem lies in the ability of algorithms such as PLS-DA to find solutions in some spaces where no real solution exists. For example, PLS-DA can separate two groups comprised completely of random data (Westerhuis et al. 2008). The gold standard here is biological replication in a blind, new dataset (Kenny et al. 2008; Shi et al. 2002, 2004).

The third general concept is that of disclosure. At the level of the field as a whole (or the individual who may be reading a grant or paper in it), one of the solutions, indeed arguably the best solution, to this suite of problems is full disclosure of what is done. The overall goal of the Metabolomic Standard Initiative (http://msi-workgroups.sourceforge.net/) is to enable different metabolomics groups to talk with each other. Within the multivariate analysis component of the standards initiative the goal is, in part, to request, and, ideally, eventually demand, sufficient information so as to be able to really understand what was done and catch key problems (Goodacre et al. 2007). Thus we seek to define and encourage standardization at the level of reporting data, not at the level of conducting analysis.

The fourth and final general concept that we can now address is the most powerful way to attack metabolomics data. This may be broken into at least three pieces. (i) Do we know, for any question of interest, tools which always work? To our knowledge, the answer is no. (ii) Do we know, for any specific class of metabolomics question (e.g. modeling nutritional intake, discriminating individuals on either of two diets), whether there are tools (i.e. algorithms/multivariate approaches) that always outperform other tools with respect to a given metric (e.g. accuracy, robustness, overfit characteristics)? In the experience of one of us (BSK), components based analysis (PCA, PLS, PLS-DA, etc. (Trygg and Wold 2002; Wold et al. 2001)) always performs as well as or better than pure distance based algorithms, such as clustering and self-organizing maps, on metabolomics data (Paolucci et al. 2004a; Shi et al. 2002, 2004). This comment is very specific, however, for a very defined problem—caloric intake in rats. That said, we have looked at other data sets (non-metabolomics) where self-organising maps and clustering outperform projections. Others have had success with machine learning approaches, support vector machines, etc. (Ellis et al. 2002; Kenny et al. 2008; Lindon et al. 2001; Tominaga 1999). Together, even this anecdotal argument suggests that we are not yet ready to answer this question.

Indeed, while it is conceptually true that we should be able to match algorithm to question, it appears that we remain a long way from really being able to assign this match with confidence [i.e. which method(s) work for which question(s)]. Not everyone agrees. There are, for example, those who are convinced genetic algorithms, or random forest, or PLS-DA answer all questions about group classification, but in reality each has strengths and weaknesses. In some datasets, all three will essentially yield equivalent variables as being important and equivalent class assignments. I have also seen the opposite, where each identifies different variables as important and makes different class predictions (and all with relatively high accuracy). Thus, for the moment, we have no robust answers, only answers that are valid within a defined experimental lens

6 Metabolite identification in mass spectrometry-based metabolomics

6.1 Current state of the art

Metabolite identification is an essential part of any metabolomics experiment. Unfortunately, because metabolite identification is one of the most difficult and time consuming steps in metabolomics this crucial process is often deferred to the final stages of many studies and therefore left largely unfinished. In some cases, it is ignored altogether. Without formal compound identification, the discovery of any metabolically interesting patterns or clusters (via PCA or PLS-DA) is largely meaningless.

The challenge of metabolite identification, especially for mass spectrometry, lies in the fact that there are potentially thousands of compounds that can match a given parent ion mass or a given atomic composition. The situation is made even worse given that most metabolomics experiments generate hundreds or even thousands of different masses. In the past, compound identification via mass spectrometry required the use of complementary analytical techniques, such as NMR, IR, or specialized chemical assays. However, this is now changing. In particular, with the introduction of improved separation technologies, higher resolution mass spectrometers, smarter “mass analysis” algorithms, more innovative chemical labelling schemes, more comprehensive MS databases and a better understanding of the “standard” metabolic composition of most organisms, it is now possible to confidently and rapidly identify many metabolites via MS (Guo et al. 2007; Kind and Fiehn 2007; Kopka 2006; Tikunov et al. 2005).

As has been previously discussed, MS-based experiments can be done in any number of ways using many different kinds of separation (GC, LC, CE, 1D, 2D), ionization (EI, MALDI, ESI, APCI, CI) and detection (parent-ion-only, parent ion + EI fragmentation or parent ion + soft secondary fragmentation) techniques. Consequently a given MS-based metabolomics experiment can generate 3 general types of data or mass spectral tags (MSTs): (1) parent mass only; (2) parent mass + chromatographic retention time or (3) parent mass + fragment mass + chromatographic retention time. These properties, if properly documented, can allow identification of both previously known and hitherto unidentified compounds. The identification of unknowns is done by measuring the MSTs of postulated pure/authentic compounds that may match features of the unknown MST (Kopka 2006).

This document discusses the needs and requirements regarding data, databases and software for MS-based metabolite identification. In particular we have attempted to provide a framework or a series of recommendations regarding how metabolite identification can be facilitated in an MS-based metabolomics experiment. In doing so we have used the databases and software models developed for MS-based proteomics as a template for MS-based metabolomics.

6.2 MS-based “pure” metabolite databases: current limitations and recommendations

There are 3 types of MS-based metabolite databases: (1) those that provide raw, unannotated MS (GC–MS or LC–MS) spectral data of biofluids or tissue extracts; (2) those that provide annotated MS (GC–MS or LC–MS) spectral data or MST’s of biofluids or tissue extracts and 3) those that provide reference MS spectra or MST’s (GC–MS or LC–MS) of pure compounds. The first two types of databases represent archival reference data of MS-based metabolomics experiments. They can be used to facilitate compound identification and are often used in general metabolomic analysis. However, the most useful databases for compound identification are those in category 3. These include the NIST GC–MS database, the Metlin database (Smith et al. 2005), the Golm Metabolome database (Kopka et al. 2005), BinBase (Fiehn et al. 2005), SetupX (Scholz and Fiehn 2007), the HMDB LC–MS/MS library (Wishart et al. 2007) and others (Moco et al. 2006). Minimally, these kinds of databases should include the name, chemical formula and monoisotopic mass (to 5 decimal places) of a significant number (>500) of metabolites or chemically modified (trimethylsilated) metabolites. Ideally these databases should also include chromatographic retention times or retention indices (for both GC and LC separations) as well as fragment ion mass data (EI or MS/MS fragmentation). As with similar kinds of proteomics MS databases, these metabolomics MS databases should be searchable by (1) elemental composition; (2) parent ion mass; (3) retention time/index; and/or (4) mass fragment pattern. A major limitation of LC–MS relative to GC–MS is the fact that retention times and retention indices for LC methods are not generally reproducible and therefore cannot be used as reliably as GC retention indices. Potentially, the establishment of a “universal” calibration standard of 5–7 compounds with varying polarity that could be spiked into LC separations would provide a means of making LC retention times both meaningful and shareable. Another particularly useful addition to these kinds of MS-based metabolite databases would be the possibility of restricting the search to certain kinds of compounds (i.e. endogenous metabolites only, known mammalian metabolites only, drugs only, toxins only, metabolites specific to a certain tissue or biofluid, etc.) or combinations of metabolite classes. Furthermore these databases should provide experimental details (images of spectra, collection conditions, MS spectrometer parameters, collision energies, instrument type, date of collection, lab/individual who collected the data) concerning the origin or source of their pure compound reference spectra. While it would be preferable if these MS-based metabolomics databases were freely accessible and freely downloadable (as many MS-based proteomics databases are), we appreciate the fact that commercial possibilities exist for the sale and distribution of these kinds of MS resources. Nevertheless, we would encourage the community (both academic and industrial) to do their best to make their database resources publicly accessible and adherent to these standards.

Because the number of compounds of interest to metabolomics researchers easily numbers in the thousands and because different organisms have profoundly different metabolomes, it is almost impossible for a single lab or a single investigator to have access or an interest in collecting referential MS spectra for ALL metabolites. Therefore, there is a need to establish a process by which individuals from many different metabolomic interests or backgrounds may contribute to a common repository of MS (GC or LC) reference compound spectra. While several compound reference databases already exist (NIST, the SDBS) these are general chemical databases and they are not limited to metabolites or compounds of biological interest. What is really needed is the equivalent of a GenBank or PDB for MS spectral deposition. Such a model allows users to deposit data to a common repository so that it can be searched, shared or used by other scientists. Already the NMR metabolomics community has started depositing their reference compound spectra into a common repository, called the BioMagResBank (Ulrich et al. 2008). We would advocate for the establishment of a similar entity (lets call it the BioMSBank) to support submission (and searching) of referential MS (LC–MS and GC–MS) metabolite spectra. We would advocate that such a resource store its data in a common data exchange format such as XML (extensible mark-up language).

6.3 MS-based metabolite identification software: limitations and recommendations

Currently most MS-based metabolite identification software is a proprietary stand-alone application tied to a proprietary MS instrument. This is somewhat contrary to the trends seen in other “omics” software development. Indeed, over the past decade, the trend for most “omics” software is towards web-based applications or web-server delivery. This has numerous advantages (accessibility, speed, improved maintenance, no need for installation support) and it is the way in which most MS-based proteomics software is now available—both commercially and academically. We believe that MS-based metabolite identification software should move to this model. We also believe that MS-based metabolite identification software should emulate many of the features used by MS-based proteomics software. This includes support for database or subdatabase selection, support for different mass list input formats, support for chemical modification corrections (in this case TMS or dimethylation), and inclusion of additional MST data (retention index, parent ion mass, fragment ion patterns, etc.). Obviously this kind of software needs to sit atop an appropriately maintained database of referential metabolite spectra (see above).

A separate kind of software tool that is unique to metabolomics (and small molecule MS work) is elemental or chemical formula prediction software. Given sufficiently high mass accuracy (1–2 ppm) and resolution, it is possible to use the parent ion mass spectrum to determine the elemental composition and, in many cases, the identity or approximate identity of a compound. Recently Kind and Fiehn have developed a series of 7 heuristic rules for chemical formula extraction and compound identification (or ranking) from high resolution MS spectra (Kind and Fiehn 2007). With the increasing availability of FT-ICR and Orbitrap MS instruments, there is a distinct possibility that this approach may offer a powerful adjunct, or even replacement, to MST-based metabolite identification. In particular, with the growing knowledge about what is found or findable in many metabolomes (~1000 metabolites in microbes, ~3000 metabolites in mammals), the use of chemical formula prediction, followed by rapid scanning for formula matches to known metabolite lists could prove to be a very fast, simple and robust method to metabolite identification—especially among organisms that have well characterized metabolomes. We would advocate that more effort be devoted to this particular kind of MS-based metabolite identification and that the software be made available through easily accessible web-server applications.

6.4 Metabolite quantification and standards: limitations and recommendations

While MS-based approaches are widely recognized for their sensitivity and capacity to identify large (>100s) of metabolites, they are not generally recognized as being useful in metabolite quantification. This has often been the Achilles heel to many MS-based approaches. However, with the recent successes in using isotopic affinity tags (such as ICAT) to make MS-based proteomics reasonably quantitative, there is a distinct possibility that similar isotope tagging methods could make MS-based metabolomics equally quantitative (Carlson and Cravatt 2007a; Guo et al. 2007). We believe that the issue of metabolite quantification should be made a priority in MS labs and that efforts that focus methodological improvements or software improvements to make rapid and accurate metabolite quantification possible should receive both encouragement and support from the metabolomics community. Indeed, identification without quantification is the bane of many analytical chemists—as well as many metabolomics researchers.

Recently the proteomics community has pushed for the establishment of standards to assess the performance of instruments, methods and labs in identifying and quantifying proteins from defined mixtures. We believe that similar initiatives should be undertaken in the metabolomics community. The use of defined metabolite mixtures (specific to plants, microbes and mammals) would provide a means to objectively assess the performance and reliability of algorithms, databases and protocols used in MS-based metabolite identification and quantification. Indeed, the use of standardized metabolite mixtures would provide much-needed validation of existing methods and a means of objective assessment of emerging methods.

6.5 Metabolite identification: consensus recommendations

  1. (1)

    Metabolite identification (and quantification) must be a priority in any MS-based metabolomics experiment.

  2. (2)

    Metabolites should be identified or classified as either: (a) unknown; (b) belonging to a certain chemical class; (c) putatively identified by a match to a database MST or (d) confirmed with an authentic standard.

  3. (3)

    GC–MS and LC–MS databases should become open source (using XML), open for public deposition and much more standardized in terms of the information they provide.

  4. (4)

    The use of a standard set of externally spiked-in retention markers (5–7 compounds) would help standardize the reporting of LC retention times and the utility of LC–MS MSTs.

  5. (5)

    A set of “gold standards” consisting of synthetic plasma, urine and cerebrospinal fluid (CSF) should be used by the metabolomics community to assess existing protocols, check new protocols and verify inter or intra-lab reproducibility.

  6. (6)

    These gold standards would each contain 50 chemically diverse, biofluid-specific compounds with wide concentration ranges (pM to mM).

  7. (7)

    Databases containing detailed physico-chemical information of known food components (phytochemicals, nutrients) and food additives must be established to facilitate nutrition research by MS-based metabolomics.

7 Biological interpretation of metabolomics results

After having identified and quantified the relevant metabolite changes caused by a nutritional intervention, biological conclusions need to be drawn related to the research hypothesis. As yet, dedicated tools for the biological interpretation of metabolomics data are scarce. In this section we discuss the challenges posed by biological interpretation and present suggestions for improvement.

7.1 Challenges

7.1.1 On the level of study design

Nutrigenomics is the study of molecular relationships between nutrition and the complexity of molecular processes, commonly measured with “omics” techniques, with the aim to extrapolating how such subtle changes can affect human health (Afman and Muller 2006; Chavez and de Chavez 2003; Müller and Kersten 2003; van Ommen and Stierum 2002). Metabolomics provides an essential contribution, as nutrition is almost synonymous with metabolism. In both study design and interpretation of metabolomics data, a number of aspects related to the control of unwanted sources of variability, the time of sampling and the nature of the samples analysed should be taken into account. Some sources of variation are relatively easy to be controlled and recorded, such as analytical instrument performance, sample storage and processing, and data-processing. Other sources of variability are much more of problem and need thorough thoughts when designing your study. Firstly, inter-individual differences are usually larger than treatment effect. These inter-individual differences are partly explained by some variability induced by non-controlled conditions in the study. It can be reduced at the urine level but not in plasma or saliva, through the use of a standard diet (Walsh et al. 2006). However, large inter-individual differences remain, some being age- and gender-dependent (Assfalg et al. 2008; Lawton et al. 2008), others might be related to diurnal variability, lifestyle, timing of sampling in relation to eating habits and fluid intake (especially with regard to urines) or non-compliance in dietary interventions. Discussions are ongoing on how to best capture these variations in describing the study design. The new tools developed by the European Bioinformatics Institute to capture study metadata (ISA-TAB, ISA-creator and BioInvestigation Index: http://www.ebi.ac.uk/net-project/projects.html) are currently adapted by NuGO to carefully capture and describe nutritional intervention studies. Also, the use of standardized diets is a frequent topic of discussion. Standard run-in (pre-intervention) diets will indeed reduce the variation in especially the urine metabolome (Walsh et al. 2006). Many food components present in diet are traceable in plasma and urine, either intact or metabolized and contribute to this variability (Manach et al. 2009; Mennen et al. 2008). A series of human studies will still be needed to characterize this variability and to produce valid recommendations on how to best limit interferences with diet-induced physiological adaptations.

Secondly, homeostatic (fasting) metabolomics analysis may be less informative than metabolomics analysis under perturbed conditions. Unfortunately, only very few examples exploiting this concept have been published so far (Kuhl et al. 2008; Shaham et al. 2008; Wopereis et al. 2009). Thirdly, although our main interest will be in organ-related processes and biochemistry, these are not readily available in human studies and we thus need to suffice with body fluids. Plasma and urine have distinct biological characteristics. Urine generally reports on exposure and environmental challenges (Fardet et al. 2008b; Walsh et al. 2006, 2007), while plasma reveals endogenous processes, including inter-organ communication, energy metabolism, inflammation and disease state (Abdel-Sayed et al. 2008; Kuhl et al. 2008; Wood et al. 2008).

7.1.2 On a single metabolite level

Do we understand the biological function of the metabolite in the studied matrix?

This question can be addressed at two levels. The first is its “intrinsic” function. For example, glucose has a primary biological function as energy source. The second is its relevance related to the matrix. Increased glucose in urine indicates diabetes, while its increase in plasma is primarily correlated to the post-prandial state and insulin sensitivity. Many other metabolites have less known or more complex functions. Besides, metabolite concentrations in distinct tissues may have different relevance, and even intra-cellular compartmentalisation plays a role. For instance, metabolite concentrations in mitochondria may be very different from the cytoplasmic concentration. Plasma carnitine is not straightforwardly related to mitochondrial carnitine concentration. Glutathione concentrations are higher in liver than plasma, and strongly fluctuate in a diurnal rhythm (Blanco et al. 2007).

Metabolomics needs a knowledge base on metabolite information in the context of its matrix. Clinical chemistry databases like Young’s effects online (www.fxol.org) are valuable. The HMDB database (Wishart et al. 2007) provides a treasure of information on metabolite properties. NuGOwiki (www.nugowiki.org) potentially adds to this knowledge base, as any researcher can add biological observations related to the metabolite. Other applications related to plasma metabolite concentration changes appear, like the HORA suite (Bruschi et al. 2008). At this point, a large knowledge gap still exists in the translation from metabolite concentration changes in body fluids to organ biochemistry and (molecular) physiological interpretation.

Do we understand the relevance of the metabolite’s changed body fluid concentration?

Can we translate organ processes to body fluid readouts? Biochemistry text books teach us about the glycolysis as an intracellular process not to occur in plasma. Yet, GC–MS metabolomics detects most of the glycolysis and citric acid cycle metabolites in plasma, indicating that those compounds can be transported through the membrane and may be a measure of the glycolytic activity in certain tissues. Or is this due to the way of sampling, i.e. red and white blood cells lysis? Usually we cannot make a simple comparison between plasma changes and organ changes. The body strives to maintain homeostasis, and therefore the organs need to work hard to keep plasma concentrations between acceptable boundaries, thereby showing much more metabolite fluctuation in the organs. For instance, absorbed glucose triggers release of insulin, resulting in inhibition of triglyceride hydrolysis in adipose tissue, free fatty acid esterification, decreased gluconeogenesis, decreased proteolysis and many other processes in various organs, all with their “imprint” on the plasma metabolome (Shaham et al. 2008). Urine functions as an accumulating waste basket of exogenous compounds and their metabolites, while plasma levels of these compounds need to be as low as possible. Furthermore, time lags can be involved between organ changes and plasma changes. Therefore, statistical relationships between various compartments should be treated with care.

A practical approach in this problem is the use of animal models to establish the link between organ physiology and pathology, and body fluid (plasma) metabolome changes. Especially, transgenic mice models are key in unravelling specific mechanisms and combined omics technologies then provide many indications for both the mechanism and which metabolites should be measured in plasma (Chen et al. 2008a; de Vogel-van den Bosch et al. 2008; Hansson et al. 2005; Kleemann et al. 2007).

7.1.3 On a complex level (metabolic fingerprinting)

Do fingerprints really tell us more than the sum of the individual components?

Nowadays roughly two different strategies can be distinguished for metabolite investigations: (1) metabolic profiling and (2) metabolic fingerprinting (Dettmer et al. 2007). They should be clearly distinguished here. Metabolic profiling focuses on a group or category of metabolites of interest defined a priori (e.g. fatty acids, oxidized lipids, nucleosides, etc.) and all these metabolites are precisely quantified. Metabolic profiling is a targeted way to study different aspects of metabolism and one should need the assemblage of a whole suite of quantitative methods to turn metabolic profiling into metabolomics. In general, speaking in terms of metabolomics people refer to metabolic fingerprinting, where metabolite profiles are compared with limited a priori knowledge of the metabolites of interest. Semi quantitative data are acquired by high throughput analytical methods (such as LC–MS or 1H NMR) and (bio)markers (ions or chemical shift signals) revealed by (difficult) multivariate statistical tools. The identity of the different signals from the fingerprint can subsequently be revealed by metabolite identification procedures allowing biological interpretation.

The question “Do fingerprints really tell us more than the sum of the individual components” will be positively answered by statistics, but biologists struggle with this truth. Biologists simply want to understand the relevance of each individual change, and collate these into pathways, and processes. Furthermore, biologists also exploit the results that metabolites, pathways and processes are not changed, whereas metabolic profiling is, due to the statistical selection procedures, only focused on the detection of changes. However, nutritional interventions will often lead to complex changes and therefore an individual approach will seldom be enough to understand the underlying mechanism. Therefore we need to unravel the biochemical relationships between the components of a “profile”. Pathway tools like Pathvisio (www.pathvisio.org) and biological network tools like Metacore (www.genego.com) and IPA (www.ingenuity.com) provide a first attempt to connect these components but (also given the complex relationship between body fluids and organ biochemistry described above) are not (yet) up to the task. These tools originate from transcriptome visualisation, are primarily focussing on intracellular processes, and cannot cope with the characteristics of plasma and urine “biochemistry”.

Matching metabolomics technology to biology

In describing nutrition and health relationships, biologists often can produce a “wishlist” of metabolites to be identified and quantified. So far, metabolomics has primarily been a technology-driven science: the list of the metabolites analyzed was determined on the basis of their physico-chemical properties, and usually it does not completely match with the biologist’s wishlist. Many times, biologists moaned at only seeing the same amino acid changes in NMR fingerprinting strategies. A metabolic profiling technique such as lipidomics provides a more biologically coherent metabolite set and is a method of first choice in many papers. The inflammation-related oxylipids as separated and quantified by Newman and Pedersen is a jewel of a biologically relevant targeted lipidome (Newman et al. 2007). As the metabolites on those platforms are more strictly linked to certain biological processes, the interpretation of the data of such platforms is much easier.

Although NMR and MS spectroscopy can detect all kind of metabolites, there is no single universal technique today than can provide estimates of all compounds making the human metabolome. From a biological point of view therefore, a comprehensive metabolomics analysis should be the assemblage of several quantitative methods that analyze the key metabolites from the biochemical pathways or signalling processes that are of interest in your research question. Nutrition deals with metabolism, oxidation, and inflammation as primary processes that maintain health or promote disease. The advantage of this semi-targeted approach is that quantitative data are collected for well-annotated metabolites allowing the implementation of databases that can be further mined. In addition, highly sensitive individual assays to probe important low-abundance metabolites with regulatory functions such as eicosanoids, not easily detected in current metabolic fingerprinting approaches may also be needed.

A major limit of the metabolic fingerprinting strategies is the limited number of metabolites identified. Therefore biological interpretation has to be performed on a small number of metabolites. It is challenging to get a good biological interpretation based on only fragments of the overall picture.

Can we separate the exogenous metabolome from the endogenous health/disease effects hidden in the metabolome?

The exogenous metabolome can be defined as all metabolites directly derived from extrinsic factors such as diet (nutrients & non-nutrients), drugs, toxics and metabolites produced by the colonic flora. The endogenous metabolome are intrinsic metabolites involved in or resulting from primary and intermediary metabolism formed under direct cell genome/proteome control (Gibney et al. 2005; Manach et al. 2009). The consumption of a particular diet or nutrient will induce changes in the endogenous metabolome but these effects may be made blurred by the presence, particularly in urine, of a large number of exogenous metabolites resulting from the digestion of food. Furthermore, food is also made of macronutrients that are transformed into compounds partly identical to some endogenous metabolites. On the other hand, these exogenous metabolites can also be useful as markers of food intake. Different metabolite profiles were observed after consumption of high-meat-, vegetarian-, Atkins-, or high-fish diets (Rezzi et al. 2007). In fact, one of the very few methods to precisely estimate food intake is through quantification of specific metabolites (Landberg et al. 2008; Noguchi et al. 2006; Sun et al. 2007a, b), and this may become a valuable application of metabolomics (Mennen et al. 2006). Moreover, it will not be simple to understand the separate effects of age, gender, physical activity, stress, drugs, region etc. Furthermore, we should not forget the interactions between gut flora and host metabolism. The large-bowel microflora produces metabolic signals that might overwhelm the true metabolic signals of nutrients in human biofluids (Dumas et al. 2006; Goodacre 2007). Consequently, it will be hard to divide the diet induced changes in the metabolome impacting health into changes deriving from the food itself.

What extra information can be extracted from time–course metabolomics profiles?

Healthy subjects have a remarkable capacity to maintain homeostasis, through regulation of metabolism, transport, and effective defence and repair mechanisms in oxidative and inflammatory stress. In the development of nutritional related diseases this homeostasis may be deregulated. Disease progression in early, intermediate and late stage of development has been shown to have distinct metabolome profiles (Harrigan et al. 2005; Lamers et al. 2005). Processes involved in these regulatory activities in late stages of disease seem to be essentially different from processes involved in early onset disease. With maturing of the metabolomics technology, more comprehensive inventories of “normal” concentrations spanning the life stages and sexes become available (Lawton et al. 2008). To better understand whether the homeostasis in a patient is disturbed, we should have a good estimate of the “normal” concentrations spanning the life stages and sexes. Such data adds to the clinical chemistry data available and is indispensable if we want to surpass the level of “differential display profiling”.

How to link a metabolomics study to the results of other (published) metabolomics studies?

Currently, it is very difficult to compare metabolomics results from one study to results from another study for two reasons. Firstly, many different metabolomics platforms are in use. Each company or institute has its own favorite technique and derivatization method. Secondly, metabolite data are often semi-quantitative and not expressed in absolute concentrations. For this reason, it is difficult to compare metabolite concentrations to clinical chemistry reference ranges, but also to published metabolomics studies. We therefore advocate that initiatives should be undertaken in the metabolomics community to make reporting of metabolites in absolute concentrations possible, rather than using arbitrary units or x-fold changes (see also Sect. 6.4).

Linking the metabolome to other measurements

Understanding the metabolic and signalling networks that regulate health and disease is a principal goal of nutritional research and of post-genomics research in general. Although it may seem like increasing complexity, a breakthrough may be obtained by integrating metabolomics with other phenotypic and possibly genotypic information. Metabolomics can be embedded with other sources of data such as histology, functional tests, gene expression, (targeted) proteomics etc., and this may help to further understand the metabolome biology. Metabolomics started as a technology push, but is now getting part of a more comprehensive phenotyping. Genotype–phenotype linking studies are now emerging in the area of transcriptomics (Chen et al. 2008b; Emilsson et al. 2008; Wang et al. 2007), and this linkage is being explored in large cohort studies (Tracy 2008). Handling of data and results should change from technology-focused to study-focussed. Only after this transposition, multi-study comparison is possible for cross-validation and meta-analysis based on accurate phenotypic matching. Standardized formats and mark-up language development will facilitate this development (Sansone et al. 2008; Taylor et al. 2008). Databases and LIMS will thus gradually shift from individual data technology data handling to study-oriented integration combined evaluation of all relevant results.

7.2 Solutions

As earlier mentioned, nutrition deals with metabolism, oxidation and inflammation as primary processes that maintain health or promote disease. So, metabolites and metabolomics need to focus on these processes, primarily (in case of human studies) in plasma. An integrated bioinformatics solution that helps to map and visualise changes in metabolite levels and links to knowledge bases might be useful. This means: translating metabolite changes to process changes, translating profiles to metabolic processes and transport (including biochemical information), and translating plasma observations to organ (mechanism) processes. All of this sounds like science fiction but we might want to use this as an umbrella for integrated toolbox development. So, we propose to federate efforts in metabolome bioinformatics in a number of ways, in order to achieve this goal. The toolbox should address several key problems summarised below.

Analytics: biology driven metabolomics platforms. Biologists should continue creating wish lists of metabolites that are most relevant in the overarching processes of metabolism, oxidation and inflammation. Metabolomics platforms need to be further developed, combined and fine-tuned to cover the most important compounds for these overarching processes. Subsequently, metabolite data should be quantitative and expressed in absolute units that can be used for comparing concentrations with reference concentration ranges used in clinical chemistry and in publications.

Study design: focus on intra-individual variation by including at least two samples per person. To control biological variation that occurs between the volunteers in a human intervention study at least two samples are needed per person: one sample before the start of the intervention and one sample at the end of the intervention. In this way it is possible to specifically identify metabolites that are changed by the dietary intervention by substracting the inter-individual variation from the data.

Study design: time–course metabolomics (and fluxomics). Taking into account the time–course of metabolism should provide a dynamic view of changes in metabolic pathways. In this way we are able to separate time effects—representing processes and pathology, i.e. relatively slowly occurring changes from healthy physiology to a state of pathology—from diet effects—usually in the range of hours as they reflect normal kinetic processes, representing concentration differences as cause from change in diets. Linking these data to our biochemical knowledge will result in understanding the underlying dynamic processes (fluxomics).

Study design: use of challenge tests. The use of challenge tests with different time points, like for example the oral glucose tolerance test (OGTT), facilitates the detection of subtle differences in metabolomics data (Kuhl et al. 2008; Shaham et al. 2008). This should become a part of comprehensive nutritional phenotyping (van Ommen et al. 2008b).

Study design: parallel human and mouse intervention studies. To gain a better understanding of the relation between body fluid metabolite changes and organ physiology and pathology, human intervention studies need to be carried out in parallel to animal model studies to access the target tissues and link changes in metabolite concentrations in the plasma to organ physiology.

Bioinformatics: metabo-ontology (MO). In the world of transcriptomics the Gene Ontology Consortium developed controlled classification (ontologies) that describe gene products in terms of their associated biological processes, cellular components and molecular functions. We should also have that for metabolites. This will then become the basis for statistical approaches on pathways and process level: metabolite set enrichment analysis (MSEA).

Bioinformatics: plasma oriented biological networks. We need to build bioinformatics tools that connect plasma and organs. So far, all pathways and process visualisation tools are focused on intracellular processes only. Organ specific affected processes have been translated into plasma specific metabolite profiles. Network biology and modelling approaches for nutrition and health relationships (van Ommen et al. 2008a) will become important.

We have started to construct these maps, which allow visualisation of intracellular changes based on changes quantified in plasma, for micronutrient metabolomics. This is an open source (wiki-based) effort in Wikipathways (Pico et al. 2008), with export options to major pathway visualisation tools like GenMAPP and Cytoscape (http://wikipathways.org/index.php/Portal:Micronutrient).

Bioinformatics: integrated workbench. All data analysis tools needed for nutritional metabolomics might be unified into an integrated workbench. This could be established on a number of levels:

  1. (1)

    Use of unified nomenclature and markup language. In the case of metabolomics, a series of initiatives are helpful (the metabolomics standards initiative (Sansone et al. 2006, 2007; Taylor et al. 2008). Simple agreements like a unified coding of metabolites still need to be agreed on, and the HMDB coding (connected to a synonym finder) (Wishart et al. 2007) is for the biologist the obvious choice. This database contains besides all other metabolite identifiers (KEGG, BioCyc, BIGG, Metlin, Pubchem, ChEBI, CAS and InChI) and chemical information, also biological information such as cellular, tissue and biofluid locations, normal reference ranges, associated disorders, metabolic enzymes, etc.

  2. (2)

    Agreement on basic issues in datawarehousing and (pre)processing, changing from analysis-orientation to study orientation, allowing metadata and other parameters to “travel along” with the metabolomics data (Sansone et al. 2006, 2008).

  3. (3)

    A unified statistical and bioinformatics workbench, following the example of Genepattern (Reich et al. 2006). Genepattern is a software package developed for transcriptomics data analysis, which provides a comprehensive environment that can support (i) a broad community of users at all levels of computational experience and sophistication, (ii) access to a repository of analytic and visualization tools and easy creation of complex analytic methods from them and (iii) the rapid development and dissemination of new methods. This will function once points 1) and 2) have been taken care of. Such a suite will allow LIMS-independent manoeuvring within and between datasets, if all are ISA-TAB compliant. Given the large variety of vendors and analytical application, this is a must.

  4. (4)

    A plasma-oriented bioinformatics platform. If indeed our technological output is a list of plasma metabolites provides a good reference starting point. Applications like HORA (Bruschi et al. 2008) have understood this and provide useful connected tools. The plasma oriented biological networks as mentioned above (Pico et al. 2008; van Ommen et al. 2008a) continue this exercise.

  5. (5)

    A common depository/point of access of all available tools, knowledge and results. The NuGO metabolomics portal (www.nugo.org/metabolomics) was created for this purpose and will only flourish if this becomes a community effort. Hopefully, the metabolomics community, while still in its infancy, is flexible enough to properly organize itself on common grounds for the benefit of all.

8 Conclusions

MS techniques which combine sensitivity and selectivity appear today as the most appropriate to capture the bulk of the highly heterogeneous human metabolome. A combination of several MS techniques combined with different chromatographic methods will likely be needed to offer the maximum coverage of the human metabolome. No standards exist yet for the characterization of the human metabolome (Fiehn et al. 2007). Techniques still evolve rapidly and standards are needed to allow the sharing and comparison of data between laboratories or studies. The main obstacles faced today by nutritionists when trying to obtain biologically meaningful results from metabolomics studies have been analysed here and various recommendations have been made (Table 1). They should be collectively addressed in the next years. Indeed the goals of nutritionists should be shared with analysts, statisticians, (bio)informaticians and companies developing MS equipments and software. This is precisely one of the raison d’être of the European Nutrigenomics Organization (NuGO, www.nugo.org) to facilitate the sharing of ideas and the emergence of collaborative projects in the field of nutrition. Exchanges with scientists in other disciplines such as toxicology, medicine or pharmacology should also be encouraged. Links between major initiatives for technical developments, database building, training, etc. should be strengthened and as much as possible coordinated at the international level.