Next Article in Journal
Metabolomics Analysis of Urine Samples from Children after Acetaminophen Overdose
Previous Article in Journal
Impact of Soil Warming on the Plant Metabolome of Icelandic Grasslands
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Recommendations for Improving Identification and Quantification in Non-Targeted, GC-MS-Based Metabolomic Profiling of Human Plasma

1
Duke Molecular Physiology Institute, Duke University School of Medicine, Durham, NC 27701, USA
2
Department of Surgery, Duke University School of Medicine, Durham, NC 27710, USA
3
Division of Cardiology, Department of Medicine, Duke University School of Medicine, Durham, NC 27710, USA
*
Authors to whom correspondence should be addressed.
Metabolites 2017, 7(3), 45; https://doi.org/10.3390/metabo7030045
Submission received: 29 June 2017 / Revised: 18 August 2017 / Accepted: 23 August 2017 / Published: 25 August 2017

Abstract

:
The field of metabolomics as applied to human disease and health is rapidly expanding. In recent efforts of metabolomics research, greater emphasis has been placed on quality control and method validation. In this study, we report an experience with quality control and a practical application of method validation. Specifically, we sought to identify and modify steps in gas chromatography-mass spectrometry (GC-MS)-based, non-targeted metabolomic profiling of human plasma that could influence metabolite identification and quantification. Our experimental design included two studies: (1) a limiting-dilution study, which investigated the effects of dilution on analyte identification and quantification; and (2) a concentration-specific study, which compared the optimal plasma extract volume established in the first study with the volume used in the current institutional protocol. We confirmed that contaminants, concentration, repeatability and intermediate precision are major factors influencing metabolite identification and quantification. In addition, we established methods for improved metabolite identification and quantification, which were summarized to provide recommendations for experimental design of GC-MS-based non-targeted profiling of human plasma.

1. Introduction

High-throughput molecular profiling is being increasingly used in large numbers of human samples to identify novel biomarkers and mechanisms of health and disease. Metabolomics, a subfield in molecular profiling, investigates the metabolome, the total quantitative collection of small-molecule metabolites in biofluids such as plasma [1,2], in an identified and quantified manner [3]. Since metabolomic changes are downstream of alterations at the genomic, transcriptomic and proteomic level [4], metabolomics is particularly suited for identification of biomarkers, examination of molecular physiology, and investigation of genetic and environmental modifications [5]. Commonly utilized technologies in metabolomics include liquid or gas chromatography (LC and GC) coupled to mass spectrometry (MS), capillary electrophoresis (CE), and nuclear magnetic resonance (NMR) spectroscopy [6].
Two general approaches exist in metabolomic profiling: targeted and non-targeted. The targeted approach identifies and quantifies select known metabolites, usually via isotope-labeled internal standards; the non-targeted approach aims to profile as many metabolites as possible, the identities of which are not established prior to analysis. The main advantage of the non-targeted approach is a broader coverage of the metabolome with opportunities for discovering novel pathways [7]. However, the non-targeted approach comes with inherent challenges surrounding metabolite identification and quantification, the first step of metabolomic profiling that directly impacts further biological insight. These challenges arise from both the complexity of biofluids with a wide range of compound classes and metabolite abundance, and intrinsic limitations of available analytical techniques [8,9]. For example, unknowns or analytes with no chemical identification discovered in metabolomic profiling studies frequently exceed the number of known metabolites with positive or putative identification by 2–3 times [10,11]. In recent years, with the advancements in metabolomic profiling approaches, more insights have been gained into the human blood metabolome [12,13]. Comprehensive databases for the human blood metabolome such as the Human Metabolome Database (http://www.hmdb.ca) have also been constructed to improve the identification of metabolites and characterization of metabolic pathways. Systematic metabolite identification and quantification in non-targeted metabolomic profiling have resulted in the discovery of novel disease biomarkers and pathways [14], while overlooking these key steps prior to drawing biological inferences has led to early pitfalls [15].
Compared to the targeted approach, non-targeted metabolomic profiling is also associated with greater difficulties in quality control and method validation where common parameters considered in method validation of targeted analysis such as accuracy or trueness cannot be adapted easily [16]. Efforts to overcome these difficulties, both at the experimental and computational level, have become a focus of metabolomic research [17,18,19,20,21,22,23]. These efforts include research on standardizing the experimental protocols [19,24,25], strategies for incorporating quality controls [18,26,27,28,29], and recommendations for employing statistics in the experiment design [30]. These efforts have also resulted in the formation of many working groups and data repositories (e.g., Metabolomics Workbench [31]) for standardization [17,32]. Currently, more research is still needed in the practical applications of quality control and method validation. For example, while chemical contamination has been suggested to interfere with metabolomic profiling [14], no previous study has investigated the effect of contaminants on metabolite identification and quantification systematically. As a result, recommendations for experimental design frequently include the incorporation of blanks containing identical reagents as biological samples [7,18,27], but few recommendations exist concerning ways to process and utilize blank data. The effect of concentration on metabolite identification has also been reported, where increased concentration resulted in higher numbers of identified components [33], but most of these studies were restricted to standard solutions [33] or a subset of isotope-labeled metabolites [34]. While some studies have advocated the use of linearity and repeatability and intermediate precision in quality control samples to monitor analytical performance [7,19,27], few have investigated their effects on metabolite quantification in complex biological samples or provided practical guidelines for improvement.
In this study, we sought to identify and modify steps in non-targeted metabolomic profiling of human plasma that could influence metabolite identification and quantification. We performed non-targeted metabolomic profiling using gas chromatography-mass spectrometry (GC-MS), due to its broad coverage, high sensitivity, and reproducibility [2,35,36]. Our hypothesis is that contaminants, concentration, and repeatability and intermediate precision are major factors influencing metabolite identification and quantification. In addition to developing methods for improved identification and quantification, we hope to provide recommendations for experimental design in GC-MS-based non-targeted metabolomic profiling of human plasma.

2. Results

Our experimental design included two studies: (1) the limiting-dilution study, which investigated the effects of dilution on analyte identification and quantification, and (2) the concentration-specific study, which compared the optimal concentration established in the first study with the standard volume used in the current institutional protocol [37]. For both studies, aliquots of human plasma were deproteinated with methanol, dried, methoxymated, trimethylsilylated, and run on a 6890N GC/5975 Inert MS (Agilent Technologies, Santa Clara, CA, USA).
Results from all aliquots were included in the analysis. A total of 320 analytes were detected in the limiting-dilution study, consisting of 183 known analytes and 137 unknowns. After excluding 29 analytes present in less than 20% of non-blanks (known: 24, unknown: 5), 291 analytes (known: 159, unknown: 132) were included in further analysis.

2.1. Selectivity: Contaminant Profile

The selectivity of an analytical method is defined as the ability to quantify the analytes accurately in the presence of interferences, such as process impurities and chemical contamination [18,38]. Examination of the contaminant profile can prevent false-positive discoveries and increase the selectivity of non-targeted metabolomic profiling. In the limiting-dilution study, 156 out of the 291 (53.6%) profiled analytes were present in at least one blank (148) or annotated as non-metabolites after manual curation (8). These analytes were characterized as contaminants, and were further classified into definite (present in greater than or equal to 20% or 6 blanks or annotated as a non-metabolite after manual curation, 123) or potential contaminants (present in greater than or equal to 1 but less than 6 blanks, 33). Classes of these contaminants (Figure 1) include process impurities (e.g., silicone oils and alkane hydrocarbons) present in blanks or discovered after manual curation (Supplemental Table S1), metabolites present in blanks (Table 1), and unknowns present in blanks. The majority of unknown (66.7%, 88) and 42.8% (68) of known analytes were contaminants.
Five contaminants exhibited positive run-order effects (Spearman’s rho greater than 0.5, p-value less than 0.05), including four unknowns and one equipment component; 16 contaminants exhibited negative run-order effects (Spearman’s rho less than −0.5, p-value less than 0.05), including 1 equipment component, 4 metabolites and 11 unknowns (Supplemental Figure S1).
The majority (49, 72.1%) of the 68 known contaminants (41 definite, 8 potential) from the limiting-dilution study were reproducible in the concentration-specific study. The majority of non-reproducible known contaminants were metabolites (16) undetected in blanks in the concentration-specific study. The concentration-specific study also produced 3 new contaminants. Additionally, 12 of the 74 (16.2%) unknown definite contaminants, as characterized by a match from the auxiliary library of unknowns, were reproducible. These results were used to establish a contaminant repository consisting of highly reproducible and potential contaminants for reference in future studies.
Definite non-metabolite contaminants (equipment components and unknowns, 98) and reagent derivatives (EDTA, MSTFA and pyridine derivatives, 5) were excluded from further analysis. Potential contaminants were included after background adjustment by subtracting the mean batch-specific blank level from the analyte level. Five potential contaminants with unadjusted levels lower than the background were excluded. Combined with noncontaminants, 183 analytes remained as features to describe potentially authentic metabolites. Known analytes identified in the NIST SRM1950 plasma were consistent with those reported in previous publications [28]. The identities of these analytes, together with analytes identified in the volunteer plasma, are listed in Supplemental Table S2.

2.2. Linearity: Signal-Concentration Relationship

Linearity refers to the ability to obtain measured analytical signals directly proportional to the concentration of analytes [39]. Linearity is a multifactorial problem affected by ionization efficiency of the analyte, ion transport from the ion source to the mass analyzer, and linear response of the detector. Assessment of the linearity of this signal-concentration relationship provides validation to simultaneous measurement of multiple metabolite concentrations in non-targeted metabolomic profiling [19]. In the limiting-dilution study, the linear regression model was deemed appropriate by F-test in 112 (61.2%) analytes, including 74 known analytes and 38 unknowns. After excluding 16 definite or potential contaminants, 55 analytes exhibiting lack of fit for the linear model were refitted with sigmoid curves using logistic regression models, as well as polynomial models (quadratic, cubic or 4th order), to test the hypothesis that saturation of the chromatography column is responsible for the lack of fit. F-test revealed that sigmoid curves were appropriate for 26 analytes and polynomial models were appropriate for 18 analytes in this subgroup, confirming the effects of saturation.
For the 112 analytes where the use of the linear regression model was appropriate, the adjusted R2 was used to assess the degree of linearity (Figure 2). Approximately half of analytes (47.9%, 23) with low linearity (R2 less than 0.5) were definite or potential contaminants. Known analytes had a significantly higher linearity than unknown analytes (p = 0.01, Table 2). Examination of the estimated parameter β 1 revealed that all except one analyte, a potential contaminant, had positive slopes.

2.3. Linear Dynamic Range

The linear dynamic range can be used to determine the optimal range for analyte detection. Outside the linear dynamic range, estimation of the analyte concentration becomes uncertain and may deviate significantly from the actual value [39]. In the limiting-dilution study, the majority (90.5%) of analytes’ linear dynamic range (LDR) was between concentrations of 4.98 × 10−9 and 7.48 × 10−9 (v/v, corresponding to a plasma extract volume of 100–150 µL) or 7.48 × 10−9 and 9.97 × 10−9 (corresponding to a plasma extract volume of 150–200 µL, Table 3). Only one analyte’s LDR was above 1.50 × 10−8 (plasma extract volume 300 µL). Using this information, the concentration of 7.48 × 10−9 (plasma extract volume 150 µL) was determined optimal.

2.4. Repeatability and Intermediate Precision

Since all plasma extracts used in this study were obtained from one sample (single blood draw from one individual), biological variability was minimized. Therefore, repeatability and intermediate precision in this study reflected mainly of process variability in sample preparation and instrument variability; each plasma extract aliquot served as quality control. Median within-batch RSD for all analytes was significantly higher at low plasma extract volumes than at high volumes (Figure 3, Kruskal-Wallis rank sum test, p-value less than 0.001). Post-hoc pairwise comparisons using the Conover’s test for multiple comparisons revealed that this difference was significant for the lowest three volumes (25, 50 and 75 µL) and no longer significant starting at 100 µL.
Averaged across all plasma extract volumes, within-batch RSD was significantly higher in definite and possible contaminants (median = 3.42, 25th/75th: 2.48/5.13) than non-contaminants (median = 3.06, 25th/75th: 2.33/3.92, Wilcoxon rank sum test, p-value = 0.04). Analytes with low linearity also had significantly higher within-batch RSD (median = 4.50, 25th/75th: 3.15, 4.93) than analytes with high linearity (median = 2.33, 25th/75th: 1.73/2.83, Wilcoxon rank sum test, p-value less than 0.001). There was no significant difference in within-batch RSD for known analytes vs. unknowns (Wilcoxon rank sum test, p-value = 0.22).
The median between-batch RSD for all analytes was significantly higher at lower volumes than at high volumes (Figure 4, Kruskal-Wallis rank sum test, p-value less than 0.001). Post-hoc pairwise comparisons revealed that this difference was significant for all volumes below 400 µL. Between-batch RSD was larger than within-batch RSD for 141 (76.2%) analytes.
Averaged across all plasma extract volumes, between-batch RSD was significantly higher in definite and possible contaminants (median = 5.22, 25th/75th: 3.82/7.63) than non-contaminants (median = 3.72, 25th/75th: 2.81/4.75, Wilcoxon rank sum test, p-value less than 0.001). Analytes with low linearity also had significantly higher within-batch RSD (median = 6.46, 25th/75th: 4.64/7.95) than analytes with high linearity (median = 2.86, 25th/75th: 2.33/3.42, Wilcoxon rank sum test, p-value less than 0.001). There was no significant difference in within-batch RSD for known analytes vs. unknowns (Wilcoxon rank sum test, p-value = 0.18).
An analysis-of-variance (ANOVA) test comparing a linear regression model with the addition of a batch variable and the basic model revealed that 173 (93.5%) analytes exhibited significant intermediate precision to warrant the inclusion of a batch variable in the analysis.

2.5. Concentration-Specific Study

After exclusion of contaminants, 133 known analytes detected in the concentration-specific study were compared to the limiting-dilution study. The majority of these analytes (117, 88.0%) were detected previously in the limiting-dilution study. Analytes not previously detected (16) were considered non-reproducible and excluded from the concentration comparisons.
Comparison of analyte detection at plasma extract volume 150 vs. 700 µL (plasma concentration 7.48 × 10−9 vs. 3.49 × 10−8) revealed 7 known analytes and 13 unknowns that were detected inconsistently (less than 20%) at 150 µL and consistently (greater than 50%) at 700 µL. All seven known analytes except one were of low to moderate linearity (adjusted R2 less than 0.7) in the limiting-dilution study.

3. Discussion

In this study, we investigated the steps in GC-MS-based non-targeted metabolomic profiling of human plasma that could influence metabolite identification and quantification. We tested and confirmed that contaminants, concentration, and repeatability and intermediate precision are major factors influencing the identification and quantification of metabolites. The findings of this study lead to recommendations for experimental design in GC-MS-based non-targeted metabolomic profiling of human plasma.
Through methodical inclusion and systematic analysis of blanks, we discovered that the majority of unknowns and close to half of known analytes detected were contaminants. This result highlights the importance of including blanks in GC-MS-based non-targeted metabolomic profiling, a step that is not universally incorporated in practice currently. While the majority of contaminants were equipment components, unknowns, or reagent derivatives, 19% were metabolites with levels above the detection limit but below true biological levels. These metabolite contaminants consist of a wide range of metabolites, such as amino acids, carbohydrates, fatty acids, lipids and organic acids. The most likely sources of metabolite contaminants are the polypropylene tubes used in sample preparation, with oils used as extrusion aids or mould-release agents. Our results provide direct evidence that contaminants could share similar chemical and physical properties to true metabolites, as proposed previously by Dunn et al. [27]. Without background correction, these metabolite impurities could affect the selectivity of metabolite quantification by providing false positive signals. While inclusion of blanks may increase the cost of metabolomic assays, the additional information gained in both metabolite identification and quantification warrants investigators considering routinely including them in study designs. In addition to improving selectivity, our results also demonstrated that using blanks could provide insight into the nature of unknowns and significantly narrow their search space. Unknowns are often considered spurious peaks from reagent contaminants, chemical artifacts during derivatization or deconvolution artifacts as opposed to true metabolites, and most current studies exclude all unknowns routinely from further analysis. While some studies have reported the number of unknowns [40], few have reported their characteristics or distribution. In this study, we discovered that while the majority of unknowns were contaminants, some were absent in blanks, results that were reproducible in the second study. By including reproducible unknowns in metabolomic profiling, the statistical power could be increased, potentially leading to the discovery of novel biomarkers and pathways.
Comparison of the limiting-dilution and concentration-specific study showed that the contaminant profile is highly reproducible. This result prompted us to establish a contaminant repository consisting of highly reproducible and potential contaminants for reference in future studies.
Few previous studies have explored the signal-concentration relationship in complex biological samples such as human plasma [34]. Our study utilized analytical replicates to examine the appropriateness of a linear model through comparing the pure error variability and variability from lack of fit. In our study, the signal-concentration relationship was linear for only 61.2% of analytes. Potential explanations for nonlinearity include contaminant effect and saturation effect. Contaminant effect arises from the metabolite impurities present in equipment and reagents that could affect the samples differently. At lower concentrations, false positive signals may arise from these impurities, thus affecting metabolite quantification. Conversely, as concentration increases beyond a certain threshold, the chromatography column may become saturated, resulting in peak broadening, decreased sensitivity and poor quantification. In this study, we examined saturation effect using sigmoid and polynomial models as alternatives to the linear regression model. Our results showed that saturation effect could explain close to half of the nonlinearity.
Our results showed that known analytes had significantly higher linearity than unknowns. This is likely because many unknowns may be spurious peaks arising from deconvolution artifacts or impurities. The classes of metabolites represented by linear analytes are diverse, suggesting that the functional group is not the only factor that affects linearity. Previous studies have advocated using dilution in quality control samples of metabolomic profiling to generate a list of highly linear “targets” that can be used for further method validation [7]. These known analytes showing high linearity in this study were used to construct a list of targets that we will use for performance monitoring in the future; the unknowns showing high linearity were added to our institutional library as potential metabolites of biological importance.
By examining the linear dynamic range for all analytes, we determined that the optimal concentration for quantification was 7.48 × 10−9 for the majority of analytes, corresponding to a plasma extract volume of 150 µL. The optimal protocol established for sample preparation and derivatization (SOP) can be found at: http://dmpi.duke.edu/files/dmpi_gc-ms_protocol.pdf. The subsequent concentration-specific study confirmed that by decreasing the plasma extract volume from 700 to 150 µL (concentration from 3.49 × 10−8 to 7.48 × 10−9), only a few low abundant, low linear metabolites and unknowns were less consistently detected. One of the main challenges in metabolomic profiling is the trade-off between detection and quantification. Using higher plasma volumes may increase the detection rate of low abundant analytes. However, at higher volumes, peaks for highly abundant analytes may become saturated, resulting in decreased accuracy in quantification. In the application of metabolomic profiling to human diseases, quantification of most analytes may be more important than detection of low abundant analytes, especially when the goal is to differentiate as many metabolite levels between cases and controls as possible. Conversely, for studies on samples with low abundant metabolites (e.g., neonates), using a higher plasma volume and thus metabolite concentration may be required to achieve improved identification and quantification. Of note, the optimal plasma volume established in this study may not be generalizable to other studies using different analytical instruments and experimental conditions. Therefore, we recommend establishing the linear dynamic range specific to individual instruments prior to initiating large-scale non-targeted metabolomic profiling studies.
In this study, repeatability was greatest at the lowest three volumes. This result is consistent with previous reports [34]. Sources of repeatability include variability in sample preparation and data acquisition. Specifically, contaminants affected repeatability significantly, as evidenced by higher within-batch RSD in contaminants than non-contaminants. The fact that within-batch RSD did not differ in known analytes compared to unknowns suggests that repeatability is intrinsic to the experimental process, rather than analyte-specific. The overall low within-batch RSD confirms that the method is highly reproducible, and meets the requirements similar to targeted methods.
Intermediate precision was higher than repeatability for the majority of analytes in this study. Sources of intermediate precision are similar to repeatability and include variability in sample preparation and data acquisition. In addition, since different batches were performed on different days, change in sensitivity over time may also contribute to intermediate precision as sample components aggregate in the GC injector or electrospray ion source [27]. While inter-experiment RSD was below 10% for the majority of analytes at all concentrations, the significant batch effect on quantification for most analytes suggests that batch controls should be included routinely in reporting and analysis of metabolomic profiling.
Broad-scan, non-targeted GC/MS metabolomics is useful for examining small compounds in plasma whose concentrations range from low micromolar to millimolar. However, GC has numerous limitations, including the need to extract and derivatize analytes to render them sufficiently nonpolar for GC. GC is poorly suited for some compounds, including those that are highly volatile and elute in the solvent front, as well as thermolabile or highly polar metabolites, such as quaternary amines, guanidino compounds, internal zwitterions, and molecules with phosphodiester bonds. Protocols and instruments vary widely. In assays for the hundred-plus plasma metabolites that are readily accessible by GC/MS, optimization experiments are essential during development of a stable analytic platform.

4. Materials and Methods

Our experimental design included two studies: (1) the limiting-dilution study (Figure 5), which investigated the effects of dilution on analyte identification and quantification, and (2) the concentration-specific study, which compared the optimal concentration established in the first study (7.48 × 10−9, corresponding to a plasma extract volume of 150 µL) with the standard volume used in the current institutional protocol [37] (3.49 × 10−8, corresponding to a plasma extract volume of 700 µL).

4.1. Sample Acquisition, Preparation, and Derivatization

Both studies utilized a single EDTA-anticoagulated blood sample obtained from one healthy volunteer after 10 h of fasting. The blood sample was collected at the beginning of the limiting-dilution study and plasma was extracted after centrifugation. The plasma sample was then separated into 1.2 mL aliquots and stored at −80 °C prior to sample preparation.
The limiting-dilution study was divided into 10 batches with identical experimental design (Figure 5) spanning 16 consecutive days, while the concentration-specific study was conducted within a two-day period. For both studies, plasma aliquots (100 µL each) were first extracted with 750 µL methanol spiked with a retention-time-lock internal standard of 6.25 mg/L perdeuterated myristic acid (C14:0-D27-TMS) to remove proteins. Following centrifugation at 2081× g for 5 min at room temperature, the supernatants were pooled into a 10 mL glass tube. Varying amounts of the pooled methanolic extract were then dispensed into new microcentrifuge tubes, and ballasted with 7.5:1 MeOH/H2O (v/v) for a total volume of 700 µL. The limiting-dilution and concentration-specific study differed in the volumes of pooled methanolic extract used, corresponding to different plasma concentrations. For the limiting-dilution study, each batch consisted of 33 aliquots with 11 different plasma extract volumes (0–700 µL), corresponding to 11 plasma concentrations repeated three times (Table 4). The concentration-specific study consisted of one batch of 32 aliquots: 15 replicates for each of the two concentrations 7.48 × 10−9 and 3.49 × 10−8 (corresponding to plasma extract volumes of 150 µL and 700 µL, respectively) and two blanks (reagents only). For both studies, each aliquot of methanolic extract ballasted with MeOH/H2O was dried with a SpeedVac SPD111V sample concentrator (Thermo Fisher Scientific, Asheville, NC, USA) for 5 h, followed by the addition of 100 µL ethyl acetate as an azeotropic drying agent, and another 45 min of SpeedVac drying. The dried plasma extracts were derivatized with 25 µL of 18 mg/mL methoxyamine hydrochloride in pyridine at 50 °C for 30 min, followed by trimethylsilylation with 75 µL of N-methyl-N-(trimethylsilyl)trifluoroacetamide (MSTFA) at 50 °C for 30 min.

4.2. GC-MS Analysis

The derivatized aliquots were analyzed with a 6890N GC-5975 Inert MS (Agilent Technologies, Santa Clara, CA, USA) using previously described methods [37]. A high-volume, ProSep inlet (liner dimensions 2 × 6.0 × 243 mm, Patent No: US 6,484,560 B1, Apex Technologies, Inc., Edison, NJ, USA) [37] was used to allow for programmed-temperature vaporization and diversion of the heavy contaminants away from the GC-MS. Volumes of 5 µL were injected into a DB5-MS capillary column (two 15 m × 250 µm × 0.25 µm; J & W Scientific, Folson, CA, USA connected in series by a microfluidic flow controller, Agilent Technologies, Santa Clara, CA, USA) in 25:1 split mode. The split ratio was determined empirically in prior experiments. Initial inlet pressures were adjusted empirically to achieve a retention time of 16.727 min for the internal standard. Helium was used as the carrier gas, and the pressure was programmed with helium flow at a constant rate of 2.0 mL/min. The initial GC oven temperature was 60 °C, and the temperature was increased at a rate of 10 °C /min to a final temperature of 325 °C. At the end of each run, both the inlet and the oven were held at 325 °C for a “bake-out” to minimize carryover. During this “bake-out”, the upstream GC column was back-flushed via the mid-column microfluidic splitter, while the inlet was purged with high-flow helium at 50 mL/min. Positive ions were generated with conventional electron ionization (EI) at 70 eV; detection was achieved using a full scan mode from 600 to 50 m/z. Aliquots were run in a randomized order to ensure that the orders of sample preparation and data acquisition did not introduce biases (Figure 5). Method blanks containing the reagents only were processed following the same procedure as the biological aliquots and included at the beginning, middle, and end of each run.
Instrument maintenance was performed after every week of analysis, and included cleaning the ionization source components, tuning the mass spectrometry analyzer, and changing the GC liner. After instrument maintenance, injections of the same volunteer plasma were performed prior to continuing the study to compare the retention times, analyte detection, and peak shapes to ensure consistency.

4.3. Metabolite Identification and Quantification

GC-MS data were first deconvoluted with AMDIS (build 140.24, version 2.72, National Institute of Standards and Technology, Gaithersburg, MD, USA), with the following settings, which experience has shown to be suitable: component width 12 scans; exclusions of the total-ion chromatogram and m/z 73, 74, 75, 147, 148, and 149; adjacent peak subtraction-none; resolution-medium; sensitivity-high; and shape requirements-low. Peak annotation was achieved using our institutional library. The institutional library consists of the Fiehn RTL spectral library [41] with additions established using purified standard compounds in the DMPI metabolomics laboratory and spectra from the Golm Metabolome Database [42] and similar public spectral libraries. Metabolite identification was based on retention index and spectral match scores. Identified (known) analytes with reverse scores greater than or equal to 75 were included in further analysis. Unidentified (unknown) analytes were catalogued using an auxiliary library of spectra corresponding to unidentified peaks that were conserved across samples. These were categorized according to retention index and the dominant m/z spectral fragment. Retention indices were assigned by a quadratic equation defining the retention index (RI) as a function of retention time (RT), derived from injections of a ladder of fatty acid methyl esters, or FAMES, where RI = 2.246 × RT2 + (21.61 × RT) + 507.9, with the RIs of FAMES defined as 800 for methyl octanoate, 900 for methyl nonanoate, and so on. Analyte levels were reported as the log-base-2 transformed values of integrated peak areas. Analytes detected in less than 20% of non-blanks were excluded from further analysis.
To validate findings in these two studies, a third study was conducted using paired samples consisting of (1) the volunteer plasma used in the first two studies, and (2) the NIST SRM1950 plasma standard (5 × 1 mL) [28]. These paired samples were prepared and analyzed in three batches using the same methods as the limiting dilution study. Identities of known metabolites detected in the NIST SRM1950 plasma standard were compared to previous reports in the literature [28].

4.4. Parameters Assessed for Method Development

To test our hypothesis, we examined five parameters previously proposed for bioanalytical method development [7] in the limiting-dilution study: selectivity, linearity, linear dynamic range, and repeatability and intermediate precision.
Selectivity is defined as the ability to identify and quantify analytes in the presence of potential contaminants such as process impurities, reagent derivatives, and sample carryover [7]. We assessed selectivity through examining analytes detected in blanks, with the assumption that any analyte detectable in greater than or equal to 1 blank is a contaminant. These contaminants may include components from collection tubes and plastic ware, reagent derivatives, and metabolites introduced through the preparation process that mimic the same metabolites present in biological samples. Additional contaminants were discovered by manual curation (examination of the annotation): analytes with non-metabolite annotations (e.g., silicone oils) were also classified as contaminants. All contaminants were further classified into definite (present in greater than or equal to 20%/6 blanks or annotated as a non-metabolite) or potential contaminants (present in greater than or equal to 1 but less than 6 blanks). Run-order effects in blanks were estimated as the Spearman’s correlation coefficient between run order and contaminant levels.
While accuracy of quantification is not easily achievable in non-targeted metabolomic profiling, linearity, or the ability to obtain signals directly proportional to the concentration of analytes within a given range [7], can be assessed as a measure of quantification. Linearity was commonly assessed using the coefficient of determination, or R2, in previous studies [34]. Although convenient, R2 is a limited measure in assessing goodness-of-fit of a linear regression model, as non-linear relationships can present with a high R2 value. In this study, we took advantage of the analytical replicates in the study design and assessed linearity of the signal-concentration relationship using a linear regression model: Y i j = β 0 + β 1 X j + ε i j , where Yij denotes the analyte level for the ith aliquot for the jth level of X (i = 1, …, 30; j = 1, …, 10), Xj is the log2 of plasma extract volume, and εij ~ iid N (0, σ2). The parameters β 0 and β 1 were estimated using the least squares solution. The appropriateness of the linear regression model was examined using residual plots by plotting the residuals against fitted values. Additionally, the F-test for lack of fit was used to test the full model: Y i j = μ j + ε i j , where E[Yij] = μj, versus the reduced model: Y i j = β 0 + β 1 X j + ε i j . Analytes for which the linear model was deemed appropriate were further assessed using the adjusted coefficient of determination, or adjusted R2; analytes exhibiting lack of fit for the linear model were refitted with sigmoid curves using logistic regression or polynomial models.
Linear dynamic range for each analyte was evaluated using response factors obtained by dividing the analyte levels by their concentrations [38]. The linear range was defined as the range between 0.95 and 1.05 times the average value of the response factors. The optimal concentration was determined as the concentration where the majority of analytes were in their linear dynamic range.
Repeatability and intermediate precision in analyte quantification were assessed by examining the coefficient of variation or relative standard deviation (RSD). Specifically, repeatability, or within-batch variability, was assessed by examining the RSD for each analyte at each plasma extract volume, averaged across the 10 batches. Intermediate precision, or between-batch variability, was assessed by examining the RSD for each analyte at each plasma extract volume, using the mean analyte levels for each batch. Kruskal-Wallis test with post-hoc pairwise comparisons using the Conover’s test was performed to compare the repeatability and intermediate precision RSD at different volumes. In addition, an analysis-of-variance (ANOVA) test comparing a linear regression model with the addition of a batch variable with the basic linear regression model was used to estimate batch effects. A two-tailed alpha of 0.05 was used.
The concentration-specific study compared analyte detection, defined as the presence of an analyte above the 20% cut-off, at the optimal concentration established in the first study (7.48 × 10−9, corresponding to a plasma extract volume of 150 µL) with the standard used in the current institutional protocol (3.49 × 10−8, corresponding to a plasma extract volume of 700 µL). Reproducibility of the contaminant and metabolite profile was also assessed by comparing the results of the limiting-dilution and concentration-specific study.

5. Conclusions

Using a limiting-dilution and concentration-specific study, we confirmed that contaminants, repeatability and intermediate precision and concentration are major factors influencing metabolite identification and quantification, and established methods for improved metabolite identification and quantification. These methods are summarized (Table 5) to provide recommendations for experimental design of GC-MS-based non-targeted profiling of human plasma metabolome.

Supplementary Materials

The following are available online at www.mdpi.com/2218-1989/7/3/45/s1, Figure S1: Example of positive (left) and negative (right) run order effect on contaminant levels. Contaminant on the left is an unknown with retention time 8.125 min; contaminant on the right is beta-monopalmitin, Table S1: Non-metabolite known contaminants detected in blanks, Table S2: Known analytes identified in the volunteer and NIST SRM 1950 plasma.

Acknowledgments

Funding for this study was supported by the TSFRE Braunwald Research Fellowship and National Institute of Health Grant T32HL007101 (H.W.).

Author Contributions

H.W., E.R.H, J.R.B. and S.S. conceived and designed the experiments; H.W. and S.K.O’N. performed the experiments; H.W., M.J.M, S.K.O’N. and E.R.H. analyzed the data; C.B.N. contributed reagents/materials/analysis tools; H.W., J.R.B. and M.J.M wrote the paper. All authors contributed to the preparation of the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Beecher, C.W.W. The human metabolome. In Metabolic Profiling: Its Role in Biomarker Discovery and Gene Function Analysis; Springer: Berlin/Heidelberg, Germany, 2003; pp. 311–318. [Google Scholar]
  2. Wishart, D.S. Computational approaches to metabolomics. In Methods in Molecular Biology; Springer: Berlin/Heidelberg, Germany, 2010; pp. 263–282. [Google Scholar]
  3. Fiehn, O. Metabolomics—The link between genotypes and phenotypes. Plant Mol. Biol. 2002, 48, 155–171. [Google Scholar] [CrossRef] [PubMed]
  4. Barba, I.; Garcia-dorado, D. Metabolomics in cardiovascular disease : Towards clinical application. Coron. Artery Dis. 2012. Available online: https://cdn.intechopen.com/pdfs-wm/32774.pdf (accessed on 23 August 2017).
  5. Worley, B.; Powers, R. Multivariate analysis in metabolomics. Curr. Metabolom. 2013, 1, 92–107. [Google Scholar]
  6. Dunn, W.B.; Bailey, N.J.; Johnson, H.E. Measuring the metabolome: Current analytical technologies. Analyst 2005, 130, 606–625. [Google Scholar] [CrossRef] [PubMed]
  7. Naz, S.; Vallejo, M.; Garcia, A.; Barbas, C. Method validation strategies involved in non-targeted metabolomics. J. Chromatogr. A 2014, 1353, 99–105. [Google Scholar] [CrossRef] [PubMed]
  8. Koek, M.M.; Muilwijk, B.; Van Der Werf, M.J.; Hankemeier, T. Microbial metabolomics with gas chromatography/mass spectrometry. Anal. Chem. 2006, 78, 1272–1281. [Google Scholar] [CrossRef] [PubMed]
  9. Dunn, W.B.; Erban, A.; Weber, R.J.M.; Creek, D.J.; Brown, M.; Breitling, R.; Hankemeier, T.; Goodacre, R.; Neumann, S.; Kopka, J.; et al. Mass appeal: Metabolite identification in mass spectrometry-focused untargeted metabolomics. Metabolomics 2013, 9, 44–66. [Google Scholar] [CrossRef]
  10. Du, X.; Zeisel, S.H. Spectral deconvolution for gas chromatography mass spectrometry-based metabolomics: Current status and future perspectives. Comput. Struct. Biotechnol. J. 2013, 4, 1–10. [Google Scholar] [CrossRef] [PubMed]
  11. Kopka, J. Current challenges and developments in GC-MS based metabolite profiling technology. J. Biotechnol. 2006, 124, 312–322. [Google Scholar] [CrossRef] [PubMed]
  12. Psychogios, N.; Hau, D.D.; Peng, J.; Guo, A.C.; Mandal, R.; Bouatra, S.; Sinelnikov, I.; Krishnamurthy, R.; Eisner, R.; Gautam, B.; et al. The human serum metabolome. PLoS ONE 2011, 6, e16957. [Google Scholar] [CrossRef] [PubMed]
  13. Wishart, D.S.; Jewison, T.; Guo, A.C.; Wilson, M.; Knox, C.; Liu, Y.; Djoumbou, Y.; Mandal, R.; Aziat, F.; Dong, E.; et al. HMDB 3.0-the human metabolome database in 2013. Nucleic Acids Res. 2013, 41, 801–807. [Google Scholar] [CrossRef] [PubMed]
  14. Wang, Z.; Klipfell, E.; Bennett, B.J.; Koeth, R.; Levison, B.S.; Dugar, B.; Feldstein, A.E.; Britt, E.B.; Fu, X.; Chung, Y.-M.; et al. Gut flora metabolism of phosphatidylcholine promotes cardiovascular disease. Nature 2011, 472, 57–63. [Google Scholar] [CrossRef] [PubMed]
  15. Brindle, J.T.; Antti, H.; Holmes, E.; Tranter, G.; Nicholson, J.K.; Bethell, H.W.L.; Clarke, S.; Schofield, P.M.; McKilligin, E.; Mosedale, D.E.; et al. Rapid and noninvasive diagnosis of the presence and severity of coronary heart disease using 1H-NMR-based metabonomics. Nat. Med. 2002, 8, 1439–1444. [Google Scholar] [CrossRef] [PubMed]
  16. Riedl, J.; Esslinger, S.; Fauhl-Hassek, C. Review of validation and reporting of non-targeted fingerprinting approaches for food authentication. Anal. Chim. Acta 2015, 885, 17–32. [Google Scholar] [CrossRef] [PubMed]
  17. Reza, P.R.; Masanori, M.S.; Correa, E.; Dayalan, S.; Tim, A.G. Data standards can boost metabolomics research, and if there is a will, there is a way. Metabolomics 2016, 12, 1–13. [Google Scholar]
  18. Lind, M.V.; Savolainen, O.I.; Ross, A.B. The use of mass spectrometry for analysing metabolite biomarkers in epidemiology: Methodological and statistical considerations for application to large numbers of biological samples. Eur. J. Epidemiol. 2016, 31, 717–733. [Google Scholar] [CrossRef] [PubMed]
  19. Kanani, H.; Chrysanthopoulos, P.K.; Klapa, M.I. Standardizing GC-MS metabolomics. J. Chromatogr. B Anal. Technol. Biomed. Life Sci. 2008, 871, 191–201. [Google Scholar] [CrossRef] [PubMed]
  20. Gika, H.G.; Wilson, I.D.; Theodoridis, G.A. The Role of Mass Spectrometry in Nontargeted Metabolomics, 1st ed.; Elsevier: Philadelphia, PA, USA, 2014; Volume 63, pp. 213–233. [Google Scholar]
  21. Fiehn, O. Extending the breadth of metabolite profiling by gas chromatography coupled to mass spectrometry. TrAC Trends Anal. Chem. 2008, 27, 261–269. [Google Scholar] [CrossRef] [PubMed]
  22. Sumner, L.W.; Amberg, A.; Barrett, D.; Beale, M.H.; Beger, R.; Daykin, C.; Fan, T.W.M.; Fiehn, O.; Goodacre, R.; Griffin, J.L.; et al. Proposed minimum reporting standards for chemical analysis: Chemical analysis working group (CAWG) Metabolomics Standards Initiative (MSI). Metabolomics 2007, 3, 211–221. [Google Scholar] [CrossRef] [PubMed]
  23. Lisec, J.; Schauer, N.; Kopka, J.; Willmitzer, L.; Fernie, A.R. Gas chromatography mass spectrometry-based metabolite profiling in plants. Nat. Protoc. 2006, 1, 387–396. [Google Scholar] [CrossRef] [PubMed]
  24. Dunn, W.B.; Broadhurst, D.; Begley, P.; Zelena, E.; Francis-McIntyre, S.; Anderson, N.; Brown, M.; Knowles, J.D.; Halsall, A.; Haselden, J.N.; et al. Procedures for large-scale metabolic profiling of serum and plasma using gas chromatography and liquid chromatography coupled to mass spectrometry. Nat. Protoc. 2011, 6, 1060–1083. [Google Scholar] [CrossRef] [PubMed]
  25. Ammerlaan, W.; Trezzi, J.-P.; Lescuyer, P.; Mathay, C.; Hiller, K.; Betsou, F. Method validation for preparing serum and plasma samples from human blood for downstream proteomic, metabolomic, and circulating nucleic acid-based applications. Biopreserv. Biobank. 2014, 12, 269–280. [Google Scholar] [CrossRef] [PubMed]
  26. Godzien, J.; Alonso-Herranz, V.; Barbas, C.; Armitage, E.G. Controlling the quality of metabolomics data: New strategies to get the best out of the QC sample. Metabolomics 2014, 11, 518–528. [Google Scholar] [CrossRef]
  27. Dunn, W.B.; Wilson, I.D.; Nicholls, A.W.; Broadhurst, D. The importance of experimental design and QC samples in large-scale and MS-driven untargeted metabolomic studies of humans. Bioanalysis 2012, 4, 2249–2264. [Google Scholar] [CrossRef] [PubMed]
  28. Simon-Manso, Y.; Lowenthal, M.S.; Kilpatrick, L.E.; Sampson, M.L.; Telu, K.H.; Rudnick, P.A.; Mallard, W.G.; Bearden, D.W.; Schock, T.B.; Tchekhovskoi, D.V.; et al. Metabolite profiling of a NIST standard reference material for human plasma (SRM 1950): GC-MS, LC-MS, NMR, and clinical laboratory analyses, libraries, and web-based resources. Anal. Chem. 2013, 85, 11725–11731. [Google Scholar] [CrossRef] [PubMed]
  29. Sangster, T.; Major, H.; Plumb, R.; Wilson, A.J.; Wilson, I.D. A pragmatic and readily implemented quality control strategy for HPLC-MS and GC-MS-based metabonomic analysis. Analyst 2006, 131, 1075–1078. [Google Scholar] [CrossRef] [PubMed]
  30. Trutschel, D.; Schmidt, S.; Grosse, I.; Neumann, S. Experiment design beyond gut feeling: Statistical tests and power to detect differential metabolites in mass spectrometry data. Metabolomics 2015, 11, 851–860. [Google Scholar] [CrossRef]
  31. Sud, M.; Fahy, E.; Cotter, D.; Azam, K.; Vadivelu, I.; Burant, C.; Edison, A.; Fiehn, O.; Higashi, R.; Nair, K.S.; et al. Metabolomics workbench: An international repository for metabolomics data and metadata, metabolite standards, protocols, tutorials and training, and analysis tools. Nucleic Acids Res. 2016, 44, D463–D470. [Google Scholar] [CrossRef] [PubMed]
  32. Salek, R.M.; Arita, M.; Dayalan, S.; Ebbels, T.; Jones, A.R.; Neumann, S.; Rocca-Serra, P.; Viant, M.R.; Vizcaíno, J.A. Embedding standards in metabolomics: The metabolomics society data standards task group. Metabolomics 2015, 11, 782–783. [Google Scholar] [CrossRef]
  33. Lu, H.; Liang, Y.; Dunn, W.B.; Shen, H.; Kell, D.B. Comparative evaluation of software for deconvolution of metabolomics data based on GC-TOF-MS. TrAC Trends Anal. Chem. 2008, 27, 215–227. [Google Scholar] [CrossRef]
  34. Aa, J.; Trygg, J.; Gullberg, J.; Johansson, A.I.; Jonsson, P.; Antti, H.; Marklund, S.L.; Moritz, T. Extraction and GC/MS analysis of the human blood plasma metabolome. Anal. Chem. 2005, 77, 8086–8094. [Google Scholar] [CrossRef] [PubMed]
  35. Pasikanti, K.K.; Ho, P.C.; Chan, E.C.Y. Gas chromatography/mass spectrometry in metabolic profiling of biological fluids. J. Chromatogr. B Anal. Technol. Biomed. Life Sci. 2008, 871, 202–211. [Google Scholar] [CrossRef] [PubMed]
  36. Wishart, D.S. Computational strategies for metabolite identification in metabolomics. Bioanalysis 2009, 1, 1579–1596. [Google Scholar] [CrossRef] [PubMed]
  37. McNulty, N.P.; Yatsunenko, T.; Hsiao, A.; Faith, J.J.; Muegge, B.D.; Goodman, L.; Henrissat, B.; Oozeer, R.; Cools-Portier, S.; Gobert, G.; et al. The impact of a consortium of fermented milk strains on the gut microbiome of gnotobiotic mice and monozygotic twins. Sci. Transl. Med. 2011, 3. [Google Scholar] [CrossRef] [PubMed]
  38. Gustavo González, A.; Ángeles Herrador, M. A practical guide to analytical method validation, including measurement uncertainty and accuracy profiles. TrAC Trends Anal. Chem. 2007, 26, 227–238. [Google Scholar] [CrossRef]
  39. Christenson, R.H.; Duh, S.H. Methodological and analytic considerations for blood biomarkers. Prog. Cardiovasc. Dis. 2012, 55, 25–33. [Google Scholar] [CrossRef] [PubMed]
  40. Styczynski, M.P.; Moxley, J.F.; Tong, L.V.; Walther, J.L.; Jensen, K.L.; Stephanopoulos, G.N. Systematic identification of conserved metabolites in GC/MS data for metabolomics and biomarker discovery. Anal. Chem. 2007, 79, 966–973. [Google Scholar] [CrossRef] [PubMed]
  41. Kind, T.; Wohlgemuth, G.; Lee, D.Y.; Lu, Y.; Palazoglu, M.; Shahbaz, S.; Fiehn, O. FiehnLib: Mass spectral and retention index libraries for metabolomics based on quadrupole and time-of-flight gas chromatography/mass spectrometry. Anal. Chem. 2009, 81, 10038–10048. [Google Scholar] [CrossRef] [PubMed]
  42. Kopka, J.; Schauer, N.; Krueger, S.; Birkemeyer, C.; Usadel, B.; Bergmüller, E.; Dörmann, P.; Weckwerth, W.; Gibon, Y.; Stitt, M.; Willmitzer, L.; et al. [email protected]: The Golm metabolome database. Bioinformatics 2005, 21, 1635–1638. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Contaminants represent 54% of all analytes detected. Classes of contaminants: process impurities (e.g., silicone oils and alkane hydrocarbons) present in blanks or discovered after manual curation (25), metabolites present in blanks (43), and unknowns present in blanks (88).
Figure 1. Contaminants represent 54% of all analytes detected. Classes of contaminants: process impurities (e.g., silicone oils and alkane hydrocarbons) present in blanks or discovered after manual curation (25), metabolites present in blanks (43), and unknowns present in blanks (88).
Metabolites 07 00045 g001
Figure 2. Distribution of R2 value for all analytes. Approximately half of analytes (47.9%, 23) with low linearity (R2 less than 0.5) were definite or potential contaminants.
Figure 2. Distribution of R2 value for all analytes. Approximately half of analytes (47.9%, 23) with low linearity (R2 less than 0.5) were definite or potential contaminants.
Metabolites 07 00045 g002
Figure 3. Boxplot of repeatability (within-batch relative standard deviation or RSD) by plasma extract volume. The horizontal lines represent the median and the lower and upper hinges correspond to the 25th and 75th percentiles. The asterisks * denote RSD that was significantly different in post-hoc pairwise comparisons using the Conover’s test for multiple comparisons.
Figure 3. Boxplot of repeatability (within-batch relative standard deviation or RSD) by plasma extract volume. The horizontal lines represent the median and the lower and upper hinges correspond to the 25th and 75th percentiles. The asterisks * denote RSD that was significantly different in post-hoc pairwise comparisons using the Conover’s test for multiple comparisons.
Metabolites 07 00045 g003
Figure 4. Boxplot of intermediate precision (between-batch relative standard deviation or RSD) by plasma extract volume. The horizontal lines represent the median and the lower and upper hinges correspond to the 25th and 75th percentiles. The asterisks * denote RSD that was significantly different in post-hoc pairwise comparisons using the Conover’s test for multiple comparisons.
Figure 4. Boxplot of intermediate precision (between-batch relative standard deviation or RSD) by plasma extract volume. The horizontal lines represent the median and the lower and upper hinges correspond to the 25th and 75th percentiles. The asterisks * denote RSD that was significantly different in post-hoc pairwise comparisons using the Conover’s test for multiple comparisons.
Metabolites 07 00045 g004
Figure 5. (a) Schematic of the sample preparation steps for the limiting dilution study; (b) an example of the injection order of the plasma extract aliquots. Aliquots were analysed in a randomized order to minimize biases in sample preparation and data acquisition. Blanks containing the reagents only were included in at the beginning, middle, and end of each run. The concentration-specific study used a similar protocol except for different plasma extract volumes (0, 150 and 700 µL only).
Figure 5. (a) Schematic of the sample preparation steps for the limiting dilution study; (b) an example of the injection order of the plasma extract aliquots. Aliquots were analysed in a randomized order to minimize biases in sample preparation and data acquisition. Blanks containing the reagents only were included in at the beginning, middle, and end of each run. The concentration-specific study used a similar protocol except for different plasma extract volumes (0, 150 and 700 µL only).
Metabolites 07 00045 g005
Table 1. Metabolite contaminants detected in blanks by type (definite or potential) and chemical class. Detection rate in blanks varied by metabolite type.
Table 1. Metabolite contaminants detected in blanks by type (definite or potential) and chemical class. Detection rate in blanks varied by metabolite type.
TypeClass MetaboliteNo. of Blanks (%)
DefiniteAmino acidsGlycine6 (20%)
-Benzene derivativesBenzoic acid22 (73.3%)
-CarbohydratesGlucose and other aldohexoses20 (66.7%)
--Sucrose and similar disaccharides10 (33.3%)
-Fatty acidsHeptadecanoic acid or Octadecanol23 (76.7%)
--Myristic acid or Pentadecanol27 (90%)
--Nonanoic acid12 (40%)
--Oleic acid12 (40%)
--Palmitic acid27 (90%)
--Pentadecanoic acid or Hexadecanol14 (46.7%)
--Stearic acid27 (90%)
-Lipids alpha-Monopalmitin27 (90%)
--beta-Monopalmitin27 (90%)
--beta-Monostearin27 (90%)
--Glycerol26 (86.7%)
--Thymol15 (50%)
-Organic acidsPyruvic acid20 (66.7%)
--Succinic acid7 (23.3%)
-OtherPhosphoric acid23 (76.7%)
--Uridine27 (90%)
PotentialAmino acidsAspartic acid3 (10%)
-Benzene derivativesGentisic acid4 (13.3%)
- Phenol2 (6.7%)
-CarbohydratesFructose or similar ketohexose1 (3.3%)
-Fatty acidsArachidic acid or 1-Heneicosanol3 (10%)
--Decanoic acid1 (3.3%)
--Lauric acid4 (13.3%)
--Methyl palmitate2 (6.7%)
--Methyl stearate2 (6.7%)
-Lipids Gamma-Tocopherol2 (6.7%)
-Organic acidsAcetoacetate or 2-Aminoisobutanoic acid3 (10%)
--Glycolic acid2 (6.7%)
--Lactic acid5 (16.7%)
--Urea2 (6.7%)
-Other1,2-Propanediol1 (3.3%)
--4-Hydroxypyridine or 3-Hydroxypyridine3 (10%)
--Ethanolamine2 (6.7%)
--O-Methylphosphate3 (10%)
--Prunetin or similar isoflavone1 (3.3%)
Table 2. Distribution of adjusted R2 by analyte type. Known analytes had a significantly higher linearity than unknowns. Fisher’s exact test comparing known and unknowns p-value = 0.01.
Table 2. Distribution of adjusted R2 by analyte type. Known analytes had a significantly higher linearity than unknowns. Fisher’s exact test comparing known and unknowns p-value = 0.01.
SummaryNo. (% )KnownUnknown
R2adj greater than 0.9532 (1.6%)1 (1.8%)
R2adj (0.7, 0.95)6452 (41.3%)12 (21.1%)
R2adj (0.5, 0.7)3022 (29.7%)8 (21.1%)
R2adj less than 0.55024 (32.5%)24 (63.2%)
Table 3. Linear dynamic range for all analytes. The majority (90.5%) of analytes’ linear range was between 100 and 200 µL.
Table 3. Linear dynamic range for all analytes. The majority (90.5%) of analytes’ linear range was between 100 and 200 µL.
Plasma Extract Volume (µL)No. Analytes (%)
75–1008 (4.5%)
100–150100 (55.9%)
150–20062 (34.6%)
200–3006 (3.4%)
300+1 (0.6%)
Table 4. Experimental design for each batch in the limiting-dilution study. Eleven different plasma extract volumes repeated three times were included in each batch (total number of aliquots = 33). Each plasma extract volume was ballasted with 7.5:1 methanol/H2O (v/v) to bring the total volume to 700 µL. The entire limiting-dilution study consisted of 10 batches with identical experimental design.
Table 4. Experimental design for each batch in the limiting-dilution study. Eleven different plasma extract volumes repeated three times were included in each batch (total number of aliquots = 33). Each plasma extract volume was ballasted with 7.5:1 methanol/H2O (v/v) to bring the total volume to 700 µL. The entire limiting-dilution study consisted of 10 batches with identical experimental design.
Methanolic Plasma Extract Volume (µL)Methanol/H2O Volume 1 (µL)Equivalent Plasma Volume Injected 2 (nL)Equivalent Plasma Concentration 3 (v/v)
070000
256755.71.25 × 10−9
5065011.32.49 × 10−9
7562517.03.74 × 10−9
10060022.64.98 × 10−9
15055033.97.48 × 10−9
20050045.29.97 × 10−9
30040067.91.50 × 10−8
40030090.51.99 × 10−8
600100135.72.99 × 10−8
7000158.43.49 × 10−8
1 7.5:1 methanol/H2O (v/v) solution was used to bring the total volume up to 700 µL prior to drying. 2 Calculated using injection volume (5 μL out of 100 μL derivatized plasma) and split ratio (25:1): 0.1 mL of plasma × 106 nL/mL × (plasma extract volume/850) × (5/100)/26. 3 Calculated using weight-based estimate of total plasma volume 4.54 L.
Table 5. Recommendations for experimental design of GC-MS-based non-targeted profiling of human plasma metabolome, including recommendations on the inclusion of blanks, applications of linearity, control for repeatability and intermediate precision, establishment of linear range and treatment of unknowns.
Table 5. Recommendations for experimental design of GC-MS-based non-targeted profiling of human plasma metabolome, including recommendations on the inclusion of blanks, applications of linearity, control for repeatability and intermediate precision, establishment of linear range and treatment of unknowns.
Experimental DesignRecommendations
Establish method blanks Include 3 blank samples in the beginning, middle and end of every sequence run
Use both blanks and manual curation for contaminant profiling
Establish a list of highly reproducible and potential contaminants
LinearityIncorporate dilution into QC samples
Metabolites showing linearity can be used as targets to validate the methodology and monitor changes
Lack of linearity may indicate contaminant effect or saturation effect
Repeatability and intermediate precisionBatch should be included in reporting and analysis of non-targeted GC-MS profiling
RangeLinear dynamic range should be established through dilution studies
Optimal concentration established through dilution studies should be used for metabolic profiling
UnknownsUnknowns presenting as contaminants can be excluded from further analysis
Highly-linear unknowns may be biologically important metabolites
Reproducible, highly linear and non-contaminant unknowns should be added to the library or databases for future references

Share and Cite

MDPI and ACS Style

Wang, H.; Muehlbauer, M.J.; O’Neal, S.K.; Newgard, C.B.; Hauser, E.R.; Bain, J.R.; Shah, S.H. Recommendations for Improving Identification and Quantification in Non-Targeted, GC-MS-Based Metabolomic Profiling of Human Plasma. Metabolites 2017, 7, 45. https://doi.org/10.3390/metabo7030045

AMA Style

Wang H, Muehlbauer MJ, O’Neal SK, Newgard CB, Hauser ER, Bain JR, Shah SH. Recommendations for Improving Identification and Quantification in Non-Targeted, GC-MS-Based Metabolomic Profiling of Human Plasma. Metabolites. 2017; 7(3):45. https://doi.org/10.3390/metabo7030045

Chicago/Turabian Style

Wang, Hanghang, Michael J. Muehlbauer, Sara K. O’Neal, Christopher B. Newgard, Elizabeth R. Hauser, James R. Bain, and Svati H. Shah. 2017. "Recommendations for Improving Identification and Quantification in Non-Targeted, GC-MS-Based Metabolomic Profiling of Human Plasma" Metabolites 7, no. 3: 45. https://doi.org/10.3390/metabo7030045

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop