A comprehensive automatic data analysis strategy for gas chromatography-mass spectrometry based untargeted metabolomics

doi:10.1016/j.chroma.2019.460787

Journal of Chromatography A

Volume 1616, 12 April 2020, 460787

https://doi.org/10.1016/j.chroma.2019.460787 Get rights and content

Highlights

•
We propose a comprehensive data analysis workflow for GC-MS-based metabolomics.
•
An automatic TIC peak detection and resolution methodology is proposed.
•
A new time-shift correction and component registration algorithm is developed.
•
A MATLAB GUI is developed based on the developed strategy for users.

Abstract

Automatic data analysis for gas chromatography-mass spectrometry (GC-MS) is a challenging task in untargeted metabolomics. In this work, we provide a novel comprehensive data analysis strategy for GC-MS-based untargeted metabolomics (autoGCMSDataAnal) by developing a new automatic strategy for performing TIC peak detection and resolution and proposing a novel time-shift correction and component registration algorithm. autoGCMSDataAnal uses original acquired GC-MS datafiles as input to automatically perform TIC peak detection, component resolution, time-shift correction and component registration, statistical analysis, and compound identification. We utilize standards and complex plant samples to comprehensively investigate the performance of autoGCMSDataAnal. The results suggest that the developed strategy is comparable with several state-of-the-art methods that are widely used in GC-MS-based untargeted metabolomics. Based on the proposed strategy, we develop a user-friendly MATLAB GUI for users who are unfamiliar with programming languages to facilitate their routine analysis, which can be freely downloaded at: http://software.tobaccodb.org/software/autogcmsdataanal.

Introduction

GC-MS-based untargeted metabolomics has been widely used in many laboratories for high thoroughly characterizing massive semi-volatile and volatile compounds in complex samples [1]. A remarkable advantage of this technique is that compounds can be accurately identified by matching their acquired mass spectra against those in library, like National Institute of Standards and Technology (NIST). However, automatic data mining for GC-MS, involving compound feature extraction, coeluted component resolution, and peak alignment, is still a bottleneck for untargeted metabolomics [2].

One of the most challenging tasks is accurately retrieving components under TIC peaks. Compared with the well-developed TIC peak detection algorithms [3], [4], [5], [6], automatic component resolution for each detected TIC can still be treated as an unresolved task in GC-MS-based untargeted metabolomics. Thus far, the Automated Mass Spectral Deconvolution and Identification System (AMDIS) is one of the most widely utilized algorithms for the deconvolution of coeluted components. A remarkable advantage is that hundreds of TIC peaks can be analyzed simultaneously, and retrieved mass spectral profiles can be directly imported into the NIST library for compound identification. Broeckling et al. [7] took the advantage of the AMDIS and developed the MET-IDEA for providing a component list table to benefit the subsequent data analysis, such as screening metabolites. Du et al. and Domingo-Almenara et al. developed a number of high-performance algorithms [8], [9], [10], [11], [12], [13] to automatically resolve coeluted components. All these methods can automatically perform peak detection and component resolution, which is very helpful for practical applications. Performances of the above-mentioned methods for coeluted component resolution are depended on the selection of model ions, which should be the selective ions of the underlying components.

Chemometrics methods provide another choice. Some algorithms like trilinear decomposition [14,15] and multivariate curve resolution [16,17] have been widely used to perform component resolution in complex samples. These methods take the bilinear structure of the instrumental response of each sample to retrieve the underlying chromatographic and mass spectral profiles of components by iteratively optimizing random initialized values. A prerequisite of these chemometrics methods is that one has to manually set a number of initialized parameters for each TIC peak, involving elution range, number of components, which greatly limits their applications in the GC-MS-based untargeted metabolomics.

Time shift across samples is another problem that obstructs the direct comparison of multiple samples to find biomarkers. Mass spectrum can be used as a valid tool for aligning components. In GC-MS-based untargeted metabolomics, however, components with similar chemical structures are frequently encountered, which can generate similar mass spectra and thus may lead to inaccurate peak alignment results if they closely elute. In conclusion, new comprehensive methods for automatic implementing data mining in GC-MS-based untargeted metabolomics are urgently required [18,19].

Our research group has developed a number of methods [18,[20], [21], [22]] for GC-MS data analysis, such as employing chemometrics methods like HELP and MCR-ALS for screened TIC peak resolution. However, these methods can not automatically perform TIC peak resolution, which means one has to manually set calculation parameters like number of components under a TIC peak and the corresponding selective zones for either HELP or MCR-ALS. Aiming to develop a comprehensive automatic GC-MS data analysis workflow, we propose a novel TIC peak detection and peak resolution strategy in this work. Additionally, we provide a new time-shift correction and component registration algorithm. Based on these novel algorithms, we develop a new comprehensive automatic GC-MS data analysis method, called autoGCMSDataAnal, for automatically performing GC-MS-based untargeted metabolomics. The performance of the developed autoGCMSDataAnal is demonstrated by both standards and complex plant samples.

Section snippets

Standards

A mixture of 11 organic acid compounds (see Table 1) is prepared, which is then diluted to obtain a series of calibration samples with different concentration levels. A methyl esterification is performed for these calibration samples. Finally, 1 μL solution is injected into an 7890-5977 Agilent GC-MS. A DB-5MS column (50 m × 0.25 mm, 0.25 μm) is used. The column temperature is linearly increased from 50 °C to 280 °C at a rate of 3 °C min⁻¹. Only the full scan mode is set for the mass

Theory

Fig. 1 provides a brief workflow of autoGCMSDataAnal, which consists of single sample analysis and batch analysis steps. The single sample analysis is further divided into component resolution and time-shift correction, whereas the batch analysis is divided into component registration and peak filling for statistical analysis to screen metabolites. The retrieved components can be imported to NIST for identification.

Standard sample analysis

Performance of the developed strategy on compound identification and quantification is investigated by standards. Resolution results for the 11 compounds are provided in Table 1. These compounds can be automatically and precisely identified by the autoGCMSDataAnal. The match factors (MF) provided by NIST are larger than 900 for all compounds. Moreover, the coefficients of determination of the regression lines are larger than 0.9900.

An example for automatic TIC detection and component resolution

Conclusion

Automatic data analysis of complex samples remains one of the most challenging tasks in GC-MS-related untargeted metabolomics. This work provides a novel comprehensive data analysis strategy, autoGCMSDataAnal, for users, which can automatically perform TIC peak detection, TIC peak resolution, time-shift correction and component registration, statistical analysis, and compound identification. Performance of autoGCMSDataAnal has been demonstrated by standards and complex plant samples. Results

CRediT authorship contribution statement

Yu-Ying Zhang: Methodology, Writing - original draft. Qian Zhang: Methodology, Writing - original draft. Yue-Ming Zhang: Methodology, Writing - original draft. Wei-Wei Wang: Data curation. Li Zhang: Data curation. Yong-Jie Yu: Conceptualization, Software. Chang-Cai Bai: Formal analysis. Ji-Zhao Guo: Formal analysis. Hai-Yan Fu: Writing - review & editing. Yuanbin She: Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no competing financial interests.

Acknowledgments

The authors gratefully acknowledge the financial support of the Foundation of the National Natural Science Foundation of China (Grant nos. 21868028, 21606137, 21776259, 21776259, and 21305160), NGY2016124, and Ningxia Medical University (XT2016003). The author Hai-Yan Fu wants to acknowledge the financial support of the Talented Youth Cultivation Program of South-Central University for Nationalities (No. CRZ18002).

References (27)

S. Abou-el-karam et al.
Marker discovery in volatolomics based on systematic alignment of GC-MS signals: application to food authentication
Anal. Chim. Acta
(2017)
L. Yi et al.
Chemometric methods in data processing of mass spectrometry-based metabolomics: a review
Anal. Chim. Acta
(2016)
X. Domingo-Almenara et al.
Compound identification in gas chromatography/mass spectrometry-based metabolomics by blind source separation
J. Chromatogr. A
(2015)
X. Domingo-Almenara et al.
Automated resolution of chromatographic signals by independent component analysis–orthogonal signal deconvolution in comprehensive gas chromatography/mass spectrometry-based metabolomics
Comput. Meth. Prog. Bio.
(2016)
X. Domingo-Almenara et al.
Avoiding hard chromatographic segmentation: a moving window approach for the automated resolution of gas chromatography–mass spectrometry-based metabolomics signals by multivariate methods
J. Chromatogr. A
(2016)
A.C. Olivieri et al.
MVC2: a MATLAB graphical interface toolbox for second-order multivariate calibration
Chemometr. Intell. Lab.
(2009)
Y. Hu et al.
A flexible and novel strategy of alternating trilinear decomposition method coupled with two-dimensional linear discriminant analysis for three-way chemical data analysis: characterization and classification
Anal. Chim. Acta
(2018)
E. Ortiz-Villanueva et al.
Knowledge integration strategies for untargeted metabolomics based on MCR-ALS analysis of CE-MS and LC-MS data
Anal. Chim. Acta
(2017)
J. Jaumot et al.
MCR-ALS GUI 2.0: new features and applications
Chemometr. Intell. Lab.
(2015)
H.-.Y. Fu et al.
Mass-spectra-based peak alignment for automatic nontargeted metabolic profiling analysis for biomarker screening in plant samples
J. Chromatogr. A
(2017)

Z.-.M. Zhang et al.

Multiscale peak alignment for chromatographic datasets

J. Chromatogr. A

(2012)

Y.-.J. Yu et al.

A chemometric-assisted method based on gas chromatography–mass spectrometry for metabolic profiling analysis

J. Chromatogr. A

(2015)

L. Han et al.

Automatic untargeted metabolic profiling analysis coupled with chemometrics for improving metabolite identification quality to enhance geographical origin discrimination capability

J. Chromatogr. A

(2018)

Cited by (32)

A novel comprehensive strategy for high-thoroughly studying released compounds during the combustion process of herbs. A case study for Artemisia argyi Levl. et Vant
2024, Journal of Chromatography A
The comprehensive study of compound variations in released smoke during the combustion process is a great challenge in many scientific fields related to analytical chemistry like traditional Chinese medicine, environment analysis, food analysis, etc. In this work, we propose a new comprehensive strategy for efficiently and high-thoroughly characterizing compounds in the online released complex smokes: (i) A smoke capture device was designed for efficiently collecting chemical constituents to perform gas chromatography-mass spectrometry (GC-MS) based untargeted analysis. (ii) An advanced data analysis tool, AntDAS-GCMS, was used for automatically extracting compounds in the original acquired GC-MS data files. Additionally, a GC-MS data analysis guided instrumental parameter optimizing strategy was proposed for the optimization of parameters in the smoke capture device. The developed strategy was demonstrated by the study of compound variations in the smoke of traditional Chinese medicine, Artemisia argyi Levl. et Vant. The results indicated that more than 590 components showed significant differences among released smokes of various moxa velvet ratios. Finally, about 88 compounds were identified, of which phenolic compounds were the most abundant, followed by aromatics, alkenes, alcohols and furans. In conclusion, we may provide a novel approach to the studies of compounds in online released smoke.
Gold nanobipyramid colorimetric sensing array for the differentiation of strong aroma-type baijiu with different geographical origins
2024, Food Chemistry
It is of great significance to quickly and effectively distinguish strong aroma-type baijiu (SAB) with the largest baijiu market share and the most extensive production regions. Colorimetric sensor arrays based on gold nanobipyramids (AuNBPs) with extraordinary plasmonic properties were constructed for the differentiation of SAB from different geographical origins. The sensing strategy was based on silver deposition on different morphologies of AuNBPs under different reducing conditions containing amino or hydroxyl groups. The deposition process can be effective for distinguishing differences in baijiu due to the chemical interaction between the trace ingredients in baijiu and reductants. The colorimetric sensor arrays were implemented for the response of the main ingredients and further used for the differentiation of SAB from different regions by linear discriminant analysis. The results showed that the sensing strategy had excellent performance in distinguishing SAB from different origins, and provides a promising application strategy for baijiu quality control.
Geographical discrimination of Flos Trollii by GC-MS and UHPLC-HRMS-based untargeted metabolomics combined with chemometrics
2023, Journal of Pharmaceutical and Biomedical Analysis
For centuries, Flos Trollii has been consumed as functional tea and a folk medicine in China's north and northwest zones. The quality of Flos Trollii highly depends on the producing zones. Unfortunately, few studies have been reported on the geographical discrimination of Flos Trollii. This work comprehensively investigated Flos Trollii compounds with an integration strategy combining gas chromatography-mass spectrometry (GC-MS) and ultrahigh-performance liquid chromatography-high resolution mass spectrometry (UHPLC-HRMS) with chemometrics to explore the differences between Flos Trollii obtained from various origins of China. About 71 volatile and 22 involatile markers were identified with GC-MS and UHPLC-HRMS, respectively. Geographical discrimination models were synthetically investigated based on the identified markers. The results indicated that the UHPLC-HRMS coupled with the fisher discrimination model provided the best prediction capability (>97%). This study provides a new solution for Flos Trollii discrimination.
A study of flavor variations during the flaxseed roasting procedure by developed real-time SPME GC–MS coupled with chemometrics
2023, Food Chemistry
Volatile compound variations during the roasting procedure play an essential role in the flaxseed-related product. In this work, we proposed a new strategy to high-throughput characterize the dynamic variations of flavors in flaxseed. Volatile compounds released at various roasting times were comprehensively investigated by a newly developed real-time solid-phase microextraction coupled with gas chromatography-mass spectrometry (GC–MS). Raw data files were analyzed by our advanced GC–MS data analysis software AntDAS-GCMS. Chemometric methods such as principal component analysis and partial least squares-discrimination analysis have realized the differences of samples with various roasting times. Finally, a total of 51 compounds from 11 aromas were accurately identified and confirmed with standards, and their variations as a function of roasting time were studied. In conclusion, we provided a new solution for the online monitoring of volatile compounds during the industrial roasting process.
A comprehensive study of the effect of drying methods on compounds in Elaeagnus angustifolia L. flower by GC-MS and UHPLC-HRMS based untargeted metabolomics combined with chemometrics
2023, Industrial Crops and Products
The effect of drying methods on the variations of volatile, semi-volatile, and involatile compounds in Elaeagnus angustifolia L. (E.A.) flower was comprehensively investigated by using gas chromatography-mass spectrometry (GC-MS) and ultrahigh performance liquid chromatography-high resolution mass spectrometry (UHPLC-HRMS) based untargeted metabolomics combined with our recently developed automatic data analysis software AntDAS for the first time. About 21 volatile compounds, 44 semi-volatile compounds, and 17 involatile compounds were identified and showed significant differences among drying methods. Chemometric methods like principal components analysis (PCA), and partial least squares-discriminant analysis (PLS-DA) were used in combination with heatmap analysis to reveal the relationship between compound compositions and drying methods. Finally, it was concluded that room-temperature drying is more conducive to preserving volatile compounds in E.A. flower, and hot air drying is better for preserving semi-volatile and involatile compounds. This work can provide guidelines for industrial processes of E.A. flower.
Quality assessment for the flower of Lonicera japonica Thunb. during flowering period by integrating GC-MS, UHPLC-HRMS, and chemometrics
2023, Industrial Crops and Products
The rapid changes of inherent compounds in Lonicera japonica Thunb. (L. japonica) during the flowering period can seriously affect its industry quality. Most of current studies focused on several targeted compounds, which, unfortunately, cannot provide a comprehensive overview on the compound variations. Therefore, the relationship between the compositions of compounds and the flower quality of L. japonica is still unknown. Aiming to address this problem, analytical methods based on GC-MS and UHPLC-HRMS were built for high-thoroughly characterizing compounds in the flower quality of L. japonica during the flowering period. Advanced chemometric methods including autoGCMSDataAnal and AntDAS were used for the first time to resolve coeluted compounds in both GC-MS and UHPLC-HRMS. Based on the developed strategy, about 70 volatile compounds and 38 involatile compounds that were related to the flower quality of L. japonica were screened. Frequently analysis suggested that about 9 compounds can be used as markers for the quality evaluation of the flower quality.

View all citing articles on Scopus

¹: These authors contributed equally to this work.

View full text

A comprehensive automatic data analysis strategy for gas chromatography-mass spectrometry based untargeted metabolomics

Highlights

Abstract

Introduction

Section snippets

Standards

Theory

Standard sample analysis

Conclusion

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgments

Anal. Chim. Acta

Anal. Chim. Acta

J. Chromatogr. A

Comput. Meth. Prog. Bio.

J. Chromatogr. A

Chemometr. Intell. Lab.

Anal. Chim. Acta

Anal. Chim. Acta

Chemometr. Intell. Lab.

J. Chromatogr. A

J. Chromatogr. A

J. Chromatogr. A

J. Chromatogr. A