A new approach to untargeted integration of high resolution liquid chromatography–mass spectrometry data
Graphical abstract
Introduction
The systems biology framework aims at describing the behavior of biological systems (e.g. organisms, organs, cells) as a whole rather than the behavior of their (functional) biochemical components in isolation. During the last decade functional analysis of the transcriptome, proteome, and metabolome has increased [1], [2]. Because the metabolome is expected and found to be more sensitive to environmental (diet, drug, lifestyle) perturbations than the transcriptome or proteome, the emphasis on the phenotype at a more global systems biological level has shifted the focus toward the metabolome [1], [3], [4], [5], [6], [7], [8]. With this increasing awareness of the importance of the metabolome, the number of methods to detect and quantify metabolites is increasing. Hyphenated mass-spectrometry (GC, CE or LC–MS) has become the predominant technology for determining metabolite abundances, mainly because of its sensitivity allowing the measurement of low abundant metabolites in small sample volumes. Targeted modes of data acquisition (MRM/SRM) allow the MS to detect pre-selected compounds with an even higher sensitivity but at the same time have a limit (determined by the maximum MS/MS scan experiments possible) on the list of target compounds reported. The full scan data acquisition mode however, enables a wider, untargeted coverage of different metabolites.
Despite the limited number of compounds reported, targeted approaches are wide spread. Obvious reasons are the added advantage of data interpretation of known metabolites/compounds and the possibility to quantify them (using internal standards) often with better precision and accuracy then in untargeted modes. To a large extent the lesser use of untargeted approaches is also due to the lack of appropriate software that would enable untargeted extraction and integration without introducing artifacts and errors. As a result, integration is often limited to a set of known metabolites (targets) only and in most cases vendor software (MassLynx [9], Compass DataAnalysis [10], MassHunter [11], etc.) is used.
The lack of software that enables untargeted integration has been recognized by various academic groups and different algorithms and solutions have been suggested. For GC–MS measurements Metabolite Detector [12] or TNO-Deco [13] and Metalign [14] could be used and for high-resolution LC–MS software like XC-MS [15], Metalign [14] or MZmine [16] are available. However, these solutions do require specific user input, sometimes even sample specific, and often much user experience is needed before the data is properly extracted and integrated. All LC–MS untargeted solutions result in a huge list of features sometimes with additional putative identification (e.g. XC-MS). Several packages extract and/or report features based on differential analysis between sample groups (e.g. diseased vs. healthy). This not only limits the scalability but renders the method useless if no such grouping factor exists.
In this paper we describe a method for untargeted feature extraction and in addition we propose a new strategy that addresses the aforementioned shortcomings. The method is able to integrate untargeted data, can be incorporated in an automated environment and with only a few parameters to configure, the user interaction is kept to a minimum. Our proposed strategy is a two-step approach that in fact automates common analytical practice. The first step after feature extraction is based on per sample grouping of single features to feature-sets according to their isotopic patterns and retention times. Here, we introduce the term feature-set as a group of two or more features in a single sample with isotopically related masses that share the same retention time. The second step of our strategy consists of matching these feature-sets over samples. This way more constraints are imposed on the search space to increase the probability for a proper match over samples. Conversely, noisy signals have a lower chance of being propagated.
We demonstrate our method using data obtained with full scan global lipidomics profiling acquired with high-resolution LC–MS (Quadrupole Time-Of-Flight (qTOF)). Lipid profiles are especially challenging for untargeted processing due to the presence of a large number of isomers. We compare our untargeted integration results for a set of known compounds to those that were obtained by vendor software (the reference set) and those obtained using XC-MS [15]. The developed Matlab package is available for download on request and efforts are directed toward a user-friendly Windows executable.
Section snippets
Workflow (and methods)
Comparable to any software package that analyzes hyphenated MS data, the basic workflow comprises reading data, detecting, extracting and integrating peaks and relating them over samples. For a list of known targets, i.e. compounds with known masses and retention times this seems a straightforward task. Integration however, is complicated by issues like retention time shifts, bad chromatographic separation of isomers, bad peak shapes, noise and small shifts in registered masses. In vendor
Experimental
The software was written in Matlab 2011a (64 bit) using the bioinformatics-, image processing- and statistical toolboxes. All calculations were done on a DELL workstation equipped with a 4 core Intel® Xeon® CPU X5482 @3.2 GHz processor and 16 GB of memory running Windows 7 Professional 64 bit.
To demonstrate the proposed method, its functionality was tested using data of a clinical study obtained from a global lipid platform measured in positive mode [20]. The spectra obtained from this method
Feature extraction, grouping and comparison over samples
The mass spectra were acquired at a mass resolving power of 10,000, consequently the mass-resolution parameter was set to 10,000. Because we wanted to compare the results to those obtained by vendor software using highly optimized integration parameters for some compounds, the optional split-ratio was set to a very sensitive 0.01. This meant that if features contained two or more peaks with intensities as small as 1% of the highest peak in that feature it was split into multiple features. For
Discussion
As we demonstrated the effectiveness of our approach we realize that this is not the only software that is capable of doing untargeted analysis using high resolution LC–MS spectra. What makes this method different however, is that only a very limited amount of expert knowledge is required to use this method and the untargeted implementation to the very end. Subsequent matching of feature-sets instead of single features across samples arguably increases the probability of a proper match. Using
Conclusions
We introduced a new method to integrate high resolution full scan profiling LC–MS data in an untargeted manner. To demonstrate the effectiveness of our strategy of only comparing feature-sets over samples we used complex lipidomics full scan profiling LC–MS data of 128 samples. We compared the automatically integrated areas for a set of 174 known target lipids to those obtained by optimized and manually controlled quantification using vendor software. For 87% of the targets the correlation
Acknowledgements
We would like to thank Professor K. Willems van Dijk at Leiden University Medical Center for providing the samples. We acknowledge Professor A.H.C. van Kampen and Mia Pras-Raves at the Academic Medical Center, Amsterdam for processing the samples with XC-MS. We further would like to thank Jorne Troost for his feedback on the analytical issues we faced and Adrie Dane for constructive discussions. This project was (co)financed by the Netherlands Metabolomics Centre (NMC) which is part of the
References (28)
- et al.
Chemom. Intell. Lab. Syst.
(2010) - et al.
Anal. Chim. Acta
(2007) - et al.
Int. J. Pharm.
(2011) Front. Physiol.
(2010)- et al.
Pharmacogenomics
(2007) Nature
(2002)Expert Rev. Mol. Diagn.
(2007)- et al.
Nat. Rev. Microbiol.
(2005) - et al.
PLoS Genet.
(2008) - et al.
Nat. Genet.
(2010)
Nature
Cited by (8)
Post-acquisition spectral stitching. An alternative approach for data processing in untargeted metabolomics by UHPLC-ESI(−)-HRMS
2017, Journal of Chromatography B: Analytical Technologies in the Biomedical and Life SciencesCitation Excerpt :Metabolomic analysis is performed by various techniques with the aim of detecting as many features as possible. The two most commonly used techniques are nuclear magnetic resonance (NMR) and mass spectrometry coupled to liquid chromatography (LC–MS) [10,17]. An equally important role besides the instrumental analysis is devoted to the assessment of the results, by employing a large number of data analysis procedures.
Getting the right answers: Understanding metabolomics challenges
2015, Expert Review of Molecular Diagnostics