Elsevier

Analytica Chimica Acta

Volume 801, 1 November 2013, Pages 34-42
Analytica Chimica Acta

A new approach to untargeted integration of high resolution liquid chromatography–mass spectrometry data

https://doi.org/10.1016/j.aca.2013.09.028Get rights and content

Highlights

  • We introduce a new method for untargeted feature extraction and integration.

  • Few parameters are needed to configure method.

  • Feature extraction and integration are performed per sample, this make the implementation highly scalable.

  • We successfully demonstrate our method using complex lipidomics data.

  • Matlab package available on request.

Abstract

Because of its high sensitivity and specificity, hyphenated mass spectrometry has become the predominant method to detect and quantify metabolites present in bio-samples relevant for all sorts of life science studies being executed. In contrast to targeted methods that are dedicated to specific features, global profiling acquisition methods allow new unspecific metabolites to be analyzed. The challenge with these so-called untargeted methods is the proper and automated extraction and integration of features that could be of relevance. We propose a new algorithm that enables untargeted integration of samples that are measured with high resolution liquid chromatography–mass spectrometry (LC–MS). In contrast to other approaches limited user interaction is needed allowing also less experienced users to integrate their data. The large amount of single features that are found within a sample is combined to a smaller list of, compound-related, grouped feature-sets representative for that sample. These feature-sets allow for easier interpretation and identification and as important, easier matching over samples. We show that the automatic obtained integration results for a set of known target metabolites match those generated with vendor software but that at least 10 times more feature-sets are extracted as well. We demonstrate our approach using high resolution LC–MS data acquired for 128 samples on a lipidomics platform. The data was also processed in a targeted manner (with a combination of automatic and manual integration) using vendor software for a set of 174 targets. As our untargeted extraction procedure is run per sample and per mass trace the implementation of it is scalable. Because of the generic approach, we envision that this data extraction lipids method will be used in a targeted as well as untargeted analysis of many different kinds of TOF-MS data, even CE- and GC–MS data or MRM. The Matlab package is available for download on request and efforts are directed toward a user-friendly Windows executable.

Introduction

The systems biology framework aims at describing the behavior of biological systems (e.g. organisms, organs, cells) as a whole rather than the behavior of their (functional) biochemical components in isolation. During the last decade functional analysis of the transcriptome, proteome, and metabolome has increased [1], [2]. Because the metabolome is expected and found to be more sensitive to environmental (diet, drug, lifestyle) perturbations than the transcriptome or proteome, the emphasis on the phenotype at a more global systems biological level has shifted the focus toward the metabolome [1], [3], [4], [5], [6], [7], [8]. With this increasing awareness of the importance of the metabolome, the number of methods to detect and quantify metabolites is increasing. Hyphenated mass-spectrometry (GC, CE or LC–MS) has become the predominant technology for determining metabolite abundances, mainly because of its sensitivity allowing the measurement of low abundant metabolites in small sample volumes. Targeted modes of data acquisition (MRM/SRM) allow the MS to detect pre-selected compounds with an even higher sensitivity but at the same time have a limit (determined by the maximum MS/MS scan experiments possible) on the list of target compounds reported. The full scan data acquisition mode however, enables a wider, untargeted coverage of different metabolites.

Despite the limited number of compounds reported, targeted approaches are wide spread. Obvious reasons are the added advantage of data interpretation of known metabolites/compounds and the possibility to quantify them (using internal standards) often with better precision and accuracy then in untargeted modes. To a large extent the lesser use of untargeted approaches is also due to the lack of appropriate software that would enable untargeted extraction and integration without introducing artifacts and errors. As a result, integration is often limited to a set of known metabolites (targets) only and in most cases vendor software (MassLynx [9], Compass DataAnalysis [10], MassHunter [11], etc.) is used.

The lack of software that enables untargeted integration has been recognized by various academic groups and different algorithms and solutions have been suggested. For GC–MS measurements Metabolite Detector [12] or TNO-Deco [13] and Metalign [14] could be used and for high-resolution LC–MS software like XC-MS [15], Metalign [14] or MZmine [16] are available. However, these solutions do require specific user input, sometimes even sample specific, and often much user experience is needed before the data is properly extracted and integrated. All LC–MS untargeted solutions result in a huge list of features sometimes with additional putative identification (e.g. XC-MS). Several packages extract and/or report features based on differential analysis between sample groups (e.g. diseased vs. healthy). This not only limits the scalability but renders the method useless if no such grouping factor exists.

In this paper we describe a method for untargeted feature extraction and in addition we propose a new strategy that addresses the aforementioned shortcomings. The method is able to integrate untargeted data, can be incorporated in an automated environment and with only a few parameters to configure, the user interaction is kept to a minimum. Our proposed strategy is a two-step approach that in fact automates common analytical practice. The first step after feature extraction is based on per sample grouping of single features to feature-sets according to their isotopic patterns and retention times. Here, we introduce the term feature-set as a group of two or more features in a single sample with isotopically related masses that share the same retention time. The second step of our strategy consists of matching these feature-sets over samples. This way more constraints are imposed on the search space to increase the probability for a proper match over samples. Conversely, noisy signals have a lower chance of being propagated.

We demonstrate our method using data obtained with full scan global lipidomics profiling acquired with high-resolution LC–MS (Quadrupole Time-Of-Flight (qTOF)). Lipid profiles are especially challenging for untargeted processing due to the presence of a large number of isomers. We compare our untargeted integration results for a set of known compounds to those that were obtained by vendor software (the reference set) and those obtained using XC-MS [15]. The developed Matlab package is available for download on request and efforts are directed toward a user-friendly Windows executable.

Section snippets

Workflow (and methods)

Comparable to any software package that analyzes hyphenated MS data, the basic workflow comprises reading data, detecting, extracting and integrating peaks and relating them over samples. For a list of known targets, i.e. compounds with known masses and retention times this seems a straightforward task. Integration however, is complicated by issues like retention time shifts, bad chromatographic separation of isomers, bad peak shapes, noise and small shifts in registered masses. In vendor

Experimental

The software was written in Matlab 2011a (64 bit) using the bioinformatics-, image processing- and statistical toolboxes. All calculations were done on a DELL workstation equipped with a 4 core Intel® Xeon® CPU X5482 @3.2 GHz processor and 16 GB of memory running Windows 7 Professional 64 bit.

To demonstrate the proposed method, its functionality was tested using data of a clinical study obtained from a global lipid platform measured in positive mode [20]. The spectra obtained from this method

Feature extraction, grouping and comparison over samples

The mass spectra were acquired at a mass resolving power of 10,000, consequently the mass-resolution parameter was set to 10,000. Because we wanted to compare the results to those obtained by vendor software using highly optimized integration parameters for some compounds, the optional split-ratio was set to a very sensitive 0.01. This meant that if features contained two or more peaks with intensities as small as 1% of the highest peak in that feature it was split into multiple features. For

Discussion

As we demonstrated the effectiveness of our approach we realize that this is not the only software that is capable of doing untargeted analysis using high resolution LC–MS spectra. What makes this method different however, is that only a very limited amount of expert knowledge is required to use this method and the untargeted implementation to the very end. Subsequent matching of feature-sets instead of single features across samples arguably increases the probability of a proper match. Using

Conclusions

We introduced a new method to integrate high resolution full scan profiling LC–MS data in an untargeted manner. To demonstrate the effectiveness of our strategy of only comparing feature-sets over samples we used complex lipidomics full scan profiling LC–MS data of 128 samples. We compared the automatically integrated areas for a set of 174 known target lipids to those obtained by optimized and manually controlled quantification using vendor software. For 87% of the targets the correlation

Acknowledgements

We would like to thank Professor K. Willems van Dijk at Leiden University Medical Center for providing the samples. We acknowledge Professor A.H.C. van Kampen and Mia Pras-Raves at the Academic Medical Center, Amsterdam for processing the samples with XC-MS. We further would like to thank Jorne Troost for his feedback on the analytical issues we faced and Adrie Dane for constructive discussions. This project was (co)financed by the Netherlands Metabolomics Centre (NMC) which is part of the

References (28)

  • R.H. Jellema et al.

    Chemom. Intell. Lab. Syst.

    (2010)
  • S. Smit et al.

    Anal. Chim. Acta

    (2007)
  • T. Rajalahti et al.

    Int. J. Pharm.

    (2011)
  • R. Breitling

    Front. Physiol.

    (2010)
  • D.I. Ellis et al.

    Pharmacogenomics

    (2007)
  • H. Kitano

    Nature

    (2002)
  • D.B. Kell

    Expert Rev. Mol. Diagn.

    (2007)
  • D.B. Kell et al.

    Nat. Rev. Microbiol.

    (2005)
  • C. Gieger et al.

    PLoS Genet.

    (2008)
  • T. Illig et al.

    Nat. Genet.

    (2010)
  • K. Suhre et al.

    Nature

    (2011)
  • Waters, 2013,...
  • Bruker, 2013,...
  • Agilent, 2013,...
  • Cited by (8)

    View all citing articles on Scopus
    View full text