Skip to main content
Log in

Predicting compound amenability with liquid chromatography-mass spectrometry to improve non-targeted analysis

  • Research Paper
  • Published:
Analytical and Bioanalytical Chemistry Aims and scope Submit manuscript

Abstract

With the increasing availability of high-resolution mass spectrometers, suspect screening and non-targeted analysis are becoming popular compound identification tools for environmental researchers. Samples of interest often contain a large (unknown) number of chemicals spanning the detectable mass range of the instrument. In an effort to separate these chemicals prior to injection into the mass spectrometer, a chromatography method is often utilized. There are numerous types of gas and liquid chromatographs that can be coupled to commercially available mass spectrometers. Depending on the type of instrument used for analysis, the researcher is likely to observe a different subset of compounds based on the amenability of those chemicals to the selected experimental techniques and equipment. It would be advantageous if this subset of chemicals could be predicted prior to conducting the experiment, in order to minimize potential false-positive and false-negative identifications. In this work, we utilize experimental datasets to predict the amenability of chemical compounds to detection with liquid chromatography-electrospray ionization-mass spectrometry (LC–ESI–MS). The assembled dataset totals 5517 unique chemicals either explicitly detected or not detected with LC–ESI–MS. The resulting detected/not-detected matrix has been modeled using specific molecular descriptors to predict which chemicals are amenable to LC–ESI–MS, and to which form(s) of ionization. Random forest models, including a measure of the applicability domain of the model for both positive and negative modes of the electrospray ionization source, were successfully developed. The outcome of this work will help to inform future suspect screening and non-targeted analyses of chemicals by better defining the potential LC–ESI–MS detectable chemical landscape of interest.

Graphical abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data availability

All data generated or analyzed during this study are included in this published article and its supplementary information files.

Code availability

The computer code created for the current study is available from the corresponding author on reasonable request.

References

  1. Wambaugh JF, Setzer RW, Reif DM, Gangwal S, Mitchell-Blackwood J, Arnot JA, et al. High-throughput models for exposure-based chemical prioritization in the ExpoCast Project. Environ Sci Technol. 2013;47(15):8479–88.

    CAS  PubMed  Google Scholar 

  2. Csiszar SA, Meyer DE, Dionisio KL, Egeghy P, Isaacs KK, Price PS, et al. Conceptual framework to extend life cycle assessment using near-field human exposure modeling and high-throughput tools for chemicals. Environ Sci Technol. 2016;50(21):11922–34.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Li L, Westgate JN, Hughes L, Zhang X, Givehchi B, Toose L, et al. A model for risk-based screening and prioritization of human exposure to chemicals from near-field sources. Environ Sci Technol. 2018;52(24):14235–44.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Isaacs KK, Glen WG, Egeghy P, Goldsmith M-R, Smith L, Vallero D, et al. SHEDS-HT: an integrated probabilistic exposure model for prioritizing exposures to chemicals with near-field and dietary sources. Environ Sci Technol. 2014;48(21):12750–9.

    Article  CAS  PubMed  Google Scholar 

  5. Wild CP, Scalbert A, Herceg Z. Measuring the exposome: a powerful basis for evaluating environmental exposures and cancer risk. Environ Mol Mutagen. 2013;54(7):480–99.

    Article  CAS  PubMed  Google Scholar 

  6. Dix DJ, Houck KA, Martin MT, Richard AM, Setzer RW, Kavlock RJ. The ToxCast program for prioritizing toxicity testing of environmental chemicals. Toxicol Sci. 2007;95(1):5–12.

    Article  CAS  PubMed  Google Scholar 

  7. Tice RR, Austin CP, Kavlock RJ, Bucher JR. Improving the human hazard characterization of chemicals: a Tox21 update. Environ Health Perspect. 2013;121(7):756–65.

    Article  PubMed  PubMed Central  Google Scholar 

  8. Hertzberg RP, Pope AJ. High-throughput screening: new technology for the 21st century. Curr Opin Chem Biol. 2000;4(4):445–51.

    Article  CAS  PubMed  Google Scholar 

  9. NRC U. Risk assessment in the federal government: managing the process. National Research Council, Washington DC. 1983;11(3).

  10. Tolonen A, Pelkonen O. Analytical challenges for conducting rapid metabolism characterization for QIVIVE. Toxicology. 2015;332:20–9.

    Article  CAS  PubMed  Google Scholar 

  11. Nicolas CI, Mansouri K, Phillips KA, Grulke CM, Richard AM, Williams AJ, et al. Rapid experimental measurements of physicochemical properties to inform models and testing. Sci Total Environ. 2018;636:901–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Wetmore BA, Wambaugh JF, Ferguson SS, Sochaski MA, Rotroff DM, Freeman K, et al. Integration of dosimetry, exposure, and high-throughput screening data in chemical toxicity assessment. Toxicol Sci. 2012;125(1):157–74.

    Article  CAS  PubMed  Google Scholar 

  13. Sobus JR, Grossman JN, Chao A, Singh R, Williams AJ, Grulke CM, et al. Using prepared mixtures of ToxCast chemicals to evaluate non-targeted analysis (NTA) method performance. Anal Bioanal Chem. 2019;411(4):835–51.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Newton SR, McMahen RL, Sobus JR, Mansouri K, Williams AJ, McEachran AD, et al. Suspect screening and non-targeted analysis of drinking water using point-of-use filters. Environ Pollut. 2018;234:297–306.

    Article  CAS  PubMed  Google Scholar 

  15. Schymanski EL, Williams AJ. Open science for identifying “known unknown” chemicals. Environ Sci Technol. 2017;51(10):5357.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Sobus JR, Wambaugh JF, Isaacs KK, Williams AJ, McEachran AD, Richard AM, et al. Integrating tools for non-targeted analysis research and chemical safety evaluations at the US EPA. J Eposure Sci Environ Epidemiol. 2018;28(5):411–26.

    Article  CAS  Google Scholar 

  17. Ulrich EM, Sobus JR, Grulke CM, Richard AM, Newton SR, Strynar MJ, et al. EPA’s non-targeted analysis collaborative trial (ENTACT): genesis, design, and initial findings. Anal Bioanal Chem. 2019;411(4):853–66.

    Article  CAS  PubMed  Google Scholar 

  18. McEachran AD, Chao A, Al-Ghoul H, Lowe C, Grulke C, Sobus JR, et al. Revisiting five years of CASMI contests with EPA identification tools. Metabolites. 2020;10(6):260.

    Article  CAS  PubMed Central  Google Scholar 

  19. Newton SR, Sobus JR, Ulrich EM, Singh RR, Chao A, McCord J, et al. Examining NTA performance and potential using fortified and reference house dust as part of EPA’s Non-Targeted Analysis Collaborative Trial (ENTACT). Anal Bioanal Chem. 2020;412(18):4221–33.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Favreau P, Poncioni-Rothlisberger C, Place BJ, Bouchex-Bellomie H, Weber A, Tremp J, et al. Multianalyte profiling of per- and polyfluoroalkyl substances (PFASs) in liquid commercial products. Chemosphere. 2017;171:491–501.

    Article  CAS  PubMed  Google Scholar 

  21. Reymond J-L, Ruddigkeit L, Blum L, van Deursen R. The enumeration of chemical space. WIREs Comput Mol Sci. 2012;2(5):717–33.

    Article  CAS  Google Scholar 

  22. Horai H, Arita M, Kanaya S, Nihei Y, Ikeda T, Suwa K, et al. MassBank: a public repository for sharing mass spectral data for life sciences. J Mass Spectrom. 2010;45(7):703–14.

    Article  CAS  PubMed  Google Scholar 

  23. Dalby A, Nourse JG, Hounshell WD, Gushurst AKI, Grier DL, Leland BA, et al. Description of several chemical structure file formats used by computer programs developed at Molecular Design Limited. J Chem Inf Model. 1992;32(3):244–55.

    CAS  Google Scholar 

  24. Cao Y, Charisi A, Cheng L-C, Jiang T, Girke T. ChemmineR: a compound mining framework for R. Bioinformatics. 2008;24(15):1733–4.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. R Core Team (2019). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.

  26. Heller SR, McNaught A, Pletnev I, Stein S, Tchekhovskoi D. InChI, the IUPAC international chemical identifier. J Cheminform. 2015;7(1):23.

    Article  PubMed  PubMed Central  Google Scholar 

  27. Williams AJ, Grulke CM, Edwards J, McEachran AD, Mansouri K, Baker NC, et al. The CompTox Chemistry Dashboard: a community data resource for environmental chemistry. J Cheminform. 2017;9(1):61.

    Article  PubMed  PubMed Central  Google Scholar 

  28. Lowe CN, Williams AJ. Enabling high-throughput searches for multiple chemical data using the U.S.-EPA CompTox chemicals dashboard. J Chem Inf Model. 2021;61(2):565–70.

    Article  CAS  PubMed  Google Scholar 

  29. Richard AM, Judson RS, Houck KA, Grulke CM, Volarath P, Thillainadarajah I, et al. ToxCast chemical landscape: paving the road to 21st century toxicology. Chem Res Toxicol. 2016;29(8):1225–51.

    Article  CAS  PubMed  Google Scholar 

  30. Mansouri K, Grulke C, Richard A, Judson R, Williams A. An automated curation procedure for addressing chemical errors and inconsistencies in public datasets used in QSAR modelling. SAR QSAR Environ Res. 2016;27(11):911–37.

    Article  CAS  Google Scholar 

  31. Yap CW. PaDEL-descriptor: an open source software to calculate molecular descriptors and fingerprints. J Comput Chem. 2011;32(7):1466–74.

    Article  CAS  PubMed  Google Scholar 

  32. Mansouri K, Grulke CM, Judson RS, Williams AJ. OPERA models for predicting physicochemical properties and environmental fate endpoints. J Cheminform. 2018;10(1):10.

    Article  PubMed  PubMed Central  Google Scholar 

  33. Mansouri K, Cariello NF, Korotcov A, Tkachenko V, Grulke CM, Sprankle CS, et al. Open-source QSAR models for pKa prediction using multiple machine learning approaches. J Cheminform. 2019;11(1):60.

    Article  PubMed  PubMed Central  Google Scholar 

  34. Khan K, Baderna D, Cappelli C, Toma C, Lombardo A, Roy K, et al. Ecotoxicological QSAR modeling of organic compounds against fish: application of fragment based descriptors in feature analysis. Aquat Toxicol. 2019;212:162–74.

    Article  CAS  PubMed  Google Scholar 

  35. Gramatica P, Cassani S, Chirico N. QSARINS-chem: Insubria datasets and new QSAR/QSPR models for environmental pollutants in QSARINS. J Comput Chem. 2014;35(13):1036–44.

    Article  CAS  PubMed  Google Scholar 

  36. Yang C, Tarkhov A, Marusczyk J, Bienfait B, Gasteiger J, Kleinoeder T, et al. New publicly available chemical query language, CSRML, to support chemotype representations for application to data mining and modeling. J Chem Inf Model. 2015;55(3):510–28.

    Article  CAS  PubMed  Google Scholar 

  37. Liaw A, Wiener M. Classification and regression by randomForest. R News. 2002;2(3):18–22.

    Google Scholar 

  38. Altman DG, Bland JM. Diagnostic tests. 1: sensitivity and specificity. BMJ. 1994;308(6943):1552.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Klingspohn W, Mathea M, ter Laak A, Heinrich N, Baumann K. Efficiency of different measures for defining the applicability domain of classification models. J Cheminform. 2017;9(1):44.

    Article  PubMed  PubMed Central  Google Scholar 

  40. Gramatica P. Principles of QSAR models validation: internal and external. QSAR Comb Sci. 2007;26(5):694–701.

    Article  CAS  Google Scholar 

  41. Todeschini R, Consonni V. Molecular descriptors for chemoinformatics: volume I: alphabetical listing/volume II: appendices, references: John Wiley & Sons; 2009. https://doi.org/10.1002/9783527628766

  42. D’Amboise M, Bertrand MJ. General index of molecular complexity and chromatographic retention data. J Chromatogr A. 1986;361:13–24.

    Article  CAS  Google Scholar 

  43. Platts JA, Butina D, Abraham MH, Hersey A. Estimation of molecular linear free energy relation descriptors using a group contribution approach. J Chem Inf Comput Sci. 1999;39(5):835–45.

    Article  CAS  Google Scholar 

  44. Jinno K, Kawasaki K. The correlation between molecular polarizability of PAHs and their retention data on various stationary phases in reversed-phase HPLC. Chromatographia. 1984;18(2):103–5.

    Article  CAS  Google Scholar 

  45. Ehrmann BM, Henriksen T, Cech NB. Relative importance of basicity in the gas phase and in solution for determining selectivity in electrospray ionization mass spectrometry. J Am Soc Mass Spectrom. 2008;19(5):719–28.

    Article  CAS  PubMed  Google Scholar 

  46. Hall LH, Mohney B, Kier LB. The electrotopological state: structure information at the atomic level for molecular graphs. J Chem Inf Comput Sci. 1991;31(1):76–82.

    Article  CAS  Google Scholar 

  47. McEachran AD, Sobus JR, Williams AJ. Identifying known unknowns using the US EPA’s CompTox Chemistry Dashboard. Anal Bioanal Chem. 2017;409(7):1729–35.

    Article  CAS  PubMed  Google Scholar 

  48. Richard AM, Huang R, Waidyanatha S, Shinn P, Collins BJ, Thillainadarajah I, et al. The Tox21 10K Compound Library: collaborative chemistry advancing toxicology. Chem Res Toxicol. 2021;34(2):189–216.

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

The authors would like to acknowledge Ralph Hindle from Vogon Laboratory Services along with Tarun Anumol and Craig Marvin from Agilent Technologies, Inc. for assisting with the spectral collection and curation of the ToxCast Screening Library data used in this study. We would also like to thank Katherine Phillips, Katie Paul-Friedman, and Risa Sayre for preliminary conversations surrounding this study.

Funding

The US EPA Office of Research and Development funded and managed the research described here.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Charles N. Lowe.

Ethics declarations

Consent for publication

All authors have consented to publication of this work.

Conflict of interest

The authors declare no competing interests.

Disclaimer

The information in this document has been funded wholly or in part by the US Environmental Protection Agency. It does not signify that the contents necessarily reflect the views of the agency, nor does mention of trade names or commercial products constitute endorsement or recommendation for use. The paper has been subjected to the agency’s review process and approved for publication. Mention of trade names or commercial products does not constitute endorsement or recommendation for use.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lowe, C.N., Isaacs, K.K., McEachran, A. et al. Predicting compound amenability with liquid chromatography-mass spectrometry to improve non-targeted analysis. Anal Bioanal Chem 413, 7495–7508 (2021). https://doi.org/10.1007/s00216-021-03713-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00216-021-03713-w

Keywords

Navigation