Abstract
With the increasing availability of high-resolution mass spectrometers, suspect screening and non-targeted analysis are becoming popular compound identification tools for environmental researchers. Samples of interest often contain a large (unknown) number of chemicals spanning the detectable mass range of the instrument. In an effort to separate these chemicals prior to injection into the mass spectrometer, a chromatography method is often utilized. There are numerous types of gas and liquid chromatographs that can be coupled to commercially available mass spectrometers. Depending on the type of instrument used for analysis, the researcher is likely to observe a different subset of compounds based on the amenability of those chemicals to the selected experimental techniques and equipment. It would be advantageous if this subset of chemicals could be predicted prior to conducting the experiment, in order to minimize potential false-positive and false-negative identifications. In this work, we utilize experimental datasets to predict the amenability of chemical compounds to detection with liquid chromatography-electrospray ionization-mass spectrometry (LC–ESI–MS). The assembled dataset totals 5517 unique chemicals either explicitly detected or not detected with LC–ESI–MS. The resulting detected/not-detected matrix has been modeled using specific molecular descriptors to predict which chemicals are amenable to LC–ESI–MS, and to which form(s) of ionization. Random forest models, including a measure of the applicability domain of the model for both positive and negative modes of the electrospray ionization source, were successfully developed. The outcome of this work will help to inform future suspect screening and non-targeted analyses of chemicals by better defining the potential LC–ESI–MS detectable chemical landscape of interest.
Graphical abstract
Similar content being viewed by others
Data availability
All data generated or analyzed during this study are included in this published article and its supplementary information files.
Code availability
The computer code created for the current study is available from the corresponding author on reasonable request.
References
Wambaugh JF, Setzer RW, Reif DM, Gangwal S, Mitchell-Blackwood J, Arnot JA, et al. High-throughput models for exposure-based chemical prioritization in the ExpoCast Project. Environ Sci Technol. 2013;47(15):8479–88.
Csiszar SA, Meyer DE, Dionisio KL, Egeghy P, Isaacs KK, Price PS, et al. Conceptual framework to extend life cycle assessment using near-field human exposure modeling and high-throughput tools for chemicals. Environ Sci Technol. 2016;50(21):11922–34.
Li L, Westgate JN, Hughes L, Zhang X, Givehchi B, Toose L, et al. A model for risk-based screening and prioritization of human exposure to chemicals from near-field sources. Environ Sci Technol. 2018;52(24):14235–44.
Isaacs KK, Glen WG, Egeghy P, Goldsmith M-R, Smith L, Vallero D, et al. SHEDS-HT: an integrated probabilistic exposure model for prioritizing exposures to chemicals with near-field and dietary sources. Environ Sci Technol. 2014;48(21):12750–9.
Wild CP, Scalbert A, Herceg Z. Measuring the exposome: a powerful basis for evaluating environmental exposures and cancer risk. Environ Mol Mutagen. 2013;54(7):480–99.
Dix DJ, Houck KA, Martin MT, Richard AM, Setzer RW, Kavlock RJ. The ToxCast program for prioritizing toxicity testing of environmental chemicals. Toxicol Sci. 2007;95(1):5–12.
Tice RR, Austin CP, Kavlock RJ, Bucher JR. Improving the human hazard characterization of chemicals: a Tox21 update. Environ Health Perspect. 2013;121(7):756–65.
Hertzberg RP, Pope AJ. High-throughput screening: new technology for the 21st century. Curr Opin Chem Biol. 2000;4(4):445–51.
NRC U. Risk assessment in the federal government: managing the process. National Research Council, Washington DC. 1983;11(3).
Tolonen A, Pelkonen O. Analytical challenges for conducting rapid metabolism characterization for QIVIVE. Toxicology. 2015;332:20–9.
Nicolas CI, Mansouri K, Phillips KA, Grulke CM, Richard AM, Williams AJ, et al. Rapid experimental measurements of physicochemical properties to inform models and testing. Sci Total Environ. 2018;636:901–9.
Wetmore BA, Wambaugh JF, Ferguson SS, Sochaski MA, Rotroff DM, Freeman K, et al. Integration of dosimetry, exposure, and high-throughput screening data in chemical toxicity assessment. Toxicol Sci. 2012;125(1):157–74.
Sobus JR, Grossman JN, Chao A, Singh R, Williams AJ, Grulke CM, et al. Using prepared mixtures of ToxCast chemicals to evaluate non-targeted analysis (NTA) method performance. Anal Bioanal Chem. 2019;411(4):835–51.
Newton SR, McMahen RL, Sobus JR, Mansouri K, Williams AJ, McEachran AD, et al. Suspect screening and non-targeted analysis of drinking water using point-of-use filters. Environ Pollut. 2018;234:297–306.
Schymanski EL, Williams AJ. Open science for identifying “known unknown” chemicals. Environ Sci Technol. 2017;51(10):5357.
Sobus JR, Wambaugh JF, Isaacs KK, Williams AJ, McEachran AD, Richard AM, et al. Integrating tools for non-targeted analysis research and chemical safety evaluations at the US EPA. J Eposure Sci Environ Epidemiol. 2018;28(5):411–26.
Ulrich EM, Sobus JR, Grulke CM, Richard AM, Newton SR, Strynar MJ, et al. EPA’s non-targeted analysis collaborative trial (ENTACT): genesis, design, and initial findings. Anal Bioanal Chem. 2019;411(4):853–66.
McEachran AD, Chao A, Al-Ghoul H, Lowe C, Grulke C, Sobus JR, et al. Revisiting five years of CASMI contests with EPA identification tools. Metabolites. 2020;10(6):260.
Newton SR, Sobus JR, Ulrich EM, Singh RR, Chao A, McCord J, et al. Examining NTA performance and potential using fortified and reference house dust as part of EPA’s Non-Targeted Analysis Collaborative Trial (ENTACT). Anal Bioanal Chem. 2020;412(18):4221–33.
Favreau P, Poncioni-Rothlisberger C, Place BJ, Bouchex-Bellomie H, Weber A, Tremp J, et al. Multianalyte profiling of per- and polyfluoroalkyl substances (PFASs) in liquid commercial products. Chemosphere. 2017;171:491–501.
Reymond J-L, Ruddigkeit L, Blum L, van Deursen R. The enumeration of chemical space. WIREs Comput Mol Sci. 2012;2(5):717–33.
Horai H, Arita M, Kanaya S, Nihei Y, Ikeda T, Suwa K, et al. MassBank: a public repository for sharing mass spectral data for life sciences. J Mass Spectrom. 2010;45(7):703–14.
Dalby A, Nourse JG, Hounshell WD, Gushurst AKI, Grier DL, Leland BA, et al. Description of several chemical structure file formats used by computer programs developed at Molecular Design Limited. J Chem Inf Model. 1992;32(3):244–55.
Cao Y, Charisi A, Cheng L-C, Jiang T, Girke T. ChemmineR: a compound mining framework for R. Bioinformatics. 2008;24(15):1733–4.
R Core Team (2019). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.
Heller SR, McNaught A, Pletnev I, Stein S, Tchekhovskoi D. InChI, the IUPAC international chemical identifier. J Cheminform. 2015;7(1):23.
Williams AJ, Grulke CM, Edwards J, McEachran AD, Mansouri K, Baker NC, et al. The CompTox Chemistry Dashboard: a community data resource for environmental chemistry. J Cheminform. 2017;9(1):61.
Lowe CN, Williams AJ. Enabling high-throughput searches for multiple chemical data using the U.S.-EPA CompTox chemicals dashboard. J Chem Inf Model. 2021;61(2):565–70.
Richard AM, Judson RS, Houck KA, Grulke CM, Volarath P, Thillainadarajah I, et al. ToxCast chemical landscape: paving the road to 21st century toxicology. Chem Res Toxicol. 2016;29(8):1225–51.
Mansouri K, Grulke C, Richard A, Judson R, Williams A. An automated curation procedure for addressing chemical errors and inconsistencies in public datasets used in QSAR modelling. SAR QSAR Environ Res. 2016;27(11):911–37.
Yap CW. PaDEL-descriptor: an open source software to calculate molecular descriptors and fingerprints. J Comput Chem. 2011;32(7):1466–74.
Mansouri K, Grulke CM, Judson RS, Williams AJ. OPERA models for predicting physicochemical properties and environmental fate endpoints. J Cheminform. 2018;10(1):10.
Mansouri K, Cariello NF, Korotcov A, Tkachenko V, Grulke CM, Sprankle CS, et al. Open-source QSAR models for pKa prediction using multiple machine learning approaches. J Cheminform. 2019;11(1):60.
Khan K, Baderna D, Cappelli C, Toma C, Lombardo A, Roy K, et al. Ecotoxicological QSAR modeling of organic compounds against fish: application of fragment based descriptors in feature analysis. Aquat Toxicol. 2019;212:162–74.
Gramatica P, Cassani S, Chirico N. QSARINS-chem: Insubria datasets and new QSAR/QSPR models for environmental pollutants in QSARINS. J Comput Chem. 2014;35(13):1036–44.
Yang C, Tarkhov A, Marusczyk J, Bienfait B, Gasteiger J, Kleinoeder T, et al. New publicly available chemical query language, CSRML, to support chemotype representations for application to data mining and modeling. J Chem Inf Model. 2015;55(3):510–28.
Liaw A, Wiener M. Classification and regression by randomForest. R News. 2002;2(3):18–22.
Altman DG, Bland JM. Diagnostic tests. 1: sensitivity and specificity. BMJ. 1994;308(6943):1552.
Klingspohn W, Mathea M, ter Laak A, Heinrich N, Baumann K. Efficiency of different measures for defining the applicability domain of classification models. J Cheminform. 2017;9(1):44.
Gramatica P. Principles of QSAR models validation: internal and external. QSAR Comb Sci. 2007;26(5):694–701.
Todeschini R, Consonni V. Molecular descriptors for chemoinformatics: volume I: alphabetical listing/volume II: appendices, references: John Wiley & Sons; 2009. https://doi.org/10.1002/9783527628766
D’Amboise M, Bertrand MJ. General index of molecular complexity and chromatographic retention data. J Chromatogr A. 1986;361:13–24.
Platts JA, Butina D, Abraham MH, Hersey A. Estimation of molecular linear free energy relation descriptors using a group contribution approach. J Chem Inf Comput Sci. 1999;39(5):835–45.
Jinno K, Kawasaki K. The correlation between molecular polarizability of PAHs and their retention data on various stationary phases in reversed-phase HPLC. Chromatographia. 1984;18(2):103–5.
Ehrmann BM, Henriksen T, Cech NB. Relative importance of basicity in the gas phase and in solution for determining selectivity in electrospray ionization mass spectrometry. J Am Soc Mass Spectrom. 2008;19(5):719–28.
Hall LH, Mohney B, Kier LB. The electrotopological state: structure information at the atomic level for molecular graphs. J Chem Inf Comput Sci. 1991;31(1):76–82.
McEachran AD, Sobus JR, Williams AJ. Identifying known unknowns using the US EPA’s CompTox Chemistry Dashboard. Anal Bioanal Chem. 2017;409(7):1729–35.
Richard AM, Huang R, Waidyanatha S, Shinn P, Collins BJ, Thillainadarajah I, et al. The Tox21 10K Compound Library: collaborative chemistry advancing toxicology. Chem Res Toxicol. 2021;34(2):189–216.
Acknowledgements
The authors would like to acknowledge Ralph Hindle from Vogon Laboratory Services along with Tarun Anumol and Craig Marvin from Agilent Technologies, Inc. for assisting with the spectral collection and curation of the ToxCast Screening Library data used in this study. We would also like to thank Katherine Phillips, Katie Paul-Friedman, and Risa Sayre for preliminary conversations surrounding this study.
Funding
The US EPA Office of Research and Development funded and managed the research described here.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Consent for publication
All authors have consented to publication of this work.
Conflict of interest
The authors declare no competing interests.
Disclaimer
The information in this document has been funded wholly or in part by the US Environmental Protection Agency. It does not signify that the contents necessarily reflect the views of the agency, nor does mention of trade names or commercial products constitute endorsement or recommendation for use. The paper has been subjected to the agency’s review process and approved for publication. Mention of trade names or commercial products does not constitute endorsement or recommendation for use.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Lowe, C.N., Isaacs, K.K., McEachran, A. et al. Predicting compound amenability with liquid chromatography-mass spectrometry to improve non-targeted analysis. Anal Bioanal Chem 413, 7495–7508 (2021). https://doi.org/10.1007/s00216-021-03713-w
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00216-021-03713-w