Identification and description of the uncertainty, variability, bias and influence in quantitative structure-activity relationships (QSARs) for toxicity prediction

https://doi.org/10.1016/j.yrtph.2019.04.007Get rights and content

Highlights

  • Uncertainties, variabilities and potential areas of bias and influences of QSARs are identified.

  • The creation, description and application of QSARs is evaluated.

  • 13 types of uncertainty, variability, bias and influence of QSARs established.

  • 49 assessment criteria for QSARs are presented.

  • Application of the assessment criteria will improve the uptake and use of QSARs.

Abstract

Improving regulatory confidence in, and acceptance of, a prediction of toxicity from a quantitative structure-activity relationship (QSAR) requires assessment of its uncertainty and determination of whether the uncertainty is acceptable. Thus, it is crucial to identify potential uncertainties fundamental to QSAR predictions. Based on expert review, sources of uncertainties, variabilities and biases, as well as areas of influence in QSARs for toxicity prediction were established. These were grouped into three thematic areas: uncertainties, variabilities, potential biases and influences associated with 1) the creation of the QSAR, 2) the description of the QSAR, and 3) the application of the QSAR, also showing barriers for their use. Each thematic area was divided into a total of 13 main areas of concern with 49 assessment criteria covering all aspects of QSAR development, documentation and use. Two case studies were undertaken on different types of QSARs that demonstrated the applicability of the assessment criteria to identify potential weaknesses in the use of a QSAR for a specific purpose such that they may be addressed and mitigation strategies can be proposed, as well as enabling an informed decision on the adequacy of the model in the considered context.

Graphical abstract

Summary of overarching assessment criteria for the two QSAR Case Studies showing areas of high (red), moderate (yellow) and low (green) uncertainty, variability, bias or influence.

Image 1
  1. Download : Download high-res image (249KB)
  2. Download : Download full-size image

Introduction

To understand the confidence that may be assigned to a prediction, there is a need to assess the underlying model and its suitability to make the prediction in question (Patterson and Whelan, 2017). With regard to the risk assessment of chemicals, many predictions can be made relating to hazard identification and potency as well as exposure assessment. For the prediction of toxicity and data gap filling in particular, the use of Quantitative Structure-Activity Relationship (QSAR) models is a well-established technique (Cronin and Yoon, 2019). QSARs attempt to formalise the relationship between toxicity and chemical structure and properties such that a model may make a prediction, from structure, when data are missing. For the purposes of this paper, the term “QSAR” is taken in its broadest possible sense to include relationships between chemical structure and properties and toxicity that have been formalised into some type of model – this may range in complexity from structural alerts to machine learning, categoric definition of activity and continuous potency, and be based on any type of descriptor or property. Being at the forefront of in silico toxicology for several decades, predictions from QSARs have found use in chemical regulations, such as for adaptation of information requirements in Annex XI of REACH (Spielmann et al., 2011). Despite this, in practice, QSARs are used mostly as supporting information or for screening purposes, although successfully integrated for example in the regulatory assessment practice of drug impurities, to predict mutagenicity according to the ICH M7 guidelines (ICH, 2017). However, for a wider uptake and successful regulatory use, an assessment and user-friendly communication of the confidence in the model and its application is needed in order to enable the user or regulator to make an informed decision upon, and feel comfortable with, its use for the specific purpose in question (Worth, 2010).

Whilst it is not the purpose of this paper to give a full review of QSARs, it is accepted that they stand at the juncture of biology, chemistry and statistics. QSARs for toxicity require (preferably high quality) data to be modelled with regard to appropriate descriptors, parameters and/or properties of a set of chemicals. In theory and practice, any set of high-quality toxicity data for a coherent set of compounds is applicable. Descriptors for the various molecular properties are selected either empirically using mechanistic understanding, for example based on Molecular Initiating or Key Events of Adverse Outcome Pathways (AOPs) (Cronin and Richarz, 2017), or by statistical methods. Over five decades of progress has resulted in a multitude of descriptors of molecular structure and properties that include empirical, quantum chemical, or non-empirical parameters. Whilst empirical descriptors may be measured or estimated and include physico-chemical properties, non-empirical descriptors are typically structural features developed from the knowledge of 2D structure. Statistical methods to develop QSARs are typically either correlative or use pattern recognition approaches. The most common correlative method is regression analysis whereas pattern recognition techniques are varied, often complex and maybe multi-dimensional and non-linear (Cronin and Madden, 2010). Overall, it can be surmised that the QSAR modeller has an enormous number of techniques and approaches to use resulting in a wide diversity of QSAR models. These approaches must be used appropriately to develop models that are robust and fit for purpose; where the purpose is to support regulatory assessment a means of assigning confidence to a prediction is required.

The assignment of confidence to a prediction of toxicity requires the definition and assessment of the model and its suitability to make a prediction for a particular chemical (i.e., whether the chemical falls within the applicability domain) (Netzeva et al., 2005). The definition of a QSAR and discussions regarding the means of assessing their relative quality go back many decades and are well, and extensively, reviewed with regard to toxicity prediction elsewhere (cf. Cronin and Madden, 2010; Cronin et al., 2013). The assessment of the statistical fit of a QSAR and its overall performance are essential components to assess its quality and are ubiquitous in the science. As computational models, it was acknowledged early in the evolution of QSARs that statistical fit must be adequate for the intended purpose, but not too high as to imply over-fitting (Eriksson et al., 2003). Once computational techniques to perform statistical analysis became more widely available, more sophisticated statistical analyses were undertaken and there was a move from statistical fit to the assessment of predictivity of large external test sets, not necessarily considered in the model (Tropsha, 2010). Although an essential component of evaluating a QSAR, statistical veracity is only one part of the process to determine whether a prediction from a QSAR may be acceptable for a particular purpose.

The use of QSARs to make prediction of toxicity to support legislation goes back at least to the 1980s (Cronin and Yoon, 2019; Worth, 2010). However, the modern paradigm of the regulatory use of QSARs for toxicity prediction was defined by the “Setubal Workshop” in 2002 (Jaworska et al., 2003; Cronin et al., 2003a, Cronin et al., 2003b) with a particular focus on preparing for the requirements of the European Union's then upcoming Registration, Evaluation, Authorisation and restriction of CHemicals (REACH) legislation. The Setubal Workshop sparked interest in assessing the limitations and practical considerations in using QSARs in a regulatory setting (e.g., Tong et al., 2005; Tunkel et al., 2005; Worth, 2010), in particular, the issue of uncertainty (e.g., Sahlin et al., 2011; Ball et al., 2014). In order to assess and ensure the quality of a QSAR, aligned to the analysis of statistical performance, was the need for a better appreciation of the requirement for accurate representation of chemical structure, the intrinsic variability of the biological data and mechanistic basis on which a QSAR is based. This knowledge was crystallised into a set of six Principles for the “validation” of QSARs at the Setubal Workshop. These six principles were condensed to five when taken up as the OECD Principles for the Validation of (Q)SARs for Regulatory Use (OECD, 2007; Worth, 2010). The Principles are often used as a framework to describe the content and performance of the model, for instance as applied using the QSAR Model Reporting Format (QMRF). In this context, the Principles, if applied appropriately by the model developer, may allow for formal validation of a model in terms of it being fit-for-purpose, or acceptable, in a regulatory context. The process of formal validation usually requires assessment against pre-defined validation criteria such that a non-expert in the field will have confidence in the use of the method. One of the shortcomings in the use of the Principles has been the failure to evaluate models fully against such criteria. As a result, to a certain extent they are a descriptive, rather than a diagnostic, means of defining and analysing a QSAR. Despite this, during the past 15 years the Principles have served the QSAR community (both users and developers) well, however they were not intended to be used for the breadth of QSAR approaches now available or be implemented with the context of 21st Century Toxicology – some examples of the types of uncertainties and how the science has developed are provided in Table 1. In addition, they do not necessarily allow or describe the ability to reproduce a QSAR or ensure its transparency (Patel et al., 2018; Piir et al., 2018), and have been applied in different ways and with different levels of detail by QSAR developers. As such, there is an opportunity to broaden the Principles and incorporate newer thinking around toxicological problems. For instance, toxicology is moving to a more considered use of information with an emphasis on building weight of evidence (WoE) from individual lines of evidence. As part of this, there has been a growing emphasis on describing uncertainty(ies) associated with a model in an attempt to qualify, or even quantify, the areas where more information may be beneficial (Patterson and Whelan, 2017). In addition, there is a growing prominence of broader topics such as Good Computer Modelling Practice (GCMP) (Judson, 2009; Judson et al., 2015) as well as the understanding on how bias and variability affect a model. In the broader context, the needs for all types of models, in order to use them to make decisions, have been defined with checklists to ensure model users and developers have considered the most significant factors for success (Calder et al., 2018). Thus, the assignment of the confidence in a model is dependent to a large extent on the identification of uncertainties, variability and biases and understanding, for QSARs, of how they may affect the prediction or the decision made based on the prediction, which may depend on the context considered. This information is necessary for the decision-maker to make an informed evaluation taking into account the potential limitations and, for example, an adequate risk-benefit evaluation.

There are various definitions of uncertainty with regard to toxicological assessment, it is beyond the scope or possibility of this paper to review all of these, or indeed to standardise them, but some key definitions include the following. The World Health Organisation International Programme on Chemical Safety (WHO IPCS) gave a general definition of uncertainty as “imperfect knowledge concerning the present or future state of an organism, system, or (sub)population under consideration” (WHO IPCS, 2004). Relevant to this paper, the WHO IPCS (2004) definition was refined specifically to hazard characterisation as"lack of knowledge regarding the “true” value of a quantity, lack of knowledge regarding which of several alternative model representations best describes a system of interest, or lack of knowledge regarding which probability distribution function and its specification should represent a quantity of interest” (WHO IPCS, 2017). ECHA (2012) builds on the WHO IPCS definition stating “Uncertainty can be caused by limitations in knowledge (e.g. limited availability of empirical information), as well as biases or imperfections in the instruments, models or techniques used. An example is an emission estimate that is based on a reasonable-worst case assumption.”. Whilst these (and other) formal definitions are not discounted, this paper has preferentially considered the possibly broader terminology proposed by EFSA (2018a) which defined uncertainty as “all types of limitations in available knowledge that affect the range and probability of possible answers to an assessment question” (EFSA, 2018a). The EFSA Guidance is based around identifying, assessing, describing and, in some cases, quantifying uncertainty. There have been a variety of approaches that have attempted to define uncertainty in toxicological QSAR including those based on epistemological and other analyses (Vallverdu, 2012) and various statistical approaches (e.g., Sahlin, 2013; Sahlin et al., 2013; Sahlin, 2015). Aligned to the concept of uncertainty is its quantification and the fact that increasingly robust quantitative uncertainty analysis frameworks have been developed which have provided a unified framework for hazard characterisation (WHO IPCS, 2017; Chiu and Slob (2015). It is acknowledged that this study does not itself attempt full quantification, but that there is a growing need for it to enable decisions regarding the acceptability of a prediction to be made. Whilst the areas of uncertainties relating to read-across have been identified (Schultz et al., 2019), and a checklist of elements to consider as part of an expert review of QSARs has been developed (Myatt et al., 2018), as yet there has been no coherent mapping or definition of uncertainty with regard to QSAR models.

As with uncertainty, there are various definitions of variability (cf. ECHA, 2012; EFSA, 2018b; NRC, 2009; US EPA, 2001 amongst many others). Again, this paper does not propose a formal definition of variability but has taken it to refer, in a broad sense, to an actual variation or heterogeneity that can be measured or assessed in some manner, however, the uncertainty related to it may be reduced with more information, i.e. its better characterisation in common with EFSA (2018b). This paper also considered areas of bias and influence. Bias can be defined as the possibility of introducing systematic error in the results (e.g., for the purposes of this study, a prediction) resulting from methodological criteria (Higgins and Green, 2008) with a number of approaches of measuring it (cf. Hooijmans et al., 2014; Krauth et al., 2014). For this study influence implies any aspect (particularly not relating to the algorithm or data behind a model) that made a particular model preferable to another e.g. a cognitive bias (cf. Arnott, 2006), thus meaning that other predictions could have been made which may have had higher confidence. In the context of bias and influence in toxicological QSAR, these concepts have been taken to mean any direct or indirect aspects and/or motivations relating to the development, use or interpretation of a model that could be subject to change in a different context or situation – in other words, areas where human decisions have affected the model or use of a model.

The assessment of uncertainties, variabilities, biases and areas of influence of QSARs will undoubtedly allow for a more didactic and context dependent evaluation of their use for toxicity prediction. Overall, this would allow to increase confidence in QSAR models and, as a consequence, further their uptake and use in practice. It is not the purpose of the proposed scheme in this paper to consider any uncertainties, variabilities, biases or influences together or to combine in some way to provide an overall score – this would require a more mathematical approach. However, it is intended that a review of a QSAR model according to the assessment criteria defined will allow a user of a model and a user of a prediction of a model to identify any aspect where confidence may be lacking and make an assessment as to whether this is acceptable for the decision to be made (e.g., is the level of confidence sufficient to make a specific regulatory decision).

The aim, therefore, of this paper was to identify and describe the areas of uncertainty, variability, bias and influence with regard to the prediction of toxicity by QSARs, namely the issues that affect the development, description and utilisation of (Q)SARs. A list of criteria was compiled for each of these three areas that can be applied systematically in order to identify key uncertainties with a QSAR model. These assessment criteria were applied to two published QSAR studies in order to illustrate their utility to identify areas of high uncertainty, variability, bias and influence, allowing the confidence in the model to be evaluated and for mitigation strategies to be formulated.

Section snippets

Identification of Assessment Criteria for the Uncertainties, Variabilities and Areas of Potential Bias and Influence in a QSAR for Toxicity Prediction

An expert review was undertaken of various types of QSAR analyses for the prediction of toxicity. The specific QSARs are not listed here but drew on the experience of the authors. The review included published and unpublished QSARs, as well as free-to-use and with-payment computational products. From the review based on the experience of the authors, types of uncertainties, variabilities, bias and influences on the development and use of a QSAR for toxicity prediction were compiled and

Results and discussion

Toxicology is a science based on measurement, description and analysis of experimental findings. As a scientific discipline it inevitably relies on interpretation of evidence to make a decision. With the paucity of resources available for toxicological assessment, much has been placed into maximising the possible information that may be obtained without recourse to animal testing and/or by utilising existing data. Part of this process of maximising the value and utility of data has been to

Conclusions

The uncertainties and areas of variability, bias and influence of QSAR models have been identified and a set of assessment criteria to evaluate QSAR models is presented. In total, 49 assessment criteria have been defined to account for aspects of the data behind the model and the model development approach, the description of the model and its application. The criteria allow for the identification of areas of high uncertainty or other issues that may be of concern for the confidence in a

Disclaimer

The views expressed are solely those of the authors and the contents of this manuscript do not necessarily represent the views or position of the European Commission.

Declaration of interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

The detailed, knowledgeable and extremely helpful comments from the anonymous reviewers assisted greatly in the improvement of this manuscript. The authors thank them for their considerable efforts and contribution.

References (62)

  • T.W. Schultz et al.

    Assessing uncertainty in read-across: questions to evaluate toxicity predictions based on knowledge gained from case studies

    Comput. Toxicol.

    (2019)
  • D. Arnott

    Cognitive biases and decision support systems development: a design science approach

    Inf. Syst. J.

    (2006)
  • N. Ball et al.

    Toward good read-across practice (GRAP) guidance

    ALTEX

    (2016)
  • M. Calder et al.

    Computational modelling for decision-making: where, why, what, who and how

    R. Soc. Open Sci.

    (2018)
  • W.A. Chiu et al.

    A unified probabilistic framework for dose–response assessment of human health effects

    Environ. Health Perspect.

    (2015)
  • M.T.D. Cronin et al.

    Relationship between Adverse Outcome Pathways and chemistry-based in silico models to predict toxicity

    Appl. in Vitro Toxicol.

    (2017)
  • M.T.D. Cronin et al.

    Use of QSARs in international decision-making frameworks to predict health effects of chemical substances

    Environ. Health Perspect.

    (2003)
  • M.T.D. Cronin et al.

    Use of QSARs in international decision-making frameworks to predict ecologic effects and environmental fate of chemical substances

    Environ. Health Perspect.

    (2003)
  • M.T.D. Cronin et al.

    Chemical Toxicity Prediction: Category Formation and Read-Across

    (2013)
  • ECHA (European Chemicals Agency)

    Guidance on Information Requirements and Chemical Safety Assessment

    (2011)
  • ECHA (European Chemicals Agency)

    Guidance on Information Requirements and Chemical Safety Assessment

    (2012)
  • ECHA (European Chemicals Agency)

    Read-across Assessment Framework (RAAF)

    (2017)
  • EFSA (European Food Safety Authority) Scientific Committee et al.

    Guidance on uncertainty analysis in scientific assessments

    EFSA J.

    (2018)
  • EFSA (European Food Safety Authority) Scientific Committee et al.

    Scientific opinion on the principles and methods behind EFSA's guidance on uncertainty analysis in scientific assessment

    EFSA J.

    (2018)
  • C.M. Ellison et al.

    Definition of the structural domain of the baseline non-polar narcosis model for Tetrahymena pyriformis

    SAR QSAR Environ. Res.

    (2008)
  • L. Eriksson et al.

    Methods for reliability and uncertainty assessment and for applicability evaluations of classification- and regression-based QSARs

    Environ. Health Perspect.

    (2003)
  • T. Hanser et al.

    Applicability domain: towards a more formal definition

    SAR QSAR Environ. Res.

    (2016)
  • M. Hewitt et al.

    Repeatability analysis of the Tetrahymena pyriformis population growth impairment assay

    SAR QSAR Environ. Res.

    (2011)
  • J.P. Higgins et al.

    Cochrane Handbook for Systematic Reviews of Interventions

    (2008)
  • C.R. Hooijmans et al.

    SYRCLE's risk of bias tool for animal studies

    BMC Med. Res. Methodol.

    (2014)
  • Cited by (0)

    View full text