Comparing multiple statistical methods for inverse prediction in nuclear forensics applications

https://doi.org/10.1016/j.chemolab.2017.10.010Get rights and content

Highlights

  • Review of several statistical methods for inverse prediction.

  • Comparing independent prediction methods to assess credibility of predictions.

  • Agreement amongst several methods indicates predictions are credible.

  • Disagreement indicates predictions as not credible.

Abstract

Forensic science seeks to predict source characteristics using measured observables. Statistically, this objective can be thought of as an inverse problem where interest is in the unknown source characteristics or factors (X) of some underlying causal model producing the observables or responses (Y=g(X)+error). This paper reviews several statistical methods for use in inverse problems and demonstrates that comparing results from multiple methods can be used to assess predictive capability. Motivation for assessing inverse predictions comes from the desired application to historical and future experiments involving nuclear material production for forensics research in which inverse predictions, along with an assessment of predictive capability, are desired.

Four methods are reviewed in this article. Two are forward modeling methods and two are direct inverse modeling methods. Forward modeling involves building a forward casual model of the responses (Y) as a function of the source characteristics (X) using content knowledge and data ideally obtained from a well-designed experiment. The model is then inverted to produce estimates of X given a new set of responses. Direct inverse modeling involves building prediction models of the source characteristics (X) as a function of the responses (Y) – subverting estimation of any underlying causal relationship. Through use of simulations and a data set from an actual plutonium production experiment, it is shown that agreement of predictions across methods is an indication of strong predictive capability, whereas disagreement indicates the current data are not conducive to making good predictions.

Introduction

The U.S. Government is conducting a series of experiments at the U.S. National Laboratories for nuclear forensics research. The objective is to assess the ability to infer source characteristics, ranging from material origin to production parameters, of interdicted special nuclear material from the nuclear signature, or measured observables. Statistically, this objective can be thought of as an inverse problem where the source characteristics (X) of a material are predicted from the measured observables (Y). Additionally, it is desired to assess confidence in the prediction using, for example, statistical confidence intervals or a probability distribution of plausible X values. Beyond nuclear forensics applications [1], [2], [3], inverse prediction is of interest in more general forensic science activities such as estimating time of death of homicide victims [4], [5]. Inverse prediction also spans a wide range of areas outside of forensics including computer model calibration [6], [7], chemometrics [8], [9], and geophysical applications [10], [11].

Inverse prediction methods can be divided into two categories: 1) causal (forward) modeling and 2) direct inverse modeling. Causal models attempt to capture the notion that ‘Y is caused by X’ and are often expressed in terms of a low-order polynomial, which can be thought of as a Taylor series approximation to the true but unknown underlying relationship. Alternatively, theory regarding the causal relationship can be used to develop the mathematical form of the model. Assuming a q-dimensional response Y=(y1,y2,,yq) and a p-dimensional set of input factors X=(x1,x2,,xp), the relationship between the responses and factors can be expressed as Y=g(X;θ)+ε, where g represents the true underlying relationship, θ is a vector of unknown parameters, and ε is a random vector that captures the noise in the observed data. The sign equating Y to g(X;θ)+ε means ‘equal in distribution’. For example, it is common to assume that ε is a mean zero multivariate normal random vector. This implies Y is multivariate normal with the same covariance as ε and mean g(X;θ). Direct inverse modeling begins by building a model for X as a function of Y: X=h(Y;γ)+η, where h represents the true underlying relationship, γ is a set of parameters, and η is a random vector capturing the noise. Again, the equality sign means equal in distribution. If η is assumed to be mean zero multivariate normal, then X is also multivariate normal with the same covariance as η and mean h(Y;γ).

Regardless of the approach taken, the goal is to estimate an unknown X*=(x1*,x2*,,xp*) that most likely produced a new observation Y*=(y1*,y2*,,yq*). The relationship between X and Y (i.e., g or h) must be estimated using calibration data. The direct inverse approach is often used in practice because of the convenience of building a model that directly predicts X with common software packages rather than having to invert forward models. One drawback to this approach is that a standard regression assumption is violated by commonly used software: the predictors (Y) in these models are measured with error while the responses (X) are often measured with negligible error [12]. Additionally, inference on the causal relationship is lost with the direct inverse approach. However, in some applications, such as instrument calibration [13], [14], causal inference is not a priority and there is an indication the direct inverse method is more efficient [15], [16], [17]. Dimension reduction may be needed when using the direct approach because the number of responses q is often much larger than the number of inputs p [2]. For example, in near-infrared reflectance (NIR) applications [8], measurements at hundreds (or thousands) of wavelengths constitute the multivariate response, whereas the input factors correspond to just a handful of constituents in the material being measured. As alluded to in the application, the two methods will typically handle missing data in the responses and inputs differently. Additional references discussing the properties of the two methods include: [18], [19], [20].

In practice, multiple methods (both forward and inverse) can be used independently to predict source characteristics X of a particular material of interest. In the machine learning community, predication accuracy of many methods is often compared. The best performing method is then chosen and used for future prediction. It is often difficult to assess if some other prediction method (not originally assessed) would improve prediction. Likewise, there may be a set of responses (Y) not originally used, or known about, that could have improved prediction of X. This paper asserts that while choosing the best method amongst a set is useful, analysts should routinely assess the level of consistency across the set of chosen prediction methods. Each method comes with a set of assumptions that are often difficult to formally justify or verify. Consistency in prediction performance amongst several methods with differing assumptions provides additional confidence that the results are robust over those assumptions. Lack of consistency should be investigated thoroughly to understand why the certain methods do not do as well. One common reason for poorer performance is the exclusion of important variables in the model. Once these are identified and used, it is often the case that each method's prediction performance improves. While this may not seem as important to the machine learning community where an exhaustive set of variables is often available and variable selection techniques can be utilized; the situation is different for nuclear forensic applications. Material can be analyzed in many ways, providing a large list of potential predictors. However, many measurements are expensive to take and no one knows a priori if the current list is sufficient. Lack of consistency is a sign that subject matter experts should be consulted about possible unmeasured characteristics that might prove useful despite their additional cost.

Undoubtedly, consistency between several methods could be assessed in many ways; this paper takes a pragmatic approach. Assume each method is assessed with a common prediction capability metric (e.g. root mean squared error (RMSE) on a hold-out set). Consider that this same metric is also calculated for predictions made using no modeling information. For example, the magnitude and variation of the metrics across the different methods can be compared to the prior mean prediction. Small magnitudes and variation (relative to the prior mean prediction) are signs of both consistency and good prediction. On the other hand, large variations represent inconsistency; large magnitudes indicate the models do not improve prediction capability.

The remainder of this article is organized as follows. Section 2 reviews four inverse prediction methods, two causal and two direct inverse approaches. The two causal approaches are frequentist and Bayesian normal theory linear models. The two direct inverse approaches, principal components regression (PCR) and partial least squares regression (PLSR), are dimension reduction techniques. These four methods represent a common but clearly non-exhaustive set of methods. The set of methods to explore is ultimately an analyst choice; for the applications discussed here, this set of four has been useful. Section 3 presents a demonstration study to describe how predictions from several methods can be used to assess predictive capability. Section 4 applies the methods to a real nuclear forensics data set where inverse prediction is difficult due to an insufficient set of available discriminating responses. Section 5 provides a discussion.

Section snippets

Methods

This section reviews two causal and two direct inverse modeling methods for inverse prediction. The two causal modeling approaches use frequentist and Bayesian linear models. The frequentist and Bayesian methods differ philosophically in how the unknown parameters of a statistical model are estimated and how predictions of the input variables are made. In this work, both methods assume the data, given unknown parameters, is generated from a parametric probability distribution. This is known as

Demonstration study

This section presents a study to exercise each of the methods described above and demonstrates how they can be used collectively to assess the capability of inverse predictions. Training data are used to fit (calibrate) each model and make inverse predictions on test data. Sixteen different responses are generated for the training and test data. The means of the response are of the form (2) with coefficients given in Table 1 and contours in Fig. 2. The surfaces are sets of rotations of one

Application: Pu(III) oxalate precipitation

In this section, inverse predictions of processing parameters using experimental data from Ref. [34] are made. Burney characterizes the effects of several precipitation factors on particle size and other morphological properties of calcined plutonium powder produced using the reverse-strike precipitation method. It is concluded that the set of responses available is not sufficiently informative and discriminating for inverse prediction. The diagnostics used to make this conclusion are also

Discussion

This paper reviews several methods for inverse prediction and demonstrated that they can be compared to help assess predictive performance. In general, consistently good predictions across methods increases confidence in the robustness of the predictions and that the responses provide adequate discriminating ability. On the other hand, poor and inconsistent predictions across several methods provides strong evidence that the responses do not provide adequate discriminating ability and

References (37)

  • D.M. Haaland et al.

    Partial least-squares methods for spectral analyses. 1. relation to other quantitative calibration methods and the extraction of qualitative information

    Anal. Chem.

    (1988)
  • N. Sun
    (1999)
  • P.A. Parker et al.

    The prediction properties of classical and inverse regression for the simple linear calibration problem

    J. Qual. Technol.

    (2010)
  • R. Krutchkoff

    Classical and inverse regression methods of calibration

    Technometrics

    (1967)
  • R.G. Krutchkoff

    Classical and inverse regression methods of calibration in extrapolation

    Technometrics

    (1969)
  • V. Centner et al.

    Inverse calibration predicts better than classical calibration

    Fresenius’ J. Anal. Chem.

    (1998)
  • J. Tellinghuisen

    Inverse vs. classical calibration for small data sets

    Fresenius’ J. Anal. Chem.

    (2000)
  • N. Kannan et al.

    A comparison of classical and inverse estimators in the calibration problem

    Commun. Statistics – Theory Methods

    (2007)
  • Cited by (15)

    • A comparison of experimental designs for calibration

      2020, Chemometrics and Intelligent Laboratory Systems
      Citation Excerpt :

      Note that some degree of correlation among response variables is expected as the responses may consist of various measurements from the same final product. For instance, with the chemometrics application of particle morphology, multiple measurements of the same output are taken to describe irregular forms. [5] caution that substantial relationships between the responses may hinder inference on inputs.

    • Nuclear forensics

      2020, Handbook of Radioactivity Analysis: Volume 2: Radioanalytical Applications
    View all citing articles on Scopus
    View full text