Comparing multiple statistical methods for inverse prediction in nuclear forensics applications

doi:10.1016/j.chemolab.2017.10.010

Chemometrics and Intelligent Laboratory Systems

Volume 175, 15 April 2018, Pages 116-129

https://doi.org/10.1016/j.chemolab.2017.10.010 Get rights and content

Highlights

•
Review of several statistical methods for inverse prediction.
•
Comparing independent prediction methods to assess credibility of predictions.
•
Agreement amongst several methods indicates predictions are credible.
•
Disagreement indicates predictions as not credible.

Abstract

Forensic science seeks to predict source characteristics using measured observables. Statistically, this objective can be thought of as an inverse problem where interest is in the unknown source characteristics or factors (X) of some underlying causal model producing the observables or responses ( $Y = g (X) + e r r o r$ ). This paper reviews several statistical methods for use in inverse problems and demonstrates that comparing results from multiple methods can be used to assess predictive capability. Motivation for assessing inverse predictions comes from the desired application to historical and future experiments involving nuclear material production for forensics research in which inverse predictions, along with an assessment of predictive capability, are desired.

Four methods are reviewed in this article. Two are forward modeling methods and two are direct inverse modeling methods. Forward modeling involves building a forward casual model of the responses (Y) as a function of the source characteristics (X) using content knowledge and data ideally obtained from a well-designed experiment. The model is then inverted to produce estimates of X given a new set of responses. Direct inverse modeling involves building prediction models of the source characteristics $(X)$ as a function of the responses (Y) – subverting estimation of any underlying causal relationship. Through use of simulations and a data set from an actual plutonium production experiment, it is shown that agreement of predictions across methods is an indication of strong predictive capability, whereas disagreement indicates the current data are not conducive to making good predictions.

Introduction

The U.S. Government is conducting a series of experiments at the U.S. National Laboratories for nuclear forensics research. The objective is to assess the ability to infer source characteristics, ranging from material origin to production parameters, of interdicted special nuclear material from the nuclear signature, or measured observables. Statistically, this objective can be thought of as an inverse problem where the source characteristics (X) of a material are predicted from the measured observables (Y). Additionally, it is desired to assess confidence in the prediction using, for example, statistical confidence intervals or a probability distribution of plausible X values. Beyond nuclear forensics applications [1], [2], [3], inverse prediction is of interest in more general forensic science activities such as estimating time of death of homicide victims [4], [5]. Inverse prediction also spans a wide range of areas outside of forensics including computer model calibration [6], [7], chemometrics [8], [9], and geophysical applications [10], [11].

Inverse prediction methods can be divided into two categories: 1) causal (forward) modeling and 2) direct inverse modeling. Causal models attempt to capture the notion that ‘Y is caused by X’ and are often expressed in terms of a low-order polynomial, which can be thought of as a Taylor series approximation to the true but unknown underlying relationship. Alternatively, theory regarding the causal relationship can be used to develop the mathematical form of the model. Assuming a q-dimensional response $Y = {(y_{1}, y_{2}, \dots, y_{q})}^{⊤}$ and a p-dimensional set of input factors $X = {(x_{1}, x_{2}, \dots, x_{p})}^{⊤}$ , the relationship between the responses and factors can be expressed as $Y = g (X; θ) + ε$ , where g represents the true underlying relationship, θ is a vector of unknown parameters, and ε is a random vector that captures the noise in the observed data. The sign equating Y to $g (X; θ) + ε$ means ‘equal in distribution’. For example, it is common to assume that ε is a mean zero multivariate normal random vector. This implies Y is multivariate normal with the same covariance as ε and mean $g (X; θ)$ . Direct inverse modeling begins by building a model for X as a function of Y: $X = h (Y; γ) + η$ , where h represents the true underlying relationship, γ is a set of parameters, and η is a random vector capturing the noise. Again, the equality sign means equal in distribution. If η is assumed to be mean zero multivariate normal, then X is also multivariate normal with the same covariance as η and mean $h (Y; γ)$ .

Regardless of the approach taken, the goal is to estimate an unknown $X^{*} = {(x_{1}^{*}, x_{2}^{*}, \dots, x_{p}^{*})}^{⊤}$ that most likely produced a new observation $Y^{*} = {(y_{1}^{*}, y_{2}^{*}, \dots, y_{q}^{*})}^{⊤}$ . The relationship between X and Y (i.e., g or h) must be estimated using calibration data. The direct inverse approach is often used in practice because of the convenience of building a model that directly predicts X with common software packages rather than having to invert forward models. One drawback to this approach is that a standard regression assumption is violated by commonly used software: the predictors (Y) in these models are measured with error while the responses (X) are often measured with negligible error [12]. Additionally, inference on the causal relationship is lost with the direct inverse approach. However, in some applications, such as instrument calibration [13], [14], causal inference is not a priority and there is an indication the direct inverse method is more efficient [15], [16], [17]. Dimension reduction may be needed when using the direct approach because the number of responses q is often much larger than the number of inputs p [2]. For example, in near-infrared reflectance (NIR) applications [8], measurements at hundreds (or thousands) of wavelengths constitute the multivariate response, whereas the input factors correspond to just a handful of constituents in the material being measured. As alluded to in the application, the two methods will typically handle missing data in the responses and inputs differently. Additional references discussing the properties of the two methods include: [18], [19], [20].

In practice, multiple methods (both forward and inverse) can be used independently to predict source characteristics X of a particular material of interest. In the machine learning community, predication accuracy of many methods is often compared. The best performing method is then chosen and used for future prediction. It is often difficult to assess if some other prediction method (not originally assessed) would improve prediction. Likewise, there may be a set of responses (Y) not originally used, or known about, that could have improved prediction of X. This paper asserts that while choosing the best method amongst a set is useful, analysts should routinely assess the level of consistency across the set of chosen prediction methods. Each method comes with a set of assumptions that are often difficult to formally justify or verify. Consistency in prediction performance amongst several methods with differing assumptions provides additional confidence that the results are robust over those assumptions. Lack of consistency should be investigated thoroughly to understand why the certain methods do not do as well. One common reason for poorer performance is the exclusion of important variables in the model. Once these are identified and used, it is often the case that each method's prediction performance improves. While this may not seem as important to the machine learning community where an exhaustive set of variables is often available and variable selection techniques can be utilized; the situation is different for nuclear forensic applications. Material can be analyzed in many ways, providing a large list of potential predictors. However, many measurements are expensive to take and no one knows a priori if the current list is sufficient. Lack of consistency is a sign that subject matter experts should be consulted about possible unmeasured characteristics that might prove useful despite their additional cost.

Undoubtedly, consistency between several methods could be assessed in many ways; this paper takes a pragmatic approach. Assume each method is assessed with a common prediction capability metric (e.g. root mean squared error (RMSE) on a hold-out set). Consider that this same metric is also calculated for predictions made using no modeling information. For example, the magnitude and variation of the metrics across the different methods can be compared to the prior mean prediction. Small magnitudes and variation (relative to the prior mean prediction) are signs of both consistency and good prediction. On the other hand, large variations represent inconsistency; large magnitudes indicate the models do not improve prediction capability.

The remainder of this article is organized as follows. Section 2 reviews four inverse prediction methods, two causal and two direct inverse approaches. The two causal approaches are frequentist and Bayesian normal theory linear models. The two direct inverse approaches, principal components regression (PCR) and partial least squares regression (PLSR), are dimension reduction techniques. These four methods represent a common but clearly non-exhaustive set of methods. The set of methods to explore is ultimately an analyst choice; for the applications discussed here, this set of four has been useful. Section 3 presents a demonstration study to describe how predictions from several methods can be used to assess predictive capability. Section 4 applies the methods to a real nuclear forensics data set where inverse prediction is difficult due to an insufficient set of available discriminating responses. Section 5 provides a discussion.

Section snippets

Methods

This section reviews two causal and two direct inverse modeling methods for inverse prediction. The two causal modeling approaches use frequentist and Bayesian linear models. The frequentist and Bayesian methods differ philosophically in how the unknown parameters of a statistical model are estimated and how predictions of the input variables are made. In this work, both methods assume the data, given unknown parameters, is generated from a parametric probability distribution. This is known as

Demonstration study

This section presents a study to exercise each of the methods described above and demonstrates how they can be used collectively to assess the capability of inverse predictions. Training data are used to fit (calibrate) each model and make inverse predictions on test data. Sixteen different responses are generated for the training and test data. The means of the response are of the form (2) with coefficients given in Table 1 and contours in Fig. 2. The surfaces are sets of rotations of one

Application: Pu(III) oxalate precipitation

In this section, inverse predictions of processing parameters using experimental data from Ref. [34] are made. Burney characterizes the effects of several precipitation factors on particle size and other morphological properties of calcined plutonium powder produced using the reverse-strike precipitation method. It is concluded that the set of responses available is not sufficiently informative and discriminating for inverse prediction. The diagnostics used to make this conclusion are also

Discussion

This paper reviews several methods for inverse prediction and demonstrated that they can be compared to help assess predictive performance. In general, consistently good predictions across methods increases confidence in the robustness of the predictions and that the responses provide adequate discriminating ability. On the other hand, poor and inconsistent predictions across several methods provides strong evidence that the responses do not provide adequate discriminating ability and

References (37)

C. Anderson-Cook et al.
Design of experiments and data analysis challenges in calibration for forensics applications
Chemom. Intelligent Laboratory Syst.
(2015)
V.M. Krasnopolsky et al.
Some neural network applications in environmental sciences. part i: forward and inverse problems in geophysical remote measurements
Neural Netw.
(2003)
C.M. Anderson-Cook et al.
Response surface design evaluation and comparison
J. Stat. Plan. Inference
(2009)
C.M. Anderson-Cook et al.
Statistical analysis for nuclear forensics experiments, statistical analysis and data mining
ASA Data Sci. J.
(2015)
K.J. Moody et al.
Nuclear Forensic Analysis
(2014)
J. Wells et al.
Estimating maggot age from weight using inverse prediction
J. Forensic Sci.
(1995)
E.N. Ieno et al.
Analysing forensic entomology data using additive mixed effects modelling
M.C. Kennedy et al.
Bayesian calibration of computer models
J. R. Stat. Soc. Ser. B Stat. Methodol.
(2001)
L. Biegler et al.
(2011)
H. Martens et al.
Multivariate Calibration
(1992)

D.M. Haaland et al.

Partial least-squares methods for spectral analyses. 1. relation to other quantitative calibration methods and the extraction of qualitative information

Anal. Chem.

(1988)

N. Sun

(1999)

P.A. Parker et al.

The prediction properties of classical and inverse regression for the simple linear calibration problem

J. Qual. Technol.

(2010)

R. Krutchkoff

Classical and inverse regression methods of calibration

Technometrics

(1967)

R.G. Krutchkoff

Classical and inverse regression methods of calibration in extrapolation

Technometrics

(1969)

V. Centner et al.

Inverse calibration predicts better than classical calibration

Fresenius’ J. Anal. Chem.

(1998)

J. Tellinghuisen

Inverse vs. classical calibration for small data sets

Fresenius’ J. Anal. Chem.

(2000)

N. Kannan et al.

A comparison of classical and inverse estimators in the calibration problem

Commun. Statistics – Theory Methods

(2007)

Cited by (15)

Review of multi-faceted morphologic signatures of actinide process materials for nuclear forensic science
2024, Journal of Nuclear Materials
Particle morphology is an emerging signature that has the potential to identify the processing history of unknown nuclear materials. Using readily available scanning electron microscopes (SEM), the morphology of nearly any solid material can be measured within hours. Coupled with robust image analysis and classification methods, the morphological features can be quantified and support identification of the processing history of unknown nuclear materials. The viability of this signature depends on developing databases of morphological features, coupled with a rapid data analysis and accurate classification process. With developed reference methods, datasets, and throughputs, morphological analysis can be applied within days to (i) interdicted bulk nuclear materials (gram to kilogram quantities), and (ii) trace amounts of nuclear materials detected on swipes or environmental samples. This review aims to develop validated and verified analytical strategies for morphological analysis relevant to nuclear forensics.
Investigation of process history and underlying phenomena associated with the synthesis of plutonium oxides using Vector Quantizing Variational Autoencoder
2023, Chemometrics and Intelligent Laboratory Systems
Accurate, high-throughput, and unbiased analysis of plutonium oxide particles is necessary for analysis of the underlying phenomena associated with the process parameters involved in its synthesis. Compared to qualitative and taxonomic descriptors, quantitative descriptors of particle morphology using scanning electron microscopy (SEM) have shown success in analyzing process parameters of uranium oxides. Among other candidates, a neural network called Vector Quantized Variational Autoencoder (VQ-VAE) has shown the ability to quantitatively describe particle morphology with $>$ 85% accuracy from uranium oxide processing routes. In this study, we utilized VQ-VAE to analyze synthesized plutonium dioxide (PuO₂) particles to investigate the underlying phenomena and predict its process parameters. The surface morphology of PuO₂ powders calcined from plutonium oxalates precipitated under varying synthesis conditions related to concentrations, temperature, addition and digestion times, precipitant feed, and strike order was performed using SEM. A pipeline was developed to extract and quantify useful images of individual particles with VQ-VAE which was followed by further reduction of the dimensionality of the feature space using a bottlenecking neural network fit to perform multiple classification tasks simultaneously. The reduced feature space could predict process parameters for a single particle with $>$ 80% accuracy for some parameters. It also demonstrated potential in successfully grouping particles with similar surface morphology. The clustering and classification results revealed valuable information regarding the chemical process parameters that predominantly influence the PuO₂ particle morphology, which includes strike order and oxalic acid feed stock, respectively. Repeating the analysis with multiple particles improved the classification accuracy of each process parameter compared to results yielded using single particle, with statistically significant results being yielded with as few as four particles.
A comparison of experimental designs for calibration
2020, Chemometrics and Intelligent Laboratory Systems
Citation Excerpt :
Note that some degree of correlation among response variables is expected as the responses may consist of various measurements from the same final product. For instance, with the chemometrics application of particle morphology, multiple measurements of the same output are taken to describe irregular forms. [5] caution that substantial relationships between the responses may hinder inference on inputs.
The impact of experimental design choice on the performance of statistical calibration is largely unknown. Calibration is a technique that uses available experimental data to model the relationship between input and response variables to ultimately infer inputs based on newly observed response values. The purpose of this article is to investigate the performance of several experimental designs with regards to inverse prediction via a comprehensive simulation study. Specifically, we compare several design types including traditional response surface designs, algorithmically generated variance optimal designs, and space-filling designs. Results indicate that the choice of design has an impact on calibration performance and provides overall support for the use of I-optimal designs.
Nuclear forensics
2020, Handbook of Radioactivity Analysis: Volume 2: Radioanalytical Applications
Nuclear forensics, the analysis of interdicted nuclear material to provide information for both law enforcement and national intelligence, remains an area of active interest for both scientists and public policy experts. From the technical perspective, nuclear forensics requires expertise in the analysis of nuclear materials, as well as expertise in the characteristics imparted on those materials by manufacturing processes in the civilian and military nuclear fuel cycles. Radiometric analysis techniques play a significant role in analyzing these interdicted materials, providing a means for both quick, nondestructive analysis of the material in the field and highly sensitive analyses for short- and medium-lived isotopes in the laboratory. Nuclear forensic analysis also utilizes nonradiometric materials analysis techniques, such as mass spectrometry and scanning electron microscopy, as well as conventional forensic techniques. This chapter will describe the field of nuclear forensics, working from the national objective of nuclear attribution to the underlying requirements for validated signatures and high-quality analytical results using a variety of materials analysis techniques. Examples from real interdicted materials, as well as international exercises, will be used to demonstrate the key concepts of nuclear forensics.
Input-response space-filling designs incorporating response uncertainty
2024, Statistical Analysis and Data Mining
From smoke to sustainability: the role of socioeconomic factors in the continuous use of clean cooking technologies in Uganda
2023, Technological Sustainability

View all citing articles on Scopus

View full text

Comparing multiple statistical methods for inverse prediction in nuclear forensics applications

Highlights

Abstract

Introduction

Section snippets

Methods

Demonstration study

Application: Pu(III) oxalate precipitation

Discussion

Chemom. Intelligent Laboratory Syst.

Neural Netw.

J. Stat. Plan. Inference

Statistical analysis for nuclear forensics experiments, statistical analysis and data mining

ASA Data Sci. J.

Nuclear Forensic Analysis

Estimating maggot age from weight using inverse prediction

J. Forensic Sci.

Analysing forensic entomology data using additive mixed effects modelling

Bayesian calibration of computer models

J. R. Stat. Soc. Ser. B Stat. Methodol.

Multivariate Calibration

Partial least-squares methods for spectral analyses. 1. relation to other quantitative calibration methods and the extraction of qualitative information

Anal. Chem.

The prediction properties of classical and inverse regression for the simple linear calibration problem

J. Qual. Technol.

Classical and inverse regression methods of calibration

Technometrics

Classical and inverse regression methods of calibration in extrapolation

Technometrics

Inverse calibration predicts better than classical calibration

Fresenius’ J. Anal. Chem.

Inverse vs. classical calibration for small data sets

Fresenius’ J. Anal. Chem.

A comparison of classical and inverse estimators in the calibration problem

Commun. Statistics – Theory Methods