1 Introduction

Experimental studies of physical systems are often concerned with answering simple questions: Does the Higgs boson exist? Can we observe gravitational waves? Ideal experiments are designed whereby the results depend on the answer to these questions, and so by making such measurements these answers can be inferred. It is, however, often also the case that these fundamental properties are just one of many complex and independent parameters that affect the experimental data. The other parameters could be anything from other fundamental physical constants, which are perhaps unknown or known to poor precision, to experimental effects such as the detector resolution and efficiency. Therefore, in order to answer the ‘interesting’ questions, one must first answer many ‘uninteresting’ questions about the measurements, and in fact often it is these uninteresting questions which dominate the efforts of researchers in their fields.

In this article we present a novel technique which uses machine learning [1] to bypass the difficult and uninteresting parts of the analysis, and address the fundamental questions directly. Machine learning refers to a set of numerical algorithms which allow computers to learn patterns and make predictions without encoding those patterns explicitly. These techniques have exceptional analytical potential, and have been used to great effect in a plethora of fields, for example to perform image analysis and facial recognition [2], to understand the sentiment of a paragraph of text [3], to automatically identify interesting events in high energy physics experiments, such as the LHC [4], and to automatically distinguish between true gravitational wave signatures and those produced by non-astrophysical noise in LIGO data [5].

Here the fundamental question we wish to address is: given an experimental energy spectrum produced by the resonant scattering of a nucleus with \(^{4}\)He, is \(\alpha \)-clustering observed in the structure of the compound nucleus formed in this reaction? Alpha-clustering is the phenomenon whereby protons and neutrons form sub-structures within the nucleus, and it can usually be ascribed to specific nuclear energy levels, known as \(\alpha \)-clustered states. This has been shown to play a pivotal role in dictating the properties and interactions of light nuclei [6, 7], yet it has not been observed to the same extent in heavy nuclei. It is tempting, therefore, to suggest that systems which contain few nucleons are more likely to form cluster structures than those composed of many nucleons, and efforts to understand this trend have led to considerable experimental and theoretical work investigating \(\alpha \)-clustering in medium mass nuclei, some of which is detailed in Ref. [8]. It is unclear, however, whether the reduction of experimentally observed \(\alpha \)-cluster structures in heavy nuclei truly reflects a shift in structural preference away from \(\alpha \)-clustering, or whether experimental difficulties which arise with increasing nuclear mass have concealed the cluster structures in this region.

One experimental difficulty which is unique to heavier systems is the increasing nuclear level density. This leads to more complex experimental spectra, and also means that \(\alpha \)-clustered states often serve as doorway states [9] in the \(\alpha \) decay-channel, and as such, rather than searching for a single \(\alpha \)-clustered state, one must instead search for groups of fragmented states all sharing the strength of the original clustered state. Usually the analysis of these experimental spectra requires the extraction of the properties of all of the energy levels which are populated in the reaction, and then the energy levels are compared with a theoretical nuclear model in order to ascertain whether or not they exhibit signs of \(\alpha \)-clustering. However, the significant increase in the complexity of the spectra means that unambiguously extracting all of the energy levels is a very challenging prospect, and is often the primary obstacle when analysing experimental data in this mass region.

In this scenario, the uninteresting properties are the energy levels, which are difficult to extract and the majority of which will correspond to non-clustered structures. So rather than attempting to extract the states, in this article a technique is developed which simulates many spectra, each time assuming a unique and random combination of nuclear states, but in each case controlling for whether or not an \(\alpha \)-clustered structure is present. Machine learning is then employed to learn the differences between spectra which do or do not contain \(\alpha \)-clustered states, independent of the properties of the other states in the spectrum. This algorithm can then be applied to the measured data to ascertain the existence of \(\alpha \)-clustering.

In this article this technique is employed to examine the evolution of \(\alpha \)-clustering in titanium isotopes. Previous work on \(^{44}\)Ti has identified a range of \(\alpha \)-clustered states, many of which have been shown to be fragmented [10,11,12]. These observations are in agreement with predictions made by \(\alpha \)-cluster model calculations [8, ch. 2] and a deformed basis Antisymmetrised Molecular Dynamics calculation [9], indicating a good understanding of the underlying \(\alpha \)-cluster structure. There has, however, been comparatively little work done to investigate similar structures in neutron rich titanium isotopes. Analyses of \(\alpha \)-transfer reactions have indicated that the degree of \(\alpha \)-clustering in titanium isotopes decreases with increasing nuclear mass, both in the ground state [13, 14] and in excited states [11], however, a measurement of \(^{48}\)Ca(\(\alpha \),\(\alpha \)) elastic scattering shows significant resonant structure [15]. This may be indicative of \(\alpha \)-clustered states in \(^{52}\)Ti above the \(\alpha \)-decay threshold, however no formal analysis was performed on this measurement.

The present work investigates \(^{44,48,52}\)Ti by measuring the resonant scattering reactions, \(^{4}\)He(\(^{40,44,48}\)Ca,\(\alpha \)). This allows the degree of \(\alpha \)-clustering above the \(\alpha \)-decay threshold to be compared consistently between the three isotopes, and is an ideal testing ground for a novel machine learning technique as \(^{44}\)Ti can be used to test the reliability of the procedure, as it is already well understood, before the technique is applied to the neutron rich isotopes.

2 Experimental measurements and results

The \(^{4}\)He(\(^{40,44,48}\)Ca,\(\alpha \)) measurements were made using the Thick Target Inverse Kinematics (TTIK) technique [16]. The reaction chamber was filled with \(^{4}\)He gas, which acted firstly as a medium to smoothly decrease the energy of the calcium ions as they travel through the chamber via electronic interactions, and secondly as the target for the desired nuclear reactions. This allows a measurement to be made of the entire excitation spectrum without changing the beam energy. The scattered \(\alpha \)-particles were measured using two 1mm thick Double-sided Silicon Strip Detectors (DSSDs), placed at the opposite end of the reaction chamber to the beam entrance in the E-\(\Delta \)E configuration. This ensured that the measurements consisted purely of \(\alpha \)-particles and allowed the measurements to be made at a scattering angle of 180\(^\circ \) in the centre-of-mass frame. The measured spectra are shown in Fig. 1, and more details on the experimental work can be found in Ref. [17, 18].

Fig. 1
figure 1

Measurements of the differential cross-section, made at a centre-of-mass scattering angle of 180\(^\circ \), of the \(^{4}\)He(\(^{40}\)Ca,\(\alpha \)) (top), \(^{4}\)He(\(^{44}\)Ca,\(\alpha \)) (middle) and \(^{4}\)He(\(^{48}\)Ca,\(\alpha \)) (bottom) reactions. The resonances in these measurements are relevant to the structure of \(^{44}\)Ti, \(^{48}\)Ti and \(^{52}\)Ti respectively

A crucial aspect of the TTIK technique is that the measured spectra are in fact a convolution of the true excitation function with the experimental resolution. This serves to reduce the height of any resonances which are much narrower than the experimental resolution. This behaviour can severely hinder the analysis of TTIK spectra if the experimental resolution is poor, however, if it is small enough such that it only impacts states which are too narrow to be considered \(\alpha \)-cluster candidates, and it does not cause neighbouring states to become indistinguishable, then it can be considered a useful property as its only effect will be to remove non-clustered states from the spectra. In the present work, REX [19], a Monte-Carlo simulation of thick target resonant scattering experiments, was used to calculate the experimental resolution as 45 keV at Full Width Half Maximum.

3 Alpha clustered doorway states model

The cross-section, \(d\sigma /d\Omega \), of the resonant reactions measured in this work can be calculated directly from the energy levels in the compound nucleus using R-matrix theory [20]. It is, therefore, possible to simulate \(d\sigma /d\Omega \) by first generating a set of ‘non-clustered’ energy levels, and then optionally coupling these levels to an \(\alpha \)-clustered doorway state. The simulated spectra are generated from the energy levels using the Simplified R-Matrix [21], and classified as either non-clustered (no \(\alpha \)-clustered doorway states), or clustered (one \(\alpha \)-clustered doorway state). Many clustered and non-clustered spectra were generated, each time with a unique and random set of energy levels.

The Simplified R-Matrix calculates \(d\sigma /d\Omega \) for reactions where all initial and final state nuclei are spin-0. The cross-section is calculated as a function of excitation energy, \(E_x\), and centre-of-mass scattering angle, \(\theta \), from the excitation energies, \(E_\lambda \), orbital angular momenta, \(L_\lambda \), partial decay widths, \(\Gamma _{\lambda \mu }\), and total decay widths, \(\Gamma _{\lambda }\), of the energy levels, where the energy levels are indexed by \(\lambda \) and the decay channels are indexed by \(\mu \). This is written explicitly as

$$\begin{aligned} \frac{d\sigma }{d\Omega } =&\left| \delta _{\mu \mu '} f_b({E_x}, \theta ) - \frac{i}{2 k_\mu } \sum _\lambda (2 L_\lambda + 1) \right. \nonumber \\&\left. \times \frac{\Gamma _{\lambda \mu }}{\Gamma _\lambda } (e^{2 i \beta _\lambda ({E_x})} - 1) e^{2 i \phi _{L_\lambda }} P_{L_\lambda }(\cos {\theta })\right| ^2, \end{aligned}$$
(1)

where

$$\begin{aligned} \beta _\lambda ({E_x}) = \arctan { \left( \frac{\Gamma _\lambda / 2}{E_\lambda - {E_x}} \right) } \end{aligned}$$
(2)

and \(f_b\) is the background amplitude, \(k_\mu = \sqrt{2 m_\mu E_\text {c.m.}} / \hbar \), \(E_\text {c.m.}\) is the centre-of-mass energy of the system, \(m_\mu \) is the reduced mass, \(P_{L_\lambda }\) is a Legendre polynomial of order \(L_\lambda \) and \(\phi _{L_\lambda }\) is the partial wave phase shift. The partial wave phase shifts exist only in the simplified version of the R-matrix to account for the behaviour of the interference between the resonances and the background amplitude. In this work they were randomised between 0 and \(\pi \) to account for all possible types of interference.

In practice the cross-section is not measured as a continuous quantity, and instead is measured in a finite number of excitation energy bins. In order to ensure that the simulations match the experimental data the cross section was calculated discretely for each experimental bin, \(d{\sigma }/d{\Omega }_n\), where \({E_x}_{n}\) and \({\theta }_n\) are the excitation energy and scattering angle of the bin respectively. Additionally, the background amplitude was defined by fitting a smoothing spline to the experimental spectra which approximated the background, and sampling this at \({E_x}_n\). Finally the simulated cross-section was convoluted with the experimental resolution, and noise was added based on the experimental signal to noise ratio, in order to make the simulations as directly comparable to the measured spectra as possible.

The non-clustered energy levels were simulated by generating a set of shell-model like energy levels, known as class-I energy levels and indexed by \(\lambda _\text {I}\), characterised by ensuring that the levels adhere to the appropriate statistical distributions (described below) indicative of the shell model.

The partial widths, \(\Gamma _{\lambda _\text {I} \mu }\), for each decay channel \(\mu \) were constructed to follow Porter-Thomas statistics [22] by Gaussianly distributing the reduced widths, \(\gamma _{\lambda _\text {I} \mu }\), with a mean of 0 and variance given by \(\langle \gamma ^2_{\mu } \rangle \). The partial widths are calculated from the reduced widths using

$$\begin{aligned} \Gamma _{\lambda _\text {I} \mu } = 2 P_{\mu L_{\lambda \text {I}}} \gamma _{\lambda _\text {I} \mu }^2 \, , \end{aligned}$$
(3)

where \(P_{\mu L_{\lambda \text {I}}}\) is the penetrability through the combined Coulomb and centrifugal barrier, and \(L_{\lambda _\text {I}}\) is the orbital angular momentum in channel \({\mu }\). The penetrability was calculated from the regular and irregular Coulomb wavefunctions [23].

The values of \(\langle \gamma ^2_{\mu } \rangle \) dictate the average strength of each decay channel. In these simulations they were chosen by defining the mean square ratio to the Wigner limit for single particle decays, \(\langle \theta ^2_\text {sp} \rangle \), and the ratio to the single particle strength for each decay channel, \(R_{\mu /\text {sp}}\). The Wigner limit, \(\gamma ^2_{\mu w}\), is a theoretical upper bound on the reduced width. Written formally, this gives

$$\begin{aligned} \langle \gamma ^2_{\mu } \rangle = R_{\mu /\text {sp}} \langle \theta ^2_{\mu } \rangle \gamma ^2_{\mu w} \, . \end{aligned}$$
(4)

For all of the spectra in this work the only open decay channels are the proton, neutron and \(\alpha \) channels. Since the proton and neutron decays are both decays to single particles, \(R_{p / \text {sp}}, R_{n / \text {sp}} \sim 1\), however, one would expect average \(\alpha \)-decay strength to be weaker than the proton and neutron strengths for purely shell-model type states as the \(\alpha \)-particle is a more complex particle, and so \(R_{\alpha / \text {sp}} < 1\).

The excitation energies, \(E_{\lambda _\text {I}}\), and spins and parities, \(J^\pi \), were generated such that the nearest neighbour state spacings of states with the same \(J^\pi \) followed the Wigner distribution [24], defined as

$$\begin{aligned} P_w \left( D_{J^\pi } \right) = \frac{\pi D_{J^\pi }}{2 \langle D_{J^\pi } \rangle ^2} \exp { \left( -\frac{\pi D_{J^\pi }^2}{4 \langle D_{J^\pi } \rangle ^2} \right) }, \end{aligned}$$
(5)

where \(\langle D_{J^\pi } \rangle \) is the mean nearest neighbour state spacing for states with the same \(J^\pi \), and is calculated from the overall mean state spacing, \(\langle D \rangle \), using the Gaussian cutoff factor from the Fermi-gas model [23],

$$\begin{aligned} \langle D_{J^\pi } \rangle = \langle D \rangle \frac{2J + 1}{2 \sigma ^2_\text {spc}} \frac{1}{\sqrt{2 \pi \sigma ^2_\text {spc}}} \exp {\left( - \frac{J(J+1)}{2 \sigma ^2_\text {spc}} \right) } \, , \end{aligned}$$
(6)

where the spin cutoff factor \(\sigma _\text {spc}\) is defined by assuming that the nucleus is a rigid rotating sphere.

The clustered spectra were generated by coupling an \(\alpha \)-clustered doorway state, known as a class-II state, to the set of class-I states, to produce a set of compound states, indexed by \(\lambda \). The class-II state was assumed to exist in a highly deformed secondary minimum in the deformation potential energy surface, and was characterised as being \(\alpha \)-clustered by a large ratio to the Wigner limit in the \(\alpha \)-channel, \(\theta ^2_{\text {II}, \alpha }\), and zero decay widths in all other channels. Its spin and parity, \(J^\pi _\text {II}\), were randomised, and its excitation energy, \(E_\text {II}\), was randomised uniformly within the measured energy range.

The coupling between the class-I and class-II states was based on the work by Bjørnholm and Lynn [25] for the treatment of fission isomers. The compound states were generated by solving the eigenvalue equation

$$\begin{aligned} \begin{bmatrix} {\varvec{E_\mathbf{I }}} &{} {\varvec{H_\mathbf{c }}} \\ {\varvec{H^T_\mathbf{c }}} &{} E_\text {II} \end{bmatrix} \begin{bmatrix} {\varvec{C^\mathbf (I) _\lambda }} \\ C^\text {(II)}_\lambda \end{bmatrix} = E_\lambda \begin{bmatrix} {\varvec{C^\mathbf (I) _\lambda }} \\ C^\text {(II)}_\lambda \end{bmatrix}, \end{aligned}$$
(7)

where \({\varvec{E_\mathbf{I }}}\) is a diagonal matrix containing \(E_{\lambda _\text {I}}\), \(E_\lambda \) is the excitation energy of the compound state and \({\varvec{C^\mathbf (I) _\lambda }}\) and \(C^\text {(II)}_\lambda \) are the coefficients which produce the compound state from the class-I and class-II states. The matrix \({\varvec{H_\mathbf{c }}}\) is a \(1\times N_\text {I}\) matrix, where \(N_\text {I}\) is the total number of class-I states. The elements of \({\varvec{H_\mathbf{c }}}\) are 0 for class-I states which have a different \(J^\pi \) to the class-II state, and otherwise are taken from a normal distribution, centred on 0 with a variance given by \(\langle H^2_c \rangle \). This ensures that the class-II state only couples to class-I states of the same \(J^\pi \), and the use of a normal distribution is justified in Ref. [25] to account for the random behaviour of the overlap between the class-I and class-II state wavefunctions. The value of \(\langle H^2_c \rangle \) defines the strength of the coupling, and, therefore, the number of class-I states which will couple significantly to the doorway state, known as the fragmented states. However, the number of fragmented states depends also on the state spacing of the class-I states. Therefore, \(N_c\) is defined for each clustered spectrum, which is directly proportional to the expected number of fragmented states, and from this \(\langle H^2_c \rangle \) is defined as

$$\begin{aligned} \langle H^2_c \rangle = \frac{N_c \langle D_{J^\pi _\text {II}} \rangle ^2}{\pi }. \end{aligned}$$
(8)

The reduced width amplitudes of the compound states are calculated from \({\varvec{C^\mathbf (I) _\lambda }}\) and \(C^\text {(II)}_\lambda \) as

$$\begin{aligned} \gamma _{\lambda \mu } = \sum _{\lambda _\text {I}} \left( {\varvec{C^\mathbf (I) _\lambda }}\right) _{\lambda _\text {I}} \gamma _{\lambda _\text {I} \mu } + C^\text {(II)}_\lambda \gamma _{_\text {II}, \mu }. \end{aligned}$$
(9)
Fig. 2
figure 2

An example of a simulated clustered spectrum (top) and non-clustered spectrum (bottom). The position of the doorway state is indicated by the dotted line. These spectra were calculated for \(^{44}\)Ti, using the parameters: \(\langle D \rangle \) = 0.05 MeV, \(\langle \theta ^2_\text {sp} \rangle \) = 0.035, \(R_{\alpha /\text {sp}}\) = 5%, \(N_c\) = 2, \(E_\text {II}\) = 12 MeV, \(J^\pi _\text {II}\) = \(3^-\) and \(\theta ^2_{_\text {II}, \alpha }\) = 0.2. Clustering likelihood: 4% vs 89%

An ensemble of spectra, containing an equal number of clustered and non-clustered spectra, were generated using this model. The input parameters, \(\left\{ \langle D \rangle , \langle \theta _\text {sp}^2 \rangle , R_{\alpha /\text {sp}} \right\} \) for both types of spectra and additionally \(\left\{ \theta _{\text {II}, \alpha }^2, N_c, E_\text {II}, J^\pi _\text {II} \right\} \) for the clustered spectra, were randomised within sensible ranges to ensure that all reasonable scenarios were accounted for. Choosing the ranges for each of these parameters is akin to choosing a prior distribution in Bayesian statistics. The ranges used and their justifications are given in Table 1, and an example of the clustered and non-clustered spectra produced are shown in Fig. 2.

Table 1 The parameter ranges used to produce the ensemble of spectra

This spectrum ensemble was used as ‘training data’ to train a Random Forest Classifier (RFC) to classify spectra as either clustered or not clustered, where each spectrum is characterised by a set of ‘features’ calculated from \(d\sigma / d\Omega _n\). More details on the RFC are given in Sect. 4.

The features used were calculated from \(d\sigma / d\Omega _n\) using a combination of the Continuous Wavelet Transform (CWT) [28] and a Principle Component Analysis (PCA) [29]. It was shown in Refs. [17, 18] that the CWT is an effective tool for the identification of \(\alpha \)-clustered doorway states from TTIK measurements. The CWT calculates wavelet coefficients, \(W_{\Psi ,nm}\), from \(d\sigma / d\Omega _n\) by folding it with an appropriately chosen wavelet, \(\Psi (E)\). The wavelet is scaled by \(\delta \! E_m\), known as the scale parameter, which allows features in the spectrum to be expanded as a function of scale. The wavelet coefficients are calculated as

$$\begin{aligned} W_{\Psi ,nm} = \frac{1}{\sqrt{\delta \! E_m}} \int ^\infty _{-\infty } \frac{d\sigma }{d\Omega }(\epsilon ) \Psi \left( \frac{\epsilon - {E_x}_n}{\delta \! E_m} \right) \, d \epsilon , \end{aligned}$$
(10)

where \(\epsilon \) is a dummy variable used to facilitate the integration, and in practice the integral was calculated numerically using the trapezoidal rule. In this work the complex Morelet wavelet [28], which can be thought of as a windowed Fourier transform, was used. This is defined formally as

$$\begin{aligned} \Psi (E) = \left( d \sqrt{\pi } \right) ^{1/2} \exp \left( -i 2 \pi E \right) \exp \left( -\frac{E}{2d^2} \right) , \end{aligned}$$
(11)

where d defines the size of the window, and in this work \(d = 0.8\) MeV. In this case \(\delta \! E_m\) is the equivalent of the period in a typical Fourier transform, and \(W_{\Psi , nm}\) is similar to a Fourier transform coefficient, but localised at \({E_x}_n\). In this work 70 values of \(\delta \! E_m\) were used, uniformly spaced between 0 and 1 MeV. The CWTs of the \(^{4}\)He(\(^{40,44,48}\)Ca,\(\alpha \)) spectra are shown in Fig. 3.

Fig. 3
figure 3

The CWTs of the measured spectra, \(^{4}\)He(\(^{40}\)Ca,\(\alpha \)) (top), \(^{4}\)He(\(^{44}\)Ca,\(\alpha \)) (middle) and \(^{4}\)He(\(^{48}\)Ca,\(\alpha \)) (bottom). In each case the heatmap shows the magnitude of \(W_{\Psi , nm}\) as a function of \(\delta E_m\) and \({E_x}_n\)

In this work the magnitude of the wavelet coefficients, \(\left| W_{\Psi , nm} \right| \), are used and the phases are discarded, as it was observed that the phases contained little useful information regarding the \(\alpha \)-clustered nature of the spectrum. It would, however, be inefficient to use \(\left| W_{\Psi , nm} \right| \) directly in the RFC as they are not orthogonal, with large correlations between neighbouring values of \(\left| W_{\Psi , nm} \right| \), and a large number of coefficients are required to adequately characterise a spectrum, which leads to an unnecessarily computationally intensive analysis. Instead a PCA is performed on \(\left| W_{\Psi , nm} \right| \) as a form of dimensionality reduction. This constructs a new set of orthogonal features from \(\left| W_{\Psi , nm} \right| \), chosen to ensure that the largest fraction of the variance in the original feature set is retained in the fewest possible features. In this case 300 PCA features were used, which accounted for 99.3% of the variance in the \(\left| W_{\Psi , nm} \right| \) feature set. More details on the PCA algorithm can be found in Ref. [29]. The PCA algorithm is very sensitive to the initial distributions of the features, and works optimally when these are approximately normally distributed and normalised to a mean of 0 and a variance of 1. In order to accomplish this, the logarithm was taken of \(\left| W_{\Psi , nm} \right| \), and the logged values were independently normalised to have a mean of zero and unit variance across the training data. The PCA was then performed on these normalised log wavelet coefficients.

The result of this process is a set of PCA features, \(\text {PCA}_k\), which each correspond to a certain \(\left| W_{\Psi , nm} \right| \) distribution. Some examples of these distributions are shown in Fig. 4 for \(k = 0,1,2,20\), and an example of the stages of producing the PCA variables from a raw spectrum are shown in Fig. 5. The ensemble of \(\text {PCA}_k\) for all of the simulated spectra, combined with their classification label, clustered or not clustered, makes up the training data from which the Random Forest is trained.

The consequence of using PCA features as opposed to directly using the \(\left| W_{\Psi , nm} \right| \) features is that they much more naturally describe the overall properties of the spectrum than they do the properties of individual resonances within the spectrum. For example it is evident from Fig. 4 that \(\text {PCA}_{0}\) represents the average amplitude of the resonances throughout the spectrum, relative to the amplitude of the noise in the spectrum, and \(\text {PCA}_{1}\) represents whether or not the average resonant amplitude increases or decreases throughout the spectrum. The higher order PCA variables then begin to account for the shapes of the resonances, the spacings between the resonances and the widths of the resonances, however, these properties are all merged by the PCA algorithm, obscuring the properties of individual resonances. While this may lead to a reduction in the sensitivity of this algorithm to the more subtle effects of \(\alpha \)-clustering on the spectra, the dominant effects ought to still be captured by the PCA features.

Fig. 4
figure 4

Examples of some PCA variables for \(^{44}\)Ti. A heatmap colour scale is used to highlight the shape of the PCA components. See text for details

Fig. 5
figure 5

The 4 stages of feature engineering used to prepare the spectra for the RFC. Left to right: the raw spectrum, the CWT of the spectrum, the normalised logarithm of the CWT, the PCA of the normalised CWT. For each stage of processing the top plot shows an example spectrum, and the bottom plot shows the distribution of some randomly chosen features across the entire training data set

4 Machine learning

A RFC [30] is an ensemble machine learning method, which combines many randomised decision trees to produce a more robust and sophisticated classification than is possible using a single decision tree. Each tree is randomised by training it on a random subset of the training data, and at each node in the tree the optimal splitting criterion is chosen from a subset of the available features.

The RFC classifies a spectrum by allowing the individual decision trees to perform the classification independently, and then averaging the results. This method produces a pseudo-likelihood that the spectrum is clustered, \(L_c'\), which is calculated as the fraction of the decision trees which predict that the spectrum is clustered. It is possible to calibrate the pseudo-likelihood to give the true likelihood that the spectrum is clustered, \(L_c\). This calibration was performed by calculating \(L_c'\) for every spectrum in the training data via fivefold cross-validation [31], which splits the training data into 5 segments and then trains the RFC on 4 of those, before using it to calculate \(L_c'\) for the spectra in the 5th segment. This process is repeated, leaving out each of the segments one at a time, until \(L_c'\) has been calculated for every spectrum in the training data. All of the clustered and non-clustered spectra were then binned separately as a function of \(L_c'\), producing two histograms, \(N^\text {c}_n\) and \(N^\text {nc}_n\) respectively, with bin centroids at \(L_{c,n}'\). The true clustering likelihood was then calculated from these histograms as the fraction of the spectra in each bin that are clustered, given formally as

$$\begin{aligned} L_{c,n} = \frac{N^\text {c}_n}{N^\text {c}_n + N^\text {nc}_n}. \end{aligned}$$
(12)

Finally a logistic function was fit to \(L_{c,n}\) as a function of \(L_{c,n}'\), producing the continuous function \(L_c(L_c')\), under the constraints that \(L_c(0) = 0\) and \(L_c(1) = 1\). This function was then used to convert between \(L_c'\) and \(L_c\), an example of which is shown in Fig. 6.

Fig. 6
figure 6

An example of the likelihood calibration. The logistic function fit is shown in red, \(L_{c,n}\) are shown in blue. The dashed black line indicates \(L_c' = L_c\)

Five-fold cross-validation was also used to tune the RFC hyper-parameters by calculating the percentage of the cross-validated classifications which were correct, known as the classification accuracy. The hyper-parameters that were tuned were the total number of decision trees which compose the RFC, and the minimum number of events which may be contained within a single node of a decision tree. The optimal values chosen were 1000 decision trees and 75 events respectively. While traditional RFCs use fully grown decision trees, rather than limiting them by defining a minimum number of events per node, it was found in this work that fully grown trees sometimes overfit to the training data, producing unreliable results.

In addition to the classification accuracy, two other quantities were used to assess the quality of the RFC, the fraction of the clustered spectra which were classified correctly (sensitivity) and the fraction of the non-clustered spectra which were classified correctly (specificity). These are often also referred to as the True Positive Rate (TPR) and True Negative Rate (TNR) respectively. These are used, in addition to the accuracy, to probe the behaviour of the RFC in the following section.

5 Results

Three RFCs were produced, one each for \(^{44}\)Ti, \(^{48}\)Ti and \(^{52}\)Ti, with cross-validated classification accuracies of 76%, 77% and 79% respectively, sensitivities of 74%, 78% and 81% respectively and specificities of 77%, 76% and 79% respectively. It is interesting to observe the dependence of the sensitivity of the RFCs on some key simulation parameters, as the sensitivity can be treated as a measure of how easy it is to observe an \(\alpha \)-clustered doorway state. The three RFCs all behaved similarly, so one can assume that the conclusions drawn here are applicable to all three measurements, and only the results for \(^{44}\)Ti are presented.

Firstly it was important to ascertain that the RFCs were capable of identifying \(\alpha \)-clustered states in spectra containing more than one, despite being trained only on spectra with a single \(\alpha \)-clustered state. The sensitivity of the RFC was plotted as a function of the number of class-II states in the spectra in Fig. 7, demonstrating that the sensitivity increases with the number of class-II states in the spectrum. This is to be expected for a sensible RFC since if there are many class-II states present it becomes less likely that the RFC will miss all of them.

Fig. 7
figure 7

The sensitivity of the RFC as a function of the number of class-II states in the spectrum

The sensitivity was calculated as a function of \(\theta _{\text {II}, \alpha }^2\) by binning the training data uniformly into 40 \(\theta _{\text {II}, \alpha }^2\) bins and calculating the sensitivity independently for each bin. These values were then smoothly interpolated using a Gaussian process with a Matern kernel, which assumes that the data points ought to be correlated highly with those close in \(\theta _{\text {II}, \alpha }^2\), and uses the magnitude of the errors on the data points to infer the smoothness of the interpolation and the size of the confidence interval. The data and the Gaussian process fit are shown in Fig. 8. Below \(\theta _{\text {II}, \alpha }^2 \sim 0.25\) the sensitivity decreases, while above it plateaus. This indicates that if the \(\alpha \)-clustered doorway state one is attempting to observe has a large ratio to the Wigner limit in the \(\alpha \)-channel, above 0.25, it is much easier to observe than if one attempts to observe a similar state with a smaller \(\theta _{\text {II}, \alpha }^2\). This is a sensible result, as states with small \(\alpha \)-widths will look similar to class-I states, and, therefore, be more difficult to identify.

Fig. 8
figure 8

The sensitivity of the RFC as a function of \(\theta _{\text {II}, \alpha }^2\), fit with a Gaussian process using a Matern kernel (line). The shaded region indicates a 1\(\sigma \) confidence interval

Fig. 9
figure 9

The sensitivity of the RFC as a function of the excitation energy of the class-II state, for each \(J^\pi \) of the class-II state (data points with error bars). The values are fit smoothly using a Gaussian process with a Matern kernel (solid line), and a \(1\sigma \) confidence interval is shown (shaded region). The Gaussian process fits are compared in the bottom-right plot

Finally the sensitivity was calculated for each \(J^\pi _\text {II}\), as a function of \(E_\text {II}\). This is plotted in Fig. 9, and shows that at high energies, low-spin doorway states are difficult to observe, and conversely at low energies high-spin doorway states are difficult to observe. This is because the resonant amplitude is proportional to \((2J + 1)^2\), which amplifies high-spin states, however, the increased centrifugal barrier for high spin states dramatically decreases their penetrability factor and, therefore, their decay widths. Therefore, at low energies, where the barrier penetrability is especially dominant, the high spin states are difficult to populate, whereas at high energies they are populated and their increased amplitude dominates the spectrum, obscuring the low-spin resonances.

Upon their application to the experimentally measured data, the RFCs predicted clustering likelihoods of 92%, 41% and 83% respectively, indicating that it is very likely that \(^{44}\)Ti and \(^{52}\)Ti contain at least one \(\alpha \)-clustered doorway state and unlikely that \(^{48}\)Ti does. This is consistent with previous observations of \(\alpha \)-clustered doorway states in \(^{44}\)Ti [10,11,12], as well as with a previous analysis of these data, which identified doorway states in \(^{44}\)Ti and \(^{52}\)Ti but not in \(^{48}\)Ti by examining the characteristic CWT scales of these measurements [17, 18].

Fig. 10
figure 10

The clustering likelihood for \(^{44}\)Ti (red), \(^{48}\)Ti (green) and \(^{52}\)Ti (blue), as a function of the limits used for the training data. In each case one limit is varied, and the others are held constant at their default values given in Table 1. In each plot the horizontal black dashed line indicates \(L_c = 0.5\), and the vertical black dotted line indicates the default parameter value

Next, the sensitivity of these results to the ranges used to produce the ensemble of training spectra was investigated. The upper and lower limits of \(\langle D \rangle \) and \(\langle \theta _\text {sp}^2 \rangle \), the lower limit of \( \theta _{\text {II}, \alpha }^2\), the upper limit of \(R_{\alpha /\text {sp}}\) and the value of \(N_c\) were all varied, and new training ensembles were generated, to which new RFCs were fit and clustering likelihoods were recalculated for each isotope. The clustering likelihoods are plotted as a function of the parameter limits in Fig. 10.

Firstly, while \(L_c\) is almost completely insensitive to the choice of limits on \(\langle D \rangle \), it does exhibit a dependence on the other parameter limits, to varying degrees of severity. The clustering likelihood decreases slightly for all isotopes as both \(\langle \theta _\text {sp}^2 \rangle \) limits increase. This is because as these limits increase, the average widths of the non-clustered resonances increases, reducing the difference between clustered and non-clustered spectra. The clustering likelihood also decreases for all isotopes as the lower limit on \( \theta _{\text {II}, \alpha }^2\) increases. Increasing this limit effectively increases the threshold at which a state is considered \(\alpha \)-clustered, and consequently the clustering likelihood ought to naturally decrease as this increases and the criteria for \(\alpha \)-clustering gets harsher. It is also the case that the clustering likelihoods increase for low values of \(R_{\alpha /\text {sp}}\). This is because the value of \(R_{\alpha /\text {sp}}\) dictates the average size of the non-clustered resonances. If the simulated resonances in the non-clustered spectra are all very small, then any resonances in the measured spectra will produce a large clustering likelihood. Finally, it can be seen from the clustering likelihoods as a function of \(N_c\) that while fragmented states are observed in \(^{44}\)Ti and \(^{52}\)Ti, if one looks for non-fragmented \(\alpha \)-clustered states instead (i.e. small values of \(N_c\)), then the clustering likelihood falls below 0.5 for all three isotopes, indicating none are observed. This is consistent with the expectation that if \(\alpha \)-clustered states exist in this mass region, they ought to behave as doorway states. Overall however, while there are some small variations in \(L_c\) for extreme values of the parameters, the fundamental results that \(^{44}\)Ti and \(^{52}\)Ti contain \(\alpha \)-clustered doorway states, while \(^{48}\)Ti does not, are preserved, indicating a robust analysis.

Fig. 11
figure 11

Top panel: The relative importance of the PCA features for the classification of \(\alpha \)-clustered spectra, calculated based on the average height at which each feature is used in the decision trees. Bottom panel: The contribution each feature makes to the overall clustering likelihood

It is possible to calculate the relative importances of each PCA parameter, which indicates which parameter has the most influence over the resulting classification. This is calculated by evaluating the average ‘height’ of each parameter in the decision trees, and assuming that the most important parameters are those that are used earlier (or higher). These importances are plotted in the lower panel in Fig. 11. It is clear that the importance is highest for the lowest order PCA variables, suggesting that it’s the overall group properties which contribute most significantly to the classification, for example the average resonance amplitudes, and the higher order terms are not as important. This demonstrates that the RFC is predicting the existence of \(\alpha \)-clustered doorway states by examining the average resonant amplitude observed in the spectra, and how the resonant amplitude varies as a function of excitation energy.

It is also possible to calculate the contribution each PCA feature makes to \(L_c\), \(L_{c,k}\), such that \(L_c = 0.5 + \sum _k L_{c,k}\). For example, a negative contribution for a given parameter means that parameter represents a swing towards not clustered, and a positive contribution represents a swing towards clustered. These clustering likelihood contributions are plotted for each nucleus in the bottom panel of Fig. 11. These values can be used to assess exactly how the RFCs made the classification decisions for \(^{44,48,52}\)Ti. In all three cases \(\text {PCA}_0\) contributes negatively, indicating that alone the average amplitude of the resonances is not large enough to demonstrate the existence of an \(\alpha \)-clustered doorway state. However, in the cases of \(^{44}\)Ti and \(^{52}\)Ti \(\text {PCA}_1\) makes a very large positive contribution to \(L_c\). It is clear from looking at the spectra that both of these nuclei have large resonances at low excitation energies, and so it seems reasonable to conclude that the existence of large resonances at low excitation energies is indicative of \(\alpha \)-clustered doorway states in \(^{44}\)Ti and \(^{52}\)Ti. Note this work has used a binomial classification system, where the result must be one of two results (clustered or not) which could introduce a systematic bias. In future work it could be generalised to a multinomial classification problem, where predictions are attempted if the data are (A) shell model, (B) alpha clustered, (C) alpha clustered and coupled to shell model, (D) ...etc., with a different class for each nuclear structure or model to be evaluated, giving a likelihood for each different possible model.

6 Discussion

To summarise, by training an RFC to evaluate the differences between spectra simulated either with or without \(\alpha \)-clustered states, \(\alpha \)-clustering has been identified in \(^{44}\)Ti and \(^{52}\)Ti. The results for \(^{48}\)Ti are less conclusive, but tentatively suggest that \(\alpha \)-clustering is not present in this energy region. If one searches for a single \(\alpha \)-clustered state in the spectra, rather than sets of fragmented \(\alpha \)-clustered states indicative of a doorway state, then none of the measurements return a positive result, indicating that the \(\alpha \)-clustered structures observed in \(^{44}\)Ti and \(^{52}\)Ti act as doorway states. This suggests that the doubly-magic nature of \(^{40}\)Ca and \(^{48}\)Ca is particularly important for the existence of \(\alpha \)-clustered states.

The use of machine learning here has allowed these conclusions to be drawn without requiring the extraction of the individual spins, parities, energies and widths of the nuclear energy levels. This is very powerful, as it is likely that those parameters could not be robustly extracted from the current measurements alone, yet using this technique it was still possible to quantitatively answer the crucial, fundamental questions of \(\alpha \)-clustering in this mass region.

It is important to note that the combination of the PCA and the RFC here constituted quite a ‘blunt’ machine learning algorithm, since it effectively focused only on the average resonant amplitude of the measurements and ignored the more subtle features such as the state spacing and the resonance shapes. It may be possible to improve upon the results shown here by employing a more sophisticated machine learning technique, such as convolutional neural networks, which have been used with great success for image analysis in other fields [32].