A lightweight transformer for faster and robust EBSD data collection

Dong, Harry; Donegan, Sean; Shah, Megna; Chi, Yuejie

doi:10.1038/s41598-023-47936-6

Download PDF

Article
Open access
Published: 01 December 2023

A lightweight transformer for faster and robust EBSD data collection

Harry Dong¹,
Sean Donegan²,
Megna Shah² &
…
Yuejie Chi¹

Scientific Reports volume 13, Article number: 21253 (2023) Cite this article

568 Accesses
1 Altmetric
Metrics details

Subjects

Abstract

Three dimensional electron back-scattered diffraction (EBSD) microscopy is a critical tool in many applications in materials science, yet its data quality can fluctuate greatly during the arduous collection process, particularly via serial-sectioning. Fortunately, 3D EBSD data is inherently sequential, opening up the opportunity to use transformers, state-of-the-art deep learning architectures that have made breakthroughs in a plethora of domains, for data processing and recovery. To be more robust to errors and accelerate this 3D EBSD data collection, we introduce a two step method that recovers missing slices in an 3D EBSD volume, using an efficient transformer model and a projection algorithm to process the transformer’s outputs. Overcoming the computational and practical hurdles of deep learning with scarce high dimensional data, we train this model using only synthetic 3D EBSD data with self-supervision and obtain superior recovery accuracy on real 3D EBSD data, compared to existing methods.

Q-RBSA: high-resolution 3D EBSD map generation using an efficient quaternion transformer network

Article Open access 30 January 2024

Adaptable physics-based super-resolution for electron backscatter diffraction maps

Article Open access 15 December 2022

Efficient few-shot machine learning for classification of EBSD patterns

Article Open access 14 April 2021

Introduction

Experimental methodologies for three dimensional tomography of the internal microstructure of materials has been refined considerably in the past few decades^{1,2,3,4,5,6,7,8}, growing to include a variety of modalities such as X-ray computed tomography (CT), optical imaging, electron imaging, energy dispersive X-ray spectroscopy (EDS), and electron back-scattered diffraction (EBSD) data, among others. In all of these cases, a volume of material is interrogated slice by slice, and these slices are then stacked and reconstructed using a variety of software tools. Three dimensional microstructure data, while often laborious to generate compared to collecting a single 2D section, can provide unique insight, including things like the topology of microstructural features like grains, pores, precipitates and the true shape and size distributions of such features. Properties such as fatigue and oxide transport are sensitive to the three dimensional arrangement of these features^9,10,11. And novel manufacturing processes like additive manufacturing also demand three dimensional characterization of microstructure to fully link the processing to the structure^3,12.

While advancements have been made in robustness and closed loop collection of data², corruptions in data can still happen. This may not be as significant of a problem in non-destructive techniques where the data can be recollected, but in destructive techniques like serial sectioning, it may often be the case that a few slices out of a thousand have no data or are of very poor quality, lowering the accuracy of the reconstruction. The corruptions can happen for a number of reasons including removing more material (via mechanical polishing or laser/ion abalation) than planned, the electron source nearing the end of its life resulting in low signal images, the magnification on the microscope being incorrect for a slice, or the brightness/contrast settings result in an over/under saturated image. While adjustments to the control software can be made to prevent these and other issues, it is hard to a-priori imagine every reason a slice or subset of data can be corrupted.

However, if most of the data was collected properly, we hypothesize it should be possible to infer a substantial amount of the missing data. The current common practice is to fill in missing slices is to just copy the layer above or below the missing slice². This is reasonable in most serial sectioning cases where the slices are being collected at a high enough frequency that most of the data stays the same from slice to slice. Still, there is some room for improvement on nearest neighbor replacement type approaches, and the transformer model, which has been used to fill in text data, is a tantalizing framework for also filling in missing data in sequential image data.

Since its introduction¹³, transformers have been the backbone of many impressive breakthroughs in natural language processing (NLP). In their general form, transformers are excellent at processing sequential information in parallel. Consequently, its use has spread across multiple domains, such as in computer vision¹⁴, speech processing¹⁵, and bioinformatics¹⁶. In our case, transformers are appealing to EBSD data because the readings are inherently sequential as it represents a real physical structure.

Early large transformer models such as BERT¹⁷ and GPT¹⁸ were pretrained on copious amounts of text data with the intention of learning features to model the language through associations between words and phrases with a collection of self-supervised tasks¹⁹. Unlike supervised tasks, such as classification, which requires labeled datasets, self-supervised tasks involve automatically generating labels from unlabeled data, such as unscrambling a sequence of shuffled sentences, which force the model to vaguely learn the structure of the data. This technique remained fairly intact even when scaling to billion-parameter large language models (LLMs), enormous models that process much longer sequences over the course of hundreds of billions of tokens (language units comprising of a sequence of characters)^20,21,22. Taking inspiration from a similar popular self-supervised task in NLP, our training procedure involves randomly masking a slice in an EBSD volume and having the model predict the masked slice.

Our slice recovery task is closely related to that of masked image modeling, masked autoencoders, and inpainting in computer vision^{23,24,25,26,27}. Even though these may be applicable and there exists plenty of transformers in computer vision^14,28, we want our model to also be computationally efficient as we scale our method and leverage the fact that EBSD volumes have sparse structures²⁹, the dynamics of which operate differently from typical images or videos. Regarding efficiency, the fact that EBSD produces high dimensional data means the final model’s computational footprint cannot be ignored.

Taking these into consideration, our contributions are the following:

1.
We propose a novel method to recover missing EBSD data consisting of a scalable transformer model and straightforward projection algorithm that produces superior results compared to existing methods. This is accentuated when zoning in on the recovery accuracy of grain boundaries where other methods appear to perform poorly.
2.
We demonstrate that despite being trained solely on synthetic data, our transformer can generalize to real EBSD data without additional training while still outperforming all the baselines. This robustness to out-of-distribution EBSD data overcomes a major limitation of relatively low amount of available real EBSD data that deep learning requires.
3.
Our results suggest that serial sectioning experiments using similar experimental parameters to those in our test datasets, collection time can effectively be slashed by up to 25% with very little error in predicted voxel orientation, as the collection of every fourth slice can be bypassed. With the current lengthy procedure of EBSD, this is a significant improvement in efficiency.

Background

Terminology and notation

For this paper, we use bold capital letters (e.g. $\varvec{A}$) for matrices and capital calligraphic letters (e.g. $\varvec{\mathscr {A}}$) for tensors. Scalars are represented by plain uppercase and lowercase letters (e.g. A and a). Tuples are used to describe the shape of a structure, so if $\varvec{\mathscr {A}}\in \mathbb {R}^{N_1 \times N_2 \times N_3}$, then $\varvec{\mathscr {A}}$ is of shape $(N_1, N_2, N_3)$ and vice versa. A dimension is an index (starting at 1) in the shape tuple, and the size of a dimension i is the value of the shape tuple at that index i. When we refer to multiple volumes that share the same first k dimensions (e.g. $\varvec{\mathscr {A}}\in \mathbb {R}^{N_1 \times N_2}$, $\varvec{\mathscr {B}}\in \mathbb {R}^{N_1 \times N_2 \times N_3}$, and $\varvec{\mathscr {C}}\in \mathbb {R}^{N_1 \times N_2 \times N_3'}$), we use $*$ to capture all remaining and possibly different dimensions (e.g. $\varvec{\mathscr {A}}$, $\varvec{\mathscr {B}}$, and $\varvec{\mathscr {C}}$ have shape $(N_1, N_2, *)$).

Transformers

In this section, we outline the technical details of the encoder-only transformer based on its first introduction¹³. The same source describes encoder-decoder and decoder-only transformers for interested readers.

Input processing

The first layer of a transformer is the embedding layer. For a single length N sequence of $D_{in}$-dimensional vectors, $\varvec{X}\in \mathbb {R}^{N \times D_{in}}$, each vector is transformed into a vector of size $D_{out}$. If the input is a sequence of tokens such as in language tasks, then $D_{in} = 1$. The transformation could be a linear one, or a more complex nonlinear one such as another neural network. The result is an embedded sequence of shape $(N, D_{out})$. Next, positional encoding is added to the embedding to inject positional information to each vector in the sequence. Positional encodings can either be fixed or learned.

Multi-head self-attention

The idea of self-attention is to find varying degrees of associations between elements in a sequence. Transformers typically implement multi-headed self-attention within each encoder layer, which allows the model to learn different types of associations. Let $\varvec{X}\in \mathbb {R}^{N \times D}$ be the input sequence into the attention mechanism and $\varvec{W}_h^Q, \varvec{W}_h^K, \varvec{W}_h^V \in \mathbb {R}^{D \times \frac{D}{H}}$ for head $h = 1, 2, \dots , H$. Additionally, let $\sigma$ be the row-wise softmax operator. Then, the attention for head h, $\varvec{A}_h$ is defined as

$$\begin{aligned} \varvec{A}_h = \sigma \left( \frac{\varvec{Q}_h \varvec{K}_h^\top }{\sqrt{D / H}} \right) \varvec{V}_h, \end{aligned}$$

(1)

where $\varvec{Q}_h = \varvec{X}\varvec{W}_h^Q, \varvec{K}_h = \varvec{X}\varvec{W}_h^K, \varvec{V}_h = \varvec{X}\varvec{W}_h^V$ are the query, key, and value matrices, respectively. Next, the results are concatenated and linearly transformed to produce the output $\varvec{Y}= [\varvec{A}_1 \cdots \varvec{A}_H] \varvec{W}_O$ for $\varvec{W}_O \in \mathbb {R}^{D \times D}$. Note that $\varvec{A}_1, \dots , \varvec{A}_H$ can be computed in parallel.

The main source of inefficiency is the computation of $\varvec{Q}_h \varvec{K}_h^\top$ which levies a computational and memory cost of $\mathscr {O}(N^2)$. Consequently, the utility of transformers can be limited for long sequences without proper hardware. This becomes a dire issue as the dimensionality of the input grows, such as in images and videos which can be flattened into a long sequence. Thankfully, there has been substantial work in reducing the complexity with more efficient attention approximation mechanisms, such as Linformer³⁰, Reformer³¹, Big Bird³², and many others³³.

Axial self-attention

When it comes to high-dimensional data, such as the EBSD volumes, the sequence length explodes if we flatten (or patchify in the case of many transformers in computer vision¹⁴) the input and apply vanilla self-attention, making attention computation a serious bottleneck. Furthermore, since we observe a lot of structure with EBSD data, it is reasonable to utilize some simplified attention mechanism. Our model uses axial attention^34,35 which runs the self-attention mechanism along the dimensions of the input tensor. For instance, in a cube, each voxel only attends to voxels in the same row, column, or depth. This greatly reduces the amount of computation and memory, especially for higher order tensors. Intuitively, axial attention is appropriate because we hypothesize that since local information can be quite uniform (nearby voxels are likely to be in the same grain), long range information should also be included. With axial attention, it is highly likely to also obtain information in other grains and its boundaries.

In terms of implementation, the formula for multi-headed attention (1) can be reused. Let $\varvec{\mathscr {X}}\in \mathbb {R}^{N_1, \dots , N_K, D}$ be a single K-dimensional input where $(N_1, \dots , N_K)$ defines the shape of the volume and D is the embedding size. As an example, suppose we are interested in finding axial attention along the k-th dimension for $1<k<K$. With proper dimension permutation and flattening, we can reshape $\varvec{\mathscr {X}}$ to a tensor of shape $(\prod _{i\ne k} N_i, N_k, D)$ and compute multi-headed attention by treating the first dimension as the batches. In our model, we repeatedly use the outputs of axial attention to compute axial attention along the next dimension.

Feedforward blocks

The next major component to create a transformer is the feedforward block, a multilayer perceptron with a single hidden layer. The activation function is often either a Gaussian Error Linear Unit (GELU) or Rectified Linear Unit (ReLU) function, and although the input and output sizes are identical, the hidden size can be much larger. Consequently, although self-attention is a computational bottleneck, the feedforward blocks are the dominant source of parameters in the model.

Putting it all together

With all these components, we are ready to define the general architecture of a transformer. First, an input is processed by an embedding layer and positional encoding layers at the start. Then, the input progresses into the encoder, a sequence of alternating self-attention and feedforward blocks. For training stability, residual connections and normalization layers are inserted in between each attention block and feedforward block. Finally, the features from the encoder are linearly projected to be the desired shape of the output. A visualization of a vanilla and axial transformer, which mainly differ in their attention implementation, is depicted in Fig. 1.

Method

We propose a transformer model to learn missing slices of EBSD data, followed by a projection step to smooth out the voxel values. Described in more detail in the following sections, our methodology is summarized in Fig. 2. Due to limited real EBSD data, our goal is to train this model on a large and diverse synthetic dataset and demonstrate that it can generalize to real EBSD data, which we evaluate on two nickel superalloy EBSD volumes, one for alloy IN625^36,37 and one for alloy IN718^38,39. The following repository contains the code for our method: https://github.com/hdong920/ebsd_slice_recovery.

Data description and preparation

Each dataset (both synthetic and experimental) includes orientation information at every voxel in a 3D image. For the experimental data, a substatial amount of preprocessing was done to handle the alignment of the data and clean up noise; a complete description of the preprocessing may be found in Chapman et al.² and Stinville et al.³⁸ In particular, we remove any grains smaller than 27 voxels ($3^3$) and average orienations per grain. While operating on grain-average orientations simplifies the orientation prediction problem, it does not represent the real complexity of orientation fields, which often exhibit subtle local variations. We chose to focus on grain-averaged orientations in this first implementation primarily to simplify our initial interpretation of the transformer predictions. Additionally, the publicly available version of the IN625 dataset only contained grain-averaged orientations. The original volume containing Euler angles is of shape $(N_1, N_2, N_3, 3)$, where $N_1, N_2, N_3$ are the physical dimensions, and the final dimension represents the 3 Euler angles needed to define an orientation. Additional arrays are then computed from the input data:

Cubochorics: A volume of shape $(N_1, N_2, N_3, 3)$ where the last dimension contains the cubochoric coordinates⁴⁰ converted from the original Euler angles at each voxel. Cubochoric coordinates are chosesn since the Euclidean metric is used for regressing the transfomer model. The Euclidean distance between points in Euler angle space does not necessarily relate to the angular distance between those points. While this is also true for points in cubochoric space, as the cubochoric representation is an equal-volume mapping of SO(3) onto a grid as opposed to an equal-angle mapping, the Euclidean distance in cubochoric space approximates the Euclidean distance between unit quaternions for small misorientations⁴¹. Since Euclidean distance is a valid metric in SO(3)⁴², we operate completely in cubochoric space, under the assumption that the Euclidean metric is a reasonable approximation for similarity between points in this space.
IDs: A volume of shape $(N_1, N_2, N_3)$ which assigns identification numbers (IDs) to denote which grain each voxel belongs to. Each ID number has a unique vector of cubochoric coordinates associated with it. While not needed during training, these ID numbers will be used to smooth our model outputs and evaluate model accuracy. For experimental data, the IDs are found by segmenting the grains using a misorientation tolerance^2,38, and for synthetic data, the IDs are generated alongside the orientation data.
Boundaries: A volume of shape $(N_1, N_2, N_3)$ containing Boolean values indicating if a voxel is on the grain boundary. More specifically, a voxel is a boundary voxel is at least one of its neighbors is of a different ID number than itself, where two voxels are neighbors if they share a face.

To illustrate, Fig. 3 contains example slices of (scaled and shifted) Cubochorics and Boundaries of IN625 and IN718.

Synthetic volumes are generated via DREAM.3D⁴³. Each synthetic training and validation volume is of shape $(192, 192, 192, *)$ and $(64, 192, 64, *)$, respectively. Within DREAM.3D, we generate 9 training volumes while independently varying the mean grain size and mean transformations per grain. In particular, we generate volumes with mean grain sizes 2, 2.5, and 3 with no twins; we also generate volumes with mean twins frequencies 0, 1, 2, 3, 4, and 5 while fixing mean grain size to be 2.3. Due to the nature of the software, these parameters are unitless, as we can always synthetically increase or decrease the granularity of the volume. For example slices of these synthetic volumes, see Fig. 4. The grain size distributions for each dataset are shown in Fig. 4 as probability plots. The natural logarithm of the normalized grain sizes are shown; ideal lognormally distributed data would lie on straight lines in such plots. In general, the grain sizes are primarily lognormal near their means, with noted deviations from lognormality in their tails, which is a known phenomenon⁴⁴.

Training details

We first describe a few data augmentation steps. The number of unique cubochoric coordinates is quite limited, so color shift transformations, along with other augmentations, are critical. Note that use of the word “color” is loose here because we treat the cubochoric coordinates the same as color channels in computer vision. In particular, our augmentations include random linear color shifts, rotations, and flips.

Using the 9 synthetic volumes generated by DREAM.3D, we train our axial transformer model in a self-supervised fashion similar to masked language modeling tasks¹⁷. These volumes are first normalized along each of the three cubochoric indices. Each training input is sampled from a randomly chosen volume with physical dimensions randomly permuted. Not only is cropping necessary due to computational limits, it also acts as another form of augmentation. For a sample $\varvec{\mathscr {X}}_\star \in \mathbb {R}^{64\times 7 \times 64 \times 3}$, one of the central 5 slices (along the second dimension) is randomly masked. If m is the index of the masked or unobserved slice, define $\varvec{\mathscr {M}}\in {[0,1]}^{64\times 7 \times 64 \times 3}$ to be a mask such that $[\varvec{\mathscr {M}}]_{\cdot , m} = 0$ and 1 elsewhere. Then, the model input and output will be $\varvec{\mathscr {X}}_\star \odot \varvec{\mathscr {M}}$ and $\widehat{\varvec{\mathscr {X}}} \in \mathbb {R}^{64\times 7 \times 64 \times 3}$, respectively.

Since EBSD produces very structured data, we can leverage this to design a more effective loss function. A simple mean-squared-error (MSE) loss function across all voxels would not be sufficient because most voxels in the input are observed, so we would risk learning an identity function which would produce a fairly low loss value. A better approach would be to find the MSE of only the missing slice, analogous to masked language modeling objective functions¹⁷. However, based on Fig. 3, we see that many of voxels remain unchanged across multiple slices, and the slice-to-slice changes all lie along the grain boundaries (see Fig. 6). Since voxels within grains are easy to predict the values of, the source of difficulty is recovering the boundary voxels. Therefore, we opt for a MSE loss function that only considers boundary voxels in the missing slice. More formally, letting k be the index of the missing slice and $\varvec{\mathscr {E}}\in {[0,1]}^{64 \times 64 \times 1}$ be the missing slice’s boundaries map in Boundaries, the loss function we use is

$$\begin{aligned} \mathscr {L}(\widehat{\varvec{\mathscr {X}}}, \varvec{\mathscr {X}}_\star ) = \frac{\left\| [\widehat{\varvec{\mathscr {X}}} - \varvec{\mathscr {X}}_\star ]_{\cdot , m} \odot \varvec{\mathscr {E}}\right\| _{{{\,\mathrm{\textsf{F}}\,}}}^2}{\left\| \varvec{\mathscr {E}}\right\| _0}, \end{aligned}$$

(2)

following broadcasting rules. By the way we defined $\varvec{\mathscr {E}}$, $\left\| \varvec{\mathscr {E}}\right\| _0$ is the number of unobserved boundary voxels. In summary, this loss function essentially averages MSE error across all unobserved boundary voxels.

Using this scheme, we train our 8-headed 8-layer model using stochastic gradient descent with a momentum parameter of 0.9 and weight decay of $1\textrm{e}{-5}$. Using cosine schedules, we warm-up the learning rate up to 0.01 for 8000 steps. While we decay the learning rate until 160000 total gradient steps are taken with batches of 1 sample, performance plateaus around halfway. The model uses learnable positional encoding, 4 attention heads, an embedding size of 128, and a feedforward size of 512. The feedforward block is consists of two 3D convolutions with a window size of 3 along each dimension separated by a GELU. See Fig. 1 for a visualization of the architecture. Notably, our model is compact for a transformer, only consisting of slightly under 30 million parameters. We also apply 10% dropout. Recall that the transformer returns predicted cubochoric coordinates at each voxel of the same shape as the input, but only the masked slice contributes to the loss function (2).

Nearest neighbor projection

The final step is to use outputs of our transformer, which are continuous values, to assign each voxel to a grain and produce a smoother slice with fewer intra-grain variations. To do this, we prepare a dictionary relating observed grain IDs to cubochoric coordinates. First, we assign voxels whose neighbors (voxels that share a face) that reside in the previous and next slice have the same ID to be that ID. We observe these voxels act like anchors that provide more neighbors for voxels that are more difficult to classify, which empirically improves the recovery. Next, we begin projecting voxels, prioritizing the ones with the most neighbors that have been assigned an ID, which include observed voxels and previously projected voxels. This way, we first project voxels with the most known information in their neighborhoods first to gradually build up information in more obscure neighborhoods. Projections are determined by the minimum $\ell _2$ distance to relevant cubochoric coordinates pulled from the dictionary based on neighboring IDs. Summarized in Fig. 7, the projection algorithm essentially turns the transformer outputs of the missing slice into IDs which can then be converted into cubochoric coordinates or Euler angles. For an example, see the bottom row of Fig. 7.

Experiments

With our method defined, we now evaluate its performance on synthetic and real EBSD data with no additional training or fine tuning. Even though our method overwhelmingly outperforms the baselines in terms of recovery, we also point out some weaknesses and avenues for improvement.

There are three baselines we compare to. One is k-nearest neighbors (KNN) where a voxel’s ID is determined by vote based only on observed or previously assigned IDs among its neighboring voxels. This is the exact process of the projection step (including the anchoring procedure), except instead of using a distance metric, voxels are assigned IDs by a vote system. Ties are broken randomly. Another method, which is currently employed as a simple solution to missing slices, is to copy an adjacent slice to replace the missing slice. This usually maintains fairly decent accuracy since the changes from slice to slice are minuscule compared to the number of voxels. As such, copying the previous and next slice are our other two baselines.

Performance

Because the changes from slice to slice are captured by boundary voxels, we define two different accuracy metrics using the IDs. We denote the accuracy of each voxel in the recovered slice as the overall accuracy and the accuracy only considering boundary voxels in the recovered slice as the boundary accuracy. The latter poses a greater challenge for all models, as it is consistently lower than the overall accuracy where the influence of non-boundary voxels often dominates.

While we report the mean accuracies and standard deviation across validation samples, the performance of one method is heavily correlated with the performance of others. For instance, if a particular slice is difficult to recover for one method, it is likely to be difficult for all other methods. To better compare the performances between our transformer and each baseline, we obtain differences in accuracy for each sample. Namely, we find $d(m, \varvec{\mathscr {X}}_\star , \widehat{\varvec{\mathscr {X}}}, \widehat{\varvec{\mathscr {X}}}_{b}) := m(\varvec{\mathscr {X}}_\star , \widehat{\varvec{\mathscr {X}}}) - m(\varvec{\mathscr {X}}_\star , \widehat{\varvec{\mathscr {X}}}_{b})$ for an accuracy metric m, ground truth $\varvec{\mathscr {X}}_\star$, and outputs, $\widehat{\varvec{\mathscr {X}}}$ and $\widehat{\varvec{\mathscr {X}}}_b$, from our transformer and baseline b, respectively. Computing this across all samples, the result is a distribution of accuracy improvements.

Synthetic volumes

First, we generate 4 independent synthetic volumes of shape (64, 192, 64) for each setting. Each volume is divided into 27 nonoverlapping segments of shape (64, 7, 64) each with one of the five interior slices masked. This results in 108 validation samples for each setting. These validation metrics are displayed in Fig. 8 when varying the mean twins frequency and in Fig. 9 when varying mean grain size.

We observe that our method more accurately recovers missing slices than all other baselines for every synthetic validation sample since each slice observed a positive improvement in overall and boundary accuracy. The performance gain is much more apparent for boundary voxels. Furthermore, our transformer performance is much lower variance. Among the baselines, KNN achieves the closest accuracy to our method, but it still underperforms in comparison. As expected, the difference between our method and the baselines diminishes as the scenario get simpler (larger grain sizes and fewer twins) since the baselines are already producing very accurate results.

Real EBSD datasets

Now, we seek to understand how well our model transfers to real data by running our trained model on IN625^36,37 and IN718^38,39. For both volumes, we subdivide each volume into nonoverlapping subvolumes such that all slices are of height and width 64 voxels. Then, each subvolume is then further partitioned into nonoverlapping segments of shape $(64, 7, 64, *)$, each representing a single test sample. Again, each sample contains one masked slice among the five central slices. In the end, there are 800 samples for IN625 and 3298 samples for IN718 after discarding a small set of samples whose missing slice did not have any boundary voxels (these samples would trivially result in exact recovery regardless of the model we use). Average accuracies and accuracy improvements for both volumes are shown in Table 1 and Fig. 10, respectively. For recovery examples, see Fig. 11.

Table 1 Average overall accuracy and boundary accuracy across samples of real world datasets, including their standard deviations.

Full size table

Surprisingly, even though our model is trained exclusively on synthetic data, we observe it transfers well to real EBSD data, as it still outperforms every baseline on nearly every sample, and the interquartile range is positive. Again, the difference is accentuated for boundary voxels. Test results for the IN718 dataset had much higher variance, likely due to each slice having proportionally fewer boundary voxels than IN625 slices. Qualitatively, our model has a stronger capability to recovery thin features than KNN, which visually tends to ignore more subtle structures.

Limitations

While our method provides superior results compared to the baselines, we also seek to understand the circumstance in which it may underperform. One challenge is rapid changes between slices which make up a small minority of test inputs. For instance, if the k-th slice is missing and the set of grains present in the $(k-1)$-th is different from the set of grains in $(k+1)$-th slice, the model has a lot of freedom to decide on which ones are present and to what degree in the missing slice. A case in which this can arise is when the faces of grains or twins are perpendicular to the slicing direction. However, we observe this is an issue for all methods. Using the second row of Fig. 11 as an example, the bright green grain is a large crescent shape on the slice before the missing slice but is hardly present in the following slice. Our model and KNN strives to find some middleground, but ultimately, both misclassify many of the voxels. A similar scene plays out in the fourth row of Fig. 11. Thus, future work includes designing a better loss function to mitigate errors for this or emphasizing these scenarios during training.

Another limitation arises from our projection method. Recall our local projection method projects a voxel to have the ID as one of its observed or previously assigned voxels in order to encourage grain connectedness. This means long range dependencies may be ignored in favor of more smooth structures. Furthermore, connectedness is not guaranteed for very thin features at an angle. Examples of both edge cases for matrices are illustrated in Fig. 12. Though these scenarios are fairly rare in practice, further work could be done to improve our projection algorithm. In particular, our use of grain-averaged orientations directly impacts the design of the projection step. While our transformer predicts real-valued orientations, we utilize a projection step that “snaps” these predicted values to the nearest grain-averaged value to compare with the original training data. Moving to orientation values that are not grain-averaged will require adaptation of the projection step, and potentially require modifications of the underlying transformer architecture.

Future directions can also better integrate the dynamics of orientation data into our method. For example, our use of MSE loss operating on cubochoric values is not a mathematically rigorous metric representation. We initially attempted to utilize an angular metric for loss, but encountered instability during training that we believe related to the trigonometric functions used during the loss computation. An immediate improvement to the current approach would be operating on an orientation representation for which we can compute a stable loss that represents the true angular difference in that representation space. In addition, our current data augmentation method borrowed from computer vision techniques, particularly color shifting, cannot guarantee preservation of misorientation relationships between twins and their parent grains. Taking such additional special considerations may enhance the performance of our method.

Conclusion

We have presented a novel method using transformers followed by a projection algorithm to recover missing 3D EBSD data which vastly outperforms all baselines. Notably, even though our model is trained on synthetic data, it still recovers more accurate slices for real 3D EBSD data than the baselines by a wide margin, making it a powerful data processing tool for faulty 3D EBSD readings. Furthermore, our model opens the possibility to skip every fourth slice during data collection (by using every skipped slice and the three collected slices on each side as the input to our model), potentially reducing the collection time by 25%. Future work involves addressing the limitations, scaling our method, and considering more general cases, such as altering the projection algorithm to apply to consecutive missing slices within a sample. In terms of scaling, we hope to observe emergent behavior similar to scaling LLMs and their training data, which have seen vast improvements to performance and versatility⁴⁵. Beyond EBSD, it would be interesting to investigate our method’s applicability for other high dimensional material science datasets.

Data availability

The IN625 and IN718 datasets are available in the repositories, https://acdc.alcf.anl.gov/mdf/detail/shade_afrl_am_package_v2.1/ and https://datadryad.org/stash/dataset/doi:10.5061/dryad.83bk3j9sj, respectively. Synthetic EBSD data are generated using DREAM.3D and are located at https://drive.google.com/drive/u/1/folders/1S-jKZ7wxIT4ra4q5VJ_vEl4rO3yYQSgQ.

References

Uchic, M. et al. An automated multi-modal serial sectioning system for characterization of grain-scale microstructures in engineering materials. In De Graef, M., Poulsen, H. F., Lewis, A., Simmons, J. & Spanos, G. (eds.) Proceedings of the 1st International Conference on 3D Materials Science 195–202. https://doi.org/10.1007/978-3-319-48762-5_30 (Springer International Publishing, Cham, 2016).
Chapman, M. G. et al. Afrl additive manufacturing modeling series: challenge 4, 3d reconstruction of an in625 high-energy diffraction microscopy sample using multi-modal serial sectioning. Integr. Mater. Manuf. Innov. 10, 129–141 (2021).
Article Google Scholar
Polonsky, A. T. et al. Scan strategies in EBM-printed IN718 and the physics of bulk 3D microstructure development. Mater. Charact. 190, 112043. https://doi.org/10.1016/j.matchar.2022.112043 (2022).
Article CAS Google Scholar
Polonsky, A. T. et al. Three-dimensional analysis and reconstruction of additively manufactured materials in the cloud-based BisQue infrastructure. Integr. Mater. Manuf. Innov. 8, 37–51. https://doi.org/10.1007/s40192-019-00126-7 (2019).
Article Google Scholar
Jolley, B. R., Uchic, M. D., Sparkman, D., Chapman, M. & Schwalbach, E. J. Application of serial sectioning to evaluate the performance of x-ray computed tomography for quantitative porosity measurements in additively manufactured metals. JOMhttps://doi.org/10.1007/s11837-021-04863-z (2021).
Article Google Scholar
Nguyen, L. T. & Rowenhorst, D. J. The alignment and fusion of multimodal 3D serial sectioning datasets. JOM 73, 3272–3284. https://doi.org/10.1007/s11837-021-04865-x (2021).
Article ADS Google Scholar
Kotula, P. G., Keenan, M. R. & Michael, J. R. Tomographic spectral imaging with multivariate statistical analysis: Comprehensive 3d microanalysis. Microsc. Microanal. 12, 36–48 (2006).
Article ADS CAS PubMed Google Scholar
Calcagnotto, M., Ponge, D., Demir, E. & Raabe, D. Orientation gradients and geometrically necessary dislocations in ultrafine grained dual-phase steels studied by 2d and 3d ebsd. Mater. Sci. Eng. A 527, 2738–2746 (2010).
Article Google Scholar
Naragani, D. et al. Investigation of fatigue crack initiation from a non-metallic inclusion via high energy x-ray diffraction microscopy. Acta Mater. 137, 71–84 (2017).
Article ADS CAS Google Scholar
Sandgren, H. R. et al. Characterization of fatigue crack growth behavior in lens fabricated Ti-6Al-4V using high-energy synchrotron x-ray microtomography. Addit. Manuf. 12, 132–141 (2016).
CAS Google Scholar
Wilson, J. R. et al. Three-dimensional reconstruction of a solid-oxide fuel-cell anode. Nat. Mater. 5, 541–544 (2006).
Article ADS CAS PubMed Google Scholar
Teferra, K. & Rowenhorst, D. J. Optimizing the cellular automata finite element model for additive manufacturing to simulate large microstructures. Acta Mater.https://doi.org/10.1016/j.actamat.2021.116930 (2021).
Article Google Scholar
Vaswani, A. et al. Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017).
Khan, S. et al. Transformers in vision: A survey. ACM Comput. Surv. (CSUR) 54, 1–41 (2022).
Article MathSciNet Google Scholar
Latif, S. et al. Transformers in speech processing: A survey. arXiv preprint arXiv:2303.11607 (2023).
Zhang, S. et al. Applications of transformer-based language models in bioinformatics: A survey. Bioinformatics Advances (2023).
Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
Radford, A., Narasimhan, K., Salimans, T., Sutskever, I. et al. Improving language understanding by generative pre-training (2018).
Kalyan, K. S., Rajasekharan, A. & Sangeetha, S. Ammus: A survey of transformer-based pretrained models in natural language processing. arXiv preprint arXiv:2108.05542 (2021).
Zhang, S. et al. Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068 (2022).
Brown, T. et al. Language models are few-shot learners. Adv. Neural. Inf. Process. Syst. 33, 1877–1901 (2020).
Google Scholar
Touvron, H. et al. Llama: Open and efficient foundation language models (2023). arXiv:2302.13971.
Pathak, D., Krahenbuhl, P., Donahue, J., Darrell, T. & Efros, A. A. Context encoders: Feature learning by inpainting. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2536–2544 (2016).
He, K. et al. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 16000–16009 (2022).
Kong, L. et al. Understanding masked autoencoders via hierarchical latent variable models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7918–7928 (2023).
Chang, Y.-L., Liu, Z. Y., Lee, K.-Y. & Hsu, W. Free-form video inpainting with 3d gated convolution and temporal patchgan. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 9066–9075 (2019).
Liu, R. et al. Decoupled spatial-temporal transformer for video inpainting. arXiv preprint arXiv:2104.06637 (2021).
Dosovitskiy, A. et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).
Dong, H., Shah, M., Donegan, S. & Chi, Y. Deep unfolded tensor robust PCA with self-supervised learning. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1–5 (IEEE, 2023).
Wang, S., Li, B. Z., Khabsa, M., Fang, H. & Ma, H. Linformer: Self-attention with linear complexity. arXiv preprint arXiv:2006.04768 (2020).
Kitaev, N., Kaiser, Ł. & Levskaya, A. Reformer: The efficient transformer. arXiv preprint arXiv:2001.04451 (2020).
Zaheer, M. et al. Big bird: Transformers for longer sequences. Adv. Neural. Inf. Process. Syst. 33, 17283–17297 (2020).
Google Scholar
Tay, Y., Dehghani, M., Bahri, D. & Metzler, D. Efficient transformers: A survey. ACM Comput. Surv. 55, 1–28 (2022).
Article Google Scholar
Ho, J., Kalchbrenner, N., Weissenborn, D. & Salimans, T. Axial attention in multidimensional transformers. arXiv preprint arXiv:1912.12180 (2019).
Wang, H. et al. Axial-deeplab: Stand-alone axial-attention for panoptic segmentation. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part IV, 108–126 (Springer, 2020).
Menasche, D. B. et al. Afrl additive manufacturing modeling series: challenge 4, in situ mechanical test of an in625 sample with concurrent high-energy diffraction microscopy characterization. Integr. Mater. Manuf. Innov. 10, 338–347 (2021).
Article Google Scholar
Shade, P. A. et al. AFRL am modeling challenge series: Challenge 4 data package https://doi.org/10.18126/K5R2-32IU (2019).
Stinville, J. et al. Multi-modal dataset of a polycrystalline metallic material: 3d microstructure and deformation fields. Sci. Data 9, 460 (2022).
Article CAS PubMed PubMed Central Google Scholar
Stinville, J. et al. Multi-modal dataset of a polycrystalline metallic material: 3d microstructure and deformation fields. https://doi.org/10.5061/dryad.83bk3j9sj (2022).
Roşca, D., Morawiec, A. & De Graef, M. A new method of constructing a grid in the space of 3d rotations and its applications to texture analysis. Modell. Simul. Mater. Sci. Eng. 22, 075013 (2014).
Article ADS Google Scholar
Polonsky, A. T. et al. Solidification-driven orientation gradients in additively manufactured stainless steel. Acta Mater. 183, 249–260 (2020).
Article ADS CAS Google Scholar
Huynh, D. Q. Metrics for 3d rotations: Comparisons and analysis. J. Math. Imaging Vis. 35, 155–164 (2009).
Article MathSciNet MATH Google Scholar
Groeber, M. A. & Jackson, M. A. Dream. 3d: a digital representation environment for the analysis of microstructure in 3d. Integrating materials and manufacturing innovation3, 56–72 (2014).
Donegan, S., Tucker, J., Rollett, A., Barmak, K. & Groeber, M. Extreme value analysis of tail departure from log-normality in experimental and simulated grain size distributions. Acta Materialia 61, 5595–5604 (2013).
Article ADS CAS Google Scholar
Wei, J. et al. Emergent abilities of large language models. arXiv preprint arXiv:2206.07682 (2022).

Download references

Acknowledgements

This work is supported in part by the Air Force D3OM2S Center of Excellence under FA8650-19-2-5209, and by the Carnegie Mellon University Manufacturing Futures Initiative, made possible by the Richard King Mellon Foundation. The work of H. Dong is also supported by Liang Ji-Dian Graduate Fellowship and Michel and Kathy Doreau Graduate Fellowship in Electrical and Computer Engineering at Carnegie Mellon University. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. Authors thank Michael Uchic, Marc DeGraef, Greg Rohrer, and Zachary Varley for engaging discussions throughout the course of this work. Diagrams were created in draw.io 21.5.1.

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, 15289, USA
Harry Dong & Yuejie Chi
Air Force Research Laboratory, Materials and Manufacturing Directorate, Wright-Patterson AFB, Dayton, 45433, USA
Sean Donegan & Megna Shah

Authors

Harry Dong
View author publications
You can also search for this author in PubMed Google Scholar
Sean Donegan
View author publications
You can also search for this author in PubMed Google Scholar
Megna Shah
View author publications
You can also search for this author in PubMed Google Scholar
Yuejie Chi
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

H.D. designed and implemented the model, projection algorithm, and experiments. S.D. and M.S. processed the raw data and helped integrate domain knowledge to our method. S.D., M.S., and Y.C. advised and guided the project. All authors wrote and reviewed the manuscript.

Corresponding author

Correspondence to Harry Dong.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Dong, H., Donegan, S., Shah, M. et al. A lightweight transformer for faster and robust EBSD data collection. Sci Rep 13, 21253 (2023). https://doi.org/10.1038/s41598-023-47936-6

Download citation

Received: 16 August 2023
Accepted: 19 November 2023
Published: 01 December 2023
DOI: https://doi.org/10.1038/s41598-023-47936-6

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.