Asymmetric M-estimation and change-point analysis applied to atomic force microscopy data enhancement

Dervillé Alexandre; Clausel Marianne; Coeurjolly Jean-François

doi:10.1088/1681-7575/aad0ac

1. Introduction

The atomic force microscopy (AFM) is a poweful technology used to characterize materials at the nanoscale. We refer for example to the books by Yablon (2013) and Bowen and Hilal (2009) for more details on the data acquisition. AFM is used to form an image of the topography of a sample material, by scanning the surface line by line, with a constant spacing and by providing at each point information of its height.

Unfortunately, AFM images are affected by different kinds of defects, as presented by e.g. West and Starostina (2002) and Golek et al (2014). In this paper, we consider two kinds of defects: linear type deviations and jumps which are due to a break during measurement. Both these artifacts require real-time corrections. We present in figure 1 characteristic examples. Those images are a representative sample of the kind of images often encountered in AFM image analysis. They are further used in the paper to illustrate our methodology to remedy the aforementioned problems. We clearly see in figures 1(a) and (b) a linear drift along the lines. In addition to the drift, figures 1(c) and (d) exhibit a jump (a change-point). We also see two other characteristics: the objects can be randomly or regularly located and are above the surface (see figures 1(a) and (c)) or below it (see figures 1(b) and (c)).

Several approaches have been developed to address these issues. Classically, one first extracts the objects to isolate the background of the image. Then one corrects the linear drift and runs a jump detection procedure on the background image, see e.g. Rahe et al (2010) and Chen et al (2013). An obvious criticism is that this method is subject to the quality of object detection. Commercial, free or open-source, software propose different correction strategies for the enhancement of AFM signals. The main drawback of all existing solutions is the need of the user intervention to tune particular parameters. In this paper, we propose a new strategy which does not require any preprocessing routine to remove objects. Objects are considered as outliers, so, we estimate the linear drift using M-estimators, and in particular asymmetric M-estimators. Thereafter we combine this line by line estimation with smoothing techniques and a nonparametric change-point detection method.

Next two sections detail the methodology which is further compared to standard softwares on images presented in figure 1.

2. Drift estimation

The first step is the drift estimation. Following the literature, e.g. West and Starostina (2002) and Golek et al (2014), we assume that this drift is linear along each line.

The algorithm we propose is independent of the objects locations and their relative position with respect to substrate.

We consider the following model for the AFM image:

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle \label{eq:model} Z_{i,j} - H(x_j,y_i) = \theta_{i,0} + \theta_{i,1} x_{j} + \varepsilon_{i,j} \nonumber \end{align} \tag{ 1 }$

for $i=1, \dots, n_I$ , $j=1, \dots, n_J$ where n_I and n_J denote respectively the number of lines and columns of the matrix image. Here, Z_i,j is the observation for the ith line and jth column, x_j is the abscissa position, $\theta_{i, 0}$ and $\theta_{i, 1}$ are the two real parameters which vary from line to line, $H(x_j, y_i)$ is the height of the object at the location $(x_j, y_i)$ . The quantity $H(x_j, y_i)$ represents the height of the object (positive if objects are above the surface and negative if objects are under the surface) and is potentially zero when there is no object. Finally, $\varepsilon_{i, j}$ represents the random fluctuation. The random variables $\varepsilon_{i, j}$ are assumed to be centered, independent and identically distributed.

Let $\boldsymbol{\theta_i}=(\theta_{i, 0}, \theta_{i, 1}){}^\top$ for $i=1, \dots, n_I$ . Figure 2 provides an example of one line profile from two different raw images. We clearly see a linear trend. The bumps, which are above for figure 2(a) and below for figure 2(b) correspond to objects (respectively nanoparticles or square holes). If the function H were known, a classical approach would consist in estimating $\boldsymbol\theta_{i}$ using ordinary least squares. We relax this assumption: our approach consists in considering the objects as outliers, that is, in equation (1), the response variable is not $Z_{i, j}-H(x_j, y_i)$ but simply Z_i,j. Doing so, a standard approach has no chance to estimate correctly $\boldsymbol\theta_i$ . Our suggestion is to estimate $\boldsymbol{\theta_i}$ by a robust regression, i.e. using M-estimators. We refer to Hampel et al (1986) for an overview on the subject. The M-estimator $\boldsymbol{\hat \theta_i}$ of $\boldsymbol{\theta_i}$ minimizes for each line

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle \label{eq:rho} \sum_j \rho\left(Z_{i,j} - \theta_{i,0} - \theta_{i,1} x_{j} \right) \nonumber \end{align} \tag{ 2 }$

where $\rho : \mathbb{R} \rightarrow {\mathbb{R}^+}$ is a given objective function, assumed to be continously differentiable. The minimization of equation (2) is done by solving

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle \label{eq:psiFunc} \sum_j \psi \left(Z_{i,j} - \theta_{i,0} - \theta_{i,1} x_{j} \right) = 0 \nonumber \end{align} \tag{ 3 }$

where ψ is the derivative of ρ. Many choices are possible for the function ψ, see Andrews et al (1972) for some examples. As a first attempt, we use the Tuckey biweight ψ function (see equation (4) with $c_1=c_2=c$ ) and solve equation (3) using the iterative reweighted least squares method presented by Green (1984). The constant c is set to 4.685, see e.g. Hampel et al (1986).

For the first example (see the estimated linear fit corresponding to the input 'M-estimate' in figure 2(a)), the linear trend is much more accurately estimated than if we simply used the least squares estimator. For the second example, the raw image has a lot of objects (half of the data are 'outliers') and these objects are located below the surface. For that example, both the least squares approach and the basic M-estimator fail to recover the linear trend at the surface which for these data are located at the top of the curve. To circumvent this problem, we suggest to use an asymmetric ψ function, as suggested by Allende et al (2006) in another context. The asymmetric Tukey ψ function, introduced by Tukey et al (1972) is given by

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle \label{eq:TukeyAS} \psi(x) = \left\{\begin{array}{@{}lcl@{}} x \left(1-\frac{x^2}{c_1^2} \right)^2 & {{\rm ~if~}} & -c_1 \leqslant x \leqslant 0 \nonumber \\ x \left(1-\frac{x^2}{c_2^2} \right)^2 & {{\rm ~if~}} & 0 \leqslant x \leqslant c_2 \nonumber \\ 0 & {{\rm ~otherwise.~}} & \nonumber \\ \end{array} \right. \nonumber \end{align} \tag{ 4 }$

In the following, we call left asymmetric (resp. right asymmetric) M-estimator the M-estimator derived from the asymmetric function define in equation (4) with $(c_1, c_2)=(10\times 4.685, 4.685)$ (resp. $(c_1, c_2)=(4.685, 10\times 4.685)$ ). We use the left asymmetric (resp. the right asymmetric) M-estimator when the objects are located above (resp. below) the surface.

To get a data-driven procedure, we consider both asymmetric versions and suggest to choose the one for which the vector of estimated intercepts or slopes $\newcommand{\e}{{\rm e}} \hat \theta_{i, \ell}$ for $i=1, \dots, n_I$ and $\newcommand{\e}{{\rm e}} \ell=0, 1$ vary the less. The reason behind that is that the surface of the substrate is more regular than the objects. So, if we choose the wrong side for the M-estimator, the intercepts and slopes estimates will fluctuate much more. As a final criterion, we suggest to choose the asymmetric M-estimator leading to the smallest standard deviation for the slopes estimate (which vary more around a constant value than the intercepts estimates).

This is illustrated by figure 3: the magnitude of the estimated slopes is much smaller with a right asymmetric M-estimator than with the left one. The right one is therefore chosen which actually means that for the corresponding image the objects are indeed below the surface. The linear trend for the input 'asymmetric M-estimate' in figure 2(b) shows the final result. By giving a strong weight to 'top' data, the method works pretty well. It is to be noticed that the asymmetric M-estimate works also well if a small number of objects is observed: figure 2(a) shows indeed that both the standard M-estimate as well as its asymmetric version perform similarly.

We have shown in this section that for each line the parameter $\boldsymbol{\theta_i}$ is efficiently estimated using asymmetric M-estimation. In the empirical study, this methodology is denoted by AMest (Asymmetric M-estimation).

To get a smooth estimate of the background, we can smooth the curves $(\hat \theta_{i, 0}, i=1, \dots, n_I)$ and $(\hat \theta_{i, 1}, i=1, \dots, n_I)$ . One problem, illustrated by figures 1(c) and (d), remains: due to some break measurements, we can observe some jump in the values of the intercepts estimates $(\hat \theta_{i, 0}, i=1, \dots, n_I)$ . The next section will address this issue.

3. Jumps detection and correction

As illustrated by figure 4, smoothing the coefficients $\hat \theta_{i, 0}$ and $\hat \theta_{i, 1}$ is not straightforward due to some possible jumps in the values of the $\hat \theta_{i, 0}$ . Therefore, we first propose to estimate the locations and amplitude of these potential jumps. The chosen approach is the nonparametric spline interpolation technique proposed by Ma and Yang (2011). In particular, we use the constant splines version. The interpolation of the vector of intercepts estimates by constant splines is denoted by $\hat m_1$ . It is based on a number of nodes N. To tune the number of nodes we apply the BIC criterion, $BIC = -2\ln(\hat\sigma) + {N} \log(n_I)/{n_I}$ where $\hat\sigma$ is the empirical quadratic error.

Then, to detect the jumps, we consider the successive differences:

$\begin{align*} \newcommand{\e}{{\rm e}} \displaystyle T_{1,j} = \frac{|\hat{m}_1(\,j+1) - \hat{m}_1(\,j)|}{\hat \sigma_{1}} \nonumber \end{align*}$

for $j=1, \dots, n_I-1$ , where $\hat \sigma_{1}^2 = {2\hat\sigma^2}(N+1)/{n_I^2}$ . Jumps correspond to indices j where the T_1,j exceed a given threshold. The threshold is defined as the $1-\alpha$ quantile of the asymptotic distribution of T_1,j obtained by Ma and Yang (2011). In the empirical results, the smoothed version obtained after potential jump corrections of the algorithm AMest is denoted by SAMest (for Smoothed Asymmetric M-estimation).

4. Experimental results

The results of our algorithm on AFM images are presented in figure 5. The empirical results are very satisfactory since the surface of the substrate seems to be flat. To validate in a proper way our approach, we compare it to existing softwares. Before that, we introduce a few comparison criteria which summarize the main objectives of any AFM flattening algorithm:

The background (of the image), which is the part of the image where there are no objects should have a zero-mean.
The objects, above or under the surface, should not be modified. The objects, their height and their dimension should be the same before and after the flattening.

To reach these objectives, we propose the three following criteria:

(i)
Variance of the background of the image, $\sigma_{B}^2$ : We expect this value to be the smallest possible because it represents the square of the roughness of the surface. To extract the background of the image, we use the maximally stable extremal region (also called MSER) algorithm on the raw image, see Nistér and Stewénius (2008). The MSER is not 100% robust to object detection. This explains that we do not use it to remove object before interpolation and to remove potential errors close to the boundary of objects, we apply a morphological dilation to the mask to grow up the objects with a size $9 \times 9$ . The mask obtained from this procedure is used in all criteria which require the knowledge of background.
(ii)
Maximal variance of the lines $\sigma _{L_{\rm max}}^{2}$ :
$\begin{align*} \newcommand{\e}{{\rm e}} \displaystyle \sigma _{L_{\rm max}}^{2} = \max_{l=1,\dots,n_I} | \sigma _{l}^{2} - \sigma_{l,B}^{2} |\nonumber \end{align*}$
where for each line l, $\sigma _{l}^{2}$ and $\sigma_{l, B}^{2}$ are respectively the variances for the lth line of the raw and background images (we use the same mask as for the evaluation of $\sigma^2_B$ ). The higher this value, the more the background of a line is different from the other ones and shows a problem for at least one specific line.
(iii)
The last criteria is based on a gradient approach. Let D be the difference between the original image and the flattening one. This image should only contain the interpolation image. We define the ratio ρ by
$\begin{align*} \newcommand{\e}{{\rm e}} \displaystyle \rho = \frac{{\sigma_{GB}^2}}{\sigma_{G}^2}\nonumber \end{align*}$
where $\sigma^2_G$ is the total variance of D and where $\sigma_{GB}^2$ is the variance of the background of D. Since a good flattening algorithm must not affect more the objects than the background of the image, we expect $\sigma_{GB}^2$ to be close to $\sigma_{G}^2$ (i.e. $\rho \approx 1$ ).

We now compare our methodology with existing softwares. We consider free or open source software, such as Gwyddion⁴ and WsXM (see e.g. Horcas et al (2007)), as well as the commercial softwares MountainsMap, SPIP and Nanoscope. In each case, we apply the flattening algorithm, and evaluate $\sigma^2_B$ , $\sigma^2_{L_{\max}}$ and ρ. Note that for Nanoscope, due to a restriction access, we only have results for the images presented in figures 1(a) and (b).

For the nanoparticles image (figure 1(a)), i.e. results in table 1), we observe that all methods except Nanoscope have a small $\sigma^2_B$ . The criterion $\sigma^2_{L_{\max}}$ does not reveal significative differences while the criterion ρ seems to favor our methods. Regarding the empirical results for the three other images, i.e. tables 2–4, the SAMest appears to behave very satisfactorily and outperforms the other methods in almost all cases. When this method does not exhibit the optimal value for the criterion (a value reached by MountainsMap or SPIP depending on the situations), it is close to it.

Table 1. Criteria $\sigma^2_B$ , $\sigma^2_{L_{\max}}$ and ρ evaluated for different algorithms and applied to the figure 1(a).

Method	Image		Gradient
Method	$\sigma_{B}^{2}$	$\sigma _{L_{\rm max}}^{2}$	ρ
Gwyddion	0.070 105 939 0183	24.207 005 506	0.526 823 796 944
MountainsMap	0.027 709 819 6248	24.429 950 5607	0.535 609 692 284
SPIP	0.070 822 948 1827	24.207 985 3826	0.527 303 993 248
WsXM_1	0.137 341 536 251	27.962 412 8816	0.607 684 051 546
WsXM_2	0.125 704 340 446	27.990 323 7131	0.627 033 482 507
Nanoscope	3.638 185 6679	23.778 752 3834	0.447 494 698 353
AMest	0.249 692 982 61	25.375 030 5137	0.857 717 585 092
SAMest	0.254 703 279 264	25.370 621 1168	0.865 675 095 975

Table 2. Criteria $\sigma^2_B$ , $\sigma^2_{L_{\max}}$ and ρ evaluated for different algorithms and applied to the figure 1(c).

Method	Image		Gradient
Method	$\sigma _{B}^{2}$	$\sigma _{L_{\rm max}}^{2}$	ρ
Gwyddion	0.856 391 395 958	46.445 620 2193	0.981 838 673 676
MountainsMap	0.668 606 995 632	18.100 254 7204	0.981 026 422 638
SPIP	0.531 157 237 632	50.234 843 3013	0.980 145 279 788
WsXM	0.929 946 905 931	20.933 634 5078	0.938 832 674 763
AMest	0.683 435 005 798	42.595 099 1517	0.980 175 882 357
SAMest	0.667 635 265 76	12.555 307 2977	0.955 668 461 215

Table 3. Criteria $\sigma^2_B$ , $\sigma^2_{L_{\max}}$ and ρ evaluated for different algorithms and applied to the figure 1(b).

Method	Image		Gradient
Method	$\sigma_{B}^{2}$	$\sigma _{L_{\rm max}}^{2}$	ρ
Gwyddion	6.194 889 215 53	14.336 698 1272	0.414 054 505 479
MountainsMap	0.351 224 585 387	13.491 307 3841	0.492 141 302 027
SPIP	0.351 479 298 28	13.475 986 1486	0.433 669 297 976
WsXM	0.691 405 743 069	15.575 915 8562	0.537 863 346 001
Nanoscope	1.911 900 704 17	14.613 107 4577	0.419 217 499 687
AMest	0.403 526 783 29	15.626 636 0064	0.799 386 673 26
SAMest	0.404 465 163 257	15.616 910 5409	0.799 629 013 139

Table 4. Criteria $\sigma^2_B$ , $\sigma^2_{L_{\max}}$ and ρ evaluated for different algorithms and applied to the figure 1(d).

Method	Image		Gradient
Method	$\sigma_{B}^{2}$	$\sigma _{L_{\rm max}}^{2}$	ρ
Gwyddion	40.806 580 2648	62.487 663 3844	0.854 828 446 698
MountainsMap	0.325 514 519 82	62.457 021 2466	0.966 471 936 555
SPIP	0.319 027 149 357	69.049 662 8255	0.982 801 050 429
WsXM	0.632 524 394 91	81.333 806 2141	0.687 079 529 243
AMest	0.341 521 538 476	63.264 230 075	0.973 355 167 915
SAMest	0.322 791 249 518	63.434 237 6189	0.874 777 830 478

5. Uncertainty evaluation

We want to analyse the impact of an additional noise on the estimation procedure proposed in this paper. To test it, we choose to use a Monte-Carlo approach on the image of the figure 1(c) which contains the two main defects that our algorithm aims at correcting: it exhibits a linear drift with one jump and objects are randomly located on the surface. The protocol is the following one:

(i)
add a gaussian white noise to the original image. We use $\sigma = 3$ because the standard deviation of the substrate is close to 1 (previously computed as described in the previous section);
(ii)
use our algorithm to flatten the image and store the jump position, the intercept and slope estimates;
(iii)
repeat step (i) and (ii) 500 times.

Then, we compute the coefficient of variation of intercept and slopes estimates and the one for the jump position. We also compute at each point the standard deviation of the flattened images. The coefficient of variation for the intercept estimates is 0.22%, whereas it equals 0.0057% for the slope estimates. For each simulated image our procedure detects only one jump, and the coefficient of variation of its location is negligible. At each point, the mean standard deviation of the flattened image equals the expected error $3/\sqrt{500}$ whereas the maximum standard deviation is 1.45 times larger than the mean.

In conclusion, our algorithm clearly shows its robustness to an additional noise.

6. Conclusion

In this paper, we propose a data-driven methodology based on asymmetric M-estimators and change-point analysis designed to preprocess AFM images. The procedure has the advantages to be fast, simple and does not require any tuning parameter. All other softwares we have tested require human intervention to either detect objects and/or breakpoints by hand or tune some parameter which vary from one image to another one. In that sense, our method is clearly less sensitive and more simple to use. This approach is already used in industrial context, by the company Pollen Metrology⁵.

Acknowledgments

The authors are grateful to the associate editor and referees for constructive remarks and suggestions that improved a previous version of the paper. The authors also thank the LNE (Laboratoire National de mtrologie et d'Essais) for providing them data and access to standard softwares.

Asymmetric M-estimation and change-point analysis applied to atomic force microscopy data enhancement

Article metrics

Permissions

Author e-mails

Author affiliations

ORCID iDs

Dates

Abstract

1. Introduction

2. Drift estimation

3. Jumps detection and correction

4. Experimental results

5. Uncertainty evaluation

6. Conclusion

Acknowledgments

Footnotes

Asymmetric M-estimation and change-point analysis applied to atomic force microscopy data enhancement

Article metrics

Permissions

Share this article

Author e-mails

Author affiliations

ORCID iDs

Dates

Abstract

1. Introduction

2. Drift estimation

3. Jumps detection and correction

4. Experimental results

5. Uncertainty evaluation

6. Conclusion

Acknowledgments

Footnotes