Properties of the Box–Cox transformation for pattern classification

doi:10.1016/j.neucom.2016.08.081

Neurocomputing

Volume 218, 19 December 2016, Pages 390-400

https://doi.org/10.1016/j.neucom.2016.08.081 Get rights and content

Abstract

The Box–Cox transformation [1,2] (Box and Cox, 1964; Sakia, 1992) has been regarded as a parametric pre-processing technique aimed at making the distribution of a set of points approximately Gaussian. Since normality represents an assumption underlying many statistical data analysis tools, such technique has been widely applied in different fields of Computer Science. In this paper we will provide evidence that this technique can be useful also in the case of Pattern Classification, where Gaussianity of datasets is not so critical. By letting the Box–Cox transform work in operational ranges which do not necessarily correspond to an increase in Gaussianity, we will show that class separability can be improved: this is likely due to the non linear nature of the Box–Cox transformation, which deforms the space in a nonuniform way. We will also provide some suggestions on criteria that can be used to automatically estimate the best parameter of the Box–Cox transformation in the Pattern Classification context.

Introduction

Many important results and techniques in statistical data analysis follow from the assumption that data is normally distributed. In situations where this condition does not hold, one of the possible options [3], [4] is to transform the data in such a way that the distribution is nearer to the normality assumption. The Box–Cox transformation, abstracted from the original context of the linear regression model where it was first introduced in the 60's [1], [2], belongs to this class of approaches and can be regarded as a parametric way to non linearly transform a set of points with the aim of making their distribution approximately Gaussian (see for instance [5]). Since its introduction, such transformation has been widely studied and applied to many different data analysis situations,¹ mostly in economics, econometrics, statistics and medicine, but also – to be closer to the pattern recognition area – in medical image segmentation [6], EEG signal analysis [7], geoscience [8], system dynamics modeling and prediction [9], time series forecasting [10], [11] and expression microarray [12]. It is important to notice that in many of the above-referenced applications the final goal was not the design of a classification system: in fact, the usefulness of the Box–Cox transformation as a pre-processing tool for pattern classification has received much less attention, with not so many papers published in the literature – see Section 2 for a detailed list. Clearly, in the specific Pattern Classification context, the increase of Gaussianity in a dataset is no longer the crucial aspect, since Gaussianity of the dataset does not imply Gaussianity of the classes – see for example the datasets in Fig. 1 –, the latter characteristic being the assumption of different standard classifiers, like the nearest mean classifier.

In this paper, we provide some evidence that the Box–Cox transformation may be useful also in the Pattern Classification context, typically operating in parameter ranges which can be very far from those optimal for Gaussianity. This success is likely due to the non linear nature of the Box–Cox transformation, which deforms the problem space in a nonuniform way, allowing to highlight useful structures or making the data more suitable for a given classifier (e.g. by increasing the class separability) – see the example in Fig. 3. To investigate these aspects, we start from a large scope analysis, involving a set of different datasets and classifiers, and we empirically link the behaviour of accuracies while varying the Box–Cox parameter λ with three criteria (Gaussianity, Gaussianity of every class, Class separability), showing that accuracies curves are linked more to class separability than to Gaussianity. In the second part of the paper, we also derive some practical and simple criteria to select good and effective values of the parameter λ, exploiting the characteristics of the specific scenario – i.e. the labels.

The remainder of the paper is organized as follows: in Section 2 we briefly review the basics of the Box–Cox Transformation; subsequently, in Section 3, we empirically investigate its properties in the Pattern Classification domain. The analysis on the automatic estimation of the best parameter is then reported in Section 4. Finally, in Section 5 conclusions are drawn.

Section snippets

The Box–Cox transformation

In our paper we focus on the basic formulation of the Box–Cox transform as given in the original Box–Cox paper [1], which transforms a given variable x into $x^{(λ)}$ via the following equation: $x^{(λ)} = {\begin{matrix} \frac{x^{λ} - 1}{λ} & if λ \neq 0 \\ \log (x) & if λ = 0 \end{matrix}$ the transformation being defined for $x > 0$ . Many other slightly different versions or formulations have appeared over the years [2], which are nevertheless all minor variations of the same idea. In some versions the parameter was restricted to the range $(0, 1]$ , in some others a shift

Experimental evaluation part I: understanding the Box–Cox transform

In this section, starting from an extensive analysis of different datasets, different classifiers, and different parameter configurations, we analyse the behaviour of the Box–Cox transform in the Pattern Classification scenario, trying to link accuracy improvements to different criteria relative to different aspects.

Experimental evaluation part II: automatic estimation of λ

The main goal of this section is to provide some feasible and practical hints on how to set the λ parameter, a crucial problem in the actual application of the Box–Cox Transformation. Typically, in the literature, such a parameter is (i) set by hand, or (ii) found by an exhaustive search, or (iii) automatically estimated via numerical optimization of a criterion – many criteria have been studied, starting from the historical works [1], [20], [21] up to more recent ones [9]. In the peculiar

Conclusions

In this paper we provided an empirical analysis of the behaviour of the Box–Cox transformation for pattern classification, showing that it represents an useful preprocessing tool, also when used in operational ranges which are far from those leading to the maximum Gaussianity. These results open the door to the analysis of different non linear data pre-processing methods, which can be successfully exploited in the pattern recognition field.

Acknowledgements

MB would like to thank Bob Duin for pointing out similarities between the Box–Cox transformation and the power transformation, and P. Lovato for helpful discussions on the experimental evaluation. Authors would also like to thank the Observatorio Vulcanológico y Sismológico de Manizales, Colombia (in particular John Makario Londoño-Bonilla) for providing the volcano data set.

Manuele Bicego received his Laurea degree and PhD degree in Computer Science from University of Verona in 1999 and 2003, respectively. From 2004 to 2008 he was at the University of Sassari. Currently he is assistant professor (ricercatore) at the University of Verona. From June 2009 to February 2011 he was also team leader at the Istituto Italiano di Tecnologia (IIT - Genova Italy).

His research interests include statistical pattern recognition, mainly probabilistic models (GMM, HMM) and kernel

References (32)

X. Wang et al.
Rule induction for forecasting method selection: meta-learning the characteristics of univariate time series
Neurocomputing
(2009)
C.-L. Liu et al.
Handwritten digit recognition: investigation of normalization and feature extraction techniques
Pattern Recognit.
(2004)
S. Velilla
A note on the multivariate Box–Cox transformation to normality
Stat. Probab. Lett.
(1993)
M. Orozco-Alzate et al.
The dtw-based representation space for seismic pattern classification
Comput. Geosci.
(2015)
M. Bicego et al.
Combining information theoretic kernels with generative embeddings for classification
Neurocomputing
(2013)
G. Box et al.
An analysis of transformations
J. R. Stat. Soc.: Ser. B (Methodol.)
(1964)
R. Sakia
The Box–Cox transformation technique: a review
Statistician
(1992)
A. Graybill
The Theory and Applications of the Linear Model
(1976)
K. Fukunaga
Statistical Pattern Recognition
(1990)
R.V.D. Heiden et al.
The Box–Cox metric for nearest neighbour classification improvement
Pattern Recognit.
(1997)

J.-D. Lee et al.

MR image segmentation using a power transformation approach

IEEE Trans. Med. Imaging

(2009)

L. Li et al.

Analysis of amplitude-integrated eeg in the newborn based on approximate entropy

IEEE Trans. Biomed. Eng.

(2010)

A. Barb et al.

Visual-semantic modeling in content-based geospatial information retrieval using associative mining techniques

IEEE Geosci. Remote Sens. Lett.

(2010)

X. Hong

A fast identification algorithm for Box–Cox transformation based radial basis function neural network

IEEE Trans. Neural Netw.

(2006)

A. da Costa et al.

The bias in reversing the Box–Cox transformation in time series forecasting: an empirical study based on neural networks

Neurocomputing

(2014)

B. Durbin et al.

Estimation of transformation parameters for microarray data

Bioinformatics

(2003)

Cited by (29)

Spatio-temporal patterns of Aspergillus flavus infection and aflatoxin B<inf>1</inf> biosynthesis on maize kernels probed by SWIR hyperspectral imaging and synchrotron FTIR microspectroscopy
2022, Food Chemistry
Citation Excerpt :
Aflatoxin B1 was also quantified in each kernel with different treatments at each time point and significant differences were observed. However, the original aflatoxin content of all samples was abnormally distributed and required transformation prior to data analysis since linear model analysis methods typically assume that observations are independent, and conform to a normal distribution (Bicego & Baldo, 2016). To address this, the Box-Cox transformation was used to normalize the data.
The dynamics mechanisms regulating the growth and AFB₁ production of Aspergillus flavus during its interactions with maize kernels remain unclear. In this study, shortwave infrared hyperspectral imaging (SWIR-HSI) and synchrotron radiation Fourier transform infrared (SR-FTIR) microspectroscopy were combined to investigate chemical and spatial–temporal changes in incremental damaged maize kernels induced by A. flavus infection at macroscopic and microscopic levels. SWIR-HSI was employed to extract spectral information of A. flavus growth and quantitatively detect AFB₁ levels. Satisfactory full-spectrum models and simplified multispectral models were obtained respectively by partial least squares regression (PLSR) for three types of samples. Furthermore, SR-FTIR microspectroscopy coupled with two-dimensional correlation spectroscopy (2DCOS) was utilized to reveal the possible sequence of dynamic changes of nutrient loss and trace AFB₁ in maize kernels. It exhibited new insights on how to quantify the spatio-temporal patterns of fungal infection and AFB₁ accumulation on maize and provided theoretical basis for online sorting.
Estimation of CO₂ emissions from petroleum refineries based on the total operable capacity for carbon capture applications
2021, Chemical Engineering Journal Advances
Carbon capture and storage processes are sought to play a major role in reducing carbon emissions from large point sources. Petroleum refineries, in particular, produce several streams that are CO₂-rich, including fluidized catalytic cracking, steam methane reforming, and natural gas combustion processes that generate heat for refinery operations. Of these, stationary combustion processes account for nearly two-thirds of all CO₂ generated within a refinery. In this work, a regression analysis was performed to correlate the size and power requirements for the combined capture, compression, and dehydration process dependent upon a refinery's operating capacity. Refinery capacity and CO₂ generation data from 128 U.S. refineries were normalized, and a linear regression model was developed. A capture, compression, and dehydration process model was developed using Aspen HYSYS for delivery of CO₂ (10–15 wt. % in steam) to pipeline specifications (500 ppm H₂O, 15.2 MPa). Predicted CO₂ emissions were 0.1 to 7.7 % of actual emissions, depending on whether a refinery had a low, medium, or high carbon emission/capacity ratio.
Structurally optimized suture resistant polylactic acid (PLA)/poly (є-caprolactone) (PCL) blend based engineered nanofibrous mats
2021, Journal of the Mechanical Behavior of Biomedical Materials
Citation Excerpt :
Lambda (λ), is the transformation parameter in the Box-Cox plot, the minimum value of which defines the significance of the selected transformation function. The best λ value at the minimum point on the curve determines the fitness of the transformation function (Bicego and Baldo, 2016). The confidence interval (CI) of λ remained in the range of ~ −0.20 to ~ 0.49, as shown in Fig. 4 (represented with red lines).
The structural fabrication and optimization of polylactic acid (PLA)/poly (є-caprolactone) (PCL) blend-based bead-free electrospun nanofibrous mats (ENMs) has been carried out by using Response Surface Methodology (RSM) and Taguchi design of experiments (DoE). From the three control parameters i.e., PCL content, N, N- dimethylformamide (DMF) content, and electrospinning solution concentration, the optimal parametric combinations for minimizing the bead defects amongst ENMs were obtained. The parametric optimization outcomes remained identical, from both RSM and Taguchi approaches, irrespective of the difference in the number of experimental trials. The experimental validation of the predicted results from Taguchi-design showed an excellent agreement with >95% accuracy concerning minimization of bead defects and average fiber diameter. The solution concentration was a key determinant in controlling the gross fiber morphology. The quasi-static mechanical response of the optimally designed ENMs showed a distinct role in structural aspects of fibers. The failure responses revealed the role of the structural network of ENMs in controlling the failure stress and network collapse that was also reiterated upon the outcomes of suture retention strength assessment. The optimally designed ENM structures showed a correspondingly optimal level of suture resistance, where fine fibers offered higher resistance to suture failure due to the cooperative network effects unlike the relatively coarse fiber-based ENMs undergoing collapse attributed to fiber buckling and fiber slippage in the labile structural network.
Transdermal Drug Delivery: Concepts and Application
2020, Transdermal Drug Delivery: Concepts and Application
Design and optimization of a luminescent Samarium complex of isoprenaline: A chemometric approach based on Factorial design and Box-Behnken response surface methodology
2019, Spectrochimica Acta - Part A: Molecular and Biomolecular Spectroscopy
Citation Excerpt :
For both techniques A and B. Response Surface Design (RSD) was generated to optimize the three significant factors obtained from the preliminary screening study by employing Box-Behnken (BBD) strategy with appropriately adjusted lower and upper limits for each factor to comprise narrower ranges in comparison to the screening phase. It is also worth mentioning, that according to the normal probability plots portrayed in Fig. S1 (see supplementary materials), no transformation of data was applied as the normality supposition was verified by obtaining a nearly normal distribution [49]. Originally, both responses (Y1 and Y2) were interpreted by the aid of quadratic polynomial regression equations upon using techniques A and B.
A chemometrically optimized procedure has been developed for the determination of isoprenaline (ISO) in the parent substance as well as in its respective pharmaceutical preparation. It is worth mentioning that although spectroscopic determination of Isoprenaline metal complexes has been described in literature, yet, no methods for the quantification of Isoprenaline with Samarium nor any other lanthanide metal have been reported. Fractional factorial design (FFD) was implemented in the initial screening procedure of the four designated factors, namely, reaction time (RT), metal volume (MV), pH and temperature (T) followed by Response Surface Methodology (RSM) optimization tool performed by the aid of Box Behnken design (BBD).The proposed techniques are based on a multivariate approach where a complexation reaction between Isoprenaline (ISO) and Samarium III (Sm³⁺) metal was exploited for the first time to synthesize novel fluorescence and absorbance probes of ISO-Sm. Maximum fluorescence intensity (Y1) as well as maximum absorbance (Y2) of the produced complex were attained at λ_ex/λ_em = 315/450 and λ 295 nm for spectrofluorimetric and spectrophotometric determinations, respectively, against blank solutions. Using assessment quality tools such as, Pareto charts, normal probability plots and statistical analysis of variance testing (ANOVA), significant factors were successfully indicated (p < 0.05). Furthermore, the proposed methods verified specificity and accuracy for the determination of Isoprenaline in its pure and pharmaceutical preparation using spectrofluorimetric (Technique A) and spectrophotometric (Technique B) techniques, respectively. Linearity was obtained in the range of (0.02–0.50 μg/mL) and (2–12 μg/mL) upon employing both techniques A and B, respectively. Furthermore, limit of detection (LOD) and limit of quantification (LOQ), were found to be 5.1877 ∗ 10⁻³ μg/mL, 0.01572 μg/mL and 0.5593 μg/mL, 1.6949 μg/mL, upon employing techniques A and B, respectively. Standard addition method was applied for both techniques. The analysis was successfully applied to the assay of pure powder and pharmaceutical dosage forms after which the corresponding mean recoveries were computed and were found to be in the range of 99.546%–100.257% (Technique A) and 99.872%–99.887% (Technique B) with RSD (<1).
Screening and optimization of samarium-assisted complexation for the determination of norfloxacin, levofloxacin and lomefloxacin in their corresponding dosage forms employing spectrofluorimetry
2019, Spectrochimica Acta - Part A: Molecular and Biomolecular Spectroscopy
Citation Excerpt :
Responses were initially described by quadratic polynomial equations. Achieving greater normal distribution pattern was achieved by transformation of data to minimize the lack of normality of residuals by Box and Cox Power transformation [29,30]. Box-Cox transformation was applied where higher p-values as well as smaller Anderson Darling (AD) values in the normal distribution curves were obtained.
Multivariate strategy was applied for setting a fluorescent technique for the determination of three fluoroquinolones: norfloxacin (NOR), levofloxacin (LEV) and lomefloxacin (LOM) in their pure powder and dosage forms. Based on their known interaction with lanthanides, and augmented fluorescence intensity obtained by antenna effect at λ_ex/λ_em = 314/553, 312/553 and 310/556 for NOR, LEV and LOM, respectively, the current research was scrutinized. Four continuous factors were selected for study in the screening step by means of Plackett-Burman Design, where temperature factor was excluded for being non-significant and the other factors as volume of metal ion solution, pH and reaction time were evaluated through Central Composite Design. 3-D surfaces demonstrations and 2-D contour plots designated the factors interactions followed by optimization plots, which defined the best blend for factors conjunction. pH factor was the chief motor force affecting the response as the number of coordinated ligands formed depends on the pH, whereas 1:2 complex is the main species at higher pH values followed by the volume of metal ion solution and ended by little effect of the reaction time. Model verification was monitored, which showed the model superiority for the three fluoroquinolones, where all target points tested were in good agreement with the predicted ones. The linear range for the tested drugs were found to be 0.090–1.280 μg/mL for NOR, 0.068–1.448 μg/mL for LEV and 0.077–1.552 μg/mL in case of LOM, thus approving the suitability of this method for Quality Control testing. Furthermore, applying these conditions to test the fluoroquinolones in their pharmaceuticals was done as well as intra and inter-day effects as to confirm the validity of this technique for routine analysis. Recovery % and RSD were found to be 99.958 ± 0.797, 99.887 ± 0.935 and 100.427 ± 0.698 for NOR, LEV and LOM respectively in their pure powder. While it was calculated to be 100.200 ± 0.785, 100.530 ± 0.396 and 100.620 ± 0.896 for NOR, LEV and LOM in their corresponding dosage forms. This excellent precision and accuracy obtained in results impulse it to be one of the most appropriate methods for further analysis.

View all citing articles on Scopus

His research interests include statistical pattern recognition, mainly probabilistic models (GMM, HMM) and kernel machines (e.g. SVM), with application to video analysis, biometrics and, recently, bioinformatics. Manuele Bicego is author of several papers in the above subjects, published in international journals and conferences.

Sisto Baldo received his Laurea degree in Mathematics from the University of Pisa in 1989. From 1989 to 1991 he was PhD student at the Scuola Normale Superiore in Pisa. In 1991 he was appointed assistant professor (ricercatore) at the University of Trento. Since 1999 he is associate professor of mathematical analysis: 1999–2002 at the Universitá della Basilicata in Potenza, 2002–2008 at the University of Trento, since 2008 at the University of Verona. His main research interests are in calculus of variations, geometric measure theory, elliptic partial differential equations and applications to physics (phase transitions, superconductivity, Bose-Einstein condensation). He is author of several scientific papers in the above subjects, published in international journals.

View full text

Properties of the Box–Cox transformation for pattern classification

Abstract

Introduction

Section snippets

The Box–Cox transformation

Experimental evaluation part I: understanding the Box–Cox transform

Experimental evaluation part II: automatic estimation of λ

Conclusions

Acknowledgements

Neurocomputing

Pattern Recognit.

Stat. Probab. Lett.

Comput. Geosci.

Neurocomputing

An analysis of transformations

J. R. Stat. Soc.: Ser. B (Methodol.)

The Box–Cox transformation technique: a review

Statistician

The Theory and Applications of the Linear Model

Statistical Pattern Recognition

The Box–Cox metric for nearest neighbour classification improvement

Pattern Recognit.

MR image segmentation using a power transformation approach

IEEE Trans. Med. Imaging

Analysis of amplitude-integrated eeg in the newborn based on approximate entropy

IEEE Trans. Biomed. Eng.

Visual-semantic modeling in content-based geospatial information retrieval using associative mining techniques

IEEE Geosci. Remote Sens. Lett.

A fast identification algorithm for Box–Cox transformation based radial basis function neural network

IEEE Trans. Neural Netw.

The bias in reversing the Box–Cox transformation in time series forecasting: an empirical study based on neural networks

Neurocomputing

Estimation of transformation parameters for microarray data

Bioinformatics