Improvement of prediction ability of PLS models employing the wavelet packet transform: A case study concerning FT-IR determination of gasoline parameters
Introduction
The wavelet transform (WT) is a multiresolutional analysis tool that has found several applications in analytical chemistry [1], such as signal denoising [2], database compression [3], localization of inflection points [4], and multivariate calibration [5], [6]. An extension of WT that has also been investigated in the literature [7], [8], [9], [10], [11], [12], [13], [14], [15], [16] is the wavelet packet transform (WPT) [17], which offers more flexibility for analytical signal representation and feature extraction [18]. Similarly to WT, the applications of WPT have involved signal compression [8] and denoising [9], [10], [11]. Applications in pattern recognition and classification have also become popular [12], [13], [14], [15], [16].
In the context of multivariate calibration, either linear or nonlinear, the use of WPT has received comparatively little attention, with very few contributions reported in the literature [19], [20]. In [19], WPT was employed at the input stage of an artificial neural network for simultaneous kinetic determination of Cu(II), Fe(III), and Ni(II). In [20], WPT was employed for noise extraction prior to PLS regression for spectrophotometric determination of Ni(II), Cd(II), Cu(II) and Zn(II) with xylenol orange and cetyltrimethyl ammonium bromide in synthetic samples. The range 492–650 nm was employed with a 2 nm resolution. The WPT coefficients kept in the regression procedure were selected on the basis of a hard thresholding method [21] based on the coefficient magnitudes.
In the present paper, the use of WPT for feature extraction prior to PLS regression is investigated in a case study concerning the simultaneous determination of four quality parameters of gasoline by FT-IR spectrometry. The matrix involved in this application is considerably more complex than the metal solutions studied in [20]. Moreover, as compared with the UV–vis spectra in [20] (80 wavelengths), the FT-IR spectra have a much larger number of variables (6144 wavelengths), which presents a more challenging task in terms of feature extraction. Furthermore, rather than selecting the WPT coefficients on the basis of their magnitude, the selection is based on correlation with the dependent variable (gasoline parameter in each PLS model). Therefore, the joint statistics of the instrumental response and the parameter of interest are exploited, which is of relevance given the nature of the problem (regression). The effect of varying the threshold on the prediction ability of the resulting PLS model is also investigated.
The gasoline parameters under consideration are specific mass (SM) and the distillation temperatures at which 10%, 50%, 90% of the sample has evaporated (T10%, T50%, T90%). The determination of these parameters is required by the Brazilian national fuel authority (Agência Nacional de Petróleo, Gás Natural e Biocombustíveis, ANP) as part of fuel quality control. The specific mass is directly related to the total energy content of a sample with given mass or volume. Deviations from normal values may point to contaminations. The distillation temperatures depend on the volatility features of the fuel and are used to verify whether light and heavy fractions are within allowed limits. Volatility is the main determinant of the tendency for a hydrocarbon mixture to generate potentially explosive vapours. It facilitates cold starts and also affects engine performance because of its influence on fuel evaporation in the admission collectors and cylinders before and after combustion. Small T10% values indicate good volatility for engine start-up, whereas small T50% values favour acceleration performance. The T90% temperature is associated to the content of fuel constituents with high boiling points. High T90% values may contribute towards better fuel economy and antiknock features, but excessively high values may lead to the formation of deposits in the combustion chamber and gum in the fuel admission system.
It is worth noting that the relation between infrared absorbance and the physical parameters is not based on a fundamental linear law (such as Beer's law, which arises in the context of spectroscopic determination of absorbent species). However a large body of empirical evidence [22], [23], [24], [25], [26], [27], [28], [29] suggests that linear PLS regression may be appropriate for FT-IR or NIR determination of physical parameters in petroleum products, provided that the resulting model is employed within the calibration limits.
Section snippets
Notation
Matrices are represented by bold capital letters, vectors by bold lowercase letters, and scalars by italic characters. Elements of a sequence or vector are denoted by italic characters with a subscript index. A vertical bar (|) represents the concatenation of two vectors. The hat symbol indicates a predicted value.
Filter bank implementation of the discrete wavelet transform (DWT)
As described in the wavelet literature [1], [18], the DWT of a data vector x1×J can be calculated in a fast manner by using a filter bank of the form depicted in Fig. 1a. The
Data set
The data set employed in this study consists of 103 gasoline samples collected from gas stations that are representative of the Pernambuco and Alagoas states in Brazil. The gasoline contained approximately 25% v/v ethanol, in conformity with the standard defined by ANP. The samples were stored in amber glass flasks under refrigeration at 5 °C.
The reference values for specific mass and distillation temperatures (T10%, T50%, T90%) were obtained according to the methods recommended by the American
Results and discussion
Table 1 summarizes the results obtained by conventional PLS modelling.
For illustration, Fig. 4 depicts the optimal wavelet packet trees obtained for the Coiflet family. As can be seen, as the filter length is increased (from coif1 to coif5), branches on the high-pass side of the tree (right-hand side) are removed and new branches are introduced on the low-pass side. Such a finding can be explained by noting that shorter filters have worse frequency selectivity and therefore there is a
Conclusions
The results of the case study presented in this paper indicate that the use of the wavelet packet transform coupled with an appropriate thresholding technique may lead to considerable improvements in the prediction ability of PLS models. Unlike [20], such improvements were now demonstrated in a large-scale analytical problem involving a complex fuel matrix, an extensive dataset of real samples and spectra with a very large number of variables.
Further investigations could extend the scope of
Acknowledgments
This work was supported by CNPq (grants Universal 475204/2004-2, Pronex 015/98 and research fellowships) and CAPES (PROCAD 0081/05-1 and scholarships). The authors also wish to thank Dr. Fernanda Araujo Honorato and Prof. Maria Fernanda Pimentel of the Departamento de Engenharia Química of the Universidade Federal de Pernambuco, Brasil for providing the gasoline sample spectra and the reference values for the properties considered in this study.
References (38)
- et al.
Chemom. Intell. Lab. Syst.
(1999) - et al.
Chemom. Intell. Lab. Syst.
(2004) - et al.
Chemom. Intell. Lab. Syst.
(1998) - et al.
Chemom. Intell. Lab. Syst.
(2001) - et al.
Chemom. Intell. Lab. Syst.
(2004) - et al.
Anal. Chim. Acta
(2004) - et al.
Spectrochim. Acta A
(2005) - et al.
Chemom. Intell. Lab. Syst.
(2005) - et al.
Chemom. Intell. Lab. Syst.
(2005) - et al.
Anal. Chim. Acta
(1998)
Chemom. Intell. Lab. Syst.
Anal. Chim. Acta
Chemom. Intell. Lab. Syst.
Chemom. Intell. Lab. Syst.
Wavelets in Chemistry
Chemom. Intell. Lab. Syst.
J. Chem. Inf. Comp. Sci.
J. Am. Stat. Assoc.
Chemom. Intell. Lab. Syst.
Cited by (27)
Determination of benzo[a]pyrene in cigarette mainstream smoke by using mid-infrared spectroscopy associated with a novel chemometric algorithm
2016, Analytica Chimica ActaCitation Excerpt :Therefore, it is possible for WPT to magnify some feature of the NIR/MIR spectrum as a “microscope”. In fact, WPT has been applied to NIR/MIR analysis, such as transfer of calibration model [17], classification [18,19], signal compression and denoising [20], and multivariate calibration [15,21]. In addition, it is worthwhile to mention that these processed coefficients are used as variables directly rather than reconstruct spectra with approximation and detail coefficients in this study, since WPT transforms a signal linearly from its original domain to a new domain without prejudice [15].
Study of sample temperature compensation in the measurement of soil moisture content
2011, Measurement: Journal of the International Measurement ConfederationCitation Excerpt :The network used here consists of three layers and the number of neurons for the input layer is determined by the number of the PC scores. The number of neurons for the hidden layer is optimized by experiments and the number for the output layer is one [28]. The transfer function used in hidden and output layer is tansig(tangent-sigmoid) and logsig (logsigmoid), respectively.
Ensemble wavelet modelling for determination of wheat and gasoline properties by near and middle infrared spectroscopy
2010, Analytica Chimica ActaCitation Excerpt :On the overall, 19 filters were employed (db2–db10, coif1–coif5 and sym4–sym8). It is worth noting that the dbN, symN and coifN filters have length 2N, 2N and 6N, respectively [17,40]. In each case, the number of decomposition levels was varied from one to a maximum of J.
Prediction of nitrophenol-type compounds using chemometrics and spectrophotometry
2010, Analytical BiochemistryCombining orthogonal signal correction and wavelet packet transform with radial basis function neural networks for multicomponent determination
2010, Chemometrics and Intelligent Laboratory Systems
- 1
Present address: Divisão de Engenharia Eletrônica, Instituto Tecnológico de Aeronáutica, Brazil.
- 2
Present address: Universidade Federal da Paraíba, CCEN, Departamento de Química, Laboratório de Automação e Instrumentação em Química Analítica/Quimometria (LAQA), Brazil.