Elsevier

Talanta

Volume 71, Issue 3, 28 February 2007, Pages 1136-1143
Talanta

Improvement of prediction ability of PLS models employing the wavelet packet transform: A case study concerning FT-IR determination of gasoline parameters

https://doi.org/10.1016/j.talanta.2006.06.023Get rights and content

Abstract

The wavelet packet transform (WPT) is a variant of the standard wavelet transform that offers greater flexibility in the decomposition of instrumental signals. Although encouraging results have been published concerning the use of WPT for signal compression and denoising, its application in multivariate calibration problems has received comparatively little attention, with very few contributions reported in the literature. This paper presents an investigation concerning the use of WPT as a feature extraction tool to improve the prediction ability of PLS models. The optimization of the wavelet packet tree is accomplished by using the classic dynamic programming algorithm and an entropy cost function modified to take into account the variance explained by the WPT coefficients. The selection of WPT coefficients for inclusion in the PLS model is carried out on the basis of correlation with the dependent variable, in order to exploit the joint statistics of the instrumental response and the parameter of interest. This WPT-PLS strategy is applied in a case study involving FT-IR spectrometric determination of four gasoline parameters, namely specific mass (SM) and the distillation temperatures at which 10%, 50%, 90% of the sample has evaporated. The dataset comprises 103 gasoline samples collected from gas stations and 6144 wavelengths in the range 2500–15000 nm. By applying WPT to the FT-IR spectra, considerable compression with respect to the original wavelength domain is achieved. The effect of varying the wavelet and the threshold level on the prediction ability of the resulting models is investigated. The results show that WPT-PLS outperforms standard PLS in most wavelet-threshold combinations for all determined parameters.

Introduction

The wavelet transform (WT) is a multiresolutional analysis tool that has found several applications in analytical chemistry [1], such as signal denoising [2], database compression [3], localization of inflection points [4], and multivariate calibration [5], [6]. An extension of WT that has also been investigated in the literature [7], [8], [9], [10], [11], [12], [13], [14], [15], [16] is the wavelet packet transform (WPT) [17], which offers more flexibility for analytical signal representation and feature extraction [18]. Similarly to WT, the applications of WPT have involved signal compression [8] and denoising [9], [10], [11]. Applications in pattern recognition and classification have also become popular [12], [13], [14], [15], [16].

In the context of multivariate calibration, either linear or nonlinear, the use of WPT has received comparatively little attention, with very few contributions reported in the literature [19], [20]. In [19], WPT was employed at the input stage of an artificial neural network for simultaneous kinetic determination of Cu(II), Fe(III), and Ni(II). In [20], WPT was employed for noise extraction prior to PLS regression for spectrophotometric determination of Ni(II), Cd(II), Cu(II) and Zn(II) with xylenol orange and cetyltrimethyl ammonium bromide in synthetic samples. The range 492–650 nm was employed with a 2 nm resolution. The WPT coefficients kept in the regression procedure were selected on the basis of a hard thresholding method [21] based on the coefficient magnitudes.

In the present paper, the use of WPT for feature extraction prior to PLS regression is investigated in a case study concerning the simultaneous determination of four quality parameters of gasoline by FT-IR spectrometry. The matrix involved in this application is considerably more complex than the metal solutions studied in [20]. Moreover, as compared with the UV–vis spectra in [20] (80 wavelengths), the FT-IR spectra have a much larger number of variables (6144 wavelengths), which presents a more challenging task in terms of feature extraction. Furthermore, rather than selecting the WPT coefficients on the basis of their magnitude, the selection is based on correlation with the dependent variable (gasoline parameter in each PLS model). Therefore, the joint statistics of the instrumental response and the parameter of interest are exploited, which is of relevance given the nature of the problem (regression). The effect of varying the threshold on the prediction ability of the resulting PLS model is also investigated.

The gasoline parameters under consideration are specific mass (SM) and the distillation temperatures at which 10%, 50%, 90% of the sample has evaporated (T10%, T50%, T90%). The determination of these parameters is required by the Brazilian national fuel authority (Agência Nacional de Petróleo, Gás Natural e Biocombustíveis, ANP) as part of fuel quality control. The specific mass is directly related to the total energy content of a sample with given mass or volume. Deviations from normal values may point to contaminations. The distillation temperatures depend on the volatility features of the fuel and are used to verify whether light and heavy fractions are within allowed limits. Volatility is the main determinant of the tendency for a hydrocarbon mixture to generate potentially explosive vapours. It facilitates cold starts and also affects engine performance because of its influence on fuel evaporation in the admission collectors and cylinders before and after combustion. Small T10% values indicate good volatility for engine start-up, whereas small T50% values favour acceleration performance. The T90% temperature is associated to the content of fuel constituents with high boiling points. High T90% values may contribute towards better fuel economy and antiknock features, but excessively high values may lead to the formation of deposits in the combustion chamber and gum in the fuel admission system.

It is worth noting that the relation between infrared absorbance and the physical parameters is not based on a fundamental linear law (such as Beer's law, which arises in the context of spectroscopic determination of absorbent species). However a large body of empirical evidence [22], [23], [24], [25], [26], [27], [28], [29] suggests that linear PLS regression may be appropriate for FT-IR or NIR determination of physical parameters in petroleum products, provided that the resulting model is employed within the calibration limits.

Section snippets

Notation

Matrices are represented by bold capital letters, vectors by bold lowercase letters, and scalars by italic characters. Elements of a sequence or vector are denoted by italic characters with a subscript index. A vertical bar (|) represents the concatenation of two vectors. The hat symbol () indicates a predicted value.

Filter bank implementation of the discrete wavelet transform (DWT)

As described in the wavelet literature [1], [18], the DWT of a data vector xJ can be calculated in a fast manner by using a filter bank of the form depicted in Fig. 1a. The

Data set

The data set employed in this study consists of 103 gasoline samples collected from gas stations that are representative of the Pernambuco and Alagoas states in Brazil. The gasoline contained approximately 25% v/v ethanol, in conformity with the standard defined by ANP. The samples were stored in amber glass flasks under refrigeration at 5 °C.

The reference values for specific mass and distillation temperatures (T10%, T50%, T90%) were obtained according to the methods recommended by the American

Results and discussion

Table 1 summarizes the results obtained by conventional PLS modelling.

For illustration, Fig. 4 depicts the optimal wavelet packet trees obtained for the Coiflet family. As can be seen, as the filter length is increased (from coif1 to coif5), branches on the high-pass side of the tree (right-hand side) are removed and new branches are introduced on the low-pass side. Such a finding can be explained by noting that shorter filters have worse frequency selectivity and therefore there is a

Conclusions

The results of the case study presented in this paper indicate that the use of the wavelet packet transform coupled with an appropriate thresholding technique may lead to considerable improvements in the prediction ability of PLS models. Unlike [20], such improvements were now demonstrated in a large-scale analytical problem involving a complex fuel matrix, an extensive dataset of real samples and spectra with a very large number of variables.

Further investigations could extend the scope of

Acknowledgments

This work was supported by CNPq (grants Universal 475204/2004-2, Pronex 015/98 and research fellowships) and CAPES (PROCAD 0081/05-1 and scholarships). The authors also wish to thank Dr. Fernanda Araujo Honorato and Prof. Maria Fernanda Pimentel of the Departamento de Engenharia Química of the Universidade Federal de Pernambuco, Brasil for providing the gasoline sample spectra and the reference values for the properties considered in this study.

References (38)

  • L. Pasti et al.

    Chemom. Intell. Lab. Syst.

    (1999)
  • R.K.H. Galvão et al.

    Chemom. Intell. Lab. Syst.

    (2004)
  • A.K.M. Leung et al.

    Chemom. Intell. Lab. Syst.

    (1998)
  • M. Cocchi et al.

    Chemom. Intell. Lab. Syst.

    (2001)
  • M. Cocchi et al.

    Chemom. Intell. Lab. Syst.

    (2004)
  • A. Antonelli et al.

    Anal. Chim. Acta

    (2004)
  • L. Gao et al.

    Spectrochim. Acta A

    (2005)
  • C.C. Felicio et al.

    Chemom. Intell. Lab. Syst.

    (2005)
  • F.A. Honorato et al.

    Chemom. Intell. Lab. Syst.

    (2005)
  • B.K. Alsberg et al.

    Anal. Chim. Acta

    (1998)
  • J. Trygg et al.

    Chemom. Intell. Lab. Syst.

    (1998)
  • J. Moros et al.

    Anal. Chim. Acta

    (2005)
  • A. Aulinger et al.

    Chemom. Intell. Lab. Syst.

    (2004)
  • M.C.U. Araujo et al.

    Chemom. Intell. Lab. Syst.

    (2001)
  • B. Walczak

    Wavelets in Chemistry

    (2000)
  • A.K.M. Leung et al.

    Chemom. Intell. Lab. Syst.

    (1998)
  • V.L. Martins et al.

    J. Chem. Inf. Comp. Sci.

    (2003)
  • P.J. Brown et al.

    J. Am. Stat. Assoc.

    (2001)
  • B. Walzac et al.

    Chemom. Intell. Lab. Syst.

    (1997)
  • Cited by (27)

    • Determination of benzo[a]pyrene in cigarette mainstream smoke by using mid-infrared spectroscopy associated with a novel chemometric algorithm

      2016, Analytica Chimica Acta
      Citation Excerpt :

      Therefore, it is possible for WPT to magnify some feature of the NIR/MIR spectrum as a “microscope”. In fact, WPT has been applied to NIR/MIR analysis, such as transfer of calibration model [17], classification [18,19], signal compression and denoising [20], and multivariate calibration [15,21]. In addition, it is worthwhile to mention that these processed coefficients are used as variables directly rather than reconstruct spectra with approximation and detail coefficients in this study, since WPT transforms a signal linearly from its original domain to a new domain without prejudice [15].

    • Study of sample temperature compensation in the measurement of soil moisture content

      2011, Measurement: Journal of the International Measurement Confederation
      Citation Excerpt :

      The network used here consists of three layers and the number of neurons for the input layer is determined by the number of the PC scores. The number of neurons for the hidden layer is optimized by experiments and the number for the output layer is one [28]. The transfer function used in hidden and output layer is tansig(tangent-sigmoid) and logsig (logsigmoid), respectively.

    • Ensemble wavelet modelling for determination of wheat and gasoline properties by near and middle infrared spectroscopy

      2010, Analytica Chimica Acta
      Citation Excerpt :

      On the overall, 19 filters were employed (db2–db10, coif1–coif5 and sym4–sym8). It is worth noting that the dbN, symN and coifN filters have length 2N, 2N and 6N, respectively [17,40]. In each case, the number of decomposition levels was varied from one to a maximum of J.

    View all citing articles on Scopus
    1

    Present address: Divisão de Engenharia Eletrônica, Instituto Tecnológico de Aeronáutica, Brazil.

    2

    Present address: Universidade Federal da Paraíba, CCEN, Departamento de Química, Laboratório de Automação e Instrumentação em Química Analítica/Quimometria (LAQA), Brazil.

    View full text