Abstract
Cachaça, like other distilled beverages, often passes through an aging process and a peculiar fact is that Brazilian native woods are used for producing the barrels used for aging, in addition to oak. Cachaças can be aged in barrels of amendoim (Pterogyne nitens), balsam (Myroxylon peruiferum), jequitibá (Cariniana estrellensis), umburana (Amburana cearensis), and oak (Quercus sp.). Cachaças aged in wooden barrels have their final price for the consumer substantially increased, making them susceptible to fraud. So, methods to check the authenticity of aged cachaça are increasingly required. In this paper, 109 samples of commercial aged cachaças were used for the construction of supervised classification models—PLS-DA (partial least squares for discriminant analysis) and NPLS-DA (N-way PLS-DA)—by using UV-Vis spectrophotometry and spectrofluorimetry. The best results were obtained for the model based on low-level data fusion, providing good reliability rates for the training and test sets (ranging from 56 to 100 %) for all the analyzed classes.
Similar content being viewed by others
Introduction
Cachaça, like other spirit drinks, often passes through the process of aging in wooden barrels. This step, in addition to improving its taste and acceptability by the consumer, increases its commercial value, e.g. a famous cachaça brand from Salinas (Brazil), that is aged in balsam wood during 8 years can cost up to twenty times more than the common ones.
According to the Brazilian legislation (MAPA 2005), the truly aged cachaça is the beverage stored in wooden barrels for 1 year or more, and it has different classifications, depending on the aging time and the ratio of the aged distillate in the final product. The information of the specific class of the aged cachaça in labeling is only allowed for the product whose aging process has been accompanied by the inspection and certification of MAPA (Ministry of Agriculture, Livestock and Food Supply). Due to the difficulties inherent to this certification process, many small producers age their cachaça and label it as “stored” rather than aged.
The most commonly wood used for producing barrels for aging beverages is oak (Quercus sp.). However, as the oak is not a typical tree of tropical climates, its wooden barrels come at a very high price for the Brazilian producers. Thus, an alternative is the use of native woods to produce barrels. Native woods, such as umburana (Amburana cearensis), jequitibá (Cariniana estrellensis), balsam (Myroxylon peruiferum), and amendoim (Pterogyne nitens) are the most used for the aging of cachaças produced in Minas Gerais and other states of Brazil.
With the increasing production of quality cachaças and the growing demand for the foreign and local market consumption, fast and reliable methods are needed to enable the cachaça discrimination by the type of wood used in the aging process. This avoids that a low-quality product is sold as of higher quality, contributing to the consolidation of a good image for this iconic Brazilian distillate. In order to classify cachaças according to the wood used for their aging, some authors have published studies using chemometric methods of pattern recognition (PCA, principal component analysis) and discrimination (LDA, linear discrimination analysis) jointly mainly with UV-Vis spectroscopy (Faria et al. 2003; Silva et al. 2012), but also ESI-MS (electrospray ionization mass spectrometry) (Souza et al. 2007) and HPLC (high-performance liquid chromatography) (Silva et al. 2009). These papers have reported the use of different analytical techniques to discriminate or classify the cachaças according to the wood used in their aging, but no study has presented a comparison between techniques or combined them to obtain more robust models. Thus, the aim of this work was to explore the capabilities and performances of UV-Vis spectrophotometry and spectrofluorimetry coupled to PLS-DA (partial least squares for discriminant analysis) and NPLS-DA (N-way PLS-DA) for discriminating cachaças according to the wood used in their aging. In addition to the models built individually with the data obtained from these two techniques, two data fusion strategies, at low and mid levels, were also considered, in order to improve the results. Other important aspect of this study is the employment of a significant number, more than a hundred, of commercial cachaças samples for constructing the classification models.
Materials and Methods
Samples Description
One hundred nine aged cachaças samples (originated from six different states of Brazil) were purchased in local markets and analyzed. These samples were labeled as aged in different woods: amendoim (19), balsam (23), oak (21), jequitibá (23), and umburana (23). In addition, 15 cachaça samples were aged in these five woods (three samples in each one) in our laboratory and analyzed. These samples were maintained in 1.5-L casks for 6 months.
Spectra Aquisition
The samples were diluted 30 times with ethanol/water (40:60 %, v/v) in the own cuvette, and the UV-Vis spectra were recorded from 190 to 500 nm (2 nm steps) in a diode array spectrophotometer Hewlett-Packard model 8451 A, using a 10-mm quartz cuvette. Fluorescence spectra were obtained in a Varian Cary Eclipse spectrofluorimeter, using a 10-mm quartz cuvette, without any sample pre-treatment. All the excitation-emission matrices (EEM) were obtained in the excitation range from 240 to 500 nm (10 nm steps) and in the emission range from 260 to 600 nm (2 nm steps). The excitation and emission monochromator slit widths were both 5.0 nm and the scanning rate was 9600 nm min−1.
Data Treatment and Analysis
Data were handled using the MATLAB software, version 7.9 (The Math Works, Natick, MA, USA). The PLS-DA and NPLS routines came from the PLS Toolbox, version 6.5 (Eigenvector Technologies, Manson, WA, USA). UV-Vis spectra were preprocessed by mean centering. The EEM were preprocessed to remove Rayleigh and Raman scattering (Bahram et al. 2006), and the models were obtained after mean-centering the first mode of the EEM. The fused matrices were preprocessed by autoscaling. All the pre-treatments used can be seen in Table 1.
UV-Vis spectra were split in two data sets, 82 samples for the training set and 42 samples for the test set, using the Kennard-Stone algorithm. This is a uniform mapping algorithm able to select the most representative samples for the training set based on their Euclidian distance (Kennard and Stone 1969). To ensure the representativeness of each class, this algorithm was applied to each class separately and the order of the samples (training and test sets) was maintained for all the models, aiming to obtain full comparability among the results obtained.
PLS-DA is a variant of PLS used for supervised classification, in which a model is constructed correlating the spectral data to class membership vectors. One vector is built for each class containing the dummy variables, with values of 1 and 0 if a sample belongs or not to a specific class, respectively. The spectral data and the dummy variable vectors are simultaneously decomposed in LV (latent variables) searching for the maximum discrimination among classes. The number of LV is chosen based on the lowest value of the cross validation classification error (CVCE) estimated using venetian blinds cross validation (Wise et al. 2006). A complete description of the method is given by Barker and Rayens (2003).
NPLS is a method for building multi-way calibration models and is an extension of the classical two-way PLS for cases in which the matrix of independent variables is a tensor of third or higher order. In applications treating EEM, the independent variables are in a three-way data array formed by the fluorescence intensity measurements of various samples and the dependent variable is in a concentration vector. The NPLS decomposes a multidimensional data arrangement X into a set of triads. Each triad is equivalent to a latent variable (a component) in the two-way PLS and consists of a score vector, t, related to the first way, and two weight vectors, w J and w K, related to the other two ways (emission and excitation wavelengths). A complete description of this method is given by Bro (1996). This method used for discriminant analysis (NPLS-DA) allows the supervised classification of samples by separating the classes with a high degree of similarity (Hall et al. 2005).
With the wide availability of analytical techniques and the increasing capacity of data processing, it has become easy to obtain a large amount of data, allowing the emergence of data fusion models. These models process data provided by different analytical instruments or sensors, requiring for this processing different chemometric methods. For each sample, all spectral variables from different instrument types and sources are concatenated into a single vector, known as meta-spectrum (Godinho et al. 2014). Compilation of data from different nonspecific techniques provides complementary interpretations and facilitates full product description (Vera et al. 2011).
There are three strategies for data fusion: low-level, mid-level, and high-level. The low-level data fusion combines directly the original signals after the steps of preprocessing and, then, builds the model with the concatenated matrix. The mid-level involves extracting the characteristic features of the data sets obtained for each technique before data fusion and model construction. Finally, at a high-level strategy, a multivariate model is built separately for each technique and the individual responses are combined to produce the final result. Biancolillo et al. (2014) have presented a discussion on this subject, highlighting the benefits and drawbacks of each data fusion level.
Data fusion has been applied especially to the analysis of complex food matrices, such as olive oils (Casale et al. 2010; Casale et al. 2012; Pizarro et al. 2013), meat (Alamprese et al. 2013), spices (Di Anibal et al. 2011), and beers (Vera et al. 2011; Biancolillo et al. 2014).
Results and Discussion
UV-Vis PLS-DA Model
The typical UV-Vis spectra for the analyzed aged cachaças are show in Fig. 1. The majority of the samples studied present two characteristic bands with maximum wavelengths at 200 nm and between 250 and 300 nm. These wavelengths are associated with absorption of benzene-rings substituted with groups such as hydroxyl, commonly found in phenolic compounds present in aged cachaças. The PLS2-DA algorithm was employed because it provided a classification performance similar to PLS1-DA. Unlike PLS1-DA, which builds a specific model for each class separately, PLS2-DA allows that all classes are simultaneously predicted in a single model. The training set was built with 82 samples (15 samples of amendoim; 17 of balsam; 16 of oak; 17 of jequitibá; and 17 of umburana), while 42 samples (seven samples of amendoim; nine of balsam; eight of oak; nine of jequitibá; and nine of umburana) were used in the test set. The analyzed matrix has dimensions of 124 × 156 (samples × wavelengths). The best model was constructed with six LV, which accounted for 100 and 37 % of the X and Y variance, respectively (Table 1). The variable importance in projection (VIP) scores of the PLS-DA model for classification of each type of aged cachaças is shown in Fig. 2. By observing this figure, it was possible to identify the wavelengths that most contributed for this classification. The assignment of these specific wavelengths to the markers found in aged cachaças is difficult, considering that the UV-Vis spectrophotometry is not a technique commonly used for fingerprinting. As noted above, these wavelengths are associated with absorption of hydroxyl-substituted benzene-rings, commonly found in phenolic compounds present in aged cachaças, such as phenolic acids (vanillic, gallic, syringic, and ellagic), phenolic aldehydes (vanillin, syringaldehyde, coniferaldehyde, and sinapaldehyde), and coumarin (Aquino et al. 2006).
PLS-DA estimates a “prediction probability” and a classification threshold for each modeled class. These thresholds are calculated using a Bayesian method. The probability was calculated employing the plsdthres function present in PLS Toolbox software (Wise et al. 2006). Basically, this function takes the predicted y values from the PLS-DA model, fits a normal distribution, and then uses them to calculate the probability of observing a given y-value. The two distributions typically “cross” in only one place, which leads to a single point where both P(B|y) and P(A|y) are 0.5. This point is selected as the threshold for the PLS-DA. For a complete description of this method see Pérez et al. (2009).
Upon the model creation, three important performance parameters for qualitative methodologies were estimated: sensitivity, selectivity, and reliability rates. The sensitivity is the method’s ability to detect true positive samples as positive, while the sensitivity rate (SNR) is the probability of a method to classify known positive samples as positive. Similarly, the selectivity of the method corresponds to the ability to detect truly negative samples as negative and the selectivity ratio (SLR) the probability of the method classifying known negative samples as negative (Trullols et al. 2004; Gondim et al. 2014). The reliability rate (RLR) indicates how reliable is a method and is defined as the difference between the total of results (100 %) and the sum of the false positive and the false negative rates (Gondim et al. 2011; Botelho et al. 2015).
These three parameters were calculated for the training and test sets, and the results are listed in Tables 2 and 3. Analyzing the results in Table 3, it is seen that the RLR ranged from 46 to 87 % for the training sets and 56 to 89 % for the test sets. The UV-Vis PLS-DA model provided better results for the classes of aged cachaça in amendoim and jequitibá wooden barrels. Silva et al. (2012) have obtained accuracy between 80 and 100 % in classification of aged cachaças certified by the manufacturer using UV-Vis spectra and LDA.
By using EEM, a new classification model was built, searching for better results.
EEM NPLS-DA Model
Typical contour maps for the analyzed aged cachaças are show in Fig. 3. The fluorophores present in aged cachaças are derived primarily from the oxidation of lignin in the presence of alcohol, which leads to phenolic aldehydes and phenolic acids (Aquino et al. 2006). The EEM of 10.00 mg L−1 standard solutions of various phenolic compounds and coumarin were prepared in an ethanol/water solution (40:60 %) and analyzed under the same conditions previously described. It was found that phenolic acids (gallic, syringic, vanillic, ellagic) and coumarin were characterized by maximum excitation at 250 nm and emission at 300 nm, while phenolic aldehydes (sinapaldehyde, coniferaldehyde, syringaldehyde, vanillin) showed maximum excitation at 350 nm and emission between 400 and 450 nm.
The analyzed matrix has dimensions of 124 × 171 × 27 (samples × emission wavelengths × excitation wavelengths). A NPLS2-DA model was obtained, i.e., all classes were predicted through a single model. The best model was constructed with ten factors that accounted for 99 and 56 % of the X and Y variance, respectively (Table 1).
The PLS Toolbox software does not calculate VIP scores for NPLS models, but Favilla et al. (2013) have developed a routine for this purpose. Therefore, this routine was used and the VIP scores for the developed model can be seen in Fig. 4. By observing this figure, it was possible to identify the wavelengths that most contributed for the classification: 250, 280, 330, 360, and 450 nm (excitation wavelengths), and 352 and 446 nm (emission wavelengths). As described earlier in this section, most of these excitation and emission wavelengths are characteristic of phenolic aldehydes and acids that characterize the aged cachaça. Analyzing the results in Table 3, it is seen that the RLR ranged from 48 to 91 % for the training sets and from 37 to 91 % for the test sets.
The threshold for each class in this model was also calculated through Bayesian statistics, employing the plsdthres function. The EEM NPLS-DA model provided better results for the classes of aged cachaça in oak and balsam wooden barrels.
Fused PLS-DA Models
Based on the results obtained for the two previously generated models, it was observed that the data generated by UV-Vis spectrophotometry provided better results for two classes of aged cachaça, amendoim and jequitibá, while spectrofluorimetric data provided better results for the other two classes of aged cachaças, balsam and oak. For the class of aged cachaça in umburana, both models provided good results. Thus, it was tried to merge the two types of data for building a model that be able to correctly classify the largest number of samples with good sensitivity and selectivity rates: a fused PLS-DA model.
Firstly, a mid-level data fusion strategy was tried, from the scores of the two models previously generated. The analyzed matrix has dimensions of 124 × 16 (samples × variables). The best model was constructed with six LV that accounted for 76 and 48 % of the X and Y variance, respectively (Table 1). Analyzing the results in Table 3, it is seen that the RLR ranged from 55 to 93 % for the training sets, and from 43 to 100 % for the test sets.
Considering these results, which did not provide a 100 % correct classification ability for all the test samples, a further attempt was carried out to also investigate the low-level data fusion strategy. The row-wise augmented data matrix was formed by the fusion of the UV-Vis and fluorescence spectra (250, 280, 330, 360, and 450 nm excitations), totaling 1011 variables. The choice of the excitation wavelengths was based on the analysis of the NPLS-DA VIP scores. In Fig. 5, the two spectral data sets were plotted before the fusion. For the best visualization of spectral profiles, the emission bands were normalized to have the same scale of intensity of the UV-Vis spectra.
The best model was constructed with eight LV that accounted for 82 and 61 % of the X and Y variance, respectively (Table 1). The results seen in Table 3 show that the best models were obtained with the low-level fused data: the RLR ranged from 56 to 100 % for the training sets and from 60 to 94 % for the test sets. For all the classes, this model showed the best RLR for training and testing sets, except for the class of samples aged in oak. Among the classes correctly classified, the class of cachaças aged in amendoim provided the worst RLR. This may be an indication that producers make more blends with this type of wood, without informing in the labels.
In order to monitor the developed models, the classification of the samples aged in our laboratory was evaluated. The model that had the highest success rate was the low-level data fused, with 100 % of correct classification.
Conclusion
In conclusion, UV-Vis spectra and molecular fluorescence in combination with supervised classification chemometric methods provided good models for the prediction of the wood used in the aging of commercial cachaças.
In particular, by using low-level data fusion, which proved to be the best fusion protocol, it was possible to obtain the highest RLR for four out of the five analyzed classes. This model can be used for routine analysis, facilitating the quality control of this drink and its supervision by the inspection organisms, helping to combating frauds.
References
Alamprese C, Casale M, Sinelli N, Lanteri S, Casiraghi E (2013) Detection of minced beef adulteration with turkey meat by UV–vis, NIR and MIR spectroscopy. LWT-Food Sci Technol 53:225–232
Aquino FWB, Rodrigues S, Nascimento RF, Casimiro ARS (2006) Simultaneous determination of aging markers in sugar cane spirits. Food Chem 98:569–574
Bahram M, Bro R, Stedmon C, Afkhami A (2006) Handling of Rayleigh and Raman scatter for PARAFAC modeling of fluorescence data using interpolation. J Chemom 20:99–105
Barker M, Rayens W (2003) Partial least squares for discrimination. J Chemom 17:166–173
Biancolillo A, Bucci R, Magrì AL, Magrì AD, Marini F (2014) Data-fusion for multiplatform characterization of an Italian craft beer aimed at its authentication. Anal Chim Acta 820:23–31
Botelho BG, Reis N, Oliveira LS, Sena MM (2015) Development and analytical validation of a screening method for simultaneous detection of five adulterants in raw milk using mid-infrared spectroscopy and PLS-DA. Food Chem 181:31–37
Bro R (1996) Multiway calibration multilinear PLS. J Chemom 10:47–61
Casale M, Casolino C, Oliveri P, Forina M (2010) The potential of coupling information using three analytical techniques for identifying the geographical origin of Liguria extra virgin olive oil. Food Chem 118:163–170
Casale M, Oliveri P, Casolino C, Sinelli N, Zunin P, Armanino C, Forina M, Lanteri S (2012) Characterisation of PDO olive oil Chianti Classico by non-selective (UV–visible, NIR and MIR spectroscopy) and selective (fatty acid composition) analytical techniques. Anal Chim Acta 712:56–63
Di Anibal CV, Callao MP, Ruisánchez I (2011) 1H NMR and UV-visible data fusion for determining Sudan dyes in culinary spices. Talanta 84:829–833
Faria JB, Cardello H, Boscolo M, Isique WD, Odello L, Franco DW (2003) Evaluation of Brazilian woods as an alternative to oak for cachacas aging. Eur Food Res Technol 218:83–87
Favilla S, Durante C, Li Vigni M, Cocchi M (2013) Assessing feature relevance in NPLS models by VIP. Chemom Intell Lab 129:76–86
Godinho MS, Blanco MR, Gambarra Neto FF, Lião LM, Sena MM, Tauler R, Oliveira AE (2014) Evaluation of transformer insulating oil quality using NIR, fluorescence, and NMR spectroscopic data fusion. Talanta 129:143–149
Gondim CS, Junqueira RG, Souza SVC (2011) Trends in implementing the validation of qualitative methods of analysis. Rev Inst Adolfo Lutz 70:433–447
Gondim CS, Coelho OAM, Alvarenga RL, Junqueira RG, Souza SVC (2014) An appropriate and systematized procedure for validating qualitative methods: its application in the detection of sulfonamide residues in raw milk. Anal Chim Acta 830:11–22
Hall GJ, Clow KE, Kenny JE (2005) Estuarial fingerprinting through multidimensional fluorescence and multivariate analysis. Environ Sci Technol 39:7560–7567
Kennard RW, Stone LA (1969) Computer aided design of experiments. Technometrics 11:137–148
MAPA (2005) Secretaria de Defesa Agropecuária. Ministério da Agricultura, Pecuária e Abastecimento. Instrução Normativa No 13, Brazil
Pérez NF, Ferré J, Boqué R (2009) Calculation of the reliability of classification in discriminant partial least-squares binary classification. Chemom Intell Lab 95:122–128
Pizarro C, Rodríguez-Tecedor S, Pérez-del-Notario N, Esteban-Díez I, González-Sáiz J (2013) Classification of Spanish extra virgin olive oils by data fusion of visible spectroscopic fingerprints and chemical descriptors. Food Chem 138:915–922
Silva AA, Nascimento ESP, Cardoso DR, Franco DW (2009) Coumarins and phenolic fingerprints of oak and Brazilian woods extracted by sugarcane spirit. J Sep Sci 32:3681–3691
Silva AA, Keukeleire D, Cardoso DR, Franco DW (2012) Multivariate analyses of UV-Vis absorption spectral data from cachaca wood extracts: a model to classify aged Brazilian cachacas according to the wood species used. Anal Methods 4:642–646
Souza PP, Siebald HGL, Augusti DV, Neto WB, Amorim VM, Catharino RR, Eberlin MN, Augusti R (2007) Electrospray ionization mass spectrometry fingerprinting of Brazilian artisan cachaca aged in different wood casks. J Agric Food Chem 55:2094–2102
Trullols E, Ruisanchez I, Rius FX (2004) Validation of qualitative analytical methods. Trac-Trend Anal Chem 23:137–145
Vera L, Aceña L, Guasch J, Boqué R, Mestres M, Busto O (2011) Discrimination and sensory description of beers through data fusion. Talanta 87:136–142
Wise BM, Gallagher NB, Bro R, Shaver JM, Windig W, Koch RS (2006) Chemometrics tutorial for and solo. Eingenvector Research Inc, Wenatchee
Acknowledgments
The authors thank CAPES and CNPq for fellowships.
Conflict of Interest
Cristina D. Bernardes declares that she has no conflict of interest. Dr. Paulo J. S. Barbeira declares that he has no conflict of interest. This article does not contain any studies with human or animal subjects.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bernardes, C.D., Barbeira, P.J.S. Different Chemometric Methods for the Discrimination of Commercial Aged Cachaças. Food Anal. Methods 9, 1053–1059 (2016). https://doi.org/10.1007/s12161-015-0284-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12161-015-0284-6