Original papers
Element selection and concentration analysis for classifying South America wine samples according to the country of origin

https://doi.org/10.1016/j.compag.2018.03.027Get rights and content

Highlights

  • A novel feature selection method is proposed for geographical origin classification of wine.

  • We use Kruskal Wallis test and LDA to reduce the original subset of features.

  • Nearest neighbor, Naïve Bayes, LDA and SVM classification algorithms are tested.

  • We classify wine samples from 4 south America countries using element concentration.

  • We achieved average 99.9% classification retaining average 6.73 out of 45 elements.

Abstract

This paper proposes an approach for feature selection aimed at classifying wines samples according to place of origin. The method relies on Kruskal-Wallis non-parametric test to remove non significant features, and Linear Discriminant Analysis to derive a feature importance index. The ranked features according that index are iteratively added and classification performance is assessed after each insertion. The number of selected features is chosen according the maximum accuracy in a repeated 10-fold cross-validation. Aiming at improving categorization accuracy, different classification techniques are tested. When applied to a wine dataset comprised of 53 samples from four South America countries (Argentina, Brazil, Chile, and Uruguay) and 45 chemical elements concentrations determined by ICP-OES and ICP-MS, the proposed framework yielded average 99.9% accurate classifications in the testing set, and retained average 6.73 of the 45 original elements. Retained chemical elements were then qualitatively assessed.

Introduction

The growth in international trades and potential markets for food and beverage products have motivated producing regions to develop and apply regulations to ensure product traceability (Zhao et al., 2013). In trading markets, the association of brands with their places of origin tends to boost product acceptance, leading to premium prices and commercial advantages (Diniz et al., 2014, Karoui and De Baerdemaeker, 2007). Thus, food and beverage manufacturers have displayed increasing interest in ensuring precise categorization of products into proper classes according to their place of origin, as well as in improving mechanisms for confirming products’ authenticity (Borràs et al., 2015, Marcelo et al., 2014).

Wine attributes and quality strongly depend on grapes’ features, soil properties and climate conditions, among other variables; such aspects, when combined with specific cultivation, production and preservation techniques become fundamental for product promotion and distinction (Marini et al., 2006). Wines cultivated in specific geographical regions and subjected to strict regulations are certified by a Controlled Denomination of Origin (CDO) distinction, which ensures their superior quality and adherence to best practices (Gómez-Meire et al., 2014). Thus, the development of reliable, fast and straightforward techniques aimed at precisely recognizing wines’ authenticity regarding their origin becomes a relevant issue to preserve the reputation of a CDO distinction (Marini et al., 2006).

An analytical approach to trace the origin of food and beverage products consists of assessing their elemental composition and chemical concentration (Drivelos and Georgiou, 2012). The analysis of elements’ concentrations determined by inductively coupled plasma optical emission spectrometry (ICP-OES) and/or inductively coupled plasma mass spectrometry (ICP-MS) has been widely used to determine the quality of products such as organic coffee (Barbosa et al., 2014a), eggs (Barbosa et al., 2014b), rice (Maione et al., 2016), and tea (Diniz et al., 2014, Moreda-Pineiro et al., 2003). Due to its high sensitivity and ability to measure isotopes, ICP-MS is deemed one of the most appropriate techniques for the determination of trace elements in wine (Gonzálvez et al., 2009). Such technique quantifies the presence of several chemical elements (e.g. Cu, Fe, Mn, Sn and Zn) which may affect wine stability in terms of color, taste, and organoleptic aspects. The concentration of those elements may be determined by geochemistry features, as well as by variations on winemaking procedures. Thus, the assessment of elements’ concentrations becomes a valuable resource to corroborate the authenticity of a wine. With that in view, focusing on chemical elements with higher discriminant ability becomes a crucial step to ensure proper classification of wine samples according to producing country or region. Although some studies have applied statistical and data mining-based techniques for classifying wines according to organoleptic features and geographical origin (e.g. Marini et al., 2006, Coetzee et al., 2014, Azcarate et al., 2015), few have focused on the selection of relevant features (i.e. chemical elements) that enable accurate discrimination and classification of wine samples.

This paper proposes a novel framework for feature selection aimed at categorizing wines samples into classes according to place of origin. The method combines filter and wrapper-based feature selection procedures, and relies on two operational steps. In the first step, the method applies the Kruskal-Wallis (KW) non-parametric test to each feature; features presenting a p-value higher than a given threshold h are removed from the analysis. The aim here is to discard features with no significant ability to discriminate wine samples regarding their place of origin, reducing computational effort and potentially increasing the classifier’s performance. Next, a Linear Discriminant Analysis (LDA) is applied to the remaining features, and a feature importance index is derived from LDA parameters; such index guides the selection process carried out in the next step of the proposed method. In the second step, a forward procedure based on the ranking of features given by the LDA importance index is employed. Best-ranked features are inserted one by one into the subset of features used for classification; after each insertion, classification performance is assessed. The number of selected features is chosen according the maximum accuracy in the repeated cross-validation. Aiming at improving categorization accuracy, different classification techniques are tested.

Section snippets

Samples, materials and sample preparation

Fifty three (53) samples of red wine from four wine-producing countries in South America were purchased in local markets: 13 from Argentina, 15 from Brazil, 13 from Chile, and 12 from Uruguay. The cultivars (mostly Vitis vinifera species) and geographical origin were labeled on the wine bottles. The number of samples from each producing country was not the same, as some cultivars were not found in local markets.

For sample preparation and dilution, nitric acid (Merck) was used. High-purity water

Results and discussion

Table 1 (adapted from Bentlin et al., 2011) depicts the interval of concentrations for the assessed elements in μg L−1 (note that Ca, Na, Mg, P and K concentrations are given in mg L−1); concentration means and standard deviations are presented in bold and between parentheses, respectively.

The proposed method starts by applying the KS test on all features to select the ones with p-values smaller than a given threshold h. To better assess the influence of h in the results, we tested three

Conclusion

The development of frameworks aimed at recognizing wines’ authenticity regarding their origin is deemed a fundamental topic to preserve the reputation of a CDO. This paper proposed a framework for feature selection to classify wines samples into categories according to place of origin. The framework firstly applies the Kruskal-Wallis (KW) test to each feature aiming at removing features with no significant ability to discriminate wine samples. Next, a feature importance index is derived from

References (32)

Cited by (26)

  • Determination of the most informative chemical elements for discrimination of rice samples according to the producing region

    2023, Food Chemistry
    Citation Excerpt :

    Like other foods, the denomination of origin impacts not only rice price but also influences commercial trades. In that sense, several statistics-based techniques have been applied to confirm the authenticity, composition, and quality of food products, leading to a better comprehension of chemical elements and features that differentiate products (Kahmann, Anzanello, Marcelo, & Pozebon, 2017; Soares et al., 2018; Yamashita et al., 2019; Rodrigues et al., 2020). This study proposes an approach to identify the chemical elements that best discriminate rice samples according to their region of origin in RS, Brazil.

  • Selecting relevant wavelength intervals for PLS calibration based on absorbance interquartile ranges

    2022, Chemometrics and Intelligent Laboratory Systems
    Citation Excerpt :

    It is likely that some wavelengths in such threshold subset are not relevant, and a localized approach to verify their contribution to the model becomes worthwhile. In addition, we also intend to integrate a filter phase relying on the Kruskal-Wallis non-parametric test [48] in the suggested framework tailored to remove less relevant wavelengths before conducting the wrapper phase. Gabrielli Harumi Yamashita: Conceptualization, Methodology, Writing- Original draft preparation, Michel Jose Anzanello: Supervision, Writing- Reviewing and Editing, Felipe Soares: Conceptualization, Visualization, Miriam Karla Rocha: Validation, Flavio Sanson Fogliatto: Writing- Reviewing and Editing.

  • Predictive modeling for wine authenticity using a machine learning approach

    2021, Artificial Intelligence in Agriculture
    Citation Excerpt :

    When there are no linear decision boundaries, the original dataset (x) is converted into a new space f(x) which there is a linear decision boundary that separates the samples into their classes. SVM has been successfully used in many different applications, such as: food science (Araújo et al., 2019; Richter et al., 2019; Soares et al., 2018; Turra et al., 2017), medicine (Froz et al., 2017; Vogado et al., 2018), forensic science (Maione et al., 2018), among others. In previous studies, some Vitis Vinífera wines from South America were analyzed with machine learning using feature selection and support vector machines based on their antioxidant activity, phenolic substances, anthocyanins and color (Costa et al., 2018, 2019; da Costa et al., 2016).

View all citing articles on Scopus
View full text