Abstract
Certification and quality assessment are crucial issues within the wine industry. Currently, wine quality is mostly assessed by physicochemical (e.g alcohol levels) and sensory (e.g. human expert evaluation) tests. In this paper, we propose a data mining approach to predict wine preferences that is based on easily available analytical tests at the certification step. A large dataset is considered with white vinho verde samples from the Minho region of Portugal. Wine quality is modeled under a regression approach, which preserves the order of the grades. Explanatory knowledge is given in terms of a sensitivity analysis, which measures the response changes when a given input variable is varied through its domain. Three regression techniques were applied, under a computationally efficient procedure that performs simultaneous variable and model selection and that is guided by the sensitivity analysis. The support vector machine achieved promising results, outperforming the multiple regression and neural network methods. Such model is useful for understanding how physicochemical tests affect the sensory preferences. Moreover, it can support the wine expert evaluations and ultimately improve the production.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bi, J., Bennett, K.: Regression Error Characteristic curves. In: Proceedings of 20th Int. Conf. on Machine Learning (ICML), Washington DC, USA (2003)
Blake, C., Merz, C.: UCI Repository of Machine Learning Databases (1998)
Boser, B., Guyon, I., Vapnik, V.: A training algorithm for optimal margin classifiers. In: COLT 1992: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pp. 144–152. ACM Press, New York (1992)
Cherkassy, V., Ma, Y.: Practical Selection of SVM Parameters and Noise Estimation for SVM Regression. Neural Networks 17(1), 113–126 (2004)
Cortez, P.: RMiner: Data Mining with Neural Networks and Support Vector Machines using R. In: Rajesh, R. (ed.) Introduction to Advanced Scientific Softwares and Toolboxes (in press)
Cortez, P., Portelinha, M., Rodrigues, S., Cadavez, V., Teixeira, A.: Lamb Meat Quality Assessment by Support Vector Machines. Neural Processing Letters 24(1), 41–51 (2006)
CVRVV. Portuguese Wine - Vinho Verde. Comissão de Viticultura da Região dos Vinhos Verdes (CVRVV) (July 2008), http://www.vinhoverde.pt
Dietterich, T.: Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms. Neural Computation 10(7), 1895–1923 (1998)
Ebeler, S.: Linking flavour chemistry to sensory analysis of wine. In: Flavor Chemistry - Thirty Years of Progress, pp. 409–422. Kluwer Academic Publishers, Dordrecht (1999)
Ferrer, J., MacCawley, A., Maturana, S., Toloza, S., Vera, J.: An optimization approach for scheduling wine grape harvest operations. Production Economics, pp. 985–999 (2008)
Flexer, A.: Statistical evaluation of neural networks experiments: Minimum requirements and current practice. In: Proceedings of the 13th European Meeting on Cybernetics and Systems Research, Vienna, Austria, vol. 2, pp. 1005–1008 (1996)
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Journal of Machine Learning Research 3, 1157–1182 (2003)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, NY (2001)
Kewley, R., Embrechts, M., Breneman, C.: Data Strip Mining for the Virtual Design of Pharmaceuticals with Neural Networks. IEEE Transactions on Neural Networks 11(3), 668–679 (2000)
Kramer, S., Widmer, G., Pfahringer, B., De Groeve, M.: Prediction of Ordinal Classes Using Regression Trees. Fundamenta Informaticae 47(1), 1–13 (2001)
Legin, A., Rudnitskaya, A., Luvova, L., Vlasov, Y., Natale, C., D’Amico, A.: Evaluation of Italian wine by the electronic tongue: recognition, quantitative analysis and correlation with human sensory perception. Analytica Chimica Acta, 33–34 (2003)
Moreno, I., González-Weller, D., Gutierrez, V., Marino, M., Cameán, A., González, A., Hardisson, A.: Differentiation of two Canary DO red wines according to their metal content from inductively coupled plasma optical emission spectrometry and graphite furnace atomic absorption spectrometry by using Probabilistic Neural Networks. Talanta 72, 263–268 (2007)
R Development Core Team: R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2008), http://www.R-project.org , ISBN 3-900051-00-3
Rumelhart, D., Hinton, G., Williams, R.: Learning Internal Representations by Error Propagation. In: Rulmelhart, D., McClelland, J. (eds.) Parallel Distributed Processing: Explorations in the Microstructures of Cognition, pp. 318–362. MIT Press, Cambridge (1986)
Smith, D., Margolskee, R.: Making sense of taste. Scientific American 284, 26–33 (2001)
Smola, A., Scholkopf, B.: A tutorial on support vector regression. Statistics and Computing 14, 199–222 (2004)
Sun, L., Danzer, K., Thiel, G.: Classification of wine samples by means of artificial neural networks and discrimination analytical methods. Fresenius’ Journal of Analytical Chemistry 359, 143–149 (1997)
Turban, E., Sharda, R., Aronson, J., King, D.: Business Intelligence, A Managerial Approach. Prentice-Hall, Englewood Cliffs (2007)
Vlassides, S., Ferrier, J., Block, D.: Using Historical Data for Bioprocess Optimization: Modeling Wine Characteristics Using Artificial Neural Networks and Archived Process Information. Biotechnology and Bioengineering, 73(1) (2001)
Wang, W., Xu, Z., Lu, W., Zhang, X.: Determination of the spread parameter in the Gaussian kernel for classification and regression. Neurocomputing 55, 643–663 (2003)
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco (2005)
Yu, H., Lin, H., Xu, H., Ying, Y., Li, B., Pan, X.: Prediction of Enological Parameters and Discrimination of Rice Wine Age Using Least-Squares Support Vector Machines and Near Infrared Spectroscopy. Agricultural and Food Chemistry 56, 307–313 (2008)
Yu, M., Shanker, M., Zhang, G., Hung, M.: Modeling consumer situational choice of long distance communication with neural networks. Decision Support Systems 44, 899–908 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Cortez, P., Teixeira, J., Cerdeira, A., Almeida, F., Matos, T., Reis, J. (2009). Using Data Mining for Wine Quality Assessment. In: Gama, J., Costa, V.S., Jorge, A.M., Brazdil, P.B. (eds) Discovery Science. DS 2009. Lecture Notes in Computer Science(), vol 5808. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04747-3_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-04747-3_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04746-6
Online ISBN: 978-3-642-04747-3
eBook Packages: Computer ScienceComputer Science (R0)