SMOTE for Regression

Torgo, Luís; Ribeiro, Rita P.; Pfahringer, Bernhard; Branco, Paula

doi:10.1007/978-3-642-40669-0_33

Luís Torgo^22,23,
Rita P. Ribeiro^22,23,
Bernhard Pfahringer²⁴ &
…
Paula Branco^22,23

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8154))

Included in the following conference series:

Portuguese Conference on Artificial Intelligence

3788 Accesses
86 Citations
9 Altmetric

Abstract

Several real world prediction problems involve forecasting rare values of a target variable. When this variable is nominal we have a problem of class imbalance that was already studied thoroughly within machine learning. For regression tasks, where the target variable is continuous, few works exist addressing this type of problem. Still, important application areas involve forecasting rare extreme values of a continuous target variable. This paper describes a contribution to this type of tasks. Namely, we propose to address such tasks by sampling approaches. These approaches change the distribution of the given training data set to decrease the problem of imbalance between the rare target cases and the most frequent ones. We present a modification of the well-known Smote algorithm that allows its use on these regression tasks. In an extensive set of experiments we provide empirical evidence for the superiority of our proposals for these particular regression tasks. The proposed SmoteR method can be used with any existing regression algorithm turning it into a general tool for addressing problems of forecasting rare extreme values of a continuous target variable.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Domingos, P.: Metacost: A general method for making classifiers cost-sensitive. In: KDD 1999: Proceedings of the 5th International Conference on Knowledge Discovery and Data Mining, pp. 155–164. ACM Press (1999)
Google Scholar
Elkan, C.: The foundations of cost-sensitive learning. In: IJCAI 2001: Proc. of 17th Int. Joint Conf. of Artificial Intelligence, vol. 1, pp. 973–978. Morgan Kaufmann Publishers (2001)
Google Scholar
Zadrozny, B.: One-benefit learning: cost-sensitive learning with restricted cost information. In: UBDM 2005: Proc. of the 1st Int. Workshop on Utility-Based Data Mining, pp. 53–58. ACM Press (2005)
Google Scholar
Chawla, N.V.: Data mining for imbalanced datasets: An overview. In: The Data Mining and Knowledge Discovery Handbook. Springer (2005)
Google Scholar
Zadrozny, B.: Policy mining: Learning decision policies from fixed sets of data. PhD thesis, University of California, San Diego (2003)
Google Scholar
Ling, C., Sheng, V.: Cost-sensitive learning and the class imbalance problem. In: Encyclopedia of Machine Learning. Springer (2010)
Google Scholar
Kubat, M., Matwin, S.: Addressing the curse of imbalanced training sets: One-sided selection. In: Proc. of the 14th Int. Conf. on Machine Learning, pp. 179–186. Morgan Kaufmann (1997)
Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: Synthetic minority over-sampling technique. JAIR 16, 321–357 (2002)
MATH Google Scholar
Torgo, L., Ribeiro, R.: Precision and recall for regression. In: Gama, J., Costa, V.S., Jorge, A.M., Brazdil, P.B. (eds.) DS 2009. LNCS, vol. 5808, pp. 332–346. Springer, Heidelberg (2009)
Chapter Google Scholar
Ribeiro, R.P.: Utility-based Regression. PhD thesis, Dep. Computer Science, Faculty of Sciences - University of Porto (2011)
Google Scholar
Davis, J., Goadrich, M.: The relationship between precision-recall and roc curves. In: ICML 2006: Proc. of the 23rd Int. Conf. on Machine Learning, pp. 233–240. ACM ICPS, ACM (2006)
Google Scholar
Torgo, L., Ribeiro, R.P.: Utility-based regression. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) PKDD 2007. LNCS (LNAI), vol. 4702, pp. 597–604. Springer, Heidelberg (2007)
Chapter Google Scholar
Milborrow, S.: Earth: Multivariate Adaptive Regression Spline Models. Derived from mda:mars by Trevor Hastie and Rob Tibshirani (2012)
Google Scholar
Dimitriadou, E., Hornik, K., Leisch, F., Meyer, D., Weingessel, A.: e1071: Misc Functions of the Department of Statistics (e1071), TU Wien (2011)
Google Scholar
Liaw, A., Wiener, M.: Classification and regression by randomforest. R News 2(3), 18–22 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

LIAAD - INESC TEC, Portugal
Luís Torgo, Rita P. Ribeiro & Paula Branco
DCC - Faculdade de Ciências, Universidade do Porto, Portugal
Luís Torgo, Rita P. Ribeiro & Paula Branco
Department of Computer Science, University of Waikato, New Zealand
Bernhard Pfahringer

Authors

Luís Torgo
View author publications
You can also search for this author in PubMed Google Scholar
Rita P. Ribeiro
View author publications
You can also search for this author in PubMed Google Scholar
Bernhard Pfahringer
View author publications
You can also search for this author in PubMed Google Scholar
Paula Branco
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Informatics Department, University of Lisbon, Campo Grande, 174-016, Lisbon, Portugal
Luís Correia
Information Systems Department, University of Minho, Campus de Azurém, 4800-058, Guimarães, Portugal
Luís Paulo Reis
Department of Education, University of the Azores, Campus de Angra do Heroísmo, Angra do Heroísma, 9700-042, Azores, Portugal
José Cascalho

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Torgo, L., Ribeiro, R.P., Pfahringer, B., Branco, P. (2013). SMOTE for Regression. In: Correia, L., Reis, L.P., Cascalho, J. (eds) Progress in Artificial Intelligence. EPIA 2013. Lecture Notes in Computer Science(), vol 8154. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40669-0_33

Download citation

DOI: https://doi.org/10.1007/978-3-642-40669-0_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40668-3
Online ISBN: 978-3-642-40669-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics