Abstract
Data mining techniques are currently a powerful tool to address with the seasonal time-scales forecasting. In this work, neural networks, support vector regression and generalized additive models are considered besides the most commonly used multiple linear regression methodology, to obtain precipitation forecasting models in the area of “Gran Chaco Argentino”. The results indicate that data mining techniques improve forecasts derived from other methodologies, although the efficiency of the different methodologies is highly dependent on the month and the region. The non-linear techniques improve the forecasts and show lower mean square error than the multiple linear regression and support vector regression. The root mean square error is higher east of study area than in the west because precipitation is higher. The coefficient of variation is quite low in all the months in the central and southwest parts of the area. The precipitation interval with the highest probability of occurrence showed a value of 1.5. In addition, the possibility of generating ensemble means of several models and deriving categorical forecasts is a highly advisable alternative for prediction in this region of Argentina. The use of ensemble means is recommended. The derived forecasts improve the dynamic world center models only in some regions of the study area.
Similar content being viewed by others
Data availability
The datasets generated during and/or analyzed during the current study are not publicly available but are available from the corresponding author on reasonable request.
Code availability
Not applicable.
References
Barnston A, Kumar A, Goddard L, Hoerling M (2005) Improving seasonal prediction practices through attribution of climate variability. BAMS 86(1):59–72. https://doi.org/10.1175/BAMS-86-1-59
Barreiro M (2009) Influence of ENSO and the South Atlantic Ocean on climate predictability over Southeastern South America. Clim Dyn. https://doi.org/10.1007/s00382-009-0666-9
Boukabara S, Krasnopolsky V, Stewart JQ, Maddy ES, Shahroudi N and Hoffman RN (2019) Leveraging modern artificial intelligence for remote sensing and NWP: benefits and challenges. Bull Am Meteorol Soc 100(12):ES473–ES491. Burkov
Chollet et al (2022) Keras. https://keras.io. Accessed 13 Sept 2022
Coelho C, Stephenson D, Balmaseda M, Doblas Reyes F, Oldenborge G (2005) Towards an integrated seasonal forecasting system for South America. J Climate 19:3704–3721
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20:273–297. https://doi.org/10.1007/BF00994018
Ebert-Uphoff I, Hilburn K (2020) Evaluation, tuning and interpretation of neural networks for working with images in meteorological applications. Bull Am Meteor Soc. https://doi.org/10.1175/BAMS-D-20-0097.1
FAO (2011) State of the world’s forests. Food and Agriculture Organization of the United Nations, Rome, Italy
Goddard L, Barnston A and Mason S (2003) Evaluation of the IRI´s “net assessment” seasonal climate forecasts. 1997–2001. BAMS. 1761–1781
Gonzalez MH, Rolla AL (2019) Comparison between statistical precipitation prediction in northern Patagonia (Argentina) using ERA- INTERIM and NCEP reanalysis datasets. In: Prathamesh Gorawala y Srushti Mandhari (eds) Agricultural Research Updates. 27 4: 117–128, NOVA Science Publications, New York, USA
Hartigan JA (1985) Statistical theory in clustering. J Classif 2:63–76. https://doi.org/10.1007/BF01908064
Hartigan JA, Wong MA (1979) Algorithm AS 136: a K-means clustering algorithm. Appl Stat 28:100–108. https://doi.org/10.2307/2346830
Hyndman RJ, Athanasopoulos G (2022) Forecasting principles and practice. OTexts: Melbourne, Australia. http://otexts.org/fpp2/. Accessed 13 Sept 2022
Kalnay E, Kanamitsu M, Kistler R, Collins W, Deaven D, Gandin L, Iredell M, Saha S, White G, Woollen J, Zhu I, Chelliah M, Ebisuzaki W, Higgings W, Janowiak J, Mo KC, Ropelewski C, Wang J, Leetmaa, Reynolds R, Jenne R, Joseph D (1996) The NCEP/NCAR reanalysis 40 years- project. Bull Am Meteorol Soc 77:437–471
Kaski S (2011) Self-Organizing Maps. Sammut C., In: Webb GI (eds) Encyclopedia of Machine Learning. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-30164-8_746
Kumar A (2006) On the interpretation and utility of skill information for seasonal climate predictions. Mon Wea Rev 135:1974–1984
Lee Y, Hall D, Stewart J and Govett M (2018) Machine learning for targeted assimilation of satellite data. Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer. 53–68
Leetmaa A (2003) Seasonal forecasting. Innov Practice Institut BAMS 84:1686–1691
Murtagh F, Legendre P (2014) Ward’s hierarchical agglomerative clustering method: which algorithms implement Ward’s criterion? J Classif 31:274–295. https://doi.org/10.1007/s00357-014-9161-z
Nobre C, Marengo J, Cavalcanti I, Obregon G, Barros V, Camilloni I, Campos N, Ferreira A (2005) Seasonal to decadal predictability and prediction of South America Climate. J Climate 19(23):5988–6004
Reichstein M, Camps-Valls G, Stevens V, Jung M, Denzler J, Carvalhais N et al (2019) Deep learning and process understanding for data-driven earth system science. Nature 566(7743):195–204
Tibshibari R (1996) Regression Shrinkage and Selection via the Lasso. J Roy Stat Soc 58(1):267–288
Wilks DS (2011) Statistical methods in the atmospheric sciences, 3rd edn. Academic Press, San Diego, California, USA, p 704
Wood S (2006) Generalized additive models: an introduction with R. 2nd edn, CRC Press Taylor & Francis, 474
Acknowledgements
Rainfall data was provided by the National Meteorological Service of Argentina (SMN) and the National Institute of Agricultural Technology (INTA). Thanks to the Copernicus Climate Change Service for providing the dynamical models data. The forecasts from the US National Centers for Environmental Prediction (NCEP), the Japan Meteorological Agency (JMA), and Environment and Climate Change Canada (ECCC) are in-kind contributions to the Copernicus Climate Change Service, which we acknowledge with gratitude.
Funding
This work was supported by 2020–2022 UBACyT 20020190100090BA and 2018–2020 UBACyT 20620170100012BA projects.
Author information
Authors and Affiliations
Contributions
All the authors contributed to the study conception and design. Material preparation, data collection, and analysis were performed by all the authors. All the authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval
Not applicable.
Consent to participate
All the authors have consented to participate in the study.
Consent for publication
All the authors have consented to publish the study.
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
González, M.H., Rolla, A.L. Data mining techniques applied to statistical prediction of monthly precipitation in Gran Chaco Argentina. Theor Appl Climatol 150, 1027–1043 (2022). https://doi.org/10.1007/s00704-022-04209-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00704-022-04209-y