Skip to main content

Advertisement

Log in

Spatial Air Quality Index and Air Pollutant Concentration prediction using Linear Regression based Recursive Feature Elimination with Random Forest Regression (RFERF): a case study in India

  • Original Paper
  • Published:
Natural Hazards Aims and scope Submit manuscript

Abstract

In the last decade, air pollution has become one of the vital environmental issues and has expanded its wings day by day. Prediction of air quality plays a crucial role in warning people about the air quality levels. With the help of this, we can make the proper mechanism for reducing the overall impact of bad air quality on individuals’ health. In this paper, we are focused on developing a mechanistic and quantitative prediction model for the prediction of the Air Quality Index (AQI) and Air Pollutant Concentration (NOx) levels with a clear environmental interpretation. The proposed model is based on the Linear Regression based Recursive Feature Elimination with Random Forest Regression (RFERF). For the experimental analysis, the seven well-established machine learning models have been taken, and these models are compared with our proposed model to find out their suitability and correctness. The Mean Absolute Percentage Error (MAPE), mean absolute error (MAE), mean squared error (MSE), root mean squared error (RMSE), and Coefficient of Determination (R2 score) have been used to validate the performance of prediction models. For the prediction of AQI and NOx, the data of the Central Pollution Control Board of India has been taken. The proposed model performs superior as compared to other prediction models with better accuracy and a higher prediction rate. This work also explains that linking machine learning with sensor-generated AQI data for air quality prediction is an adequate and appropriate way to solve some related environment glitches. Apart from this, the impact of air pollution on individuals’ health due to high levels of AQI, NOx, and other pollutants with the possible solutions has also been covered.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data availability

The datasets used or analyzed during the current study are available from the corresponding author on reasonable request.

References

  • Agirre-Basurko E, Ibarra-Berastegi G, Madariaga I (2006) Regression and multilayer perceptron-based models to forecast hourly O3 and NO2 levels in the Bilbao area. Environ Model Softw 21(4):430–446

    Article  Google Scholar 

  • Anderson JO, Thundiyil JG, Stolbach A (2012) Clearing the air: a review of the effects of particulate matter air pollution on human health. J Med Toxicol 8(2):166–175

    Article  Google Scholar 

  • Athanasiadis IN, Kaburlasos VG, Mitkas PA, Petridis V (2003) Applying machine learning techniques on air quality data for real-time decision support. In: First international NAISO symposium on information technologies in environmental engineering (ITEE’2003), June, Gdansk, Poland

  • Biancofiore F, Busilacchio M, Verdecchia M, Tomassetti B, Aruffo E, Bianco S, Carlo Colangeli S, Rosatell G, Di Carlo P (2017) Recursive neural network model for analysis and forecast of PM10 and PM2.5. Atmosph Pollut Res 8(4):652–659

    Article  Google Scholar 

  • Bishop CM (2006) Pattern recognition and machine learning. Springer, New York

    Google Scholar 

  • Boningari T, Smirniotis PG (2016) Impact of nitrogen oxides on the environment and human health: Mn-based materials for the NOx abatement. Curr Opin Chem Eng 13:133–141

    Article  Google Scholar 

  • Cabaneros SMS, Calautit JKS, Hughes BR (2017) Hybrid artificial neural network models for effective prediction and mitigation of urban roadside NO2 pollution. Energy Procedia 142:3524–3530

    Article  Google Scholar 

  • Cabaneros SMS, Calautit JK, Hughes BR (2019) A review of artificial neural network models for ambient air pollution prediction. Environ Model Softw 119:285–304

    Article  Google Scholar 

  • Chelani AB, Rao CC, Phadke KM, Hasan MZ (2002) Formation of an air quality index in India. Int J Environ Stud 59(3):331–342

    Article  Google Scholar 

  • Chen M, Mao S, Liu Y (2014) Big data: a survey. Mobile Netw Appl 19(2):171–209

    Article  Google Scholar 

  • Cleland JG, Van Ginneken JK (1988) Maternal education and child survival in developing countries: the search for pathways of influence. Soc Sci Med 27(12):1357–1368

    Article  Google Scholar 

  • Corani G (2005) Air quality prediction in Milan: feed-forward neural networks, pruned neural networks and lazy learning. Ecol Model 185(2–4):513–529

    Article  Google Scholar 

  • CPCB (2020) Dataset. https://app.cpcbccr.com/ccr/#/caaqm-dashboard-all/caaqm-landing/data

  • De Vito S, Massera E, Piga M, Martinotto L, Di Francia G (2008) On field calibration of an electronic nose for benzene estimation in an urban pollution monitoring scenario. Sensors Actuators B Chem 129(2):750–757

    Article  Google Scholar 

  • De Vito S, Piga M, Martinotto L, Di Francia G (2009) CO, NO2 and NOx urban pollution monitoring with on-field calibrated electronic nose by automatic Bayesian regularization. Sensors Actuators B Chem 143(1):182–191

    Article  Google Scholar 

  • De Vito S, Fattoruso G, Pardo M, Tortorella F, Di Francia G (2012) Semi-supervised learning techniques in artificial olfaction: a novel approach to classification problems and drift counteraction. IEEE Sensors J 12(11):3215–3224

    Article  Google Scholar 

  • Deswal S, Verma V (2016) Annual and seasonal variations in air quality index of the national capital region, India. Int J Environ Ecol Eng 10(10):1000–1005

    Google Scholar 

  • Devroye L, Gyorfi L, Krzyzak A, Lugosi G (1994) On the strong universal consistency of nearest neighbor regression function estimates. Ann Stat 22(3):1371–1385

    Article  Google Scholar 

  • Drucker H, Burges CJ, Kaufman L, Smola AJ, Vapnik V (1997) Support vector regression machines. In: NIPS-3: proceedings of the 1990 conference on Advances in neural information processing systems, pp 155–161

  • Du X, Kong Q, Ge W, Zhang S, Fu L (2010) Characterization of personal exposure concentration of fine particles for adults and children exposed to high ambient concentrations in Beijing, China. J Environ Sci 22(11):1757–1764

    Article  Google Scholar 

  • Fan S, Hazell PB, Thorat S (1999) Linkages between government spending, growth, and poverty in rural India, vol 110. International Food Policy Research Institute, Washington, DC

    Google Scholar 

  • Friedman JM (1996) The effects of drugs on the fetus and nursing infant: a handbook for health care professionals. Johns Hopkins University Press, Baltimore

    Google Scholar 

  • Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29(5):1189–1232

    Article  Google Scholar 

  • Fu M, Wang W, Le Z, Khorram MS (2015) Prediction of particular matter concentrations by developed feed-forward neural network with rolling mechanism and gray model. Neural Comput Appl 26(8):1789–1797

    Article  Google Scholar 

  • Fuller GW, Carslaw DC, Lodge HW (2002) An empirical approach for the prediction of daily mean PM10 concentrations. Atmos Environ 36(9):1431–1441

    Article  Google Scholar 

  • Ganguly ND, Tzanis CG, Philippopoulos K, Deligiorgi D (2019) Analysis of a severe air pollution episode in India during Diwali festival—a nationwide approach. Atmósfera 32(3):225–236

    Article  Google Scholar 

  • Granitto PM, Furlanello C, Biasioli F, Gasperi F (2006) Recursive feature elimination with random forest for PTR-MS analysis of agroindustrial products. Chemom Intell Lab Syst 83(2):83–90

    Article  Google Scholar 

  • Ibarra-Berastegi G, Elias A, Barona A, Saenz J, Ezcurra A, de Argandoña JD (2008) From diagnosis to prognosis for forecasting air pollution using neural networks: air pollution monitoring in Bilbao. Environ Model Softw 23(5):622–637

    Article  Google Scholar 

  • India at a Glance (2019) Population enumeration data. https://www.india.gov.in/india-glance/profile. Accessed 9 Dec 2019

  • Jiang D, Zhang Y, Hu X, Zeng Y, Tan J, Shao D (2004) Progress in developing an ANN model for air pollution index forecast. Atmos Environ 38(40):7055–7064

    Article  Google Scholar 

  • Kalapanidas E, Avouris N (2001) Short-term air quality prediction using a case-based classifier. Environ Model Softw 16(3):263–272

    Article  Google Scholar 

  • Ketu S, Agarwal S (2015) Performance enhancement of distributed K-Means clustering for big Data analytics through in-memory computation. In 2015b eighth international conference on contemporary computing (IC3), August 2015b. IEEE, pp 318–324

  • Ketu S, Mishra PK (2020) Performance analysis of machine learning algorithms for IoT-based human activity recognition. In: Sengodan T, Murugappan M, Misra S (eds) Advances in electrical and computer technologies. Springer, Singapore, pp 579–591

    Chapter  Google Scholar 

  • Ketu S, Mishra PK (2021a) A hybrid deep learning model for COVID-19 prediction and current status of clinical trials worldwide. Comput Mater Continua 66(2):1896–1919

    Article  Google Scholar 

  • Ketu S, Mishra PK (2021b) Internet of healthcare things: a contemporary survey. J Netw Comput Appl 192:103179

    Article  Google Scholar 

  • Ketu S, Mishra PK (2021c) Cloud, fog and mist computing in IoT: an indication of emerging opportunities. IETE Tech Rev. https://doi.org/10.1080/02564602.2021.1898482

    Article  Google Scholar 

  • Ketu S, Mishra PK (2021d) Scalable kernel-based SVM classification algorithm on imbalance air quality data for proficient healthcare. Complex Intell Syst 7(5):2597–2615

    Article  Google Scholar 

  • Ketu S, Mishra PK (2021e) Enhanced Gaussian process regression-based forecasting model for COVID-19 outbreak and significance of IoT for its detection. Appl Intell 51(3):1492–1512

    Article  Google Scholar 

  • Ketu S, Mishra PK (2022a) Empirical analysis of machine learning algorithms on imbalance electrocardiogram based arrhythmia dataset for heart disease detection. Arab J Sci Eng 47(2):1447–1469

    Article  Google Scholar 

  • Ketu S, Mishra PK (2022b) India perspective: CNN-LSTM hybrid deep learning model-based COVID-19 prediction and current status of medical resource availability. Soft Comput 26(2):645–664

    Article  Google Scholar 

  • Ketu S, Mishra PK (2022c) Hybrid classification model for eye state detection using electroencephalogram signals. Cogn Neurodyn 16(1):73–90

    Article  Google Scholar 

  • Ketu S, Mishra PK (2022d) A contemporary survey on IoT based smart cities: architecture, applications, and open issues. Wirel Person Commun. https://doi.org/10.1007/s11277-022-09658-2

    Article  Google Scholar 

  • Ketu S, Prasad BR, Agarwal S (2015) Effect of corpus size selection on performance of map-reduce based distributed k-means for big textual data clustering. In: Proceedings of the sixth international conference on computer and communication technology, September 2015a, pp 256–260

  • Ketu S, Mishra PK, Agarwal S (2020) Performance analysis of distributed computing frameworks for big data analytics: hadoop vs spark. Comput Sist 24(2):669–686

    Google Scholar 

  • Kurt A, Oktay AB (2010) Forecasting air pollutant indicator levels with geographic models 3 days in advance using neural networks. Expert Syst Appl 37(12):7986–7992

    Article  Google Scholar 

  • Kyrkilis G, Chaloulakou A, Kassomenos PA (2007) Development of an aggregate Air Quality Index for an urban Mediterranean agglomeration: relation to potential health effects. Environ Int 33(5):670–676

    Article  Google Scholar 

  • Liaw A, Wiener M (2002) Classification and regression by randomForest. R News 2(3):18–22

    Google Scholar 

  • Lightstone SD, Moshary F, Gross B (2017) Comparing CMAQ forecasts with a neural network forecast model for PM 2.5 in New York. Atmosphere 8(9):161

    Article  Google Scholar 

  • Man CK, Gibbins JR, Witkamp JG, Zhang J (2005) Coal characterisation for NOx prediction in air-staged combustion of pulverised coals. Fuel 84(17):2190–2195

    Article  Google Scholar 

  • Mishra M (2019) Poison in the air: Declining air quality in India. Lung India: Official Organ of Indian Chest Society 36(2):160

    Article  Google Scholar 

  • Nagelkerke NJ (1991) A note on a general definition of the coefficient of determination. Biometrika 78(3):691–692

    Article  Google Scholar 

  • Ni XY, Huang H, Du WP (2017) Relevance analysis and short-term prediction of PM2.5 concentrations in Beijing based on multi-source data. Atmos Environ 150:146–161

    Article  Google Scholar 

  • Northey SA, Mudd GM, Werner TT (2018) Unresolved complexity in assessments of mineral resource depletion and availability. Nat Resour Res 27(2):241–255

    Article  Google Scholar 

  • Packtpub (2018) Machine learning algorithms. https://www.packtpub.com/product/machine-learning-algorithms-second-edition/9781789347999. Accessed 15 May 2022

  • Pérez P, Trier A, Reyes J (2000) Prediction of concentrations several hours in advance using neural networks in Santiago, Chile. Atmos Environ 34(8):1189–1196

    Article  Google Scholar 

  • Ruggieri M, Plaia A (2012) An aggregate AQI: comparing different standardizations and introducing a variability index. Sci Total Environ 420:263–272

    Article  Google Scholar 

  • The World Bank (2019) Population total—India. https://data.worldbank.org/indicator/SP.POP.TOTL?locations=IN. Accessed 9 Dec 2019

  • Tso GK, Yau KK (2007) Predicting electricity energy consumption: a comparison of regression analysis, decision tree and neural networks. Energy 32(9):1761–1768

    Article  Google Scholar 

  • Vitousek PM (1994) Beyond global warming: ecology and global change. Ecology 75(7):1861–1876

    Article  Google Scholar 

  • Weisberg S (2005) Applied linear regression, vol 528. Wiley, New York

    Book  Google Scholar 

  • Yan K, Zhang D (2015) Feature selection and analysis on correlated gas sensor data with recursive feature elimination. Sensors Actuators B Chem 212:353–363

    Article  Google Scholar 

  • Yilmaz O, Kara BY, Yetis U (2017) Hazardous waste management system design under population and environmental impact considerations. J Environ Manage 203:720–731

    Article  Google Scholar 

  • Zhang Q, Jiang X, Tong D, Davis SJ, Zhao H, Geng G et al (2017) Transboundary health impacts of transported global air pollution and international trade. Nature 543(7647):705–709

    Article  Google Scholar 

  • Zhu S, Lian X, Liu H, Hu J, Wang Y, Che J (2017) Daily air quality index forecasting with hybrid models: a case in China. Environ Pollut 231:1232–1244

    Article  Google Scholar 

Download references

Funding

No funding is received.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shwet Ketu.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Ethical approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

Written informed consent for publication was obtained from all participants.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ketu, S. Spatial Air Quality Index and Air Pollutant Concentration prediction using Linear Regression based Recursive Feature Elimination with Random Forest Regression (RFERF): a case study in India. Nat Hazards 114, 2109–2138 (2022). https://doi.org/10.1007/s11069-022-05463-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11069-022-05463-z

Keywords

Navigation