Air passenger forecasting using Neural Granger causal Google trend queries

https://doi.org/10.1016/j.jairtraman.2021.102083Get rights and content

Highlights

  • A novel method based on Google Trend Queries is proposed to identify internet search queries that can forecast air passengers.

  • 171 Neural Granger Causal Google Trends Search queries are identified out of an initial 1317 queries using a word2vec model.

  • Neural Granger Queries inputs to a forecasting model produced a higher forecasting performance.

Abstract

Air passenger forecasting provides important insights for both Governments and Aerospace industries to plan their for their future activities. Google Trends can provide a large database of historical search query frequency which can be used as explanatory variables for air passenger forecasting. This paper explores the use of a Neural Granger Causality model to select the best search query that can forecast arrival air passengers in Singapore Changi Airport. Neural Granger Causality models are an extension of the original Granger Causality model that uses neural networks instead of Linear Vector Auto-Regressive (VAR) models to capture non-linear relations between the targets and the tested explanatory variables. In this paper, 1317 Google Trends search queries are tested for Neural Granger Causality of which 171 queries are deemed as Neural Granger Causal for forecasting Singapore Changi Airport monthly arrival passengers. The model that used all 171 Neural Granger Queries achieved the highest R2 value (R2=0.919) with the lowest Standard Deviation (SD=0.363) compared to the other models which was not filtered for Neural Granger Causality. The 171 queries found are search terms that reflects a unidirectional neural granger causal relationship with the number of arrival air passengers at Changi Airport.

Introduction

The air transportation industry has witnessed tremendous growth over the past decades. Significant infrastructural developments have hence followed to support this growth and accommodate the ever increasing passenger numbers. The International Air Transport Association (IATA) forecast predicts 8.2 billion passengers worldwide by the year 2037, at an annual compounded growth rate of 3.5% IATA (2019). This long term passenger growth forecast by IATA stayed the same after a revision due to the Covid-19 pandemic IATA (2020). Consequently, it has become imperative to develop tools and methodologies to efficiently generate short and long term passenger traffic forecasts at airports. Such forecasts aim to assist airside and landside operational planning, short term maintenance plans and flight schedules. Over the long term, these forecasts also assist in the future planning of Airport infrastructure, assist airline companies in equipment purchase and route structure and assist aircraft manufacturers to design future aircraft that are optimally profitable to their customers (ICAO (2006) and Kim et al. (2016)).

Traditionally, econometric predictors (GDP, Oil Price etc) are used for air passenger forecasts but calculation of the econometric predictors is generally tedious. Alternatively, data from internet search queries provides a source of predictors that are up to date, requires little post-processing and reflects the general population psychology of users (Bragazzi (2014)). However, the large number of search terms would mean that the best search terms will need to be identified if search query data were to be used for air passenger forecasting. If the best search queries are identified, they could also be used for further causality analysis which may be useful for policy makers in the aviation industry.

The purpose of this research is to: 1) Find the relevant search queries that are the most useful predictors for air passenger forecasting at Singapore Changi Airport using Neural Granger Causality (NGC) analysis which is adapted from Tank et al. (2018). 2) Investigate the efficacy of using neural granger causal search query trends as predictors to forecast the number of air passengers.

This paper will be structured as follows: Section 2 will highlight several related works and its relation to the research in this paper. Section 3 will describe the methodologies used to collect data and the describe the details of Neural Granger Causality. The results will then be presented in Section 6 a feasibility study of NGC analysis will be demonstrated on the synthetic Lorenz-96 dataset and the Changi/Google Trends dataset. Section 7 will discuss on the applicability of NGC analysis in the post-covid era and finally, this paper will conclude in Section 8.

Section snippets

Literature review

Econometric variables has been used as predictors for air passenger forecast before. Fernandes and Pacheco (2010) used a linear granger causality test to determine causality between GDP and domestic air passenger traffic in Brazil and concluded that there is an uni-directional causal relationship between econometric growth to domestic air transport demand. Hakim and Merkert (2016) also examined the causal relationship between air transport and econometric growth in the south-east asian region

Finding initial queries

One of the objectives of this research is to find the relevant search query predictors to forecast the number of Changi Airport air passengers. Even if the search queries are constrained to a single language (In this case search terms in English), there will be a large amount of possible search queries and enumerating through all search queries will not be an efficacious solution to find the inital search queries for Neural Granger Causality analysis. A smarter solution would be to use a

Analysis of target dataset - main characteristics of air passenger numbers

The target outputs (Number of Air Passengers) of the forecasting model in this study are obtained from Govtech (2020). Each numerical entry in the dataset represents the total number of air passenger arrivals in 1 month and this data set is in the range from January 1961 to Jan 2020. A seasonal and trend decomposition of this data are illustrated in Fig. 3. The seasonal decomposition reveals that the air passenger data shows that there is a repeating cycle every 4 Months. The residuals plot

Lorenz-96

The Lorenz-96 model is commonly used for data assimilation Lorenz (1996) and in this case can be used to generate time series data. Since the target variables can be artificially constructed, we can test the efficacy of Neural Granger Causal analysis on this synthetic dataset (by integrating Equation (7)) to find out if it can identify the original predictors. An experiment was set up to with 10 variables (x1,x2,x3.x10) and the forcing value F was set to 5. η was set to 0.8 and λ was set to

Proof of concept on Lorenz-96 synthetic dataset

The plots in Fig. 5 and Fig. 6 highly suggests that the neural granger model can correctly identify potentially causal time series as the weight values of the input variables (x4,x5,x6) has a significantly higher value than the other variables. Weights sparsity can also be observed when comparing the weights of the input time lags (Equation (8)) shown in Fig. 7 suggesting that the neural granger causality analysis can identify the best features and its corresponding time lag.

Both L1 and SGL

Discussion - applicability in air traffic management

The results shown by comparing Table 4, Table 5 suggests that the search queries which are filtered by Neural Granger Causality analysis are specifically important search terms that may have a causal relationship with the number of air passengers in Changi Airport. One interesting characteristic of the neural granger queries (Table 3) is that other south-east asian vacation/airports shows up as Neural Granger Causal (“Redang Island”, “Villas Hua Hin”, “Kuta Bali”, “Soekarno Hatta Airport”,

Conclusion

The purpose of this paper is to investigate and evaluate the effects of using neural granger causal Google Trends queries to forecast arrival passengers in Changi International Airport, Singapore. Neural Granger Causality models employ the use of component-wise neural networks with sparse inducing regularization parameters to identify the best predictors for forecasting a time-series.

The experimental results strongly suggests that using the Neural Granger Causality model can identify Google

References (23)

  • Google

    Google Code Archive - Long-term Storage for Google Code Project Hosting

    (2013)
  • Cited by (16)

    • Airport terminal passenger forecast under the impact of COVID-19 outbreaks: A case study from China

      2023, Journal of Building Engineering
      Citation Excerpt :

      Additional socioeconomic factors, such as national and regional gross domestic product (GDP), income levels, population, employment rates, and special events, were used to fine-tune the forecast [21,25]. Some studies have also attempted to predict travel demands with unconventional factors such as search engine queries [26,27] and social media data [28]. However, the outbreak of the COVID-19 pandemic has substantially increased the uncertainty of air travel demand, an effect that is difficult to address with the traditional predictors described above.

    View all citing articles on Scopus
    View full text