1 Introduction

The widespread of infectious coronavirus disease (COVID-19) due to Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) virus has affected more than 150 million people (positive case) and resulting over 35,00,000 deaths all over the world (as of last week of May, 2021). This pandemic has brought substantial changes in all aspects of our lifestyle. The healthcare sector of all countries were significantly affected and several strategies such as restricted mobility, isolating regions, lockdown measures have been adapted. All though these measures have reduced the spread of the disease, however, there is a great impact on socio-economic conditions of the countries, specifically developing countries where the gross national income, amount of technological infrastructure are comparatively low.

The recent advancements of COVID-19 pandemic has created an unprecedented challenge to the healthcare systems across the world. This is much more critical for the developing and under-developed countries, which often have very high population densities and limited healthcare infrastructure. Further, a large number of citizens are often needed to be quarantined or to be provided home-based treatment. Thus, there is a need of a low-cost and delay-aware effective solution [36, 41] that may help in preliminary diagnosis/monitoring while the patients are in quarantine facility or in home isolation.

The recent decade has observed significant growth in accumulation, analysis of spatio-temporal datasets for resolving several challenging and real-life problems, such as, weather forecasting, trip-planning, [38] time-critical applications and health management. [40] The spatio-temporal dataset consists of data-instances associated with location and changes in temporal intervals. Few examples of such spatio-temporal data-instances are: climate data (precipitation, temperature, cloud-cover etc.) at a particular location and in different time-scales or GPS-footprints of people in varied time-intervals. In this context, there is a new concept namely Internet of Spatial Things (IoST) where IoT is combined with spatial information [1, 37]. It is evident that to reduce the disease spread, an effective measure needs to be taken and location-information plays a crucial role in mapping hotspot areas, identifying suspected (infected) people and imposing zone-wise lockdown measures.

However, there are several challenges in fighting against the pandemic by data-driven technology. Firstly, heterogeneous information from different stakeholders need to be accumulated and analysed. For instance, number of active-cases, fatal and critical cases from health sector of government, population density at different regions from census board, health-related resources from regional officers, statistics of movement from transport department - all such information needs to be accumulated and stored in an efficient manner. The scarcity of resources, numerous misinformation and false reports are adding to the woes of the pandemic. Secondly, due to the gap of information, the distribution of the health-resources (vaccine, oxygen, medical supply) are disproportionate which is causing alarming situation in several places, specifically in developing countries like India. The overburdened healthcare system is unable to support such precarious condition. The densely populated regions, and unavailability of medical resources are already burgeoning issues in developing countries. In this regard, it is absolutely necessary to early identify/report of suspected people and isolation, distribution of resources (medical and necessary daily resources) effectively and finally circulating reliable information to reduce the unsolicited fear and panic among the citizen. In this work, we have mainly addressed three challenges (a) the non-uniform distribution of medical resources, health or transport facilities which becomes fatal for many cases and how to assist user in such scenario; (b) capturing and providing reliable information, and (c) identifying probable infected people from their mobility history, health status and other contexts. In this paper, we propose an end-to-end Spatial Data Infrastructure enabled Cloud–Fog–Edge Computing framework for COVID-19 pandemic management, named STROVEFootnote 1 to address the challenges and facilitate effective health-care and decision support system.

The major contributions of this paper are summarized as follows:

  • Spatial data infrastructure: We have proposed a unified model consisting spatial data infrastructure to manage, store, and analyse COVID-19 related information and assist users in emergency situation. We have defined several modules and functionalities of the architecture and clearly depicted the backbone of such infrastructure. To the best of our knowledge, no other existing works have studied and explored such architecture and developed it from the scratch.

  • Mobility and context analytics module: An efficient mobility and context (health parameter, activity, environmental data) analytics module has been proposed which can index, store huge volume of mobility traces and other contextual information, analyse those using advanced machine learning techniques and predict appropriate route in the time of exigency. It can also identify suspected people from their movement history and other contexts.

  • Cloud–Fog–Edge enabled computing architecture: A generic and hierarchical framework based on Cloud–Fog–Edge architecture is developed to support such data driven decision making in low delay. The framework collects data from wearable sensor device and alerts the user and health professionals on detection of any abnormality. The framework has three layers, where the fog layer reduces the delay and thus makes it suitable for real-time analysis, the edge layer captures the sensory information and the cloud server performs any time-intensive and compute-intensive tasks.

  • We have extensively evaluated our proposed framework using real-life data in terms of path-prediction accuracy, indexing information, accuracy to identify suspected infected users. Furthermore, we have presented theoretical analysis supported by simulation results to demonstrate the efficacy in terms of delay and energy consumption.

The rest of the paper is organized as follows: Sect. 2 presents the related works in the context of COVID-19. Our proposed framework is introduced and described in Sect. 3 in different sub-parts. Section 4 presents the evaluation of the proposed modules and the paper is concluded in Sect. 5 with future research directions.

Table 1 Comparisons of existing works and STROVE framework for COVID-19 pandemic management

2 Related works

Significant research works have been put forth to fight against the deadly virus. In this section, we have highlighted few of the research studies in the domain of Internet of Health Things, data mining and learning techniques in the context of COVID-19.

Yang et al. [2] emphasized on the physical therapy at home using Internet of Medical Things techniques. To track the health-status of the patients, the IoT technology is augmented with cloud paradigm in [3]. The framework achieved accuracy and sensitivity of 99.2% and 93.5% respectively. Another work [4] presents a novel technique to track social distancing among users and provide necessary alerts when the distancing norms are violated with an accuracy 92% and 98% with and without transfer learning module. It is already observed that people with co-morbidity or other health-issues are severely affected by COVID-19. In this context, another interesting study is presented in [5] where the proposed technique can help people with obesity to manage their health-status using machine-leaning enabled IoT technology. Advanced machine learning technique has been used in [6] to diagnosis COVID-19 from X-rays with an accuracy of 95.6%. A fog-assisted system is presented in [7] to reduce the delay in reporting health-status in exigency situation. There is also work in collaborative machine learning, such as Federated Learning to facilitate learning at the edge locally in [8]. Several interesting research works have been carried out in the context of COVID-19 and AI. Wang et al. [9] proposed a two-step approach to detect wearing masks leveraging hybrid transfer learning and broad learning. An adaptive attention network has been presented in [10] for detecting COVID-19 from chest X-Ray images. A fuzzy clustering method has been proposed involving multivariate time series of mobility indicators in the context of COVID-19 [11]. As social distancing is one of the effective measures to curb the pandemic, researchers [12] proposed an efficient UAV pedestrian detection method for social distance monitoring. Wang et al. [13] proposed a short-term prediction model of COVID-19 and analyzed the impact of multi-source urban data namely, temperature, air quality, humidity and inflow rate etc. A stochastic nonlinear model predictive controller is presented in [14] to support policymakers to mitigate the impact of the pandemic. Zhang et al. [15] proposed novel theoretical framework, named, Context-Transition Predictability which incorporated both sequential mobility pattern as well contextual information of human behavioural patterns. Ghosh et al. [16] presents a workflow management framework named, CLAWER for process management and IoT-based data analytics in health-care domain and achieves 0.83–0.93 accuracy. Another interesting study by Li et al. [17] empirically shows the need of COVID-19 information dashboard, and summarized visualization and interaction patterns commonly applied in dashboards. A dynamic SEIR model with information entropy has been presented in [18] in the context of COVID-19.

There are also several works in deploying deep learning methods to fight against COVID-19. Kraemer et al. [19] studies the disease spread distribution in China and analysed mobility traces at different locations. Identification of suspected crowds has been done in [20] based on the movement history of the users. Hamda et al. [21] presents a mathematical model of movement dynamics and disease outbreak in different cities of USA. Apart from these, there are also interesting works for forecasting COVID-19 cases using graph based analytics [22] and distributed deep learning using 5G network in the edge nodes [23]. From the computing infrastructure side, Tuli et al. [24] presents cloud-based framework to analyse growth of the pandemic and proposes a prediction model of the same. Mobile and fog computing technology have been utilized in [25, 39] to prevent COVID-19 community transmission effectively. Furthermore, a multi-layer architecture for collecting real-time information about COVID-19 has been developed in [26] using drones.

Table 1 shows the tabular representation of the features of existing studies and our proposed framework, STROVE. To the best of our knowledge, there is no such end-to-end framework which can support spatio-temporal data analytics and developed such spatial-data infrastructure. Further, IoT and advanced data analytics techniques are the two major scientific tools to fight against COVID-19. In this direction, STROVE presents a novel spatial-data and spatio-temporal operation supported framework in Cloud–Fog–Edge paradigm to assist users in this emergency situation.

Fig. 1
figure 1

Spatial data infrastructure: basic backbone

3 Proposed framework

The proposed STROVE framework is developed over Cloud–Edge–Fog-based backbone network. In our proposed Cloud–Fog–Edge model, at the bottom layer, there are several sensors which are present in the edge-nodes (say, mobile or ambulance). The sensors can capture health parameters (blood pressure, heart-rate, oxygen level, body temperature) and movement parameters (accelerometer and GPS) at a given time-interval. The users’ mobile devices accumulate those information and send those to the fog nodes. Here, we are suggesting the use of femtolet [27] as the fog node. Femtolet is small cell base station which has storage and processing ability [27]. The femtolet is connected with the cloud. The fog nodes perform the basic analysis on these sensory data. The cloud server performs mobility analysis for predicting appropriate path and nearby healthcare centers in case any abnormality is detected. Further, the cloud server stores the information about the COVID-19 cases at different locations, users’ health profiles and movement histories. The computational and storage power of cloud servers is utilized for aggregate operations, such as, community health analysis and trend analysis of a region. The aggregate analysis performed in cloud servers provides recommendations of lockdown measures by extracting the risk of the infectious disease spread in a spatial region. It is evident that the framework is conducive to provision \(24 \times 7\) home-health monitoring for ailing and elderly persons effectively.

Fig. 2
figure 2

Structural model of integrated geospatial information framework in the context of COVID-19 pandemic

3.1 Spatial data infrastructure

Figure 1 illustrates the basic backbone of the spatial data infrastructure and how it is developed. The core system architecture can support all spatio-temporal data and operations. It may be noted, that the spatial data types are different from traditional data types. There are three main types of spatial data: point-data (say, a specific location, usually presented by latitude and longitude), polyline-data (say, road or river data and represented by a series of latitude and longitude information) and polygon-data (building, and defined by bounding box). To store such data-types, the database or storage system needs to have geom property where the location-specific information are stored and managed. Furthermore, the system should be able to perform spatio-temporal operations, such as, creating buffer, finding intersection etc. As shown in Fig. 1, the other components of the system are developed considering all these features of spatio-temporal instances. STROVE is developed and implemented to support mobility data management, storage and indexing of such traces and analyse heterogeneous information. The major objective of such infrastructure are:

  • Search and discovery of spatio-temporal services and information efficiently and with minimal manual inference

  • Reducing data duplication among the national agencies (government) and eliminating false information by capturing and sharing reliable data

  • Heterogeneous data sharing in seamless manner

  • Data integrity and privacy conservation

In the era of sensor network development, VGI or Volunteered Geographic Information is termed as humans as sensor or citizen as sensor [28]. Here, the information about the surroundings are logged by the citizen itself. In the context of COVID-19, obtaining information is a big challenge since remote places cannot be reached due to restricted mobility. Therefore, VGI has a massive impact on public health monitoring and managing COVID-19 situation. STROVE has the capability to accumulate and rectify VGI data from several sources. In short, VGI utilizes smart-devices of citizen to assemble, modify, and share geographical information provided by the citizen voluntarily. Nevertheless, there are few challenges: (a) the accumulated data is heterogeneous in nature, also may have standardization issues, and (b) the reliability of data quality needs to be maintained. STROVE mainly focuses on (a) accumulating status of COVID-19 cases in the local neighbourhood of a user, (b) collecting any issues regarding medical resources like vaccine, oxygen or bed availability. In our app, we have provided a list of events (ev) such as new cases, fatal case if any, vaccination or medical resource related issue and any abnormal health issue. The volunteers provide the information about an event, i.e., the event-type and the severity (\(Se_{ev}\)) of the event. Moreover, there is also a provision to provide the spatial (location) and temporal (time-interval) information about the event. For maintaining the data quality, STROVE has deployed a hierarchical approach, where a reduced group of trusted individuals (who act as moderators) can rectify the collected information. Furthermore, when a large amount of data is collected from a region and such moderators are not present, in that case, we follow Crowdsourced approach, where the convergence on the reliability of the data is fully dependent on the crowd (or volunteers) by identifying and correcting errors collectively. Next, a spatial cluster with the collected dataset is formed, where each cluster consists of the information \(<log_i, Se_{log_i}, cardinality>\). Here, \(log_i\) is an event i with severity \(Se_{log_i}\). The aggregated number of the data entries is defined as cardinality. For the selection of correct and reliable information, STROVE also deploys a mechanism based on auction theory. Here, the items are information collected and participants or players (who collect the information) bid for their collected information. Each player has an independent private value for their information. STROVE compares the quality of information for different players. Then, STROVE generates a heatmap using this information from all of the places and find out the correlation with other contextual parameters.

Figure 2 illustrates the integrated geospatial information framework of STROVE. Here, the basic backbone is the spatial-data infrastructure deployed over Cloud–Fog–Edge system. As shown in the figure, there are three major domains: governance, technology and people. Amongst them, there are total nine modules and we have shown how those are associated in combating COVID-19. The governance and institutions model attain political endorsement, strengthen institutional mandates and build a cooperative data sharing environment. The policy module mandates responsibility for the production of data, and keeping abreast of issues and challenges. The financial plans required to establish and maintain an integrated geospatial information management, as well as the longer-term investment program is monitored by financial module. In the technology domain, the major emphasize is on data acquisition, data-driven and AI-enabled model development and standardization of metadata information effectively. In the bottom layer, community-engagement and involvement of domain expert have been shown. Here, the overall outcome is refined based on the domain expertise and feedback loop is generated to refine the model parameters.

The spatial data infrastructure of STROVE follows service-oriented architecture, where there are three entities: spatial service broker, spatial service provider and spatial service requestor. Here, the service provider is the owner of the service or hosts the service. All the service descriptions, metadata are stored in a registry, where the service providers can publish their services. Service requestor either can be application searching for specific information, service or the user itself. The services can have a broad range, such as, find nearby hospital having oxygen availability, find the next slot of vaccination at a location or get the route to reach a particular healthcare center in minimum delay etc. STROVE can provision a huge number of services through the backend SDI. The healthcare centers update their medical resource availability through the registry.

3.2 Mobility data analysis

In this module, we mainly explore how to find out appropriate route and nearby healthcare centre in exigency time and also how suspected people can be identified from their movement history. STROVE deploys a deep learning based model to carry out the mobility analysis task. The path prediction algorithm runs on the cloud server, and the predicted path is sent to the nearby fog nodes. The fog nodes cache the information and communicates it with the fog devices present at the predicted path. Therefore, the number of communication with the distant cloud server is reduced. We begin the discussion by defining the preliminary concepts as follows:

  1. 1.

    GPS Trajectory (G): The time-stamped location information \(loc:<latitude, longitude>\) or (lat, lon) is termed as GPS trajectory (or trace). We will use GPS trajectory, GPS log and GPS trace interchangeably and can be defined as:

    $$\begin{aligned} \begin{aligned}&((loc_1,t_1), (loc_2,t_2) \dots , (loc_n,t_n))=\{(lat_1,lon_1,t_1) \\&\quad \longrightarrow (lat_2,lon_2,t_2) \longrightarrow \dots \longrightarrow (lat_n,lon_n,t_n)\} \end{aligned}\nonumber \\ \end{aligned}$$
    (1)

    where i th point-location \(loc_i=(lat_i,lon_i) \subseteq \mathbb {R}^2\), time \(t_i \subseteq \mathbb {R}\) and \(t_1< t_2 \dots < t_n\). The trajectory is formed by connecting the location information in increasing time-ordering.

  2. 2.

    POI-taxonomy: The social-functional region of a city is denoted by Point-of-interest (POI) [29]. Typically, a specific-type of landuse (point feature), namely, residential building, commercial place, entertainment area, etc. are few examples of such POI-tags of the place. In this work, we classify the POIs of academic campuses and their neighboring regions in a tree-structured taxonomy (POI-taxonomy), where the specific POI-tag are at the leaf nodes and the internal nodes are formed by more generic POI-tags. Such examples of paths from root to leaf-node are {root, building, academic, department, Computer_Science, <lat, lon>} or {root, building, commercial, store, stationery, <lat, lon>}, where building, academic, commercial, stationery etc. are the internal nodes. Each of the internal nodes are represented by an unique identifier following the prefix property.

  3. 3.

    Region-of-interest (ROI): It is defined by the bounding-box of the complete study-region (e.g., academic campus). For the ease of storage, we have partitioned the ROI into uniform square grids, where the spatial resolution is \(100m \times 100m\). Each grid is represented by a grid-id, the lower left and upper right coordinates of the grid. The grid partitioning is carried out by integrating the vector tool processing in QGISFootnote 2 and python function, where the input is shape fileFootnote 3 of the ROI. It produces an array-list containing the grid-ids and two co-ordinates of the bounding box of each grid. We also compute the mean co-ordinates of each such grids. Since each of the POIs or stay-points are associated with latitude and longitude information, a trivial checking function can find out in which grid-id any particular POI or stay-point is residing.

  4. 4.

    Stay point (S(latlon)) of the GPS trajectory (\(G: (p_a, t_a) \dots \rightarrow (p_b, t_b) \dots \rightarrow (p_c, t_c)\)) is formed when the following facts are satisfied:

    $$\begin{aligned}&t_b - t_a \ge T_{thresh} \; and \; dist (p_a, p_b) \le Ds_{Thresh}\; \nonumber \\&\quad then, \; S.lat:= \sum _{k=a}^b \frac{p_k.lat}{|N|},\; S.lon:= \sum _{k=a}^b \frac{p_k.lon}{|N|} \nonumber \\&T:= t_b - t_a. \end{aligned}$$
    (2)

    where |N| is the number of GPS points between \(p_b\) and \(p_a\) in the log. \(Ds_{thresh}\) and \(T_{thresh}\) are two scale parameters. It depicts that if a set of points are within \(Ds_{thresh}\) distance and the time spent is within \(T_{thresh}\) time-interval, then those set of GPS points are characterized as stay-point and denoted by (S.latS.lonT).

It may be noted that an efficient indexing mechanism is required to manage the volume of data including road-network structure, POIs and movement information. STROVE builds an efficient indexing scheme and deploys it to manage and retrieve such spatio-temporal data. Here, we have divided the study region into hexagonal grids of uniform area. The geo-hash code of each of the center points of the grids are computed. The information about the geo-hash code is stored in the cloud server. Other information, like the location of health-care center, population density, percentage of vaccinated people etc. are stored in the fog nodes using hashing indexing scheme. To be specific, STROVE uses four layers of hashing, where each layer stores more granular spatial information. There is also a temporal bucket where the dynamic data-instances are stored in chronological order. The hash-functions used in STROVE are as follows:

$$\begin{aligned}&hashFunc(Loc_{id})= I, where \nonumber \\&\quad {[}I.t_1< Loc_{id}.t_{information} < I.t_2) \end{aligned}$$
(3)
$$\begin{aligned}&hashFunc(loc_x,loc_y)\nonumber \\&\quad = (loc_x+loc_y)(loc_x+loc_y+1)/2 + loc_y \end{aligned}$$
(4)

Here, \(Loc_{id}\) represents a polygonal geometry and I is the index of the location entity where it is stored. \(I.t_1\) and \(I.t_2\) represent the time-interval when \(Loc_{id}.t_{information}\) is valid. Here, \(Loc_{id}.t_{information}\) is the information (say, COVID-cases, medical resources, health-care center, mobility traces etc.) of that location. The real-time data of road-condition, traffic are accumulated in the fog nodes. Here, STROVE build a mobility-graph, where the mobility-events (in-flow and out-flow) are modelled as graph structure. We deploy a probabilistic graphical model where the model determines the path which will take minimum time to reach the destination.

Next, we attempt to find out the suspected people from their movement history. We aim to learn the normal sequences of movement in the daily life of a user and recognize the possible contact with an infected person. In this direction, we have used adaptive and stacked Long Short-term Memory (LSTM) [30] network, which is effective in the given scenario. In the preliminary step, the features are extracted from varied sensor readings using the temporal sliding window approach. The window size is incremented from 5seconds to 15seconds, with \(10\%\), \(20\%\), \(30\%\), \(40\%\), \(45\%\) and \(50\%\) overlap. After several iterations, sliding window with 8seconds and \(50\%\) overlap is selected. The framework then finds out best 15 features from the input data. The LSTM network maps an input sequence to an output sequence by computing the network activations in different time instances.

The location (\(loc_a\)) is denoted by longitude, latitude information, \(st_a\) and \(et_a\) are the start-time and end-time of the stay at \(S_a\), and \(dT_a\), \(tiT_a\) denote the distance travelled and time taken to reach \(S_a\) from the previous stay-point (\(S_{a-1}\)). The arraylist of labels (\(L_i\)) store information regarding movement of the user at \(i^{th}\) day in sequential order. We have adapted the basic encoder-decoder (many-to-many) model in our problem context. The problem is represented as multi-class single label classification, where given the trajectory trace, the model outputs the corresponding probability of becoming infected at each of the location.

Fig. 3
figure 3

Detailed representation of deep learning module

Figure 3 illustrates the architecture of the proposed deep learning module using two layers, namely, collective task layer and individual task layer. In the collective task layer, aggregated movement traces (mobility traces of all users) are feed as input and convolutional neural network is used to identify the flow of the movement traces. It considers the spatial distribution of footprint density and temporal dependency (stay-time and time of visit) to learn the classification task. We have fitted the basic architecture of [31] in STROVE along with some modifications to incorporate the mobility-facts. In the individual task layer, the input is the labelled trajectory traces (TRL) of an individual where the labels (probability of infection) are the output of the decoder. Initially, a linear scan through the trajectory trace extracts five parameters for all the stay-points of TR, which pass through embedding layer. The bidirectional LSTM takes the spatial and temporal features sequentially, and the representation of the hidden states are generated. The hidden state from the encoder is the input of the decoder. We have used a feature based attention in the auto-encoder module to select the most crucial feature in the hidden state representation. The proposed deep learning architecture is also driven by teacher-forcing principle, where ground-truth training sequences and output sequences generated from the model is used to guide the classification more accurately.

While analysing trajectory traces, information of stay-points along with their distances, time duration spent are important factors. The limitation of conventional LSTM is that it cannot use the information from future time-steps. Here, bidirectional LSTM is required as we need information from later time steps as well. The influences of the previous (\(S_{a-1}\)) and next stay-points (\(S_{a}\)) and the distance covered are more than other stay-points in finding the suspected person. Here, we append two gates, namely time (TiG) and distance gate (DG) to control the influence of these two consecutive stay-points.

The embedding module embeds the spatial and location features in to a single vector (\(x_t\)). Here, location (loc) is continuous spatial point, therefore, for discretization, we divide the study-ROI into fixed size grids. Next, we use the skip-gram model to learn the representation of locations as:

$$\begin{aligned} \begin{aligned}&\frac{1}{T} \sum _{t=1}^T \log [p(loc_{t-c}, \dots , loc_{t-1}, loc_{t+1}, \dots , loc_{t+c}|loc_t)] \\&\quad = \frac{1}{T} \sum _{t=1}^T \sum _{con} \log p(loc_{t+j}|loc_t) \; \; \\&\quad where \; con: -c\le j \le c, j \ne 0 \end{aligned} \end{aligned}$$
(5)

where \(loc_{t+j}\) is the neighboring location of the present location \(loc_t\).

Based on the spatial proximity, we use softmax function as defined:

$$\begin{aligned} p(loc_{t+j}|loc_t) = \frac{\exp (\varPhi _{loc_{t+j}}^N \varPhi _{loc_t})}{\sum _{loc_i \in loc} \exp (\varPhi _{loc_{i}}^N \varPhi _{loc_t})} \end{aligned}$$
(6)

\(\varPhi _{loc_t}\) is the vector representation of location loc and N is the length of the sequence. Next, we embed the temporal information. The representation of the temporal information should comprises of timestamp (st, et) and the time-duration spent in a location. Here, the skip-gram model is not efficient, and we use paragraph-vector model [32] and get the vector representation (\(\tau _t\) )of temporal information. The other two features distance travelled (dT) and time taken to cover the distance (tiT) are encoded into vector \(\zeta _t\). Next, these vector representations are concatenated to obtain the ensemble vector \(x_t\).

$$\begin{aligned} x_t = \tanh ([W_\varPhi \varPhi _t + b_\varPhi ] \oplus [W_\tau \tau _t + b_\tau ] \oplus [W_\zeta \zeta _t + b_\zeta ])\nonumber \\ \end{aligned}$$
(7)

Here, \(\oplus \) denotes the concatenate operation. After embedding the spatial and temporal features, we deploy a bidirectional LSTM layer, where each LSTM layer has time (TiG) and distance gate (TD). The update equations of these gates are as follows:

$$\begin{aligned} \begin{aligned} TiG_t&=sigmoid(x_t \mathbf{W} _{xt} + sigmoid(\varDelta t \mathbf{W} _{tiG})+b_G); \\&\qquad s.t. \mathbf{W} _{tiG} \le 0 \\ TD_t&=sigmoid(x_t \mathbf{W} _{xd} + sigmoid(\varDelta d \mathbf{W} _{tD})+b_D); \\&\qquad s.t. \mathbf{W} _{tD} \le 0 \end{aligned} \end{aligned}$$
(8)

Here, the time and distance intervals are \(\varDelta t\) and \(\varDelta d\). The conditions \(\mathbf{W} _{tiG} \le 0\) and \(\mathbf{W} _{tD} \le 0\) state that when time interval and distance interval are small, the influence is greater. Since, STROVE uses a bidirectional LSTM, the output of the layer is modified as:

$$\begin{aligned} h_t= \overrightarrow{h_t} + \overleftarrow{h_t} \end{aligned}$$
(9)

Here, the output from the forward and backward propagation layer are represented by \(\overrightarrow{h_t}\) and \(\overleftarrow{h_t}\) respectively. Next, the attention layer is used to measure the importance of several features vectors. Here, we have used the dot product attention function \(f_{att}\) and the representation is defined as:

$$\begin{aligned} r_{att} = \sum _{t=1}^T \frac{\exp (f_{att}(h_{t},x_{t}))}{\sum _{i=1}^T \exp (f_{att}(h_{t}, x_{t}))} h_{t} \end{aligned}$$
(10)

The decoder input layer is replaced by the weighted representation (\(r_{att}\)). Finally, softmax layer is used to get the output labels. The network is trained to minimize the cross-entropy loss of the ground truth label and predicted label. Thus, the architecture encodes and learns different mobility semantics at varied contexts, and maps the trajectory segments of users. The major challenge is to map different trajectory history of users such that suspected infected crowd can be identified. In our module the LSTM architecture can address these issues and capable to learn the long-range dependencies in sequential patterns. Therefore, the output of the deep learning module of STROVE can find out suspected person from their historical movement traces. Algorithm 1 presents the basic steps of the mobility analytics module of STROVE.

figure a

3.3 Health data analysis

In this section, we discuss about the health parameter analysis module (HELP: HEaLth Parameter) of STROVE. Here, the inputs of the module are health parameter (HP), activity (AP) and environmental context (EC) at different timestamps. The objective of the module is to identify whether the person is at risk of infection or any other abnormal health conditions.

The method to identify a user’s health status is not straight forward, as it depends on several factors. Firstly, the health parameter values depend on the age, gender and pre-existing disease of the user. Also, the environmental contexts play important role to understand whether the user is actually at risk or not. In this regard, we model the problem of identifying health-problems of users based on the accumulated data using Hidden Markov Model. In brief, Hidden Markov Model is a statistical Markov model in which it is assumed that the system is modelled using Markov process. Markov process assumes that future predictions are dependent on most recent observations. Based on the changes of health parameters along with other variables over time, HMM can efficiently detect whether a person is having any abnormal health condition. The proposed framework uses the hidden Markov model to predict an individual’s health status due to its ability to consider various influencing factors as unobserved parameters.

We aim to model user’s activity, environment variables and collected health parameter values to predict the health status of the user. Typically HMM consists of two kinds of stochastic variables, state variables (hidden) and observable variables. It represents the architecture of the HMM-based health status prediction module where each nodes represent a random variable at a given time (t) [33]. In the left side, two separate layers are present. \(t_i\) represents the observed variable (timestamp and movement sensor), and the hidden state is activity sequences of the user. Specifically, this layer constructs the basic activity sequences (standing, walking, sitting etc.) from movement sensory information. The left bottom layer extracts the context (hidden state) from observed environment variables (air temperature, humidity, low light/sound intensity etc.). This layer extracts the hidden context of the activity sequences. Next, the basic activity sequences are refined based on the extracted contexts. For instance, heart rate may increase while exercising, or heart rate may decrease while sleeping etc. The body temperature of a user may increase/ decrease while having bath. We also append user’s pre-existing disease/medical history in the second layer of the model. This is beneficial for identifying the health-status of users more efficiently. Hidden States: These are defined by the health-status of the user. For example, (high blood pressure, \(t_1 - t_6\)), (low oxygen level, \(t_{10} - t_{20}\)) are two hidden states of a user’s medical record.

Observable States: The sensory information, such as, accelerometer sensors, activities performed by the user and health parameter values are considered as observed variables. All of these variables can be easily accumulated from our low-cost customized wearable device.

The proposed model or HELP (health parameter analysis module) is formally defined as \(\varTheta = \{<H,\kappa>, <O,\kappa >, \chi , \kappa \}\), where \(<H,\kappa>\) represents the set of hidden variables of HELP. The layers are represented by \(\kappa \). \(<O,\kappa>\) denotes the observed variables obtained from various sensory information. The state transition probabilities and observation probabilities at different layers of HELP are denoted by \(\chi \).

Next, we extract several inferences from the HELP about the users’ health status. For evaluation or computation of the likelihood of an observed sequence (\(P(O|\varTheta )\)), HELP utilizes forward-algorithm along with a k-order Markovian assumption. Inspired from the work of Ghosh et al. [34], we use a variable \(a^k\) to extract \(k-\) length sequence of observed variables. It helps to model HELP from historical sequence of \(k-\) length, which is beneficial for monitoring the health status of a user during a medical test (say, 6MWT). Thus, the observation probability can be represented from forward algorithm as:

$$\begin{aligned} {P(a^k|\varTheta )=\sum _{t=1}^{length_{max}} P(a^k|h_t^k) * P(h_t^k) } \end{aligned}$$
(11)

where, the maximum length of hidden states are \(length_{max}\), \(h_t^k\) denotes sequences of hidden states within k-length. It may be noted, that we have used the idea of k-order Markov chain to predict the output depending on k recent sequences.

Fig. 4
figure 4

Smart ambulance equipped with BAN and femtolet

After aggregation, we get:

$$\begin{aligned} {\begin{aligned} P(a^k|\varTheta )=&\sum _{t=1}^{length_{max}}\left[ \prod _{i=1}^{k} P(a(i)|h_t(i)) \right. \\&\left. * P(h_t(i)|h_t(i-1),h_t(i-2),\ldots ,1)\right] \end{aligned} } \end{aligned}$$
(12)

The modules of STROVE coupled with data collected by all other sensors helps in predicting if the patient can be classified as a potential or expected positive case of COVID-19 or pandemic of similar nature. For example, a patient with oxygen levels less than 90%, high body temperature and abnormal pulse readings will be classified as positive case of COVID-19 by the used disease prediction algorithm. Furthermore, the mobility analysis module also detects whether there is some possible contact with infected person or not. It may be noted that the disease prediction algorithm employed is for research demonstration purpose only. The positive case results are notified back to respective users and healthcare authorities by the fog node so that confirmatory tests can be initiated at the health care centre for such individuals and adequate preventive measures are taken at the earliest. If the confirmatory test results are negative, the same can be informed to the prediction algorithm so that it can learn better and improve its prediction accuracy.

3.4 Faster health status prediction

During pandemic there is a scarcity of beds in the health care centres. Hence, the nearby hospital may not have available beds. In that case, the patient may have to travel to a health centre that is far away. In such a situation, the health status prediction and consequently health service provisioning during the travel time is required. Not only that the up-to-date information regarding the availability of beds in the hospitals is also important, so that the patient has not move from one hospital to another to get admission or has to wait for long time for admission after reaching a hospital, which has become a common picture in the developing countries. In this paper, we are suggesting the use of femtolet [27] inside the ambulance. The femtolet works as a fog node and is connected with the cloud. In the smart ambulance sensors nodes will be attached with patient body to collect the health parameter values and send to a microcontroller, that accumulates the collected health parameter values from all sensors and sends to the femtolet (see Fig. 4). Inside the femtolet the collected health data will be processed based on the user’s health profile and contextual information. Based on the predicted health status the femtolet will display the current condition and can communicate with the microcontroller if required, so that the required health care can be provided to the patient according the current SPO2 level, blood pressure, body temperature etc. The user’s GPS data will be transmitted to the cloud through the femtolet. The femtolet will collect the up-to-date information from the cloud regarding the availability of beds inside the currently nearby hospitals. Based on the proposed mobility prediction module the optimal path to the hospital that is able to provide the required health care facilities as well as has bed availability is predicted.

During travel the connection interruption is a major issue. The use of femtolet to process the health data provides faster health status prediction with respect to the use of long distant cloud servers for processing the data. This in turn reduces the overhead of the cloud servers also as well as reduces the network traffic. Moreover, in such disease continuous health monitoring is also very important, the use of femtolet facilitates the procedure as it is attached with the ambulance.

To calculate the delay in health status prediction inside the smart ambulance, we have considered the data collection delay, data transmission delay, data processing delay and propagation delay. Let the delay in health data collection for all parameters and accumulation by the microcontroller is \(L_c\). The data transmission delay from the microcontroller to the femtolet is given as,

$$\begin{aligned} L_u=(1+f_u)\cdot (D_h/R_u) \end{aligned}$$
(13)

where \(D_h\) is the collected health data, \(R_u\) is the rate of data transmission from the microcontroller to the femtolet, and \(f_u\) is the link failure rate from the microcontroller to the femtolet.

The delay in health data processing inside the femtolet is given as,

$$\begin{aligned} L_p=D_{p}/R_p \end{aligned}$$
(14)

where \(D_p\) is the amount of data processed inside the femtolet to predict the health status and \(R_p\) is the data processing speed of the femtolet.

The propagation delay is given as,

$$\begin{aligned} L_{pr}=d/S_p \end{aligned}$$
(15)

where d is the distance between the microcontroller to the femtolet and \(S_p\) is the propagation speed.

The femtolet sends the predicted health status data to the microcontroller. The delay in transmission of the result is given as,

$$\begin{aligned} L_d=(1+f_d)\cdot (D_r/R_d) \end{aligned}$$
(16)

where \(D_r\) is the amount of result data, \(R_d\) is the rate of data transmission from the femtolet to the microcontroller, and \(f_d\) is the link failure rate from the femtolet to the microcontroller.

Hence, the total delay in health status prediction and display is given as,

$$\begin{aligned} L_t=L_c+L_u+L_p+L_{pr}+L_d+L_s \end{aligned}$$
(17)

where \(L_s\) is the delay to display the result.

In Fig. 5 the delay in health status prediction and display for the proposed system and the cloud only system are presented. This is observed that the proposed femtolet based system reduces the delay in health status prediction by \(\sim 50\)% with respect to the cloud only system.

Fig. 5
figure 5

Delay in individual’s health status prediction inside ambulance

Fig. 6
figure 6

Android app for preliminary health status monitoring

4 Experimental evaluations

In this section, we present the performance evaluations of STROVE framework to demonstrate the efficacy of the system. Specifically, we evaluate the framework with few real data-instances collected during the study. Further, the scalability of STROVE is evaluated using a simulation study in iFogSim toolkit.

Fig. 7
figure 7

Comparison of execution time of mobility analytics module

Fig. 8
figure 8

Comparison of precision (path prediction) of mobility analytics module

Table 2 Comparison of accuracy measure for identifying suspected person and health condition

4.1 Experimental results

During this study, we collect user’s activity, health and other contextual datasets from the Kharagpur (22.31454, 87.306) and Kolkata (22.5379, 88.3682) region of India. The collected dataset includes all parameters captured by the wearable sensor [35] at different time-intervals for a duration of 14 days. The dataset is captured at varied environment conditions as well to validate our proposed methodology. The dataset contains health data (body temperature, blood pressure, pulse rate, SPO2), activity data (walking, running, climbing upstairs, downstairs, walking etc.), environmental parameters (air temperature, humidity, air pressure etc.), and movement traces (latitude, longitude). The time-frequency of the data-samples are 60secs. The total number of participants in the study are 128. The total size of the dataset is \(\sim 2.1 \; GB\).Footnote 4

Table 3 Performance metrics of the proposed mobility analytics module for path prediction with baseline methods
Fig. 9
figure 9

Knowledge representation to extract useful information

To evaluate the efficiency of the proposed framework, we set-up different conditions and measure the accuracy of identifying activity and actual health condition. We compare STROVE with five baselines and report the accuracy for all of these conditions. Table 2 presents the accuracy measure of proposed framework along with the baseline methods namely, Bayesian Model, KNN, Decision Tree (DT), SVM and NN. The parameter for KNN is selected as 3. We have chosen radial basis function (RF) as the activation function in NN. A linear kernel is selected for SVM. The results for different runs are captured and average accuracy measure is reported. In the experiment, we evaluate the accuracy of identifying suspected people as well, we also capture the accuracy of the methods in different contexts, such as, when user is infected, or, when the user has been in a close contact with an infected person. It is observed that STROVE performs exceptionally well in identifying health conditions of users compared to the baseline methods. It has outperformed other methods in a significant margin of \(\approx \) 24.8%. The key reason of this observation is that the refinement layer of the proposed HELP helps in removing the false-positives and identifies the health status of the users efficiently.

Fig. 10
figure 10

Simulated scenario of the proposed healthcare framework in iFogSim

Fig. 11
figure 11

End-to-end delay in healthcare framework

Fig. 12
figure 12

Energy consumption of the system

An Android application (app) called, STROVE has been developed as well which is utilized for personal health assistance. The app collects health parameters, profile of the user and other environmental context at different time-scales. This accumulated data is sent to fog nodes for processing. Next, STROVE analyses the information, and finds out whether the health-status is normal or not. For instance, if it is detected that body temperature is high, pulse rate is abnormal, SPO2 level is low, then the user is suggested to be at home immediately. STROVE also contacts with the nearby health-care center through the back-end SDI and medical help is provided to the user (refer Fig. 6). It is evident that user can get faster medical help compared to the traditional system, where the user has to search for facility and contact the medical practitioner for further medical tests. STROVE provides an easy and efficient system for preliminary health-checking, getting the information, and having medical attention immediately. While STROVE analyses the mobility and contact pattern of individuals and predicts probable suspected person with COVID-19 infection. Also, the android app is utilized to send alert notification to the users when the health parameters are unusual and recommendation is provided to stay at home and take COVID-19 test. User can provide the information in the app directly when they are tested positive or negative. STROVE utilizes VGI data as well and stores and incorporates it in the architecture. Therefore, information from individual and crowd can be utilized here to identify such corner cases (false positives and true negatives) which are not identified by the data analytics module.

Next, we evaluate the mobility analytics module for path prediction. The execution-time of the mobility analysis module is represented in Fig. 7. It is observed that STROVE significantly outperforms others. The key reason is that the information is modelled and stored using efficient indexing and graph-based storage framework. STROVE is compared in terms of precision for predicting path and nearby health-care centers. Figure 8 represents the precision metric of the path finding module. It is observed that our method achieves 0.95–0.88 precision value with 10 and 50 stay-points in the road-graph, respectively. The results are shown in stacked bar chart to represent the difference of the results at varied configurations more clearly. The results demonstrate high precision as well as less execution time compared to other existing approaches. Table 3 presents the experimental results of mobility analytics module of STROVE compared with baseline methods. The accuracy denotes the effectiveness of extraction of the most optimal path in the region. Stability represents the robustness of the system. The learning cost of the model estimates the time to learn the parameters of the models using the training dataset. Here, input cardinality of the model is the area of the study region. The learning cost and modelling cost are categorized into three categories concerning model training time and complexity of the model implementation. In several aspects, STROVE has outperformed other approaches to a significant margin. It demonstrates the effectiveness of the modelling, analyse techniques of STROVE.

We have also shown a sample of knowledge graph structure deployed at the backend of STROVE. Figure 9 illustrates a snapshot of such graph. It is shown that there are different entities such as, location (marked in blue color), healthcare center (marked in green color), users/ person (marked in pink color) and resources (marked in gray color). The entities have specific relationships amongst them as shown in the connected edges among the nodes. The mobility history, available medical resources etc. are managed and represented in the spatial data infrastructure of STROVE.

4.2 Simulation results

The proposed framework is simulated in iFogSim. In Fig. 10 the simulated scenario is presented. As observed from the figure five types of sensors are used for collecting body temperature, systolic and diastolic blood pressure, SPO2, ECG, and an actuator is also used. The sensor nodes and the actuator are connected with a microcontroller. The microcontroller is connected with the femtolet. The femtolet is connected with the cloud. The end-to-end delay (data collection, processing and predicting the result) and energy consumption for the proposed femtolet based framework is observed with respect to different uplink and downlink data transmission rate. In Fig. 11 the end-to-end delay while using the proposed framework and cloud only system are presented. This is observed that the proposed framework reduces the delay by \(\sim \)55% than the cloud only system. In Fig. 12 the energy consumption while using the proposed framework and cloud only system are presented. This is observed that the proposed framework reduces the energy consumption by \(\sim 90\%\) than the cloud only system. As in the proposed framework the data processing for health status prediction takes place inside the nearby femtolet instead of the remote cloud, the proposed approach outperforms the cloud only system with respect to the delay and energy consumption. Thus, we can refer the proposed framework as a delay-aware and energy-aware health status prediction framework.

5 Conclusion

This paper proposes a novel framework which facilitates an efficient COVID-19 management system. The hierarchical framework having Cloud–Fog–Edge layers reduces the network usage and cost of execution as well as reduces the delay. The scope of the proposed solution is wide; it assists in early identification of individual COVID-19 suspects that require intervention to control the spread. The hierarchical architecture can be used to constantly monitor home quarantined patients and timely inform the requirement of intensive hospitalised care. This will significantly reduce the pressure on health resources in the time of pandemic which is a major challenge in developing countries. The proposed framework can be modified and used with other models in the future to develop low-cost solution for clinical diagnosis of other pandemics and ailments also. The framework will continue to be relevant even when there is no existing threat of pandemic as it can be used to perform remote and continuous monitoring of senior citizens’ health. As the suggested framework also employs GPS sensor and mobility analysis, it will be utilised for identifying disease patterns and their endemic nature. Once such patterns are identified, medical supply chains can be automated much more efficiently.