1 Introduction

In recent days, dynamic social networks have been leveraged in various fields to analyze time-varying events involving social connectivity including computational epidemiology to study the propagation of diseases and other public health issues [1, 2]. The area of public health, in particular, has been one of increased scrutiny with the advances in information technology playing an increased role in health care and medicine. Obesity has been identified, alongside cancer, to be one of the biggest epidemics of the \(21^{st}\) century [3, 4]. World Health Organization (WHO) defines obesity and overweight as the medical conditions due to abnormal or excessive fat accumulation that may impair health. Body mass index (BMI) is a weight-for-height index commonly used to classify overweight as persons having BMI over 25 and obesity as persons having BMI over 30 in adults [5]. The major health concern associated with people being overweight or obese is that it increases the risks of developing other more chronic ailments [6, 7]. No longer exclusive to urban areas in high income countries, obesity has reached a pandemic stage with populations from middle income countries showing increased cases of obesity [7]. Classified as a disease by the American Medical Association in 2013, obesity has been identified even amongst an increasing populations of children [8]. With over \(10\%\) of the world’s population being obese or overweight, the spread of obesity is a major cause for concern in the modern world [9].

In this context, as overweight and obesity propagate over human interaction network, predicting the individuals who are going to be overweight or obese in future, is an important problem. However, the problem has received limited attention from the network analysis and machine learning perspective till date. A few recent works applied regression based techniques to such prediction problems [10]. To the best of authors knowledge, in this context the performance of alternative scalable supervised learning models such as Convolutional Neural Networks (CNN) have been explored to a limited extent.

In this work, we particularly focus on the problem of prediction of overweight status amongst the individuals connected through dynamically changing temporal interaction network. We propose a scalable convolutional regression framework for the prediction task. Further, referring to [11], the proposed model considers the major contributing factors to the propagation of weight gain, in the current or past neighborhoods of nodes in the network. We propose various schemes for encoding self-influence and social influence factors - both present and historical in propagation of overweight. The proposed CNN based model followed by multiple hidden layers of non-linear transformations, results in a rich and compact feature representation of self and social influences of the population at previous time periods. Finally, the regression layer predicts the weight gain state of the population in the future. We extend on previous work in this area using convolutional regression frameworks by exploring combinations of influence schemes used previously, as well as varying the depth and width of architectures of the predictive model [12]. A thorough experiment on the synthetic as well as real networks demonstrates the superiority of the proposed method over the existing baseline approaches.

The paper is organized as follows. Section 2 formally defines the problem. The proposed method is illustrated in Sect. 3 and the implementation details are described in Sect. 4. Sections 56 and 7 describes the data sets used in the experiment, the experimental setup and the results and observations, respectively. Section 8 involves a brief discussion of related works. Section 9 concludes with a general summary of the paper and possible future work.

2 Problem Definition

In real life, social networks evolve over time – that is, the actors may form new ties or even lose old ties with the passage of time. Additionally, the actors (or node) exhibit change in behavior (or feature) over time. Let, \(G_t = (V_t, E_t)\) be the undirected graph representing social ties \(E_t\) amongst the set of n actors, \(V_t\) at time instance t. Let, \(F_t=\{F_t^1,F_t^2,...,F_t^n\}\) represent the feature sets of all actors \(V_t\) at time t. Here \(F_t^i\) is the set of features of node i, where \(F_t^i=\{f_t^1,f_t^2,...,f_t^k\}\) and \(f^i_t\in \mathbb {R}\) is a particular feature of node i at time t.

The general problem of behavior prediction can be defined as, given a temporal network \(G_{1..T}\) observed during time instances \(t=\{1, \ldots , T\}\) and the set of node features \(F_{1..T}\) observed during \(t=\{1, \ldots , T,\}\) predict one or multiple features of nodes at time \(T+1,\) represented by \(F_{T+1}.\)

In the current scope of work, we particularly focus on the prediction of the propagation of weight gain over a social contact network defined as, given \(<G_{1, \ldots , t}, F_{1, \ldots , t}>\) and a node i in the network, predict whether i will have an increase in his weight at time \(T+1\).

The varying degrees of influence by the neighbors of a node in deciding its future behavior, the temporal change in contact network and the sparsity of connections make the weight gain prediction on a temporal network a particularly challenging problem.

Fig. 1.
figure 1

(a) Flowchart of weight gain prediction via convolutional regression network. (b) Illustration of the convolution and pooling layers of the convolutional regression model

3 The Proposed Method

We propose a convolutional regression framework to model the behavior prediction problem. The basic steps of the method are described using the flowchart in Fig. 1a. The flowchart begins with a basic preprocessing step where non-repeating nodes are removed, height and weight are converted to BMI etc. This is followed by feature extraction for nodes and their neighbor connectivity data at multiple time instances from Health and Bluetooth Proximity Reports. These features are used to generate neighborhood influence models. Multiple influence schemes have been explored in this work for modeling weight gain. The neighborhood influence features obtained are aggregated with user features to form the dataset for a convolutional regression framework which is used to predict the propagation of weight gain.

3.1 Modeling Influence

In a social network, the behavior of a node can be influenced by multiple factors. Along with the influence of their own behavior at past instances of time, there is a tendency for nodes to exhibit behavior similar to what is demonstrated by their neighbors, which is the well-known principle of homophily [13]. Thus, we consider the self-influence, and influence by neighbors or social influence separately in our model. Both self influence and social influence are derived from the current interaction record and current feature values along with the past instances of interaction and feature values. Given \(<G_{1, \ldots , t}, F_{1, \ldots , t}>\), without loss of generality we assume that \(F_{t+1} = f(G_{t-1}, F_{t-1}, G_t, F_t)\). That is, \(F_{t+1}\) is determined by the current state of the system and the immediate past state of the system only, where a state comprises of both interaction record and feature values, and \(F_{t+1}\) is independent of all its previous historical states.

Further, the proposed model considers four major factors that contribute to the propagation of weight gain (obesity), namely, unhealthy diets, recent weight gains, obesity and inactivity, in the current or past neighborhoods of nodes in the network. The various schemes used for determining the social influence are discussed below.

Weighted Interaction Scheme (WIS). Considering the binary connections is not a sufficiently good indicator of the degree neighbors’ influence. For example, let node j and k both influences node i to gain weight at time t. While j only exhibits the factor recent weight gain, k exhibits both the factors obesity and inactivity and unhealthy diet. In such scenario, degree of influence of node k is assumed to be more than that of node j. In other words, two neighbors of node i, node j and node k, will have differing influence on node i if they have different number of weight gain causative factors associated.

To address the issue, we propose to use weighted connections where weights are derived based on the number of weight gain causative factors a neighbor of a node exhibits. This enables us to retain the connectivity information of a node, while incorporating the varying degree of influence of each neighbor.

Formally, a connection weight \(W_i^t(j)\) associated with edge \(i-j\) at time stamp t is derived as -

$$\begin{aligned} W_i^t(j) = \sum _{p \in \varPi } \mathbf {I}_p^t(j) \end{aligned}$$
(1)

where, the set of causative factors for weight gain, \(\varPi = \{\) overweight, recent weight gain, bad diet, inactivity \(\}\) and the indicator function, \({I}_p^t(j) = 1\) if node j demonstrate a trait \(p \in \varPi \) at time t and 0 otherwise. For a node pair (ij) where no connection exists between i and j, \(W_i^t(j) = 0\). The input features \(X_i^t\) for node i, under this scheme are generated by concatenating \(F_i^{t-1}, W_i^{t-1}, F_i^t, W_i^t\).

Average Neighbor Feature Scheme - I (ANFS-I). Another simplistic way of representing social influence of neighbors is to use weighted average of the feature values of neighbors of a node at a given point in time. The intuition behind this scheme is that for any node i, if neighbor j has more weight gain causation factors than the average weight gain causation factors in the neighborhood of i, then j is more likely to have higher influence on i. This scheme allows us to emphasize the features of a highly influential neighbor.

Formally, this scheme determines social influence on node i by assigning weights \(N_i^t\) to the corresponding feature set, such that,

$$\begin{aligned} N_i^t = \sum _{j\in \mathbb {N}(i)} \frac{W_i^t(j) }{\bar{W}_i^t} \times F_j^t \end{aligned}$$
(2)

where, \(F_j^t\) is the feature vector for node j at time t, \(W_i^t(j)\) is given by Eq. 1, \(\mathbb {N}(i)\) is the neighborhood of node i for whom \(W_i^t(j) > 0\), \(\bar{W}_i^t\) is the average connection weight of the neighbors of node i given by \(\frac{1}{|\mathbb {N}(i)|} \sum _{k \in \mathbb {N}(i)} W_i^t(k)\).

Average Neighbor Feature Scheme - II (ANFS-II). An alternative to ANFS-I is based on the intuition that for any node i, if the neighborhood of neighbor j has more weight gain causation factors than the mean of the neighborhood of other neighbors of i, then j is more likely to have higher influence on i. In this scheme, the feature weight is computed using a normalized degree count of node j to weigh the influence it has on node i.

Formally, this scheme determines social influence on node i by assigning weights \(N_i^t\) to the corresponding feature set, such that,

$$\begin{aligned} N_i^t = \sum _{j\in \mathbb {N}(i)} \frac{D_j^t }{\bar{D}_i^t} \times F_j^t \end{aligned}$$
(3)

where, \(F_j^t\) is the feature vector for node j at time t, \(D_j^t\) is given by \(\sum _{k \in \mathbb {N}(j)} W_j^t(k)\), \(W_i^t(j)\) is given by Eq. 1, \(\mathbb {N}(i)\) is the neighborhood of node i for whom \(W_i^t(j) > 0\), \(\bar{D}_i^t\) is given by \(\frac{1}{|\mathbb {N}(i)|} \sum _{k \in \mathbb {N}(i)} D_k^t\).

ANFS-I can be considered to model direct influence of the neighbors on a node, while, ANFS-II can be considered as a measure of transitive neighbor influence on a node. The input features \(X_i^t\) for node i, under the schemes ANFS-I and ANFS-II are generated by concatenating \(F_i^{t-1}, N_i^{t-1}, F_i^t, N_i^t\).

3.2 Feature Extraction Using CNN

Convolutional Neural Networks. Convolutional Neural Network (CNN) is an adaptation of multi-layered neural deep architecture that reduces the number of free parameters [14]. This is achieved by weight sharing principle that makes use of the local receptive fields whose parameters are forced to be identical for all its possible locations. Schematically, the CNN architecture is a succession of alternating convolution layers (to capture salient information) and sub-sampling layers (to reduce dimension), both with trainable weights.

Although CNN models were historically used in the areas of image and video processing, recent deeper CNN architectures have enjoyed success in multiple research areas [15,16,17]. To the best of authors knowledge, the applicability of CNN in modeling human behavior dynamics over social networks still remains unexplored, in spite of the problem nature being inherently correlated with the CNN functionality. In other words, while, human behavior prediction over a social network involves learning a global function which emerges from the locally correlated behavior of a node and its neighbors, CNN exploits the local correlation within a small data neighborhood where the data neighborhood serves as the receptive fields to read feature values from input space. Thus making CNN a potential model for learning and predicting human behavior. Besides, the shared weight architecture of CNN helps us model the contribution of self and neighborhood influence successfully.

Table 1. Mean and Max AUROC with various influence schemes on the CNN model
Table 2. Variation in AUROC with change in model architecture – consisting of Convolutional (Conv), Pooling (Pool) and Fully Connected (FC) layers

Shared Weights for Social Network Data Representations. The key requirement for modeling data with a convolutional neural network is to adopt the concept of shared weights for that particular data representation. Hence, the use of shared features for the problem must be justified. This requires the filter shape, which searches for common features, to be customized in an intuitive manner. We also discuss the appropriateness of our data representation schemes in the context of weight sharing. For the current problem, the usability of shared weights is interpreted in two areas: (1) user features denoting self-influence, (2) neighbor influence representations.

Nodes demonstrating similar behavioral outcome, irrespective of the neighborhood, are expected to share common features of self-influence as those nodes can be said to work in similar ways under self-influence. For example, frequent exercise habits of nodes i and j are expected to result in weight loss for both the nodes.

Under WIS scheme, for two identical nodes, degree of social influence by a common neighbor is expected to be similar. For example, if node i and j both have a connection to node k, who is obese, it can be thought of that k will influence both i and j similarly. Again, for ANFS scheme, the weighing schemes already captures the influence of the different neighbors. The resultant feature vector, can be said to contribute towards social influence for different nodes. In the ANFS representation, an example similar to that for self-influence can be thought of - that is, in the presence of a feature vector which provides a compact representation of neighbor influence, nodes i and j will be influenced similarly by each component of that feature. Hence, shared weights can be applied in this case as well.

Using filters of size \(n \times 1\), where n is the length of feature vector representing a node, each filter f can be said to learn the mapping,

$$\begin{aligned} g :X_t \longrightarrow \mathbb {R}^m \end{aligned}$$
(4)

where, m is the number of nodes in the social network and each component of the co-domain is the output of a neuron.

Thus, if \((x_1,x_2, ..., x_m) \in \mathbb {R}^m\), then \(x_i = \phi (M_i^t W_f)\), where \(M_i^t\) is a row vector of the features of node i at time t, \(W_f\) is a column vector of the weights of filter f and \(\phi (.)\) is the activation unit of the neuron. In the present work rectified linear activation units (ReLU) are used [18]. This entire procedure is computationally performed as a convolution, thus speeding up calculations when generating the feature maps. The extracted feature maps are higher level representations of the raw behavioral input and hence, aid in extracting salient features for the prediction problem.

Max Pooling of Feature Maps. Max Pooling is used for pooling the outputs of multiple neurons across feature maps [19]. The pooling operation helps reduce the problem complexity and overlapping pooling filters were used for better performance of the model.

Depth of the Network. In theory, the CNN could be deepened significantly with further convolutional, pooling and fully connected layers in the model. However, in this study, the depth of the network, and thus the free parameters, are constrained by the short time series data. A layer of convolution, pooling followed by two fully connected layers guarantee decent results, while minimizing generalization issues.

3.3 RMSProp for Faster Convergence

RMSProp is a technique for using an adaptive learning rate for each neuron in the network. RMSProp maintains a moving average of the squared gradient for each weight and divides the global learning rate by this value at each weight update step. RMSProp prevents too large or small error gradients from drastically increasing or decreasing the value of the weight updates. It helps maintain consistency in weight updates, helps in faster convergence and has been used while learning the proposed model.

3.4 Multivariate Multiple Regression

The output layer used in this model is a multivariate multiple linear regression layer. The neural network is trained to model all nodes available at a given snapshot of the network. Hence, the regression layer contains response variables corresponding to the states of weight gain of all the nodes in the entire network at a point of time. This can be represented as,

$$\begin{aligned} Y_t^{(n\times 1)} = A^{(n\times 1)} + B^{(n\times m)}\psi _t^{(m\times 1)} + \epsilon ^{(n\times 1)} \end{aligned}$$
(5)

where, \(Y_t\) is a column vector of response variables of size n where n is the number of nodes in the social network, A is a column vector of size n representing the biases, \(\psi _t\) is a column vector of size m representing the output of the m hidden units of the previous layer, B is \(n \times m\) matrix of regression coefficients and \(\epsilon \) is a column vector of size n of errors. The loss function used in the Regression layer is the l2 loss function.

3.5 Preventing Over-Fitting with Dropout

Over-fitting is a particularly important problem in neural networks which are trained with a small amount of data. Constrained by the short time series data, alternative methods of reducing the over-fitting like dropout technique [20] has been used that improves generalization by reducing co-adaptions between neurons.

3.6 Combination of Influence Schemes

We try combinations of ANFS-I, ANFS-II and WIS influence models using at both the model and feature level in an attempt to improve prediction performance.

Model Level. We combine the ANFS and WIS influence schemes at the model level by using the mean as well as max of the regression outputs for their three corresponding models, to average out errors in the ensembled model.

Feature Level. While it is not possible to combine ANFS and WIS models at the feature level, it is possible to combine the two ANFS models. This is done by combining the weights derived using the ANFS-I with those derived using ANFS-II before calculating the weighted features. We explore simple additive and multiplicative combinations of the two influence models.

4 Implementation Details

The Convolutional Neural Network architecture used has been carefully designed for the short time series data. The general idea of the architecture is keeping the layers of the network narrow (that is, least number of hidden units possible), and the total network as deep as possible without suffering any severe degradation in performance due to over-fitting. This minimizes the trainable weights for the entire network.

With \(|V_t|\) number of nodes (assuming for all t) and \(|X_i^t|\) number of features per node, the model consists of an input layer of size \(|V_t|\times |X_i^t|\), a convolutional layer of 8 filters with size of \(1 \times |X_i^t|\) and a pooling layer with pool shape of \(4\times 1\) and pool stride of \(2 \times 1\) (which enables overlapping pooling), 2 fully connected layers of 50 and 25 fully connected units respectively and a Linear Regression layer with \(|V_t|\) response variables.

A high dropout rate of \(80\%\) is used in the proposed model to regularize the effects of low amount of data and to achieve increased generalization.

The training was done with approximately \(60\%\) of the available data (say t time instances) in each data split variation (monthly, fortnightly and weekly), where instances from 0 to 0.6t were included. Stochastic gradient descent was used to train each model for 100 iterations. The testing was performed on the remaining data samples which contained instances from 0.6t to t.

5 Data Description

For the prediction of weight gain status, we used a real life data set, the Social Evolution (SE) dataset [21]. The data consists of proximity information of approximately 80 undergraduate MIT students sharing a dormitory captured by digitally tracking their location, proximities, phone calls and messages through a pre-installed mobile phone application. An interaction network is formed out of the proximity data where a node represents a participant and an edge \(i-j\) indicate that node i and j were in proximity of less than 10 m. Further, we use the health assessments data of the participants including the features like, weight, height, diet, exercise frequency and more.

The social network is generated by sampling the frequency of physical proximity, given by bluetooth proximity data of mobile phones, among the nodes, wherein a new link is introduced between 2 nodes on the occurrence of a predefined number of proximity counts between them. A snapshot of the network is generated at monthly, bimonthly and weekly intervals to obtain temporally evolving social networks in each case.

In order to construct discrete temporal networks, we first divided the survey period into shorter time quantum (weekly, fortnightly, monthly) and aggregated the interactions falling in a particular time quantum. Further we pruned the nodes those do not appear in every time quantum. We also pruned edges for which number of interactions within a time quantum is below certain threshold. The temporal reference frame with which the interaction data changes are not same as that of health feature. While the health surveys were performed at 1 or 2 month(s) intervals – bluetooth proximity data were collected near continuously during the course of the survey. To make the two time frames comparable, we disaggregate the health feature sets using linear interpolation. For example, Considering a monthly time quantum, say health surveys have been performed for months 1 and 4, while connectivity data is available for every month. In this case, the features in the two months are interpolated linearly to provide features for months 2 and 3. This allows us the freedom to sample interaction data at relatively frequent intervals such as monthly, fortnightly and weekly basis and integrate it with the corresponding feature data.

It would be further interesting to study the prediction behavior on a family of generalized networks, instead of focusing on a particular instance of a real life graph. For that, we simulated a set of synthetic networks using Temporal Exponential Random Graph Model (TERGM) [22] that adopted the properties of the SE interaction network. Subsequently, the features of the nodes of the simulated temporal networks are mapped to that of the nodes of the original SE network. To achieve this, we use the PATH algorithm which is a fairly effective algorithm for graph matching [23].

6 Experimental Setup

We first carry out a parameter sensitivity study of our model that would also help to decide upon the best parameters to be used for the evaluation. Subsequently, the performance of our proposed method is compared with the performance of the existing baseline methods, using the metric area under the receiver-operator-characteristic curve (AUROC). An ROC curve is obtained by plotting the false positive rate against the true positive rate while varying the discriminating threshold for classification. The area under the ROC curve (AUROC) offers a simple yet insightful metric for quantifying the performance of classifiers and can be used to compare the optimality of classification models [24].

During the experiment multiple tests are run on the data sets with learning rate ranging between 0.001 to 0.005, step size of 0.0002. Further, we experiment with various temporal window sizes, that is number of historical variables seen.

For the comparative study we use the Personalized autoregression [10] and the socialized autoregression models [10]. While Personalized autoregression provides a simple and basic benchmark for behavior prediction tasks based on the feature set only, Socialized Autoregression uses social influence measures.

7 Results and Analysis

7.1 Parameter Sensitivity

The various parameters of the Convolutional Regression (Conv-Reg) model as well as the temporal window size have been varied to trace their influence in the model outputs. They are as follows:

Table 3. Variation of AUROC of different methods based on the frequency of splits for the different methods

Monthly, Fortnightly and Weekly Splits. Longitudinal splits of the network are varied on a monthly, fortnightly and weekly basis. The AUROC scores obtained are shown in Table 3. It is observed that on increasing the number of splits from weekly to fortnightly, the WIS scheme shows decreasing performance while the performances of the ANFS schemes (both I & II) improves. However, on further decreasing the number of splits to a monthly level, the performances of all the schemes decreases. This is possibly, because the true periodicity of the data is 14 days.

Dropout Rate. Figure 2 shows effect of the Dropout rate on the AUROC obtained from different influence schemes. The plot shows a clear increase in performance with the increase in Dropout rate. This observation is consistent with the premise that a high Dropout rate can reduce over-fitting in the model brought on by less data by preventing the neurons from co-adapting.

Fig. 2.
figure 2

Smoothed plot of the change of mean AUROC for different influence models vs. change in dropout rate

Temporal Window. The temporal window size (t) or the number of historical variables seen is varied from 0 to 2 and the performance of various influence schemes are evaluated. The result is plotted in Fig. 3. As shown in the figure, the best result is obtained when temporal window size is 1. This clearly supports our first-order Markovian assumption of the model, that given the present behavioral state, the future behavioral state is independent of the past behavioral states. While exploring further higher order behavioral states is an interesting idea, it remains a difficult task due to very real problem of not having a large enough dataset, in terms of survey sampling frequency, to draw from.

Fig. 3.
figure 3

Variation in AUROC with change in temporal window size for different influence techniques

Causative Factors for Weight Gain. We analyzed the effects of the individual weight gain factors on the prediction accuracy. When considered individually, the factors inactivity, Obesity/Overweight and recent weight gain is observed to have higher predictive power than unhealthy diet. We also analyzed combined effect of the three causative factors. As shown in the results, these three factors and their combination have higher predictive power than the predictive power of all the four factors. We can infer from the results that the unhealthy diet of an individual or her neighbor has less impact on the propagation of weight gain when compared to the other factors namely, inactivity, Obesity/Overweight and recent weight gain.

Proximity Reading Counts for Forming a Connection. We varied the threshold weight used to prune edges while forming the interaction network from 10 to 100 counts. However, the predictive performance is observed to be invariant to this change.

Table 4. AUROC scores of the influence model-learning technique pairs on the SE data set

7.2 Comparative Performance Analysis

Having studied the effects of the parameters on the predictive results, we select the following parameters that gives the best predictive performance: fortnightly temporal splits - with the samples for 10 fortnights being used for training and the remaining 7 fortnights being used for testing purposes, high dropout rate of 0.8, temporal window of size 1, combination of all four causative factors, bluetooth proximity count of 10 to form a connection very liberally in the social network graph.

The results of the comparative study are summarized in Table 4. We compare the performance of the proposed method with the personalized autoregression model with logistic (Log-AR) and linear regression (Lin-AR), socialized logistic (S-Log-AR) and linear (S-Lin) regression models with WIS, ANFS-I and ANFS-II influence models. As shown in the results, the proposed convolutional regression framework significantly out-performs the auto-regressive and the socialized auto-regressive models.

We further carry out the comparative performance analysis on the synthetic data set which is a generalization of the SE data set using TERGM model. We observe an increase of 16%–20% predictive accuracy achieved by the proposed method.

7.3 Neural Network Architecture and Depth

The number of weights in the model architecture was kept minimal to both guarantee generalization and minimize overfitting. While the architecture used for most experiments is described in Sect. 4, additional experiments are also carried out to justify an admittedly shallow 5 layer neural network model. Specifically, we test models with increased number of convolution, pooling and fully connected layers.

With a relatively low amount of training data, it is observed, as evidenced by Table 2, that gains in descriptiveness provided by a deeper architecture are superseded by the effects of overfitting on the training data. Although convolution layers have relatively lesser free parameters or number of weights to add to the model, it is seen that increase in the number of layers whether convolution or fully connected or otherwise result in a degradation of the AUROC score. The reduction in AUROC scores on deepening the architecture is replicated with experiments on all influence models. It can be inferred from the results that increasing the depth further than that of the original model results in overfitting.

We also validate the claim of a smaller architecture underfitting the data by trying a model with the fully connected layer removed. It is seen that a lower AUROC score than the original model validates our claims against a model with lesser depth as well.

7.4 Combination of Influence Schemes

The prediction of weight gain with different schemes are combined at two levels – feature level (for only ANFS) and model level. Table 1 shows the feature level combinations for the additive (ANFS-SUM) and the multiplicative scheme (ANFS-PROD) for the ANFS influence model. While ANFS-PROD in general performs better than the ANFS-SUM, neither is able to surpass the performance of the individual ANFS influence schemes.

Similarly, in case of the model level combination, we use a max (Preds-Max) and mean (Preds-Mean) scheme for averaging. Since a combination of weight gain factors has been seen to give the highest AUROC, we combine the scores of these models, as summarized in Table 5. However, despite Preds-Max usually performing marginally better than Preds-Mean, they are unable to outperform individual influence scheme models.

Table 5. Model level combination of influence schemes

8 Related Work

Although the propagation of information, opinion, or disease over social network has a long history of research, only a fairly recent work [25] highlights that obesity could spread through human interactions over a period of time. [25] demonstrates the propagation of obesity over a temporal social network data collected over a period of 25 years, sparked further interest in the social network research community.

Human interaction data available usually refers to data obtained by querying the population (eg. friendship nominations). However this can be noisy and prone to directed edges in a relationship graph due to unreciprocated nominations. In [21], the authors introduced the idea of pervasive capturing of human interactions by analyzing the pattern of daily proximity data recorded through sensors, instead of manually collected survey data. Social contact, it was deemed, could be inferred through continued close proximity through individuals. It was further demonstrated in [26] that proximity measured through mobile phones is a strong indicator for inferring friendship ties among a population. Unlike the previous works which looked at obesity propagation over more than two decades at a low sampling frequency, the dataset in [26] was collected over a rather short period of 10 months at a higher sampling frequency and the study focused on weight gain propagation, particularly that of being overweight. In [11], a regression framework considering various causative factors for weight gain were explored to model the propagation of weight gain. It was concluded that weight gain, similar to obesity, is influenced by neighbors in an interaction network.

In literature, problems related to inferring health conditions from social networks were modeled with Socialized and Personalized Logistic Regression Models which formed decent benchmarks for prediction of behavior propagation over temporal social relationship networks [10, 27]. However, unlike the simple regression based models, the present work uses neural networks with a larger number of layers of non-linear transformations to model the data.

In a recent related work [28], Restricted Boltzmann Machines (SRBM) have been used for modeling the propagation of exercise habits across a temporal social network. However, [28] developed a significantly more complex neural network connected to all historical variables of all users which is not highly scalable and like all other deep networks require very large amount of training data.

9 Conclusion and Future Work

Understanding the propagation of obesity and overweight, and identification of the underlying controlling factors is an important research area of recent days. Particularly, the prediction of the individuals who are going to be overweight or obese in future, as overweight and obesity propagate over dynamic human interaction network, would help the healthcare providers take preventive and precautionary measures to contain the propagation and improve the quality of life by modifying behavior patterns.

In this work, we address the problem from the social network analysis perspective. We developed a scalable supervised prediction model based on convolutional regression framework that is particularly suitable for prediction of short time series data. While existing models such as the SRBM and Regression based models mentioned previously are fully connected architectures and assign weights to all historical variables, we use the concept of shared weights to significantly restrict the free parameters of our model. We propose various schemes to model social influence for health behavior change. These schemes are successfully used to create feature embeddings for our convolutional network and extract similar features from the feature representation of the users in our dataset. An exhaustive comparative study on the Social Evolution data set shows the superiority of the proposed method over comparable techniques.

Further we study the contribution of the primary factors of overweight and obesity, like unhealthy diets, recent weight gains and inactivity in the prediction task. The results reveal an important observation - in contrary to the common belief, unhealthy diet of an individual or her neighbor has less impact on the propagation of weight gain when compared to the other factors namely, inactivity, obesity/overweight and recent weight gain.

Finally, we generalized the Social Evolution data set using Separable Temporal Exponential Random graph model and run the experiments to verify the performance of the proposed methods. The results show an improvement of over \(16\%\) in the predictive accuracy achieved by the developed method.

As future extension, it would be interesting to study the seasonality of interaction pattern and its effect on the propagation of obesity and overweight. As basic CNN and auto-regressive models are unable to capture the seasonal variations, a modification of the CNN and its integration with auto-regressive models that support seasonality, like Holt-Winter model could open up a future research avenue.