Elsevier

Social Networks

Volume 30, Issue 4, October 2008, Pages 297-308
Social Networks

Treatment of non-response in longitudinal network studies

https://doi.org/10.1016/j.socnet.2008.04.004Get rights and content

Abstract

The collection of longitudinal data on complete social networks often faces the problem of actor non-response. The resulting incomplete data pose a challenge to statistical analysis, as there typically is no natural way to treat the missing cases. This paper examines the problems caused by actors missing as nominators, but still occurring as nominees, in complete, directed networks measured in a panel design. In the framework of stochastic actor-driven models for network change (“SIENA models”), different methods to cope with such incomplete data are investigated. Data on a friendship network among female high school students are used to illustrate the procedures. Missing data problems related to early panel exit and late panel entry are not addressed.

Introduction

Data analysis in social sciences is often hampered by non-response. In the analysis of social networks, non-response results in missing network information. This means that ties from one actor to another are not observed and/or information on actor attributes is not available. According to Burt “missing data are doubly a curse to survey network analysis” (Burt, 1987, p. 63), compared to other types of analyses (see also Borgatti and Molina, 2003). First, the complexity of items in network surveys is more likely to generate missing data (e.g., Marsden, 2005), and second, network analysis is especially sensitive to missing data because of the dependence structure of the data. If a network tie, or worse, an actor is missing, there is limited capacity to describe the network context of the actors whose ties are missing and there is lack of information on the context of neighboring actors (Robins et al., 2004).

The effects of non-response and missing data on the structural properties of networks are investigated in several studies Burt, 1987, Costenbader and Valente, 2003, Kossinets, 2006. The general conclusion is that missing data have a negative effect on network mapping (Borgatti and Molina, 2003) and estimating structural network properties: the strength of relationships is underestimated (Burt, 1987), centrality measures become unstable as well as degree measures Costenbader and Valente, 2003, Kossinets, 2006, and clustering coefficients are underestimated (Kossinets, 2006). Still, Costenbader and Valente (2003) find that measures based on indegrees are reasonably robust for small proportions of missing data when the observed incoming ties of non-respondents are used in the analysis. This latter result shows an unique property of social networks: non-participation by respondents does not necessarily mean that they are not included in the study (Borgatti and Molina, 2003). Respondents report ties to non-respondents, that is, the incoming ties of non-respondents are available.

Missing data treatment methods can use the information on non-respondents from the nominations of observed actors. Stork and Richards (1992) propose using these partially described ties between respondents and non-respondents to reconstruct the missing outgoing ties: substitute the missing ties by the value of the tie in the opposite direction. This imputation method is appropriate if ties tend to match across actors, for instance, in undirected networks. For directed networks, this (ad hoc) imputation method seems less suitable. Another imputation method is suggested by Burt (1987), who finds that missing relations are strongly associated with weak relations and therefore can be replaced with values indicating such weak relations.

More recent missing data methods are proposed by Robins et al., 2004, Gile and Handcock, 2006, Handcock and Gile, 2007, and Koskinen (2007). These methods are also based on all available data, including the incoming ties of non-respondents. The proposed methods are model-based treatment methods within the framework of exponential random graph models (ERGMs). Robins et al. (2004) model the ties from respondents to non-respondents separately from the fully described ties, which allows exploring the structural effects for the entire network. The model is especially helpful when the non-respondents systematically differ from the respondents with respect to ties. Gile and Handcock, 2006, Handcock and Gile, 2007, and Koskinen (2007) use Markov chain Monte Carlo methods to fit ERGMs to incomplete network data. This is a more traditional missing data approach based on the marginal distribution of the observed data (e.g., see Schafer and Graham, 2002), allowing for proper inferences for network properties for both respondents and non-respondents. As the methods repeatedly sample from the conditional distribution of the missing data, they can also be used to impute the data sets.

All these methods are designed for modeling single, incomplete observations of a network. Moreover, possible treatments are either simple ad hoc procedures (the imputation methods of Stork and Richards, 1992, and Burt, 1987), or are embedded within ERGMs Robins et al., 2004, Gile and Handcock, 2006. For the case of longitudinal network data, studies on the effect of non-response or the effect of treatment procedures are lacking. In this paper we examine the effects of non-response and missing data techniques on longitudinal network data. The effects of missing data treatments are investigated within the framework of the actor-driven models for network evolution proposed by Snijders, 2001, Snijders, 2005, using simulations under a known evolution model. The missing data treatments that are used in the simulation study are the analysis of complete cases, two ad hoc imputation methods based on reconstruction (Stork and Richards, 1992) and preferential attachment (Barabasi and Albert, 1999), respectively, and a hybrid imputation procedure based on simulating networks with the actor-driven network evolution models (Snijders, 2005).

The paper is organized as follows. Section 2 addresses the problem of non-response in longitudinal network data, defining the missing data patterns that are considered in this study. In Section 3 the family of actor-driven models for network evolution of Snijders, 2001, Snijders, 2005 is briefly described. Section 4 presents the missing data treatments, of which the performance (i.e., the effects of the treatments on modeling the data with actor-driven models) is investigated in a simulation study. The design of this study is presented in Section 5 and in Section 6 the results in terms of convergence of the estimation procedure and the absolute and relative bias in the parameter estimates. The paper ends with a discussion of the results and some general recommendations.

Section snippets

Non-response in longitudinal network studies

In missing data research usually two types of non-response are distinguished: unit non-response, where complete cases are missing, and item non-response, where the unit participated but data on particular items are missing. For social network data, unit non-response means that an actor does not participates in the study and therefore all his or her outgoing ties are unavailable for analysis. Item non-response means that only particular (outgoing) ties are unavailable in the analyses.

In the case

Actor-driven models for network evolution

The prominent tool for modeling and analyzing longitudinal, complete network data is the family of stochastic, actor-driven models introduced by Snijders, 1996, Snijders, 2001, Snijders, 2005. Estimation of these models is implemented in the SIENA software (shorthand for S imulation I nvestigation for E mpirical N etwork Analysis; Snijders et al., 2007). The present study refers to this model family and software package, and the way parameter estimates are affected by missing data and missing

Missing data treatments

There are several ways to deal with missing data. Two general, popular approaches are likelihood-based estimation based on the available data and imputation (Schafer and Graham, 2002). The ERGM-based procedures proposed by Robins et al., 2004, Gile and Handcock, 2006, Handcock and Gile, 2007, and Koskinen (2007) are examples of the former group of treatments, although the latter three can also be used to produce imputed data sets. The reconstruction method suggested by Stork and Richards (1992)

Simulation study

In order to investigate the sensitivity of parameter estimates of the actor-driven models to the various types of missing data treatments, a simulation study is performed. The general pattern of the study is:

  • 1.

    generate complete data under a known evolution model,

  • 2.

    generate missing data by erasing a fraction of actors (i.e., all outgoing ties of the actors),

  • 3.

    treat the missing data using the procedures outlined in Section 4,

  • 4.

    re-estimate the evolution model on the data treated for missingness,

  • 5.

Results

The effect of the missing data treatments on modeling the longitudinal network data was evaluated using three measures of performance: practicability (operationalized as number of converged estimation runs), absolute size of error (operationalized as median parameter bias), and relative size of error (operationalized as the relative position of the true score in the distribution of estimates). The use of robust measures (percentiles) instead of sensitive ones (like averages or standard

Discussion

Missing actors have a large effect on analyzing longitudinal network data. The simulations show that ignoring the missing data and restricting the analysis to completely observed cases leads to problems when using actor-driven network evolution models. These problems are two-fold. First, the reduced sample size and the loss of information leads to problems in fitting a model to the data (convergence problems). With large fractions of missing data it is hardly possible to find a fitting

References (28)

  • Gile, K., Handcock, M.S., 2006. Model-based assessment of the impact of missing data on inference for networks. CSSS...
  • Handcock, M.S., Gile, K., 2006. Modeling social networks with sampled or missing data. CSSS Working Paper No. 75....
  • M. Huisman et al.

    Statistical analysis of longitudinal network data with changing composition

    Sociological Methods and Research

    (2003)
  • Koskinen, J., 2007. Fitting models to social networks with missing data. Paper presented at Sunbelt XXVII, the...
  • Cited by (188)

    • Peer network in math anxiety: A longitudinal social network approach

      2023, Journal of Experimental Child Psychology
    View all citing articles on Scopus
    1

    The author was funded by the Netherlands Organization for Scientific Research (NWO) under grant 401-01-550.

    View full text