Skip to content
BY-NC-ND 3.0 license Open Access Published by De Gruyter November 29, 2016

Semi-Parametric Estimation and Inference for the Mean Outcome of the Single Time-Point Intervention in a Causally Connected Population

  • Oleg Sofrygin EMAIL logo and Mark J. van der Laan

Abstract

We study the framework for semi-parametric estimation and statistical inference for the sample average treatment-specific mean effects in observational settings where data are collected on a single network of possibly dependent units (e.g., in the presence of interference or spillover). Despite recent advances, many of the current statistical methods rely on estimation techniques that assume a particular parametric model for the outcome, even though some of the important statistical assumptions required by these methods are often violated in observational network settings. In this work we rely on recent methodological advances in the field of targeted maximum likelihood estimation (TMLE) and describe an estimation approach that permits for more realistic classes of data-generative models while providing valid inference in the context of observational network-dependent data. We start by assuming that the true data-generating distribution belongs to a large class of semi-parametric statistical models. We then impose some restrictions on the possible set of such distributions. For example, we assume that the dependence among the observed outcomes can be fully described by an observed network. We then show that under our modeling assumptions, our estimand can be described as a functional of the mixture of the observed data-generating distribution. With this key insight in mind, we describe the TMLE for possibly-dependent units as an iid data algorithm and we demonstrate the validity of our approach with a simulation study. Finally, we extend prior work towards estimation of novel causal parameters such as the unit-specific indirect and direct treatment effects under interference and the effects of interventions that modify the structure of the network.

1 Introduction

1.1 Motivation

In this paper we are concerned with estimation and inference for the sample average treatment effect [1] in an observational setting that involves members of a single connected network. Valid statistical inference in such settings presents a number of significant challenges. For example, the frequently made assumption of independence among units is generally violated when data is collected on a population of connected units, since the network interactions will often cause the exposure of one unit to have an effect on the outcomes of other connected units. In general, statistical methods for estimation and inference in observational network data are faced with three key challenges that set such data apart from the classical statistical methods for independent observational data: (i) the outcome for each unit can be a function of the treatment assignments of other units that are connected to the unit through its network, an occurrence referred to as interference or spillover [2, 3]; (ii) the outcome of each unit can be a function of the baseline covariates of other units that are connected to the unit through its network, sometimes referred to as network-correlated outcomes [4]; and (iii) the observed exposure allocation for each unit can be a function of the baseline covariates of other units. As a result, the sample units are not independent, and, in fact, one only observes a single draw from the true data generating distribution. Therefore, classical statistical methods that assume independence among the observed outcomes will be often overly optimistic and invalid for quantifying the variability of estimators in such data. In addition, many of the current estimation procedures for observational network data assume a particular class of parametric or restrictive classes of semi-parametric models for the observed data-generating distribution, which makes these methods highly susceptible to bias due to model misspecification [5, 6].

Targeted maximum likelihood (or minimum loss-based) estimation (TMLE) [7, 8] is a general framework for constructing asymptotically linear and efficient substitution estimators, that belong to a much larger class of semi-parametric models, while providing asymptotically valid statistical inference. Recently, the TMLE framework has been extended to estimation of treatment effects in dependent observational data [9], where the dependence among units is described by the network of connections formed by these units (e.g., social or geographical networks). Our aim will be to provide an accurate reflection of the background knowledge available for a given scientific problem, while still being able to perform valid statistical estimation. Thus, we start by assuming a realistic semi-parametric statistical model for the generating distribution of observed network data, which places minimal restrictions on the set of such possible data-generating distributions.

In this article we generalize the previously described TMLE from van der Laan [9] to allow estimation of novel causal parameters among units that might be dependent. In particular, we describe the TMLE for estimating the treatment effect under arbitrary unit-specific stochastic interventions on groups of friends, e.g., interventions that characterize the indirect and direct treatment effect under interference [10]. We also extend our framework towards estimation of parameters defined by interventions that statically or stochastically modify the initial network structure. In addition, we describe a new asymptotic variance estimator that improves upon the previously proposed estimator from van der Laan [9]. Finally, we validate our theoretical results with a simulation study that compares TMLE to two other estimators. Our simulations demonstrate that consistent estimation and valid asymptotic inference of the sample average treatment effects for a single time point stochastic interventions is possible in this larger class of semi-parametric models, even in observational network data where the dependence between units is induced by the known network structure.

1.2 Brief review of relevant literature

The literature on networks and causal inference in network data is rapidly evolving. However, the existing statistical methods for performing estimation and inference for causal effects in networks are limited and the literature on this subject has only recently started to develop [9, 10, 11, 12, 13]. Our review is not intended to be exhaustive, instead, we focus on the key aspects and challenges of statistical estimation of treatment effects in observational network data. Most of the recently proposed approaches can be categorized as relying on either the assumption of randomized exposures across units [4, 14, 15, 16, 17, 18, 19, 20, 21, 22], or on parametric modeling of the outcome as a particular function of the unit’s network. Some of the parametric approaches applied in the network settings include generalized linear models (GLMs) and generalized estimating equations (GEEs) [5, 6], methods which have important limitations [11, 23, 24, 25, 26]. For one, GLMs and similar modeling techniques require making strong, simplifying modeling assumptions about the underlying data generating process. Hence, model misspecification for GEEs and GLMs in the network data settings is a major cause of concern. Perhaps more importantly, performing valid statistical inference with GLMs and other similar statistical techniques generally requires independence of the observational units, an assumption that is unlikely to hold due to the very nature of the network data. It has also been previously described that application of such standard statistical procedures to dependent data will result in invalid and generally anti-conservative statistical inference [11, 23].

In addition, a few promising methodological approaches to estimation in network data have begun to emerge in recent years. For example, Aronow and Samii [16] proposed a Horvitz-Thompson estimator in a randomized study settings, defined the so-called “network exposure model” and derived the finite sample estimator of the variance. However, such methods are of limited utility in observational settings. Other proposed approaches for identification and estimation of treatment effects in networks include stochastic actor-oriented models [27], and a linear Bayesian modeling approach that can accommodate for network uncertainty [21]. Another recently proposed approach applied the semi-parametric framework of targeted maximum likelihood estimation to the observation network data settings [9], yielding valid asymptotic inference, while allowing for a much larger and realistic class of data-generative models. We apply the latter approach in the sections that follow.

1.3 Overview of this article

Consider a study in which we observe a sample of N units where any two units might be connected via a social network. For each unit we collect baseline covariates, a binary exposure, and a one-dimensional outcome of interest. We denote the sample by the random vector O=(W,A,Y)P0, where W=(Wi)i=1N is a vector of baseline covariates across all units, A=(Ai)i=1N is a vector of exposures, Y=(Yi)i=1N is a vector of outcomes, and P0 belongs to a large semi-parametric model. We assume each Wi has finite support, each Ai is binary, and Yi is either binary (e.g., indicating survival beyond a specific time point, or the success of a particular intervention) or bounded (e.g., a count of the number of times an event of interest has occurred during the follow-up period, or a continuous measure of a biomarker level at the end of the study). For each unit i in the sample, we also collect the information on other units in {1,,N}{i} that are connected to (or influence) i. These units are referred to as “i’s friends”, and this set is denoted by Fi{1,,N}. It is assumed that Fi is recorded at baseline, along with other baseline covariates, and it is assumed fixed. Additionally, we allow |Fi|, the number of friends for unit i, to vary in i, but assume that this number is bounded by some known global constant K that doesn’t depend on N. The vector F=(Fi)i=1N is then referred to as the “network profile” of O. For example, in an experiment evaluating the effects of introducing a new service to an online social network, for each unit, Fi could denote the set of all online friends of i, whose exposure status may influence i’s outcome. Alternatively, in a study of the effects of early HIV treatment initiation, Fi could be the set of all sexual partners of unit i. We allow for the following types of between-unit dependencies: (i) the unit-level exposures can depend on baseline data of itself and other units, and (ii) unit-level outcomes can depend on baseline and exposure data of itself and other units. An important ingredient of our modeling approach is to assume that any dependence of each unit i on other units has to be fully described by the known network. Specifically, we assume that the dependence of i’s treatment and outcome on other units is limited to the set of i’s friends. A second important ingredient is the assumption that these dependencies can be accurately described with some known summary measures, which map the data collected on friends of each unit into a summary that has the same dimension for all units.

We now wish to estimate and perform valid inference for the sample-average of the unit-specific mean outcomes, denoted as ψ0 and defined as 1/Ni=1NEEgE(Yi|A,W)|W, where the N exposures are assigned according to some user-specified stochastic intervention g. That is, g is a fixed conditional density for drawing exposures A, given baseline covariates W. Under additional causal assumptions, this statistical quantity will be equal to the sample average of the expected counterfactual outcomes, defined as the expected outcomes which would have been obtained if the units in the sample had actually been treated according to the treatment regime specified by g [9]. We note that the definition of intervention g is kept general, allowing for any static, dynamic, or stochastic single time point interventions. We also note that the definition of the statistical parameter can be extended to multiplicative or additive sample average treatment effects [1, 28]. Additionally, this statistical parameter is defined with respect to a given network profile F and given sample size N, and thus should be regarded as a type of a data-adaptive statistical parameter [29], since its true value is allowed to change for different sample sizes and different network structures. Finally, we note that the statistical parameter defined in this manner has a generally meaningful statistical interpretation, even when the required causal assumptions do not hold, and our focus is only on the aspects of statistical estimation of such parameters in the context of the semi-parametric modeling framework.

Having introduced the key notation, we note that the above defined parameter ψ0 depends on the joint distribution of the observed data only as a function of the mixture of the unit-specific distributions, and we use Pˉ to denote this mixture. In particular, we show that our parameter is equal to a mapping Ψˉ from Pˉ. One of the advantages of the mapping Ψˉ is that it leads us to conclude that the estimation of Ψˉ(Pˉ0) should be only concerned with estimation of the relevant factors of the mixture Pˉ, rather than the corresponding factors of the distribution P. In addition, the mapping Ψˉ allows us to interpret ψ0 as a G-computation formula for the mean of iid outcomes evaluated under some fixed stochastic intervention gˉ [30]. As we will show, this representation also implies that our statistical parameter can be consistently estimated by ignoring dependence among units and treating them as if they are independent and identically distributed (iid). This observation suggests that a large class of iid-data estimators is applicable to estimation problems, such as the one we describe in this article. Based on this key insight, the dependent-data TMLE from van der Laan [9] is then presented as a typical iid-data TMLE algorithm. We also apply the new mapping Ψˉ for performing statistical inference, presenting a new robust asymptotic variance estimator that remains conservative even when the outcome model E(Yi|A,W) is misspecified. We conduct a simulation study to provide a proof of concept for our framework and to assess the feasibility of unbiased estimation and inference in finite sample observational network data. We also compare the performance of TMLE to other statistical procedures using a newly developed R package tmlenet [31].

The rest of the article is organized as follows: We start by formally describing the observed network data, defining the statistical model, and defining the statistical parameter of interest in Section 2. Next, in Section 3, we describe an alternative representation of our statistical parameter as a mapping from the mixture distribution Pˉ, where Pˉ is derived as a function of the actual observed data distribution P. In Section 4, we describe how a typical iid TMLE algorithm can be applied to estimation in possibly dependent data, focusing on the sample-average treatment effects under single time-point stochastic intervention g. We then proceed by describing the asymptotic normality of this TMLE in Section 5 and discuss possible approaches to estimating its asymptotic variance in Section 6. Next, we describe a simulation study that examines the finite sample performance of the proposed TMLE, and that of the new variance estimator, in Section 7. We then describe in Section 8 how our framework generalizes to estimation of parameters indexed by arbitrary collections of stochastic interventions or by interventions on the network structure. We conclude with a discussion of the relative merits and limitations of our proposed approach in Section 1.

1.4 A note on notation

Throughout this article we use the bold font capital letters, such as O, to denote random vectors that include observations on all N units, and bold font small letters, such as o, to denote their corresponding fixed values. For example, A will denote the vector of N exposures, i.e., A=(A1,,AN). We will also use the standard font capital letters with a subscript to denote the unit-specific observations, i.e., Ai will denote the exposure for the unit i. Finally, we will use the over-bar symbol to denote mixture distributions across all N units, as well as their corresponding random variables. For example, PˉW will denote the mixture of N unit-specific distributions of baseline covariates Wi, for i=1,,N, i.e., PˉW=1/Ni=1NPi,0, and Wˉ will denote a random variable distributed according to PˉW. The only exception to this rule will be Qˉ, which will denote the conditional expectation of the unit-specific outcome Yi, as well as the conditional expectation of the mixture-based outcome Yˉ, which happen to be equal under our statistical model.

2 Statistical model and parameter

Suppose P0N is the true data generating distribution for N observed and connected units, with O=(W,A,Y)P0N denoting the random vector for these N units and Oi=(Wi,Ai,Yi), for i=1,,N. The network profile F is assumed recorded at baseline, i.e., FW. We also assume all Yi are bounded random variables. Let M denote a statistical model containing P0N. Since O represents a network of possibly dependent units, we observe only a single draw from P0N, and as a result, are unable to estimate P0N from this single observation O. We now proceed by making a series of statistical assumptions, which will allow us to learn the true distribution of O based on this single draw. In particular, we introduce these assumptions by making restrictions on the set of possible distributions that belong M. We will then define our statistical quantity of interest as a mapping Ψ from M into the real line R.

Following van der Laan [9], we make the following set of statistical assumptions for any P0NM:

A.1

Conditional on F, each Wi depends on at most K other observations in W=(W1,,WN), i.e., if (Wj:jSi) is the set of all observations dependent with Wi then maxi|Si|K and K must not depend on N;[1]

A.2

A=(A1,,AN) are independent, conditional on W;

A.3

Y1,,YN are independent, conditional on (A,W).

These assumptions imply the following likelihood for PNM:

pN(O)=i=1NpYi|A,W(Yi|A,W)i=1NpAi|W(Ai|W)pW(W).

We also assume conditional independence implied from the known network structure:

A.4

Assume that conditional distributions P(Yi|) only depend on (Aj,Wj:jFi), for Fi=Fi{i}, and similarly, P(Ai|) depend on (Wj:jFi).

We now introduce the dimension reducing assumptions for these conditional distributions. Specifically:

B.1

Each P(Yi|) is a function of some known summary measures ais((Aj,Wj):jFi) and wis((Wj:jFi)), for Fi=Fi{i}. Each P(Ai|) is a function of the summary measure wis((Wj:jFi)), for Fi=Fi{i}. We assume wis() and ais() are known functions that map into Euclidean set of constant (in i) dimension that does not depend on N, where ais map into some common space As, and wis map into some common space Ws.

We will use the following shorthand notation for the above summary measures:

Wis=wis(W)=wis(Wj:jFi)Ws,Ais=ais(A,W)=ais((Aj,Wj):jFi)As,

As an example, an investigator might be willing to assume that the outcomes Yi depend on (A,W) only through summary measures (ais(A),wis(W)), where ais(A)=(Ai,Aic) and wis(W)=(Wi,Wic) and Aic is some one dimensional summary of exposures of i’s friends and Wic is some one dimensional summary of baseline covariates of i’s friends, where Aic is the same function in i and Wic is also the same function in i. If one is unwilling to make such strong dimensionality reducing assumptions, one could instead assume ais(A)=(Aj:jFi) and wis=(Wj:jFi), without assuming a particular functional form of ais and wis. By filling the empty spots in ais() and wis() with missing values one would assure that all summaries (ais(),wis()) are of constant dimension across i and that the information on the number of friends of i is also captured. In summary, we allow Ais and Wis to be arbitrary functions of the units’ network, as long as their dimension is fixed, common-in-i, and doesn’t depend on N. Applying these summary measures to the observed data, we obtain the following likelihood:

pN(O)=i=1Np(Yi|Ais,Wis)i=1Np(Ai|Wis)p(W).

We are now ready to make the final set of restrictions on M. Specifically:

C.1

Assume that all Yi are sampled from the same distribution QY with density given by qY(Yi|as,ws), conditional on fixed values of the summary measures (Ais,Wis), for i=1,,N. Similarly, assume that all Ai are sampled from the same distribution given by density g(Ai|ws), conditional on some fixed value of the summary measures Wis=ws, for i=1,,N.

We also assume (without loss of generality) that the densities g and qY are well-defined with respect to some dominating measure. Using the previous example of the summary measures, i.e., Wis=(Wi,Wj:jFi) and Ais=(Ai,Aj:jFi), this assumption implies that the two units i and j will be subject to the same conditional distributions for drawing their exposures and outcomes, if i and j have the same number of friends, same individual covariate and treatment values, and the same values for the covariates and treatments of their friends. This implies that its possible to learn the common-in-i densities qY and g from a single (but growing) draw O from P0N as N, resulting in a well-defined statistical estimation problem. We denote the joint density of conditional network exposures A given W by g(A|W), with above assumptions implying the factorization g(A|W)=i=1Ng(Ai|Wis). We denote the joint distribution of W by QW(W), making no additional assumptions of independence between W=(W1,,WN) and we assume qW is a well-defined density for QW, with respect to some dominating measure. This final set of assumptions defines our statistical model M, where M describes the set of all possible distributions PN for the observed dependent data O.

We now introduce the notation P=PQ,G, for Q(QW,QY) and we assume the distributions QW and QY are unspecified beyond the above modeling conditions A1, A3, A4, B1 and C1. We also note that observed exposure model for G may be a restricted to incorporate the real-world knowledge about the true conditional treatment assignment, for example, when the common-in-ig(Ai|Wis) is known, such as in a randomized clinical trial. This defines the statistical parametrization for the data-generating distribution of O in terms of the distributions Q and G, and the corresponding statistical model is defined as M={PQ,G:QQ,GG}, where Q and G denote the parameter spaces for Q and G, respectively. In particular, we denote Q0 as Q evaluated at P0N. Applying this newly introduced notation results in the likelihood:

(1)pN(O)=i=1NqY(Yi|Ais,Wis)i=1Ng(Ai|Wis)qW(W).

We define an intervention of interest by replacing the conditional distribution G with a new user-supplied intervention G that has a density g that we assume is well-defined. Namely, G is a multivariate conditional distribution that encodes how each intervened exposure, denoted as Ai, is generated conditional on W. We note that static or dynamic interventions on A correspond with degenerate choices of g (e.g., Gill and Robins [33], Robins [32, 34, 35], Yu and van der Laan [36]), while non-degenerate choices of g are often referred to as stochastic interventions (e.g., van der Laan [9], Dawid et al. [37], Robins and Richardson [39], Muñoz and van der Laan [38], Zheng and van der Laan [40]). We assume that A and A belong to the same common space A and we make no further restrictions on G. We also define Ais:=ais(A), where Ais denotes the random variable implied by the summary measure ais() mapping from an intervened exposure vector A, for i=1,,N. Finally, we define the post-intervention distribution PQ,G by replacing G in PQ,G with a new user-supplied distribution G. We use O=(Wi,Ai,Yi)i=1N to denote the random variable generated under PQ,G and its likelihood is given by:

(2)pQ,GN(O)=i=1NqY(Yi|Ais,Wis)g(A|W)qW(W).

The latter distribution PQ,G is referred to as the G-computation formula for the post-intervention distribution of O under stochastic intervention G [30] and it is a parameter of PN.

Our target statistical quantity ψ0 is now defined as a function of this post-intervention distribution (2). Specifically, it is given by:

ψ0=Ψ(P0N)=Eq0,g1Ni=1NYi,

which is an expectation of the sample-average of N outcomes among dependent units i=1,,N, where the expectation is evaluated with respect to the post-intervention distribution PQ,G. We view Ψ(P0N) as a mapping from the statistical model M into R, and we note that ψ0 is defined conditionally on the observed network structure, F and is also indexed by N. We also define Qˉ(Ais,Wis)=yyqY(y|Ais,Wis)dμ(y) as the conditional mean evaluated under common-in-i distribution QY, and Qˉ0 as Qˉ evaluated at P0N. Note that our dimension reduction assumptions imply that EP0N[Yi|A,W]=Qˉ0(Ais,Wis). We also note that our parameter ψ0 only depends on P0N through Qˉ0 and QW,0, and with a slight abuse of notation we will interchangeably use Ψ(P0N) and Ψ(Qˉ0,QW,0). Thus, the parameter ψ0 is indexed by N, F and G and can be written as:

ψ0=1Ni=1Na,wQˉ0(ais(a,w),wis(w))g(a|w)qW,0(w)dμ(a,w),

with respect to some dominating measure μ(a,w).

One might also be interested in a target quantity defined as a contrast of two stochastic interventions. For example, one may define ΨG1(P0N) and ΨG2(P0N) as the above target parameter evaluated under stochastic interventions G1 and G2, respectively, then defining the target quantity as ΨG1,G2(P0N)=ΨG1(P0N)ΨG2(P0N). The average treatment effect over N connected units is then a special case of ΨG1,G2(P0N) for interventions G1,G2 defined as g1(1N|w)=1 and g2(0N|w)=1, for any wW. We will focus on the estimation of the statistical parameter ψ0 defined for one particular G, noting that all of our results naturally generalize to contrasts or any other quantities that can be expressed as Euclidean-valued functions of a collection {ΨG(P0N):GG}, for a finite set of stochastic interventions G.

We note that by making additional untestable assumptions, one can interpret ψ0as a causal quantity that measures the sample-average of the expected counterfactual outcomes in a network of N connected units under intervention G, as was previously shown in van der Laan [9]. However, these additional causal assumptions put no further restrictions on the above described probability distribution P0N, so that our statistical model M remains the same. Since M contains the true data distribution P0N, it follows that ψ0 will always have a pure statistical interpretation as the feature Ψ(P0N) of the data distribution P0N. For the estimation problem at hand, the causal model plays no further role: even when one does not believe any of the untestable causal assumptions, one might still argue that the statistical parameter ψ0 represents an effect measure of interest controlling for all measured confounders. Finally, we note that the assumption A1 can be dropped entirely, by defining the target parameter ψ0 conditionally on the observed baseline covariates W, as shown in van der Laan [9].

3 Target parameter as a mapping applied to a mixture model

The above defined target parameter Ψ(P0N) can be represented as an alternative (and equal) mapping Ψˉ(Pˉ0), where Pˉ0 is defined as a mixture of N unit-specific components of the joint data-generating distribution P0N. This leads us to another way of thinking about the estimation of our target parameter, suggesting that the problem of estimating ψ0 should only be concerned with estimating the relevant components of the mixture Pˉ0. We first apply the summary measures Wis=wis(W) and Ais=ais(A,W) to the observed data O, mapping it into a dataset of N dependent summary observations, denoted Os=(O1s,,ONs) and referred to as the “summary data”. We assume each Ois=(Wis,Ais,Yi) is distributed according to Pi,0s, where Pi,0s is implied by the joint distribution P0N of O and the i-specific summary measures (wis(),ais()). We assume that each Pi,0s has a well-defined density pi,0s with respect to some dominating measure. As before, the set of all possible distributions of O is given by the statistical model M={PQ,G:QQ,GG}, and for a given PNM, we first define its implied mixture Pˉ and its relevant factors, and we then describe the new mapping Ψˉ(Pˉ) in Theorem 3.1. Our next goal is to present the efficient influence curve (EIC) for this mapping Ψˉ(Pˉ0), which we do in two steps in Theorems 3.2 and 3.3.

3.1 Mapping ψ0=Ψˉ(Pˉ0) for the mixture distribution Pˉ0

Let Os be the sigma-algebra for the union of the unit-specific supports of Ois, for i=1,,N. For a set AOs, define the mixture distribution Pˉ(A) as a finite mixture of N unit-specific summary distributions Pis with constant weight 1/N, i.e., Pˉ:=1/Ni=1NPis. Let Oˉs denote the random variable drawn from such Pˉ and we assume that Pˉ has a well-defined density pˉ:=1/Npis. Note that pˉ can be factorized as follows:

pˉ(Oˉs)=pˉ(Yˉ,Aˉs,Wˉs)=qˉY(Yˉ|Aˉs,Wˉs)gˉ(Aˉs|Wˉs)qˉW(Wˉs),

and we now describe in detail the above factors of pˉ.

First, let QWis denote the marginal distribution of the i-specific baseline summary measure Wis, with density qWis(Wis) defined as the marginal of the joint density pis(Yi,Ais,Wis). The distribution QˉW can then be defined as a finite mixture of these i-specific marginal distributions QWis, and the density of QˉW can be defined as follows:

qˉW(ws):=1Ni=1NqWis(ws).

We let Wˉs denote a random variable drawn from the mixture distribution QˉW, noting that Wˉs belongs to the same common space Ws as all wis(W), for i=1,,N. Similarly, we let Hi denote the i-specific joint distribution of the summaries (Ais,Wis), with its density hi(Ais,Wis) implied by pis(Yi,Ais,Wis). We also let Hi denote the joint distribution of the summaries (Ais,Wis), where Ais is determined by the user-supplied stochastic intervention GA|W and the i-specific summary measure ais, and we denote the density of Hi as hi. We also assume that these i-specific densities hi(as,ws) and hi(as,ws) are well-defined with respect to some common dominating measure μa,w. We now define the mixture distribution Hˉ as a finite mixture of i-specific Hi, with its corresponding mixture density defined as hˉ(as,ws):=1/Ni=1Nhi(as,ws), and we let (Aˉs,Wˉs) denote the random variables drawn jointly from Hˉ. Next, we define an analogous mixture distribution Hˉ as a finite mixture of i-specific distributions Hi, with its mixture density given by hˉ(as,ws):=1/Ni=1Nhi(as,ws), and we let (Aˉs,Wˉs) denote the random variables drawn jointly from Hˉ. Finally, we note that these mixture densities hˉ and hˉ can be factorized as follows:

hˉ(as,ws)=gˉ(as|ws)qˉW(ws),
hˉ(as,ws)=gˉ(as|ws)qˉW(ws),

where gˉ is a factor in the above factorization of the likelihood of Oˉs, namely, gˉ is the density for the conditional distribution of Aˉs given Wˉs, denoted as Gˉ; gˉ is the density for the conditional distribution of Aˉs given Wˉs, denoted as Gˉ; and qˉW is the previously defined marginal density for the mixture QˉW. In a similar manner one can define the conditional distribution of Yˉ given (Aˉs,Wˉs), denoted as QˉY, with its density denoted as qˉY, which completes the description of the three factors of pˉ(Oˉs). We also define a statistical model Mˉ as the space of all possible distributions {QˉY,Gˉ,QˉW} and we note that each PˉMˉ is implied by some PNM. We also note that when (W,A) are discrete, one can obtain the following intuitive analytic expressions for the above defined densities qWis, hi and hi:

qWis(ws)=wI(wis(w)=ws)qW(w)dμw(w),hi(as,ws)=a,wI(ais(a,w)=as,wis(w)=ws)g(a|w)qW(w)dμa,w(a,w),hi(as,ws)=a,wI(ais(a,w)=as,wis(w)=ws)g(a|w)qW(w)dμa,w(a,w),

where μw and μa,w are some dominating measures. The new mapping ψ0=Ψˉ(Pˉ0) for our target parameter is now presented in the following theorem.

Theorem 3.1

Let PNM and let Pis denote the i-specific summary data distribution of Ois=(Wis,Ais,Yi). Let PˉMˉ be the above defined finite mixture of these N unit-specific distributions Pis and Mˉ is the above defined mixture model. Let Oˉs=(Wˉs,Aˉs,Yˉ)Pˉ denote one sample drawn from Pˉ. The likelihood of Oˉs is given by:

pˉ(Oˉs)=qY(Yˉ|Aˉs,Wˉs)gˉ(Aˉs|Wˉs)qˉW(Wˉs),

where gˉ and qˉW are the previously defined factors of pˉ; qY is the density of QYM previously defined in Section 2, i.e., qY is the common-in-i conditional density of Yi given (Ais,Wis). Due to the modeling assumptions on M, qY is also the conditional density of Yˉ given (Aˉs,Wˉs). It follows that Ψ(PN)Ψˉ(Pˉ), where the new mapping Ψˉ(Pˉ) is given by:

Ψˉ(Qˉ,QˉW,gˉ)=EQˉWEgˉ[Qˉ(Aˉs,Wˉs)|Wˉs]=wsWs,asAsQˉ(as,ws)gˉ(as|ws)dQˉW(ws)=1Ni=1NEQWisEgˉ[Qˉ(Aˉs,Wis)|Wis]=1Ni=1Nwis,asQˉ(as,wis)gˉ(as|wis)dQWis(wis),

and we let Qˉ(as,ws):=EQY[Yˉ|Aˉs=as,Wˉs=ws]. With the slight abuse of notation we interchangeably write Ψˉ(Qˉ,QˉW,gˉ) and Ψˉ(Pˉ), to emphasize the fact that Ψˉ depends on Pˉ only through Qˉ, QˉW and gˉ.

Proof

First, we show that that qY is indeed the conditional density of Yˉ, given (Aˉs,Wˉs) under Pˉ. To see this, note:

pˉ(ws,as,ys)=1Nipis(ws,as,y)=1Nipis(y|ws,as)hi(ws,as)=qY(y|ws,as)1Ni=1Nhi(ws,as),

where by the assumptions in Section 2 we note that the distribution P(Yi|Wis,Ais) is given by a common distribution QY with density qY, from which the above result follows as claimed. The equivalence Ψ(PN)Ψˉ(Pˉ) then follows directly by applying the definitions of gˉ, QˉW and Qˉ.

The above theorem implies that the estimator of ψ0 can be obtained from the estimators of Qˉ0 and QˉW,0. It also gives us an alternative interpretation for our target parameter ψ0. Namely, the mapping Ψˉ(Pˉ0) happens to be equal to the parameter given by the G-computation formula for the mean outcome EYˉgˉ0, under stochastic intervention gˉ0 and the observed data (Wˉs,Aˉs,Yˉ)Pˉ0. That is, we take the above defined conditional density gˉ0:=hˉ(G,QW,0)/qˉW(QW,0) as if this was a known user-supplied stochastic intervention on Aˉs given Wˉs, treating gˉ0 as fixed we then evaluate EYˉgˉ0 by first replacing gˉ0(Aˉs|Wˉs) factor of pˉ(Oˉs) with gˉ0, then taking the expectation of Yˉ under such modified post-intervention distribution Pˉ. We also recognize that gˉ0 will be generally unknown, since it depends on the true distribution of the data via QW,0. Nonetheless, expressing the dependent-data parameter as some function of the mixture Pˉ0 implies that the estimation of this parameter can be accomplished by simply treating the observed dependent units as if they are independent and identically distributed (iid) (see Lemma 4.1 from Section 4 for a specific case of estimating gˉ0 component of mixture pˉ). Hence, whenever we are concerned with estimating any parameter of Pˉ0, such as ψ0 given above, we can ignore the dependence among units Ois, i=1,,N, immediately providing us with an iid-analogue estimator for ψ0, and in our case we will undertake the iid targeted maximum likelihood estimation (TMLE) approach, as described in Section 4. Among a class of iid-analogue estimators for ψ0, we can choose an estimator which would be efficient for iid data, and in our case, we will show that such an analogous efficient iid TMLE will also be semi-parametrically efficient in our dependent data model.

3.2 The efficient influence curve

The efficient influence curve (EIC), frequently referred to as the efficient score or canonical gradient, is a key ingredient in semi-parametric efficient estimation, because it defines the linear approximation of any efficient and regular asymptotically linear estimator, and therefore provides an asymptotic bound for the variance of all regular asymptotically linear estimators [41]. Furthermore, as discussed in Section 5 of van der Laan [9], even for dependent data problems such as ours, the EIC still characterizes the limiting normal distribution of the MLE, thus establishing that if we want to construct an estimator that is asymptotically equivalent to the MLE, we need to study the EIC of our target parameter. Due to local asymptotic normality of the log-likelihood, as was argued in van der Vaart [42], the normal limiting distribution implied by the MLE is still the optimal limit distribution in the convolution theorem for efficient estimators. Our first result provides the EIC DˉN for a data-adaptive parameter ΨˉN(Qˉ0,QˉW,0):=Ψˉ(Qˉ0,QˉW,0,gˉN) indexed by fixed gˉ=gˉN, where gˉN is an estimator of the true density gˉ0. We will refer to DˉN as the iid-EIC, since it is a direct analogue of the iid-data EIC for the parameter EYˉgˉ. We then present the EIC Dˉ for our actual parameter of interest Ψˉ(Qˉ0,QˉW,0,gˉ0), defined with respect to the true density gˉ0.

EIC for data-adaptive parameter indexed by fixed stochastic intervention gˉN

We now replace gˉ0 by a fixed density gˉ, set equal to some data-dependent estimator gˉN of gˉ0, which is then treated as fixed. This allows us to define the data-adaptive target parameter ΨˉN(Qˉ0,QˉW,0), indexed by such fixed gˉ, as EQˉW,0Egˉ=gˉN[Qˉ0(Aˉs,Wˉs)|Wˉs]. From iid results for parameters defined under fixed stochastic interventions, such as those described in Muñoz and van der Laan [38], it immediately follows that the efficient influence curve for parameter ΨˉN(Pˉ0) at Pˉ0Mˉ and OˉsPˉ0 is given by:

DˉIID(Pˉ0)(Oˉs)=gˉgˉ0(Aˉs|Wˉs)YˉQˉ0(Aˉs,Wˉs)+Egˉ=gˉN[Qˉ0(Aˉs,Wˉs)|Wˉs]ΨˉN(Qˉ0,QˉW,0).

In other words, we have obtained an efficient influence curve for the mean outcome of Yˉ under stochastic intervention gˉ, for one observation OˉsPˉ0. We will thus refer to DˉIID(Pˉ0)(Oˉs) as the iid-EIC, due to just described iid-data interpretation of parameter ΨˉN(Pˉ). However, we don’t get to observe Oˉs, instead, our observed data is Os=(O1s,,ONs), where OisPi,0s. Nonetheless, it follows that the EIC for the actual data-adaptive parameter ΨˉN(Pˉ0) at P0NM and the observed data model OsP0N is given by the sum of these iid-EICs, evaluated at i-specific observations Ois and scaled by 1/N, i.e, the EIC for parameter ΨˉN(Pˉ0) is given by 1/Ni=1NDˉIID(Pˉ0)(Ois) and we also present this EIC in the following theorem.

Theorem 3.2

Suppose our parameter of interest is defined by the mapping ΨˉN(Pˉ), where PˉMˉ and gˉ is fixed. The efficient influence curve DˉN(PN)(Os) for ΨˉN(Pˉ), evaluated at PNM and one observation Os (consisting of N dependent units) is given by

DˉN(PN)(Os)=1Ni=1Ngˉgˉ(Ais|Wis)YiQˉ(Ais,Wis)+Egˉ[Qˉ(Ais,Wis)|Wis]Ψi(Qˉ,QˉW)=1Ni=1NDYi(Qˉ,gˉ)(Ois)+DWis(Qˉ,QˉW)(Wis),

where

DWis(Qˉ,QˉW)(Wis)=Egˉ[Qˉ(Ais,Wis)|Wis]Ψi(Qˉ,QˉW)=asQˉ(as,Wis)gˉ(as|Wis)as,wsQˉ(as,ws)gˉ(as|ws)dQWis(ws).

Proof

See Appendix B of our technical report [43].

EIC for parameter under true gˉ(g,QW,0)

We now consider our actual target parameter Ψˉ(Pˉ0):=Ψˉ(Qˉ0,QˉW,0,gˉ0), obtained by replacing fixed gˉ:=gˉN in ΨN(Qˉ0,QˉW,0) with true density gˉ0:=hˉ(G,QW,0)/qˉW(QW,0). As a result, the EIC for the parameter Ψˉ(Qˉ0,QˉW,0,gˉ0) will contain an additional contributing term Dgˉ(P0N)(W) due to ΨN(Qˉ0,QˉW,0)Ψ(Qˉ0,QˉW,0,gˉ0). This additional contribution is derived in Appendix C of our technical report [43]. The final EIC for our actual parameter Ψˉ(Pˉ) is given by Dˉ(PN)(O)=DˉN(P0N)(Os)+Dˉgˉ(P0N)(W) and is provided in Theorem 3.3 below.

Theorem 3.3

Suppose our parameter is given by the mapping Ψˉ(Pˉ) defined in Section 3.1. The efficient influence curve for Ψˉ(Pˉ) is given by:

Dˉ(PN)(O)=DˉN(PN)(Os)+Dˉgˉ(PN)(W)=1Ni=1Ngˉgˉ(Ais|Wis)YiQˉ(Ais,Wis)+asQˉ(as,Wis)gˉ(as|Wis)Ψi(Qˉ,QˉW)+1Ni=1NasQˉ(as,Wis)gi(as|W)asQˉ(as,Wis)gˉ(as|Wis),

where

gi(as|W)=aI(ais(a,W)=as)g(a|W)dμa(a),
Ψi(Qˉ,QˉW)=as,wsQˉ(as,ws)hi(as,ws)dμ(as,ws)=a,wQˉ(ais(a,w),wis(w))g(a|w)qW(w)dμ(a,w),

and Dˉ(PN) above can be further simplified to:

Dˉ(PN)(O)=1Ni=1Ngˉgˉ(Ais|Wis)YiQˉ(Ais,Wis)+asQˉ(as,Wis)gi(as|W)Ψi(Qˉ,QˉW)=1Ni=1NDYi(Qˉ,gˉ)(Ois)+Dgi(Qˉ,QˉW)(W).

Proof

See Appendix B of our technical report [43].

Suppose that gˉ/gˉ is uniformly bounded on a set that contains (Wis,Ais) with probability 1, for all i. Using similar analysis to the one conducted in van der Laan [9], we can show that Dˉ above is a doubly robust estimating function for parameter ψ0=Ψˉ(Pˉ0), in the sense that,

P0Dˉ(Qˉ,gˉ0,ψ0)=P0Dˉ(Qˉ0,gˉ,ψ0)=P0Dˉ(Qˉ0,gˉ0,ψ0)=0,

where P0f=f(o)dP0N(o) denotes the expectation of f under distribution P0N, and Qˉ=(Qˉ,QˉW). This implies that any estimator that solves this equation is going to be consistent if: (1)QˉW,N is consistent for QˉW,0 and (2) at least one of the two estimators QˉN or gˉN is consistent for Qˉ0 or gˉ0.

4 The Targeted Maximum Likelihood Estimation (TMLE)

We had found a new representation for our target parameter Ψ(P0N)=Ψˉ(Qˉ0,QˉW,0), which shows that it depends on P0N only as a function of QˉY and QˉW. Demonstrating that our parameter can be written as a mapping Ψˉ(Qˉ,QˉW) is hence the first step in estimation of ψ0. It implies that the estimation of ψ0 should be only concerned with estimating the relevant factors of the mixture Pˉ0, which will be accomplished by following the standard Targeted Maximum Likelihood Estimation (TMLE) template. For the description of the TMLE framework in iid data with static interventions we refer to van der Laan and Rose [7], van der Laan and Rubin [8],Gruber and van der Laan [44], and for the TMLE in iid data with stochastic intervention we refer to Muñoz and van der Laan [38].

The TMLE for estimating ψ0 will be described in terms of the estimators QˉN, gˉN, gˉN and QˉW,N of Qˉ0, gˉ0, gˉ0 and QˉW,0, respectively. Our next step is then to create a targeted estimator QˉN of Qˉ0 by updating the initial estimator QˉN, defining the TMLE ψN as the corresponding plug-in estimator for the mixture mapping Ψˉ(QˉN,QˉW,N). We define the targeted update QˉN based on the loss function for Qˉ0 and the least favorable fluctuation submodel through Qˉ0 in terms of gˉ0 and gˉ0. The model update QˉN is defined in a way so that its score generates the efficient influence curve Dˉ presented in Theorem 3.3. That is, the targeted estimator QˉN updates QˉN by: (1) using the estimated weights gˉN/gˉN, (2) using a parametric submodel {QˉN(ε,gˉN/gˉN)} through the initial estimator QˉN=QˉN(0,gˉN/gˉN) at ε=0, where {QˉN(ε,gˉN/gˉN)} is referred to as the least-favorable submodel, (3) fitting ε with the standard parametric MLE, with εN denoting this fit, and finally, (4) defining the targeted (updated) estimator as QˉN:=QˉN(εN,gˉN/gˉN). We note that this TMLE is actually the usual iid TMLE algorithm for estimating the quantity EYgˉ under fixed (data-adaptive) gˉ, treating observations Ois, for i=1,,N as if they are iid.Finally, this TMLE ψN solves the empirical score equation given by the efficient influence curve Dˉ from Theorem 3.3, implying that ψN also inherits the double robustness property of this efficient influence curve.

4.1 The estimator QˉW,N for QˉW,0

We define an estimator QˉW,N of QˉW,0 by first defining the empirical counterpart QW,N of QW,0 that puts mass one on the observed W=(W1,,WN), which then implies that the empirical distribution QWis,N of QWis,0 will put mass one on its corresponding observed Wis=wis(W), for i=1,,N. Hence, for each wsWs, the empirical counterpart QˉW,N(ws) of QˉW,0(ws) may be defined as follows:

QˉW,N(ws):=1Ni=1NI(Wis=ws).

4.2 The initial (non-targeted) estimator QˉN of Qˉ0

We assumed there is a common model Qˉ0 across all i and Yi are conditionally independent given (Ais,Wis), for all i. Consequently, the estimation of a common QˉN can proceed by using the pooled summary data (Wis,Ais,Yi), i=1,,N, as if the sample is iid across i and one can rely on the usual parametric MLE or loss-based cross-validation for estimating QˉN, as described in van der Laan [9]. Given that Yi can be continuous or discrete for some known range Yi[a,b], for i=1,,N, the estimation of Qˉ0 can be based on the following log-likelihood loss,

L(Qˉ)(Y|As,Ws)=i=1NlogQˉ(Ais,Wis)Yi(1Qˉ(Ais,Wis))1Yi,

or the squared error loss

L(Qˉ)(Os)=i=1NYiQˉ(Ais,Wis)2.

Thus, fitting QˉN for common Qˉ0=E[Yi|Ais,Wis] amounts to using the summary data structure (Wis,Ais,Yi), for i=1,,N. In other words, we use the entire sample of N observations for predicting Yi. For example, for binary Yi, QˉN can be estimated by fitting a single logistic regression model to all N observations, with Yi as the outcome, (Wis,Ais) as predictors, and possibly adding the number of friends, |Fi|, as an additional covariate. After fitting QˉN, one generates a vector of unit-specific prediction values, (QˉN(Ais,Wis))i=1N, which are then used to build an updated version QˉN of QˉN.

4.3 Estimating gˉ0 and gˉ0

We now describe a direct approach to estimation of gˉ0 that relies on Lemma 4.1 below. This lemma states that a consistent estimator gˉN of gˉ0 can be obtained by taking a pooled sample (Ais,Wis), for i=1,,N, and using the usual iid maximum likelihood-based estimation, as if we were fitting a common-in-i conditional density for Ais given Wis and treating (Ais,Wis) as independent observations. For example, if each component of Ais is binary, and |Ais|=k for all i, the conditional distribution for gˉ0 could be factorized in terms of the product of k binary conditional distributions. Each of these binary conditional distributions can be estimated with the usual logistic regression methods. We also refer to Section 7 for a running example that describes in detail this direct estimation approach. Suppose now that g0 is known, as will be the case in a randomized clinical trial (RCT). We note that this aforementioned approach to estimating gˉ0 can be easily adopted to incorporate the knowledge of true g0. That is, one could proceed by first simulating a very large number of observations (Ajs,Wjs)j=1M from (g0,QW,N), with QW,N that puts mass one on the observed W, and then fitting the maximum likelihood-based estimator for gˉ0, as if we were fitting a common model for Ais given Wis, based on this very large sample that is treated as iid.

As discussed in the previous section, gˉ0:=hˉ(G,QW,0)/qˉW(QW,0) will generally be unknown and hence will also need to be estimated from the data, in particular, since gˉ0 depends on the true distribution of the data via QW,0. Therefore, we propose adopting the above approach for estimation of gˉ0 towards estimation of gˉ0 by simply replacing the known g0 with known g. Finally, even when g0 is unknown, such as in an observational study on N connected units, one could obtain a better estimator of gˉ0 by utilizing the conditional independence assumptions for observed exposures Ai, given Wis, for i=1,,N. Similar to estimation of QˉN, this allows us to use loss-based cross-validation and machine learning methods to obtain a good approximation gN(a|ws) for common-in-i density g0(a|ws), resulting in an estimator gN of the joint density g0. We can now repeat the above described procedure for estimating gˉ0 when g0 is known, except using such data-adaptively estimated gN instead of g0. In this manner, one can obtain sufficient approximations to true gˉ0 and gˉ0, by fully utilizing the actual model knowledge for g0 and the actual knowledge of g.

The resulting model fits gˉN and gˉN are used toobtain N predictions (gˉN(Ais|Wis)/gˉN(Ais|Wis)), for i=1,,N. These predictions will be used as the unit-level weights for the TMLE update of the estimator QˉN, as described in the following section.

Lemma 4.1

(Lemma 2 in van der Laan [9]). Let the density gˉ0be defined as gˉ0(aˉs|wˉs):=hˉ0(aˉs,wˉs)/qˉW,0(wˉs), where qˉW,0 and hˉ0 are as previously defined in Section 3.1. Let Hg be a set of functions that contain true gˉ0, then

gˉ0=argmaxgˉHgEP0N1Niloggˉ(Ais|Wis).

A consistent estimator gˉN for gˉ0 can be obtained by plugging in the empirical counterpart of P0N above, resulting in an estimator:

gˉN=argmaxgˉHg1Ni=1Nloggˉ(Ais|Wis).

That is, gˉN is the maximum likelihood estimator of gˉ0 that uses the pooled sample (Ais,Wis), for i=1,,N, treating the dependent N units as iid.

Proof

See Appendix A of our technical report [43].

4.4 TMLE algorithm for N connected units

Having defined the estimators QˉN, gˉN, gˉN and QˉW,N, the TMLE ψN is obtained by first constructing the model update QˉN for QˉN, as described in step 1. below, and then evaluating ψN as a substitution estimator for the mapping Ψˉ, as described in step 2. below.

  1. Define the following parametric submodel for QˉN: LogitQˉN(ε)=ε+LogitQˉN and define the following weighted log-likelihood loss function:

    Lw(QˉN(ε))(Os)=i=1NlogQˉN(ε)(Ais,Wis)Yi(1QˉN(ε)(Ais,Wis))1YigˉNgˉN(Ais|Wis).

    The model update QˉN is defined as QˉN(εN)=ExpitLogitQˉN+εN, where εN minimizes the above loss, i.e., εN=argminεLw(QˉN(ε))(Os). That is, one can fit εN by simply running the intercept-only weighted logistic regression using the pooled sample of N observations(Wis,Ais,Yi), for i=1,,N, with outcome Yi, intercept ε, using offsets LogitQˉN(Ais,Wis), predicted weights gˉN(Ais|Wis)/gˉN(Ais|Wis) and no covariates. The fitted intercept is the maximum likelihood fit εN for ε, yielding the model update QˉN, which can be evaluated for any fixed (as,ws), by first computing the initial model prediction QˉN(as,ws) and then evaluating the update QˉN(εN).

  2. The TMLE ψN=ΨˉN(QˉN,QˉW,N) of ψ0 is defined as the following substitution estimator:

    ψN=1Ni=1NasQˉN(as,Wis)gˉN,NPMLE(as|Wis)dμ(as),

    where gˉN,NPMLE is a NPMLE substitution estimator for gˉ0:=hˉ(G,QW,0)/qˉW(QW,0), obtained by plugging in the user-defined G and the empirical counterpart QW,N for QW,0 which puts mass one on observed W=(W1,,WN). Hence, the estimator gˉN,NPMLE is defined as follows:

    gˉN,NPMLE(as|ws)=1/Nihi(G,QW,N)(as,ws)1/NiQWis,N(ws),

    for each (as,ws)(As,Ws). The above TMLE ψN might require evaluation of gˉN,NPMLE for every possible as(a,W) in the support of A, and hence could be computationally challenging to implement in practice, especially for non-degenerate g and multivariate (ais:i=1,,N). However, we also note that the above TMLE ψN takes on the following algebraically equivalent form:

    ψN=1Ni=1NaQˉN(ais(a,W),wis(W))g(a|W)dμ(a),

    which does not require computing gˉN,NPMLE. For non-degenerate g, the latter expression for ψN can be closely approximated by sampling from g and performing Monte Carlo integration. That is, we propose evaluating ψN by iterating the following procedure j=1,,M times: (1) Sample N observations Aj=(Aj,1,,Aj,N) from g(a|W), conditionally on observed W=(W1,,WN); (2) Apply the summary measure mappings, constructing the following summary dataset ​(Aj,is,Wis), for i=1,,N, where each Aj,is:=ais(Aj,W); and (3) Evaluate the Monte Carlo approximation to ψN for iteration j as:

    ψj,N=1Ni=1NQˉN(Aj,is,Wis).

    The Monte Carlo estimate ψˉN of ψN is then obtained by averaging ψj,N across j=1,,M, where M is chosen large enough to guarantee a small approximation error to ψN. Finally, we note that one could substantially reduce the computation time of this algorithm by simply re-using the summary datasets (Aj,is,Wis) that were already constructed while performing direct estimation of gˉ0 from Section Section 4.3.

5 Asymptotic normality of TMLE

Having defined the TMLE ψN=Ψˉ(QˉN,QˉW,N) for our parameter Ψˉ(Qˉ0,QˉW,0), our goal now is to conduct inference. We start with an asymptotic analysis of the process (ψNΨˉN(Qˉ0,QˉW,0)), where ΨˉN(Qˉ0,QˉW,0) is the previously defined data-adaptive target parameter indexed by fixed gˉN. We then show that our results can be easily extended to allow inference for our original parameter of interest Ψˉ(Qˉ0,QˉW,0) defined with respect to the true gˉ0.

As described in the Appendix D of our technical report [43], TMLE Ψˉ(QˉN,QˉW,N) solves the following empirical score equation:

1Ni=1NDˉN(QˉN,QˉW,N,gˉN)(Ois)=0,

where DˉN(Qˉ,gˉ) is the EIC for the data-adaptive parameter ΨˉN(Qˉ):=EYˉgˉ=gˉN (Theorem 3.2). Using the identity for DˉN(Qˉ,gˉ) shown in the Appendix D of [43], we have that:

ΨˉN(Qˉ)ΨˉN(Qˉ0)=Pˉ0DˉN(Qˉ,gˉ)+Rˉ2(Qˉ,Qˉ0,gˉ,gˉ0),

where we use the notation Qˉ=(Qˉ,QˉW) and Rˉ2 is second order term provided in Appendix D of [43]. Since Pˉ0 is defined as a mixture 1/NiP0,i, and combined with the fact that our TMLE solves the above efficient score equation, we obtain:

ΨˉN(QˉN)ΨˉN(Qˉ0)=1NiDiN(QˉN,gˉN)(Ois)P0,iDiN(QˉN,gˉN)+Rˉ2,N.

By having fast enough rates for gˉN and QˉN, one can show that R2,N=oP(N1/2) and by the empirical process and asymptotic equicontinuity analysis conducted in van der Laan [9], using the same conditions as in Theorem 2 of van der Laan [9], it follows that this empirical process applied to estimated DˉN(QˉN,gˉN) is up to oP(N1/2) equal to the empirical process for the fixed limit, where we use Dˉ0N to denote this limit and we have:

ΨˉN(QˉN)ΨˉN(Qˉ0)=1NiDi,0N(Ois)P0,iDi,0N+oP(N1/2).

We now perform a similar derivation which will allow us to conduct inference for the parameter Ψˉ(Qˉ0) defined with respect to the true gˉ0:=hˉ(G,QW,0)/qˉW(QW,0). Specifically, we perform the following asymptotic expansion:

ΨˉN(QˉN)Ψˉ(Qˉ0)=ΨˉN(QˉN)ΨˉN(Qˉ0)+ΨˉN(Qˉ0)Ψˉ(Qˉ0)=1NiD0,iN(Ois)P0,iD0N+1NiD0,igˉ(Ois)P0,iD0,igˉ+oP(N1/2)=1NiD0,i(Ois)P0,iDi,0+oP(N1/2),

where D0gˉ=1/NiD0,igˉ was defined as the additional contribution to the EIC in Theorem 3.3 and D0,i is the i-specific zero-mean component of that EIC:

D0,i=gˉ0gˉ0(Ais|Wis)YiQˉ0(Ais,Wis)+asQˉ0(as,Wis)gi(as|W)wasQˉ0(as,wis(w))gi(as|w)dQW,0(w).

We also note that the above expansion must hold for our TMLE Ψˉ(QˉN,QˉW,N), since it solves the score equation given by:

1Ni=1NDigˉ(QˉN,gˉNPMLE,N)(W)=0.

By applying the analogue analysis to the one conducted in van der Laan [9], we can show that the above process converges to a normal limiting distribution at N rate, namely,

N(ΨˉN(QˉN)Ψˉ(Qˉ0))dN(0,σ02),

with σ02 given by the following limit:

σ02=limN1Ni,jR(i,j)E[D0,i(Ois)D0,j(Ojs)],

for (i,j){1,,N}2 and R(i,j)=1 when FiFj, and R(i,j)=0 otherwise, and we always have that R(i,i)=1, for all i=1,,N. We also refer to Theorem 2 in van der Laan [9] for the full list of assumptions required for asymptotic normality of N(ΨˉN(QˉN)Ψˉ(Qˉ0)).

6 Inference

The previous section established that the TMLE ΨˉN is an asymptotically linear estimator for our assumed statistical model M from Section 2. This result relied in part on the assumption A1 (weak dependence ofW). In particular, this assumption states that each Wi can be dependent on at most K other observations in W=(W1,,WN), where K is fixed. We also derived the expression for the asymptotic variance of the TMLE ΨˉN, given by σ02. Our next goal is to conduct inference for the TMLE by estimating σ02. However, we must note that while our assumed statistical model M allowed us to construct a consistent and asymptotically linear TMLE, the same model M will not allow us to estimate its variance, i.e., it results in non-identifiable σ02. To clarify, our model assumed no additional knowledge about the joint distribution of W, beyond their weak dependence. The weak dependence assumption alone provided no apparent method for obtaining a valid joint likelihood of W. Thus, we are unable to perform any type of re-sampling from the joint distribution of W. For example, our proposed estimator QW,N from Section 4 put mass one on N observed W, which leads to an estimate of EgYi|WΨi,0 that is constant zero for all i=1,,N. As a result, the NPMLE estimate QW,N doesn’t allow for valid estimation of the unit-specific components of the efficient influence curve (EIC) that contribute to σ02. In particular, we cannot estimate the terms P0{EgYi|WΨi,0×EgYj|WΨj,0}, for any dependent i and j. As a consequence, asymptotic variance is non-identifiable and valid estimation of σ02 will generally require making additional modeling restrictions for QW, beyond the previously assumed weak dependence of W.

However, if our statistical model results in a valid joint likelihood of W and allows us to fit its joint distribution, then the variance σ02 can be estimated. In this case one could employ the parametric bootstrap that re-samples W from the joint fit of its distribution. To summarize, as long as we can fit the likelihood for QW (i.e., assume a model on W which would allow us to do that) then the TMLE variance σ02 can be consistently estimated. Finally, if we are unwilling to make additional modeling restrictions for QW then only the upper bound estimates of σ02 may be obtained. While we propose a possible ad-hoc upper bound estimate of σ02 below, we leave its theoretical validation and the more detailed analysis of this topic for the future research.

In what follows, we provide some examples of specific modeling restrictions on W (e.g., when assuming W are iid) that allow us to consistently estimate the variance of the TMLE. We also describe an ad-hoc approach which may provide an estimate of the upper bound of such variance when not making any additional modeling restrictions for QW. Finally, we describe the inference for the parameter that conditions on all observed W and thus doesn’t require making any modeling assumptions about their joint distribution.

N.B. We note that our described TMLE is based on the EIC from Theorem 3.3. While this EIC was derived under no modeling restrictions on QW, our actual statistical model M from Section 2 had to assume that QW satisfied the weak dependence assumption. As such, the EIC from Theorem 3.3 does not characterize the asymptotic efficiency bound within model M. That is, by assuming some structure on W we are implicitly changing the corresponding efficiency bound of our statistical model. Thus, our proposed TMLE is no longer an asymptotically efficient estimator in a model that assumes the weak dependence of W. However, this restriction was crucial for proving the asymptotic linearity of TMLE and was thus unavoidable. We are also unaware of any current methods which would allow us to derive the EIC either for a model where only the weak dependence of W is assumed or for some variation of this assumption.

Furthermore, putting even more modeling restrictions on QW will further restrict the statistical model, implying that the previous TMLE will become even less efficient. However, these restrictions are only necessary to identify the variance, since the TMLE remains well-behaved in a bigger model M. That is, these additional restrictions have to do with model-based variance estimation.

6.1 Inference when assuming a restricted joint model for the baseline covariates

Consider a special case where W are iid. A valid approach to conducting inference is to use the empirical analogue estimator of σ02 based on the plug-in estimates DN,i of D0,i. Evaluating each E[DN,i(Ois)DN,j(Ojs)] with respect to its empirical counterpart then results in the following estimator of σ02:

σN2=1Ni,jR(i,j)DN,i(Ois)DN,j(Oˉj),

where

DN,i(Ois)=Di(QˉN,gˉN,gˉN,ψN)(Ois)=gˉNgˉN(Ais|Wis)YiQˉN(Ais,Wis)+aQˉN(ais(a,W),wis(W))g(a|W)Ψi(QˉN,QW,N)

and

Ψi(QˉN,QW,N)=waQˉN(ais(a,w),wis(w))g(a|w)dQW,N(w).

We refer to Appendix B for results demonstrating the consistency of σN2 when assuming iid W. Note that each Ψi(QˉN,QW,N), for i=1,,N, can be approximated with Monte-Carlo integration by sampling with replacement from N iid baseline covariates and then sampling N exposures according to g. We also note that for the case of iid baseline covariates one can derive the actual EIC for this specific model, as it was presented in Section 6.2 of van der Laan [9]. However, the implementation of its corresponding plug-in asymptotic variance estimator is computationally overwhelming and we hence propose using the above estimator σN2 of σ02. Furthermore, we know from the results for iid data that such EIC-based confidence intervals will generally provide correct coverage when QˉN and gˉN are correctly specified, and will be conservative if only gˉN is specified correctly. Analogous to these results we expect our estimator σN2 to be conservative when the model for Qˉ0 is misspecified, and we test the validity of this conjecture in a simulation study in Section 7. One can then construct a 95% confidence interval (CI) ψN±1.96σN/N, which will provide correct asymptotic coverage.

An alternative approach that avoids assuming iid W is to assume a statistical model that specifies a particular ordering of observations i=1,,N. This ordering, combined with the weak dependence of W, allows us to assume a particular statistical graph for the dependence among (W1,,WN), thus defining a unique factorization of the joint likelihood of W. By putting additional modeling restriction on these individual likelihood factors we can obtain a single fit of the joint distribution QW. This will in turn allow us to re-sample W. Thus, we would be able to obtain an estimate of each E[D0,i(Ois)D0,j(Ojs)] by utilizing the same plug-in approach as in the above described iid case. Note that by having a consistent estimate of E[D0,i(Ois)D0,j(Ojs)] one can also apply the same techniques from Appendix B to demonstrate that the corresponding empirical analogue estimate σN2 will be also consistent for σ02.

6.2 Ad-hoc upper bound for the variance

An alternative approach that avoids putting additional modeling restriction on the joint distribution of W is to consider various ad-hoc approaches of obtaining conservative estimates of the variance. As one of the examples, consider a plug-in estimate for σ02 that is based on the following estimate of D0,i:

DN,ic(Ois)=gˉNgˉN(Ais|Wis)YiQˉN(Ais,Wis)+aQˉN(ais(a,W),wis(W))g(a|W)ΨˉN,

where

ΨˉN=1Ni=1NΨi(QˉN,QW,N).

Note that the above i-specific estimates DN,i of D0,i are no longer guaranteed to have mean zero (i.e., they are not necessary properly centered), resulting in the following conservative estimate of the variance:

1Ni,jR(i,j)DN,ic(Ois)DN,jc(Oˉj).

Our simulations suggest that this estimator can be quite accurate when the number of friends |Fi| is relatively uniform, for i=1,,N, and that it becomes conservative for networks with skewed friend distributions (simulation results not shown). Finally, for more extreme cases, such as when the number of friends follows a power law distribution, the above plug-in variance estimator becomes overly conservative to the point of being non-informative. It is our conjecture that as the variability of individual Ψi,0 increases, this estimator should become more and more conservative. We now leave a more detailed analysis of this estimator as a topic of future research.

6.3 Inference when conditioning on the baseline covariates

Admittedly, the assumption of iid baseline covariates (W) might be too restrictive for many realistic network data generating scenarios. Similarly, the other suggested approaches will also require making specific modeling assumptions that are not necessarily supported by the data. We now propose an alternative approach for conducting inference by simply giving up on the marginal parameter of interest and performing inference conditionally on the observed baseline covariates W. This approach has been previously described in van der Laan [9], Section 8 and it results in a TMLE which is identical to the one we present in this paper. Moreover, this TMLE achieves asymptotic normality with an identifiable asymptotic variance. The asymptotic variance estimator is given by

σN2(W)=1Ni=1NDN,Yi2(Ois),

where

DN,Yi(Ois)=gˉNgˉN(Ais|Wis)YiQˉN(Ais,Wis).

Note that this conditional TMLE doesn’t require modeling the distribution of the baseline covariates and thus achieves the asymptotic normality under much weaker set of conditions for W (see van der Laan [9], Section 8). Thus, conducting conditional inference in a model with weakly dependent baseline covariates is a powerful alternative, especially when one is willing to accept the conditional interpretation of the resulting inference.

7 Simulation study

We performed a network simulation study evaluating the finite sample bias and variance of the TMLE presented in Section 4. We also evaluated the finite sample coverage of the confidence intervals described in Section 6. In addition to TMLE, we also used the Inverse Probability of Treatment Weighted (IPTW) estimator and the G-computation substitution estimator, where both of these estimators are defined below. All estimation was performed in R language [45] using a stand-alone package tmlenet [31]. The results are reported for networks consisting of N=500 and N=1,000 observations. The estimation was repeated by sampling 10,000 datasets. Due to computing time limitations, each unit in the network was allowed to be connected to at most two other units (at most two friends in Fi, for each i=1,,N). However, we note that since the same estimand would generally be obtained only once when using the actual observed data, one should be able to employ the tmlenetR package for estimation in more realistic network datasets where observed units may have much higher degrees of connectivity. The data generating distribution used in these simulations is described in more detail in Appendix A. Briefly, we first sampled N iid baseline covariates, W=(W1,,WN). For each unit i, we then generated Fi by first sampling its size, |Fi|, uniformly from {0,1,2}, followed by uniform sampling without replacement of Fi from the population of N1 units (all units, except i). The network-induced dependence among units was then simulated in the following manner. Each treatment Ai was sampled as a Bernoulli random variable, with its probability of success depending on the baseline covariate values of units in Fi{i}. Similarly, each outcome Yi was sampled as a Bernoulli random variable, with its probability of success depending on the baseline covariate values and treatments of units in Fi{i}. The probability of success for each Ai was defined as a logit-linear function of the 2-dimensional summary (Wi,jWj:jFi), given as:

P0(Ai=1|Wis)=expit(α0+α1Wi+α2jFiWj).

Similarly, the probability of success for each Yi was defined as a logit-linear function of the 4-dimensional summary (Wi,jWj,Ai,jAj:jFi), given as:

Qˉ0(Ais,Wis)=expit(β0+β1Ai+β2jFiAj+β1Wi+β2jFiWj).

In contrast, the estimation of the common-in-iQˉ0 and the mixture density gˉ0 was based on non-parametrically defined summary measures, i.e., we let Wis=(|Fi|,Wi,Wj:jFi) and Ais=(Ai,Aj:jFi), such that, for all i, |Wis|=4 and |Ais|=3. Whenever unit i had fewer than 2 friends |Fi|<2, the remainders of Wis and Ais were filled with zeros to ensure the same summary measure dimensionality across i. The common-in-i model QˉN for Qˉ0 was then estimated by fitting a logistic regression model to a pooled sample of N units, using covariates (Ais,Wis). The estimation of the conditional mixture density gˉ0(as|ws) proceeded as follows. First, for any (as,ws)(Aˉs×Wˉs), such that as{0,1}3 and as=(as(1),as(2),as(3)), we factorized P(Aˉs=as|Wˉs=ws) as:

P(Aˉs=as|Wˉs=ws)=P(Aˉs(1)=as(1),Aˉs(2)=as(2),Aˉs(3)=as(3)|Wˉs=ws)=P(Aˉs(1)=as(1)|Wˉs=ws)×P(Aˉs(2)=as(2)|Aˉs(1)=as(1),Wˉs=ws)×P(Aˉs(3)=as(3)|Aˉs(1)=as(1),Aˉs(2)=as(2),Wˉs=ws).

We then fit three separate logistic regression models, each estimating one of the factors in the above factorization, as if we were fitting common-in-i models using an iid sample of N observations (Ais,Wis). That is, the first factor P(Aˉs(1)=1|Wˉs) was fit as if we were estimating a common-in-i model P(Ais(1)=1|Wis) for N iid observations (Ai,Wis)i=1N (note that Ais(1)=Ai). Similarly, the second factor was fit as if we were estimating a common-in-i model for P(Ais(2)=1|Ai,Wis), and so on. The resulting three fits were then combined in order to obtain the estimate gˉN(as|ws) of gˉ0(as|ws). We estimated gˉ0 in a similar way, except that we first sampled a large dataset of observations (Ai,Wi) from g and QW,N, for i=1,,mN, then constructed the summary data Wis=(Wi,Wj:jFi), Ais=(Ai,Aj:jFi), and finally estimated gˉN by factorizing P(Aˉ=as|Wˉs=ws) into three factors and fitting three logistic regressions to a pooled sample (Ai,Wi) of mN observations.

The stochastic intervention g(A|W) was defined as a common-in-i intervention gp on each Ai, which assigned Ai=1 with some constant probability p, i.e., P(Ai=1)=p. Our target parameter was then defined as the sample-average of N outcomes under gp, where we use ψ0(gp) to denote the parameter’s true value. In our simulations we then estimated a discrete dose response curve {ψ0(gp)} for p[0,0.1,,0.9,1]. We also truncated the observation-specific weights gˉN(as|ws)/gˉN(as|ws) when their values exceeded 105. Finally, the confidence intervals for the TMLE were constructed based on variance estimator σN2 from Section 6. Lastly, we compared σN2 with an alternative asymptotic variance estimator σ˜N2 presented in van der Laan [9], which requires the consistency of QˉN and gˉN and is given by:

σ˜N2=1Ni=1NgˉNgˉN(Ais,Wis)(YiQˉN(Ais,Wis))2+1Ni1,i2=1NfW,i1fW,i2I(Fi1Fi2),

for

fW,i=aQˉN(ais(a),wis(W))gi(a|W)waQˉN(ais(a),wis(W))gi(a|w)QW,N(w).

7.1 IPTW (Horvitz-Thompson) estimator

The IPTW estimator is based on the TMLE weights gˉ0/gˉ0 from Section 4.4 and is defined as the weighted average of the observed outcomes Yi, weighted by gˉN/gˉN:

ψIPTW,N=1Ni=1NYigˉNgˉN(Ais|Wis),

where gˉN is an estimator of the conditional mixture gˉ0(G0,QW,0) defined in Section 3.1 and gˉN is an estimator of gˉ0(G,QW,0), also defined in Section 3.1. The estimators gˉN and gˉN are described in general in Section 4 and are also described above for the case of non-parametrically defined summary measures. We also conducted inference for ψIPTW,N by relying on the same ideas described in Section 6. That is, we used the iid-data influence curve IC(Pˉ0) of ψIPTW,N in a model that assumes gˉ0 and gˉ0 are known, characterizing the asymptotic variance of ψIPTW,N by the following limit:

σIPTW,02=limN1Ni,jR(i,j)EP0[IC(Pˉ0)(Oi)IC(Pˉ0)(Oj)],

with R(i,j)=1 when FiFj, and R(i,j)=0 otherwise. Replacing the unknown components of Pˉ0 in IC(Pˉ0) with corresponding estimates, we then obtained the following estimator σIPTW,N2 of σIPTW,02:

σIPTW,N2=1Ni,jR(i,j)(YigˉNgˉN(Ais|Wis)ψIPTW,i,N)(YjgˉNgˉN(Ajs|Wjs)ψIPTW,j,N),

for IC(Pˉ0)(Ois)=Yigˉ0/gˉ0(Ais|Wis)ψ0,i. We then constructed 95% CIs as ψIPTW,N±1.96σIPTW,N2/N.

7.2 G-computation estimator

The G-computation substitution estimator ψGCOMP,N=Ψ(QˉN,QW,N) for ψ0 is based on the un-targeted model QˉN for the common-in-i conditional expectation of Yi, as a function of the summary data (Ais,Wis). Given stochastic intervention g, the G-computation estimator is obtained as:

ψGCOMP,N=1Ni=1NaQˉN(ais(a,W),Wis)g(a|W)

where QW,N is a NPMLE that puts mass 1 on observed vector W. Evaluation of this estimator is equivalent to the Monte Carlo integration procedure described for the TMLE ψN in Section 4.4, except that we use the initial estimator QˉN for Qˉ0 instead of its targeted version QˉN. The asymptotic variance of ψGCOMP,N was not estimated and no CIs were constructed.

7.3 Results

Our simulations compared three different model specification scenarios for Qˉ0 and gˉ0/gˉ0: “(a) Q and gˉ/gˉ correct” indicates that the models for both estimators, Qˉ0 and gˉ0/gˉ0, were correctly specified; “(b) only gˉ/gˉ correct” indicates that the model for the estimator of Qˉ0 was misspecified, while the model for the estimator of gˉ0/gˉ0 was specified correctly; finally, “(c) only Q correct” indicates that the model for the estimator of Qˉ0 was specified correctly, while the model for the estimator of gˉ0/gˉ0 was misspecified. Figure 1 and Figure 2 present the simulation results for finite sample bias and empirical variance. Bias was plotted as the estimate minus the true parameter value (ψN(gp)ψ0(gp)), with different stochastic interventions gp presented on the x-axis as “% Treated”. Overall, our simulation results suggest that TMLE performs well in finite samples with dependent observations. We were able to demonstrate the double robustness property of TMLE, with it being unbiased in each of the three considered scenarios. Our results also indicate that the other two estimators are unbiased for scenario (a), but can perform poorly in alternative scenarios (b) and (c). Overall, we found the IPTW estimator to be the most variable and also most susceptible to near-positivity violations.

Figure 1 Empirical distributions for TMLE, IPTW and G-COMP, centered at the truth and estimated over 10,000 simulated data sets of size 500 (top row) and size 1,000 (bottom row) for scenario (a) - correctly specified Q$$Q$$ andgˉ∗/gˉ$$\bar{g}^{*}/\bar{g}$$. Colored ribbons mark the 2.5th to 97.5th percentile ranges of the estimands. The centered IPTW estimates outside the range of ±1$$\pm1$$ were removed.
Figure 1

Empirical distributions for TMLE, IPTW and G-COMP, centered at the truth and estimated over 10,000 simulated data sets of size 500 (top row) and size 1,000 (bottom row) for scenario (a) - correctly specified Q andgˉ/gˉ. Colored ribbons mark the 2.5th to 97.5th percentile ranges of the estimands. The centered IPTW estimates outside the range of ±1 were removed.

Figure 2 Empirical distributions for TMLE, IPTW and G-COMP, centered at the truth and estimated over 10,000 simulated data sets of size 500 (top row) and size 1,000 (bottom row) for scenarios: (b) - only gˉ∗/gˉ$$\bar{g}^{*}/\bar{g}$$ correctly specified; and (c) - only Q$$Q$$ correctly specified. Colored ribbons mark the 2.5th to 97.5th percentile ranges of the estimands. The centered IPTW estimates outside the range of ±1$$\pm1$$ were removed.
Figure 2

Empirical distributions for TMLE, IPTW and G-COMP, centered at the truth and estimated over 10,000 simulated data sets of size 500 (top row) and size 1,000 (bottom row) for scenarios: (b) - only gˉ/gˉ correctly specified; and (c) - only Q correctly specified. Colored ribbons mark the 2.5th to 97.5th percentile ranges of the estimands. The centered IPTW estimates outside the range of ±1 were removed.

The coverage results are presented in Figure 3, Figure 4 and Figure 5, where we plotted the 95% CI coverage for various asymptotic variance estimators, along with the mean CI length. We first compared the TMLE coverage of our proposed variance estimator, σN2, from Section 6 to the TMLE coverage based on the iid variance estimate σIID,N2 that made no adjustments for correlated observations, i.e., σIID,N2 is the EIC-based variance estimator that assumes data are iid. Our results in Figure 3 indicate that σIID,N2 tended to under-estimate the variance of TMLE, resulting in CIs that were too narrow for both sample sizes. We expect the coverage issues for σIID,N2 to become even more pronounced when the between-unit dependence increases, as may be the case in more realistic network scenarios with units having much higher degrees of connectivity.

Figure 3 Comparing the 95% CI coverage (top row) and length (bottom row) for the TMLE using two alternative variance estimates: σN2$$\sigma_{N}^{2}$$ - variance estimate that correctly adjusts for the dependence between observations; σIID,N2$$\sigma_{IID,N}^{2}$$- iid variance estimate that ignores the dependence between observations. The estimates are obtained from 10,000 simulated data sets of size 500 (‘Sim N500’) and size 1,000 (‘Sim N1000’). Scenarios: (a) - correctly specified Q$$Q$$ and gˉ∗/gˉ$$\bar{g}^{*}/\bar{g}$$; (b) - only gˉ∗/gˉ$$\bar{g}^{*}/\bar{g}$$ correctly specified; and (c) - only Q$$Q$$ correctly specified.
Figure 3

Comparing the 95% CI coverage (top row) and length (bottom row) for the TMLE using two alternative variance estimates: σN2 - variance estimate that correctly adjusts for the dependence between observations; σIID,N2- iid variance estimate that ignores the dependence between observations. The estimates are obtained from 10,000 simulated data sets of size 500 (‘Sim N500’) and size 1,000 (‘Sim N1000’). Scenarios: (a) - correctly specified Q and gˉ/gˉ; (b) - only gˉ/gˉ correctly specified; and (c) - only Q correctly specified.

In addition, the CIs for our dependent-data variance estimate σN2 become conservative when QˉN was misspecified. The latter result was expected based on the predictions from the semi-parametric efficiency theory for iid data. In Figure 4 we compared the coverage of IPTW with that of TMLE. Finally, we compared the TMLE coverage for our dependent-data variance estimate σN2 to the alternative asymptotic variance estimate σ˜N2 from van der Laan [9]. The simulation results of this comparison in Figure 5 show nearly identical coverage under correctly specified QˉN. However, when QˉN is misspecified, the two estimators behaved differently, with σ˜N2 showing slightly lower coverage for some sections of the estimated dose response curve. We also note that near positivity violations will generally increase the variability of our estimators. In particular, one would expect the near positivity violations to be more pronounced closer to the tail-ends of the discrete dose response curve {ψ0(gp)}, namely, for values of p close to 0 or 1. Indeed, this is also demonstrated in our simulations, where we noted increasing variability of all estimators closer to the edges of the estimated dose response curve, which also contributes to a small drop in coverage.

Figure 4 Comparing the 95% CI coverage (top row) and length (bottom row) for TMLE (variance estimate σN2$$\sigma_{N}^{2}$$) and IPTW (variance estimate σIPTW,N2$$\sigma_{IPTW,N}^{2}$$). The estimates are obtained from 10,000 simulated data sets of size 500 (‘Sim N500’) and size 1,000 (‘Sim N1000’). Scenarios: (a) - correctly specified Q$$Q$$ and gˉ∗/gˉ$$\bar{g}^{*}/\bar{g}$$; (b) - only gˉ∗/gˉ$$\bar{g}^{*}/\bar{g}$$ correctly specified; and (c) - only Q$$Q$$ correctly specified.
Figure 4

Comparing the 95% CI coverage (top row) and length (bottom row) for TMLE (variance estimate σN2) and IPTW (variance estimate σIPTW,N2). The estimates are obtained from 10,000 simulated data sets of size 500 (‘Sim N500’) and size 1,000 (‘Sim N1000’). Scenarios: (a) - correctly specified Q and gˉ/gˉ; (b) - only gˉ/gˉ correctly specified; and (c) - only Q correctly specified.

Figure 5 Comparing the 95% CI coverage (top row) and length (bottom row) of the two TMLE variance estimates, σN2$$\sigma_{N}^{2}$$ and σ˜N2$$\tilde{\sigma}_{N}^{2}$$, both of which adjust for the dependence between units, but the latter variance estimate assumes correctly specified QˉN$$\bar{Q}_{N}$$. The estimates are obtained from 10,000 simulated data sets of size 500 (‘Sim N500’) and size 1,000 (‘Sim N1000’). Scenarios: (a) - correctly specified Q$$Q$$ and gˉ∗/gˉ$$\bar{g}^{*}/\bar{g}$$; (b) - only gˉ∗/gˉ$$\bar{g}^{*}/\bar{g}$$ correctly specified; and (c) - only Q$$Q$$ correctly specified.
Figure 5

Comparing the 95% CI coverage (top row) and length (bottom row) of the two TMLE variance estimates, σN2 and σ˜N2, both of which adjust for the dependence between units, but the latter variance estimate assumes correctly specified QˉN. The estimates are obtained from 10,000 simulated data sets of size 500 (‘Sim N500’) and size 1,000 (‘Sim N1000’). Scenarios: (a) - correctly specified Q and gˉ/gˉ; (b) - only gˉ/gˉ correctly specified; and (c) - only Q correctly specified.

8 Intervening on groups of friends and intervening on network

8.1 Estimation for an arbitrary collection of stochastic interventions

We now show that the estimation framework presented thus far can be easily adapted to the estimation of the sample-average treatment effects for an arbitrary collection of i-specific stochastic interventions gFi, where each gFi may intervene on the treatments of i’s friends in addition to intervening on the treatment of i itself. A collection of such interventions involving all units, {gFi:i=1,,N}, generally cannot be described by a single intervention g on A=(A1,,AN) given W=(W1,,WN). For example, consider the problem of estimating the direct average treatment effect in a network of N connected individuals, where we define each gFi by intervening on a unit-specific treatment, Ai, setting it to a constant (zero or one), but leave unchanged the distribution of Aj for i’s friends jFi intact. That is, we assume the intervention for each Aj is stochastically sampled from its observed distribution G0(Aj|Wjs) or instead deterministically set to its observed value aj. This type of direct effect parameter has been previously explored in spillover studies, for example, in the study of the effects of deworming among rural Kenyan primary schools by Miguel and Kremer [46] and in its replication study by Davey et al. [47]. We are interested in estimation of the following target parameter,

ψ0=Ψ(P0N)=1Ni=1NEq0,gFiYigFi,

defined as the average of expectations of YigFi, where each outcome YigFi is generated under the i-specific post-intervention distribution that replaces the observed treatment allocation for i and jFi with gFi as just described. Clearly the collection of such i-specific interventions across all N units is incompatible with a single joint stochastic intervention g on A given W, since the intervention gFj for ji and jFi requires setting Aj to a constant one or zero, while the intervention gFi requires that Aj is randomly sampled from g0 or is set to aj. Nonetheless, this target parameter ψ0 remains well-defined with respect to a collection {gFi:i=1,,N}, and we may apply the same arguments as in Section 3.1, noting that ψ0 can be equivalently written as:

ψ0=1Ni=1Na,wQˉ0(ais(a,w),wis(w))gFi(a|w)qW,0(w)dμ(a,w)=1Ni=1Nas,wsQˉ0(as,ws)hi,0(as,ws)dμ(as,ws)=as,wsQˉ0(as,ws)hˉ0(as,ws)dμ(as,ws),

where hi,0(gFi,qW,0) is the density determined by gFi(a|w), qW,0 and the i-specific summary measures ais(a,w),wis(w), and hˉ0 is a mixture of hi,0, defined as hˉ0(as,ws):=1/Ni=1Nhi,0(as,ws). We also note that when (W,A) are discrete, one obtains:

hi,0(as,ws)=a,wI(ais(a,w)=as,wis(w)=ws)gFi(a|w)qW,0(w)dμa,w(a,w).

Thus, this new target parameter Ψ(P0N) can still be represented by an equivalent mixture mapping Ψˉ(Pˉ0) from Theorem 3.1 and hence, the efficient influence curve of this new Ψ(P0N) is given by the same Dˉ from Theorem 3.3. In the above, we also assumed that such i-specific densities hi,0(as,ws) are well-defined with respect to some common dominating measure μa,w, with hˉ0 being factorized as hˉ0(as,ws)=gˉ0(as|ws)qˉW,0(ws), which provides the definition of gˉ0. Hence, the TMLE framework from Section 4 can be directly applied to estimation of these types of parameters, with the only modification that we now require that each i-specific summary Ais is sampled conditionally from its i-specific intervention gFi on A given W. In particular, we use Lemma 4.1 to obtain a reasonable approximation for gˉ0 by modifying it’s direct estimator from Section 4.3 in the following manner: First, obtain a large simulated dataset of i-specific summaries (Ais,Wis), for i=1,,N, where each Ais is derived by sampling (AgFi,W) from (gFi,QW,N) and then applying the summary measure mapping ais(AgFi,W). Next, fit an estimator gˉN of gˉ0 by treating the simulated sample (Ais,Wis) as iid, applying the same maximum likelihood-based approach as earlier. Similarly, the Monte Carlo evaluation step for the TMLE ψN from Section 4.4 is modified to take into account these i-specific interventions gFi. That is, instead of sampling (A,W) from g, each Monte Carlo iteration j now consists of sampling AjgFi=(Aj,1gFi,,Aj,NgFi) from gFi(a|W), for each i=1,,N, conditional on the observed W, with a resulting dataset ofN summary observations (Aj,is,Wis) constructed by applying the i-specific mappings Aj,is:=ais(AjgFi,W), for i=1,,N.

8.2 Estimation for interventions on the network structure

Overview. Observed network data

We note that our framework can be also applied to estimation of the effect of intervening on the network structure. Suppose that we observe at baseline some initial network F0=(F10,,FN0) for the community of N connected units and we are treating F0 as fixed. As before, we are assuming that the maximum number of friends for each unit (i.e., the dimensionality of each Fi0) is bounded by some constant K that doesn’t depend on N. We then collect data on N baseline covariates W=(W1,,WN), followed by a random draw of another network profile F=(F1,,FN) and the outcomes Y=(Y1,,YN), where each Fi is now based on the initial network Fi0. Thus, the observed data on N connected units is given by O=(F0,W,F,Y) and we assume the exposure for each unit i is given by the set Fi. Since we are interested in interventions which will modify the observed network profile F (e.g., adding or removing some friends in each Fi) it is natural to allow Fi to be random, but driven by i’s own covariates and the covariates of i’s friends from Fi0. Thus, we assume that the i-specific conditional distribution GFi,0 for Fi given (F0,W) only depends on the initial network offset Fi0 and the baseline covariates (Wi,Wj:jFi0). Furthermore, we assume its conditional density gFi,0(Fi|F0,W) is well-defined. We also assume that QY,0, the common-in-i conditional distribution of Yi given W, depends only on (Wi,Wj:jFi), i.e., units j from this newly drawn friend set Fi.

Network interventions and target parameter

We follow the framework outlined in Section 2 and define the intervention on a network profile F as a user-supplied density g(F|W) that replaces the observed conditional density g0(F|W), where we also assumed that the initial network offset F0 is included in W. Alternatively, we could also follow Section Section 8.1 and define our intervention as a collection of the user-supplied i-specific densities {gFi:i=1,,N}, where each gFi(Fi|W) replaces the true i-specific density gFi,0(Fi|W). As noted in Section 8.1, a collection of such i-specific stochastic interventions generally cannot be described by a single multivariate intervention g on F=(F1,,FN) given W=(W1,,WN) and may result in an incompatible network intervention. Nonetheless, these are still well-defined interventions and we note that the target parameter indexed by such i-specific interventions gFi(Fi|W) is still well-defined. We also note that the types of interventions we will consider will generally use the current network sets Fi as inputs, to produce intervened network sets Fi. Therefore, we are concerned here with stochastic interventions which depend on the current sets Fi. Even if the intervention itself is a deterministic function of Fi (e.g., always remove the first friend), it is still stochastic by the nature of its dependence on Fi.

As a motivating example, consider an intervention defined for i=1,,N that removes certain friends jFi of each unit i when δ(Wj)r, where r is some pre-defined cutoff value and δ(Wj) is some user-defined function mapping Wj in R (e.g., δ(Wj) characterizes the baseline “risk-profile” of j). This intervention defines the post-intervention distribution that replaces QY,0(Wj:jFi), i.e., the observed conditional distribution of Yi given W, with a new distribution QY,0(Wj:jFi), where Fi is the intervened friend set such that the unit kFi only if kFi and δ(Wk)<r. We now define our statistical parameter as the sample-average of the expected outcomes under the i-specific post-intervention distributions that replace each observed network allocations gFi,0 with such intervention densities gFi:

Ψ(P0N)=1Ni=1NEq0,gFiYigFi.
Statistical model and dimension reduction assumptions

We have described how the interventions on the networks sets Fi fit within our previously outlined framework in Section 2, where Fi defines the exposure for each unit i. Our next step is concerned with dimensionality reduction, where we define the appropriate summary measures (Wis,Ais), for i=1,,N, and then model conditional distribution of Yi as a fixed-dimensional function of (Wi,Ais). In order to be able to intervene on Fi we have to assume that the conditional distribution QY,0 depends on all W through an N dimensional set (WjI(jFi):j=1,,N). That is, we have so far assumed that QY,0 is a function of (Wi,Wj:jFi). Replacing Fi with the intervened friend set Fi implies that QY,0 becomes a function of (Wi,Wj:jFi). In that sense, Yi depends on all W, except that for most j, we have I(jFi)=0 and hence Wj makes no real contribution to QY,0, unless jFi. In summary, we change the dependence of Yi on particular Wj in W by changing the composition of the set Fi and we only intervene on the way Yi may depend on a particular Wj through indicators I(jFi), for j=1,,N. This also implies that the conditional distribution of the outcome Yi is now truly a function of the entire N dimensional set W=(W1,,WN), making estimation of QY,0 particularly challenging. Thus, we first need to make additional simplifying assumptions which would allow us to estimate QY,0.

For convenience, we now assume that iFi0, iFi and iFi, for all i=1,,N (i.e., i is always connected to itself). Assume that the i’s network draw Fi is always a finite dimensional augmentation of the initial network offset Fi0. That is, the set of possible realizations of Fi is restricted to be within some close proximity of Fi0. We note that the network profile F can be viewed as an N×N adjacency matrix of indicators and our assumptions imply that rather then allowing N×N to be any possible realization of an F adjacency matrix, we restrict F0 to finite-dimensional perturbations of matrix F, allowing F0 to only change locally as a function of W and W. For example, rather than allowing Fi to draw any new friend j{1,,N}i, one may assume that Fi is restricted to drawing a new friend j only when j is in a set Fi0+:=j{Fii}Fj0. In this case we are assuming that in the new network realization Fi can only add friend j if i and j had at least one friend in common at baseline (i.e., there was at most 2nd degree of connectivity between i and j). Such an assumption implies that Fi is no longer of dimension N, but is rather of a fixed dimension that only depends on K. The set of possible network interventions on Fi, namely, each intervened Fi, is now similarly restricted to realizations of the same finite-dimensional set Fi0+.

Having defined Fi as a realization of the finite dimensional set Fi+, we define the unit’s exposure Ai, for i=1,,N, as a finite-dimensional set of indicators (I(jFi):jFi0+), namely, each Ai is a binary vector of the common-in-i dimension K+, with each non-zero entry in Ai denotes which units in Fi0+ are actual friends of i. The i-specific network intervention Ai can be defined by directly intervening on the indicators I(jFi) in Ai. We define the baseline summary measure Wis:=wis(W) as a finite dimensional set (Wj:jFi0+), for i=1,,N. Additionally, we define the exposure summary measure ais(Ai,Wis) that depends on (Ai,Wis) only as a function of the set (WjI(jFi):jFi0+), i.e.,

Ais:=ais(Ai,Wis)=ais(WjI(jFi):jFi0+),

for i=1,,N, and we assume each ais() maps into Rd, for a common-in-i dimension d. Next, we model the conditional probability of the outcome Yi as a common-in-i function QY,0 that depends on (A,W) only as a function of the summary Ais, for i=1,,N. Note that the crucial assumption that ais() is a function of (WjI(jFi):jFi0+) allows us to take full advantage of the dimensionality reduction due to ais() and still define the target parameters which actually correspond to the effects of intervening on the observed network realization F. We may also assume that Yi depends on some common-in-i summary w˜s(Wj:jFi0+)=w˜s(Wis) that is unrelated to the new network draw Fi, and hence would not be affected by an intervention Fi on Fi. Note that such modeling restrictions on Fi now make it possible to estimate our parameter Ψ(P0N).

With a slight abuse of notation, we use gFi,0(Ai|W) to denote the conditional density of Ai given W, and similarly, we use gFi(Ai|W) to denote the i-specific stochastic intervention on Ai, implied by Fi. We also note that due to our modeling assumptions for the conditional distribution of Fi given (F0,W), we have that gFi,0(Ai|W)=gFi,0(Ai|Wis) and gFi(Ai|W)=gFi(Ai|Wis), which also leads to the following representation of our target parameter:

Ψ(P0N)=1Ni=1NEQWis,0EgFiQˉ0(ais(Ai,Wis),w˜s(Wis))|Wis=1Ni=1Nai,wisQˉ0(ais(ai,wis),w˜s(wis))gFi(ai|wis)qWis,0(wis)dμ(ai,wis)=1Ni=1Nas,wsQˉ0(as,w˜s(ws))hi,0(as,ws)dμ(a,ws)=a,wsQˉ0(as,w˜s(ws))hˉ0(as,ws)dμ(as,ws)=a,wsQˉ0(as,w˜s(ws))gˉ0(as|ws)qˉW,0(ws)dμ(as,ws)=EqˉW,0Egˉ0Qˉ0(Aˉs,w˜s(Wˉs))|Wˉs.

This shows that the new target parameter Ψ(P0N) can be represented by an equivalent mixture mapping Ψˉ(Pˉ0) from Theorem 3.1. Consequently, the efficient influence curve of this new Ψ(P0N) is given by the same Dˉ from Theorem 3.3. It follows that the estimation procedure described in Section 4 remains unchanged when estimating this new parameter Ψ(P0N). As before, we also assume that the i-specific densities hˉ0(as,ws) are well-defined with respect to some common dominating measure, and that hˉ0 can be factorized as hˉ0(as,ws)=gˉ0(as|ws)qˉW(ws).

As we describe in an example below, given a particular context, one might assume that the summary measures Ais are of lower dimensionality than the identity mapping (WjI(jFi):jFi0+). For instance, one may be able to define some low-dimensional summaries of (I(jFi)Wj:jFi0+), which incorporate various features of unit i’s network and the covariates of i’s friends. One then has to assume that the conditional outcome model QY,0 for each Yi depends only on such low-dimensional features. Furthermore, if aiY() and w˜s() depend only on some subset of (Wj:jFi0+), then Wis can be also redefined as a summary of lower dimension that only includes the subset of (Wj:jFi0+) or the specific features of this subset. The direct estimation of the conditional mixtures gˉ0 and gˉ0 is then performed conditional on this lower-dimensional summary Wis.

Example

Suppose that we gather some initial data on a network of sexual partners (F0=F10,,FN0) in a community-based observational study of HIV risk factors. For each unit i, we measure the baseline covariates, Wi, which may include baseline HIV infection status (Hi) along with various risk factors. We also assume that after some period of time the network of sexual partners on each unit was measured again. This new network realization for unit i is denoted as set Fi and it defines our exposure of interest. Suppose that several months later data was collected on the binary outcome Yi which indicated whether subject i contracted HIV during the follow-up period. Our scientific question of interest is to determine the expected incidence of HIV under a hypothetical intervention on the network of sexual partners of each unit i. We note that our dimension reduction assumptions imply that the set of possible realizations of i’s partners in a new network draw Fi is restricted to units who were already connected to i at baseline through a common partner, the set we previously denoted as Fi0+. The exposure Ai is defined as a vector of indicators I(jFi) based on the set of all possible partner realizations jFi0+. We now assume that QY,0, the conditional probability of unit i contracting HIV, depends on i’s baseline covariates (Wi) and the total number of i’s current partners (nFi:=j=1K+Ai(j)). We also assume that QY,0 depends on the covariates of i’s current partners in Ai only as a function of a lower-dimensional summary measure. For example, we suppose that the risk of contracting HIV for unit i also depends on the proportion of the total number of i’s partners who had HIV at baseline (pHi:=1/nFijFiHj), as well as the proportion of i’s partners with high-risk as determined by δ(Wj)>r1, i.e, the summary pRi=1/nFijFiI(δ(Wj)>r), where δ() maps the covariates in Wj into a real line. We now suppose that the exposure summary is defined as Ais:=(nFi,pHi,pRi), and the baseline summary is defined as Wis:=(Wi,δ(Wj):jFi0+). We also consider individual interventions gFi on Fi that replace the observed partners of i in Fi with another set of partners Fi. For example, we consider the i-specific intervention gFi that decreases the total number of i’s partners by stochastically removing some jFi, where this probability of removing a partner jFi can be a function of j’s baseline risk-profile δ(Wj). Such an intervention gFi implies a new exposure set Ai and an intervention-specific summary Ais:=(nFi,pHi,pRi), where nFi:=j=1K+Ai(j), pHi:=1/nFijFiHj and pRi:=1/nFijFiI(δ(Wj)>r). One can now directly apply the TMLE framework from Section 4 to estimate the sample-average of the expected outcomes Yi under such network interventions gFi, for i=1,,N. In particular, we first need to estimate the conditional mixtures gˉ0(Ais|Wis) and gˉ0(Ai|Wis) using the direct estimation approach, which then allows us to evaluate the clever covariate based on N predictions gˉN/gˉN(Ais|Wis). We then proceed to fit a common-in-i initial model QˉN for the regression of Yi given (Ais,Wi), followed by the TMLE update QˉN for QˉN.

9 Discussion

In this paper we describe a practical application of the TMLE framework towards the goal of estimation of the sample-average treatment effects among possibly dependent units. Our first objective was to assume a realistic semi-parametric data generating model, which reflected the types of between-unit dependence one may encounter in real-life observational network study, for example, when the study units are connected via a social or geographical network. Our approach included a number of statistical assumptions, such as, the assumption of a certain conditional independence of outcomes and the assumption of fixed-dimension summary measures, which allowed us to perform estimation and inference under potential network dependence. Having defined our semi-parametric statistical model M, we also defined our target of estimation Ψ(PN) as a mapping from the joint distribution PN on N connected units, for PNM. We then showed in Section 3.1 that this target parameter depends on the joint distribution of the data only as a function of the mixture distribution Pˉ, where Pˉ was given as a mixture of the unit-specific components Pi of PN. Such mixture mapping representation implied that our estimation problem was reduced to the problem of estimating the relevant factors of the mixture Pˉ. We have argued that such dependent-data parameters can then be estimated by simply ignoring the dependence between connected observations (e.g., Lemma 4.1 ), suggesting that an entire class of the iid data estimators, such as the iid TMLE algorithm we described in this paper, may be applied to estimation problems with potential network dependence. That is, we presented the dependent-data TMLE from van der Laan [9] as an iid data algorithm, with an unusual caveat that our stochastic intervention of interest gˉ0 is dependent on the true distribution of the observed data. We also provided a simple estimator of the true asymptotic variance of the TMLE that was shown to be consistent for the model with independent baseline covariates. This variance estimator takes into account the known network structure, making adjustments for correlated EIC contributions of connected units. We also discussed the difficulty and proposed a few ad-hoc approaches for estimating the variance in a model with weakly dependent baseline covariates. Finally, we described an alternative approach for inference that conditions on the baseline covariates - an approach that results in a consistent variance estimate even for a model with weakly dependent baseline covariates. We also assessed the validity of our proposed inferential framework with a finite sample network simulation study. In particular, the finite sample performance of our proposed TMLE was compared to the parametric G-computation estimator and the IPTW estimator. Lastly, we assessed the finite sample coverage of our estimated asymptotic confidence intervals, e.g., comparing our dependent-data inference to inference that completely ignores the dependence among units. We observed that the latter approach to inference resulted in a coverage that was consistently below the nominal 95%. We also expect this coverage to become increasingly worse as one moves towards more realistic network scenarios characterized by denser networks and higher levels of between-unit dependence. However, we leave this topic to be explored in future simulation studies.

We extended the dependent-data TMLE framework first described in van der Laan [9] towards the estimation of a much larger class of parameters, such as the direct effect under interference. We have also shown that our framework can be extended to define interventions on the network itself. In particular, in Section 8.2 we described how one can estimate the post-intervention outcomes for interventions that statically or stochastically modify the initial network structure F0. Furthermore, we no longer require complete independence of the baseline covariates for conducting valid statistical inference for our TMLE. Finally, we believe our work provides an important proof of concept, demonstrating that estimation and valid statistical inference for dependent data collected from a single network are possible in this large class of semi-parametric models.

We note that the TMLE update QˉN presented in this paper differs slightly from van der Laan [9] in terms of its suggested parametric submodel fluctuation, {QˉN(ε):ε}, with the latter TMLE update being based on the following parametric submodel: LogitQˉN(ε)=LogitQˉN+εgˉ0/gˉ0. Both of these fluctuations result in TMLEs with equivalent asymptotic properties, as both updates solve the same empirical score equation. However, the two may differ in their finite sample properties. In particular, the TMLE we present here may be less sensitive to practical positivity violations, while providing similar bias reduction as the TMLE from van der Laan [9]. We also note that the TMLE update presented here may be less computationally intensive, since it only requires N evaluations of the clever covariategˉN/gˉN(Ais|Wis), for i=1,,N. The TMLE algorithm proposed in van der Laan [9] may require computing gˉN/gˉN(ais|Wis) for every ais in the support of Ais, for i=1,,N (i.e., all ais such that gis(ais|Wis)>0) due to its specific parametric submodel update.

We now note a few possible directions for future research. First, additional simulation studies should explore the performance of our TMLE in more complex networks, such as, networks generated from the preferential attachment model with power law node degree distribution [48] or networks generated under the small world model [49]. Second, it would be of interest to study how our proposed framework may be applied to estimate the change in the observed sample-average outcome when an intervention is applied to another community with a different network structure and different distribution of baseline covariates, a notion known as transportability [50, 51, 52]. Third, it is of scientific interest to explore how our estimation framework can be extended to real-world problems in which data on only a subsample of the full network is available. Moreover, the network structure on the observed units themselves is frequently not fully known, in which case it may be necessary to incorporate the uncertainty introduced by inferring the network structure from the observed data [53]. Finally, future work will investigate the estimation of causal parameters in longitudinal settings where the effect of a single time point intervention can propagate over time through the network, as is typically the case when one describes contagion in social networks [54, 55].

Funding statement: This research was supported by NIH grant R01 AI074345-07.

Acknowledgments

The authors would like to thank Elizabeth (Betsy) Ogburn for helpful discussions.

A Simulation Study. Data generating distribution

We implemented a simulation with observed data consisting of N dependent units O=(F,W,A,Y), where F=(F1,,FN) is a vector of friends for each unit, W=(W1,,WN) is a vector of baseline covariates,A=(A1,,AN) is a vector of binary treatments and Y=(Y1,,YN) is a vector of binary outcomes. The data for each unit i=1,,N is generated as,

WiBer0.35|Fi|U0,1,2Fi||Fi|Sample|Fi|1,,NiAi|W,FiBerg0Wi,Wj:jFiYi|A,W,FiBerQˉ0Ai,Wi,Aj,Wj:jFi,

where Fi{1,,n} is a set of of unit indices of size |Fi| randomly sampled without replacement, Ai is generated conditionally on the entire vector of baseline covariates W and Yi is generated conditionally on W and all treatment assignments A.

We generate A with g0 that depends on unit and unit’s friends’ baseline covariates,

gˉ0Wi,Wj:jFi=expit1.2+1.5Wi+jFi0.6Wj,

with Qˉ0 defined as

Qˉ0Ai,Wi,Aj,Wj:jFi=expit2.5+1.5Wi+0.5Ai+jFi1.5Wj+jFi1.5Aj.

B Consistency of the asymptotic variance estimator

Theorem B.1.

We make the following assumptions:

Uniform convergence: Assume that as N we have supas,ws|QˉN(as,ws)Qˉ(as,ws)|P0, supas,ws|gˉN(as,ws)gˉ(as,ws)|P0, supas,ws|gˉN(as,ws)gˉ(as,ws)|P0 and supwW|asQˉN(as,wis(w))gi(as|w)asQˉ(as,ws(w))gi(as|w)|P0, for each i=1,,N.

L2-norm convergence of the i-specific target parameters: Assume that dL2(Ψi,N,Ψi)0, for each i=1,,N;

Positivity: Assume with probability 1 that sup(as,ws)(As,Ws)gˉgˉ(as,ws)<. Similarly, assume that sup(as,ws)(As,Ws)gˉNgˉN(as,ws)< with probability 1.

Boundedness of the asymptotic variance: Assume that the asymptotic variance of the TMLE σ2 converges to c<;

Boundedness of the outcomes: Assume that maxi|Yi|<;

Universal bound on connectivity between units: Assume that sup|Fi|<K, for all i=1,,N, where K> is a universal constant that doesn’t depend on N; and

Symmetry of friend relationships: Assume that for any unit i, if another unit j is in i’s friend set Fi, i.e., jFi, it must follow that iFj.

Then the following plug-in estimator

σN2:=1Ni=1Nj=1NR(i,j)DN,i(O)DN,j(O),

is consistent for

σ2:=limN1Ni=1Nj=1NR(i,j)EDi(O)Dj(O),

where

DN,i(O)=gˉNgˉN(Ais|Wis)YiQˉN(Ais,Wis)+asQˉN(as,Wis)gi(as|W)Ψi,N(QˉN,QW,N),
Di(O)=gˉgˉ(Ais|Wis)YiQˉ(Ais,Wis)+asQˉ(as,Wis)gi(as|W)Ψi(Qˉ,QW),
Ψi(Qˉ,QW)=waQˉ(ais(a,w),wis(w))g(a|w)dQW(w),
ΨN,i(QˉN,QW,N)=waQˉN(ais(a,w),wis(w))g(a|w)dQW,N(w)

and

asQˉN(as,Wis)gi(as|W)=aQˉN(ais(a,W),wis(W))g(a|W).

Proof

Part 1. Define

σ2(N):=1Ni=1Nj=1NR(i,j)Di(O)Dj(O),

where Di:=Di(gˉ,gˉ,Qˉ,Ψi). We first prove that σN2=σ2(N)+op(1) or equivalently that

σN2σ2(N)=1Ni,jR(i,j)Di,NDj,N1Ni,jR(i,j)DiDj=1Ni,jR(i,j)(Di,NDi)Dj+Di,N(Dj,NDj)=oP(1).

The above expression implies that our claim would follow if we were to show that

1Ni,jR(i,j)Di,N(Dj,NDj)=oP(1)and1Ni,jR(i,j)(Di,NDi)Dj=oP(1).

We already assumed that maxi|Di,N|<M with probability 1, for some constant M>. Thus we have:

1Ni,jR(i,j)Di,N(Dj,NDj)1Ni,jR(i,j)maxi|Di,N|maxj|Dj,NDj|=1Ni=1Nmaxi|Di,N|j=1NR(i,j)maxj|Dj,NDj|=maxi{1,,N}|Di,N|j=1NR(i,j)maxj{1,,N}|Dj,NDj|K2Mmaxj{1,,N}|Dj,NDj|.

Define eN:=maxj|Dj,NDj|. The sum above is bounded by eNK2M2 and showing that eNP0 would prove that the first and second sums both go to zero with probability 1. Note that

maxj{1,,N}|Dj,N(O)Dj(O)|maxj{1,,N}supoO|Dj,N(o)Dj(o)|,

Thus showing that supoO|Dj,N(o)Dj(o)|P0, for any j=1,,N, would prove our claim. This implies that we need to show that

sup(y,as,ws)(Y,As,Ws)gˉNgˉN(as,ws)yQˉN(as,ws)gˉgˉ(as,ws)yQˉ(as,ws)P0,

and

supwWasQˉN(as,ws(w))gi(as|w)ΨN,iasQˉ(as,ws(w))gi(as|w)ΨiP0,

where (Y,As,Ws) is the common support of (Yi,Ais,Wis), across all i. Note that

sup(y,as,ws)ygˉN(as,ws)gˉN(as,ws)ygˉ(as,ws)gˉ(as,ws)gˉN(as,ws)gˉN(as,ws)QˉN(as,ws)+gˉ(as,ws)gˉ(as,ws)Qˉ(as,ws)supyY|Y|sup(as,ws)gˉN(as,ws)gˉN(as,ws)gˉ(as,ws)gˉ(as,ws)+sup(as,ws)gˉN(as,ws)gˉN(as,ws)QˉN(as,ws)gˉ(as,ws)gˉ(as,ws)Qˉ(as,ws).

By assumption, supyY|Y|=cY>. It now follows that

cYsup(as,ws)gˉN(as,ws)gˉN(as,ws)gˉ(as,ws)gˉ(as,ws)=cYsup(as,ws)gˉN(as,ws)gˉ(as,ws)1gˉ(as,ws)+gˉN(as,ws)1gˉN(as,ws)1gˉ(as,ws)P0,

where we used that sup(as,ws)1/gˉ(as,ws)> and sup(as,ws)1/gˉN(as,ws)> along with the uniform convergence of gˉN, and gˉN. Similarly, it follows for the second term that

sup(as,ws)gˉN(as,ws)gˉN(as,ws)gˉ(as,ws)gˉ(as,ws)Qˉ(as,ws)+gˉN(as,ws)gˉN(as,ws)QˉN(as,ws)Qˉ(as,ws)sup(as,ws)|Qˉ0(as,ws)|gˉN(as,ws)gˉN(as,ws)gˉ(as,ws)gˉ(as,ws)+supgˉN(as,ws)gˉN(as,ws)QˉN(as,ws)Qˉ(as,ws)P0,

where we used the uniform convergence of QˉN, gˉN, and gˉN. We also note that

supwWasQˉN(as,ws(w))gi(as|w)ΨN,iasQˉ(as,ws(w))gi(as|w)ΨiP0

directly follows, which finally proves that eN=maxj{1,,N}|Dj,NDj|P0 and hence that

σN2=σ2(N)+op(1).

Part 2. To show consistency of σ2(N) we show that: (1)E(σ2(N))σ2) and (2)Var(σ2(N))0, followed by the application of Chebyshev’s inequality. Note that (1) follows trivially from the definition of σ2. To show that the variance of the above plug-in estimator σN2 goes to zero we use a similar argument to the one employed in the second half of the proof of Proposition 1 in [56]. Consider the variance of σ2(N):

Var(σ2(N))=1N2Vari,jR(i,j)Di(O)Dj(O)=1N2i,j,k,lCovR(i,j)Di(O)Dj(O),R(k,l)Dk(O)Dl(O)=1N2i,j,k,lR(i,j)R(k,l)CovDi(O)Dj(O),Dk(O)Dl(O).

Note that the terms in the above sum are nonzero only when R(i,j)=R(k,l)=1. Now consider now the individual terms of this sum:

ξijkl=CovDi(O)Dj(O),Dk(O)Dl(O)=EDi(O)Dj(O)Dk(O)Dl(O)EDi(O)Dj(O)EDk(O)Dl(O).

Note that E[Di(O)]=0, for any i=1,,N. Our task is to show that the number of nonzero terms ξijkl in the sum over (i,j,k.l) is O(N), where ξijkl0 only when either of the following four conditions holds: (1)R(i,k)=1 or (2)R(j,k)=1 or (3)R(i,l)=1 or (4)R(j,l)=1. Note that we already assumed that the maximum number of friends (maxi=1,,N|Fi|) is bounded by some universal constant K>, where K does not depend on N.

We proceed to sum over indices (i,j,k,l), counting the possible number of nonzero terms in each sum. For a sum over i there are O(N) nonzero terms. For each fixed i, the sum over j=1,,N can have O(K2) nonzero terms since its constrained by condition R(i,j)=1. Note that for each fixed i there are at most K2 terms in j that may satisfy R(i,j)=1. Next, fix (i,j) and sum over (k,l). Note that these summands will be nonzero only if conditions (1) or (2) or (3) or (4) are satisfied and if R(k,l)=1. First, consider the case when R(i,k)=1 or R(j,k)=1. The sum over k then must net O(2K2) nonzero elements. However, the sum over l is still under the constraint R(k,l)=1, implying that the sum over k and l is O(2K4), with total sum over (i,j,k,l) being O(2K6N). One proceeds similarly for cases with R(i,l)=1 or R(j,l)=1, summing first over l for fixed (i,j) and then over k, also resulting in O(2K6N) terms. Thus, the total number of nonzero terms is O(4K6N) and the claim follows.

C Notation index

O=(W,A,Y): the observed data on N units

W=(W1,,WN): the observed baseline covariates on N dependent units

F=(F1,,FN): the observed network on N units (“network profile”)

A=(A1,,AN): the observed exposures on N dependent units

A=(A1,,A): intervened exposures sampled under conditional distribution G, given W

Y=(Y1,,YN): the observed outcomes on N connected units

Wis=wis(W), Ais=ais(A,W): fixed-dimension summary measures, dimensionality is the same across alli.

Os=(Ws,As,Y): observed summary data, where Ws=(W1s,,WNs), As=(A1s,,ANs), for i=1,,N

Os=(Ws,As,Y): summary data sampled from post-intervention distribution under stochastic interventiong, with As=(A1s,,ANs) and Y=(Y1,,YN)

P0N: true joint distribution of the observed data O

p0N: true joint density of the observed data O

QW,0: true joint distribution of N observed baseline covariates W

qW,0: true joint density of N observed baseline covariates W

G0(A|W): joint conditional distribution for the observed exposures A, given baseline covariates W

g0(A|W): joint conditional density for the observed exposures A, given baseline covariates W

G0(Ai|Wis): common-in-i conditional distribution for the observed exposure Ai, given the i-specific summary measure of the baseline covariates

g0(Ai|Wis): common-in-i conditional density for the observed exposure Ai, given the i-specific summary measure of the baseline covariates

G(A|W): user-specified distribution of the intervened network exposure vector A=(A1,,AN), conditional on baseline covariates W=(W1,,WN)

g(A|W): density for the user-specified stochastic intervention of the intervened exposure vector A=(A1,,AN), conditional on all baseline covariates W=(W1,,WN)

Qˉ0(as,ws): conditional expectation of the outcome, defined as EP0N[Yi|Ais=as,Wis=ws], assumed common across i when conditioned on the same fixed summary measure valuesas,ws

Hˉi,0(Ais,Wis): i-specific summary measure distribution for (Ais,Wis)

hi,0(Ais,Wis): i-specific density for the distributionHˉi,0(Ais,Wis)

Hi,0(Ais,Wis): i-specific summary measure distribution for (Ais,Wis)

hi,0(Ais,Wis): i-specific density for the distributionHˉi,0(Ais,Wis)

hˉ0(Aˉs,Wˉs): mixture density 1/Nihi,0 determined by g0 and QW,0, with (Aˉs,Wˉs) being a random variable sampled from hˉ0

hˉ0(Aˉs,Wˉs): mixture density 1/Nihi,0 determined by g and QW,0, with (Aˉs,Wˉs) being a random variable sampled from hˉ0

gˉ0(Aˉs|Wˉs): the conditional mixture density implied by factorization hˉ0(Aˉs,Wˉs)=gˉ0(Aˉs|Wˉs)QˉWs,0(Wˉs)

gˉ0(Aˉs:|:barWs): the conditional mixture density implied by factorization hˉ(Aˉs,Wˉs)=gˉ0(Aˉs:|:barWs)QˉWs,0(Wˉs).

References

1. Neyman J. Sur les applications de la theorie des probabilites aux experiences agricoles: Essai des principes (In Polish). English translation by DM Dabrowska and TP Speed (1990). Stat Sci 1923;5:465–480.Search in Google Scholar

2. Hudgens MG, Halloran ME. Toward causal inference with interference. J Am Stat Assoc 2008;103.10.1198/016214508000000292Search in Google Scholar PubMed PubMed Central

3. Sobel ME. What do randomized studies of housing mobility demonstrate? J Am Stat Assoc 2006;101:1398–1407.10.1198/016214506000000636Search in Google Scholar

4. Basse GW, Airoldi EM. Optimal design of experiments in the presence of network-correlated outcomes. ArXiv e-prints 2015.Search in Google Scholar

5. Christakis NA, Fowler JH. The spread of obesity in a large social network over 32 years. N Engl J Med 2007;357:370–379.10.1056/NEJMsa066082Search in Google Scholar PubMed

6. Christakis NA, Fowler JH. Social contagion theory: examining dynamic social networks and human behavior. Stat Med 2013;32:556–577.10.1002/sim.5408Search in Google Scholar PubMed PubMed Central

7. van der Laan MJ, Rose S. Targeted learning: causal inference for observational and experimental data Springer Series in Statistics Springer-Verlag, 2011.10.1007/978-1-4419-9782-1Search in Google Scholar

8. van der Laan MJ, Rubin D. Targeted maximum likelihood learning. Int J Biostat 2006;2.10.2202/1557-4679.1043Search in Google Scholar

9. van der Laan MJ. Causal inference for a population of causally connected units. J Causal Inference 2014;2:1–62.10.1515/jci-2013-0002Search in Google Scholar PubMed PubMed Central

10. Tchetgen EJ, vanderWeele TJ. On causal inference in the presence of interference. Stat Methods Med Res 2012;21:55–75.10.1177/0962280210386779Search in Google Scholar PubMed PubMed Central

11. Ogburn EL, vanderWeele TJ. Vaccines contagion, and social networks. ArXiv e-prints 2014.Search in Google Scholar

12. VanderWeele TJ, An W. Social networks and causal inference. In Handbook of causal analysis for social research Springer, 2013:353–374.10.1007/978-94-007-6094-3_17Search in Google Scholar

13. VanderWeele TJ, Tchetgen EJ, Halloran ME. Interference and sensitivity analysis. Stat Sci 2014;29:687–706.10.1214/14-STS479Search in Google Scholar PubMed PubMed Central

14. Aral S, Walker D. Identifying social influence in networks using randomized experiments. IEEE Intell Syst 2011;26:91–96.10.1109/MIS.2011.89Search in Google Scholar

15. Aral S, Walker D. Tie strength, embeddedness, and social influence: a large-scale networked experiment. Manage Sci 2014;60:1352–1370.10.1287/mnsc.2014.1936Search in Google Scholar

16. Aronow PM, Samii C. Estimating average causal effects under interference between units. ArXiv e-prints 2013.Search in Google Scholar

17. Bowers J, Fredrickson MM, Panagopoulos C. Reasoning about interference between units: A general framework. Polit Anal 2013;21:97–124.10.1093/pan/mps038Search in Google Scholar

18. Choi DS. Estimation of monotone treatment effects in network experiments ArXiv e-prints 2014.Search in Google Scholar

19. Liu L, Hudgens MG. Large sample randomization inference of causal effects in the presence of interference. J Am Stat Assoc 2014;109:288–301.10.1080/01621459.2013.844698Search in Google Scholar PubMed PubMed Central

20. Rosenbaum PR. Interference between units in randomized experiments. J Am Stat Assoc 2007;102:191–200.10.1198/016214506000001112Search in Google Scholar

21. Toulis P, Kao E. Estimation of causal peer influence effects. Proceedings of The 30th International Conference on Machine Learning 2013:1489–1497.Search in Google Scholar

22. Walker D, Muchnik L. Design of randomized experiments in networks. Proc IEEE 2014;102:1940–1951.10.1109/JPROC.2014.2363674Search in Google Scholar

23. Lyons R. The spread of evidence-poor medicine via flawed social-network analysis. Stat Polit Policy 2010;2:1–26.10.2202/2151-7509.1024Search in Google Scholar

24. VanderWeele TJ. Sensitivity analysis for contagion effects in social networks. Sociol Method Res 2011;40:240–255.10.1177/0049124111404821Search in Google Scholar PubMed PubMed Central

25. VanderWeele TJ. Inference for influence over multiple degrees of separation on a social network. Stat Med 2013;32:591–596.10.1002/sim.5653Search in Google Scholar PubMed PubMed Central

26. VanderWeele TJ, Ogburn EL, Tchetgen Tchetgen EJ. Why and when “flawed” social network analyses still yield valid tests of no contagion. Stat Polit Policy 20123;3.10.1515/2151-7509.1050Search in Google Scholar PubMed PubMed Central

27. Steglich C, Snijders TA, Pearson M. Dynamic networks and behavior: Separating selection from influence. Sociological Method 2010;40:329–393.10.1111/j.1467-9531.2010.01225.xSearch in Google Scholar

28. Balzer L, Petersen M, van der Laan M. Targeted estimation and inference for the sample average treatment effect. U.C. Berkeley Division of Biostatistics Working Paper Series 2015.Search in Google Scholar

29. van der Laan MJ, Hubbard AE, Pajouh SK. Statistical inference for data adaptive target parameters. U.C. Berkeley Division of Biostatistics Working Paper Series 2013.Search in Google Scholar

30. Robins JM. Addendum to “a new approach to causal inference in mortality studies with a sustained exposure period-application to control of the healthy worker survivor effect”. Computers & Mathematics with Applications 1987;14:923–945.10.1016/0898-1221(87)90238-0Search in Google Scholar

31. Sofrygin O. van der Laan MJ. Targeted maximum likelihood estimation for network data. R package version 2015;0.1.0.Search in Google Scholar

32. Gill RD, Robins JM. Causal inference for complex longitudinal data: the continuous case. Ann Stat 2001:1785–1811.10.1214/aos/1015345962Search in Google Scholar

33. Robins JM. A graphical approach to the identification and estimation of causal parameters in mortality studies with sustained exposure periods. J Chronic Diseases 1987;40(Suppl 2):139S–161S.10.1016/S0021-9681(87)80018-8Search in Google Scholar

34. Robins JM. Causal inference from complex longitudinal data. In Latent variable modeling and applications to causality Springer, 1997:69–117.10.1007/978-1-4612-1842-5_4Search in Google Scholar

35. Robins JM. [Choice as an alternative to control in observational studies]: comment. Stat Sci 1999;281–293.Search in Google Scholar

36. Yu Z, van der Laan MJ. Measuring treatment effects using semiparametric models. U.C. Berkeley Division of Biostatistics Working Paper Series 2003.Search in Google Scholar

37. Dawid AP, Didelez V Identifying the consequences of dynamic treatment strategies: a decision-theoretic overview. Stat Surveys 2010;4:184–231. et al.10.1214/10-SS081Search in Google Scholar

38. Muñoz ID. van der Laan M. Population intervention causal effects based on stochastic interventions. Biometrics 2012;68:541–549.10.1111/j.1541-0420.2011.01685.xSearch in Google Scholar PubMed PubMed Central

39. Robins JM, Richardson T. Alternative graphical causal models and the identification of direct effects. Causality and Psychopathology: Finding the Determinants of Disorders and Their Cures 2010:103–158.10.1093/oso/9780199754649.003.0011Search in Google Scholar

40. Zheng W, van der Laan MJ. Causal mediation in a survival setting with time-dependent mediators. U.C. Berkeley Division of Biostatistics Working Paper Series 2012.Search in Google Scholar

41. Bickel PJ. Efficient and adaptive estimation for semiparametric models. New York: Springer-Verlag, 1993.Search in Google Scholar

42. van der Vaart AW., Statistics Asymptotic. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge: Cambridge University Press, 1998.Search in Google Scholar

43. Sofrygin O, van der Laan MJ. Semi-parametric estimation and inference for the mean outcome of the single time-point intervention in a causally connected population. U.C. Berkeley Division of Biostatistics Working Paper Series 2015.10.1515/jci-2016-0003Search in Google Scholar PubMed PubMed Central

44. Gruber S, van der Laan MJ. Targeted maximum likelihood estimation: A gentle introduction. U.C. Berkeley Division of Biostatistics Working Paper Series 2009.Search in Google Scholar

45. Core Team R. R: a language and environment for statistical computing R Foundation for Statistical Computing, 2015 Vienna.Search in Google Scholar

46. Miguel E, Worms: Kremer M. identifying impacts on education and health in the presence of treatment externalities. Econometrica 2004;72:159–217.10.1111/j.1468-0262.2004.00481.xSearch in Google Scholar

47. Davey C, Aiken AM, Hayes RJ, Hargreaves JR. Re-analysis of health and educational impacts of a school-based deworming programme in western Kenya: a statistical replication of a cluster quasi-randomized stepped-wedge trial. Int J Epidemiol 2015.10.1093/ije/dyv128Search in Google Scholar PubMed PubMed Central

48. Barabási AL, Albert R. Emergence of scaling in random networks. Science 1999;286:509–512.10.1515/9781400841356.349Search in Google Scholar

49. Watts DJ, Strogatz SH. Collective dynamics of ’small-world’ networks. Nature 1998;393:440–442.10.1515/9781400841356.301Search in Google Scholar

50. Bareinboim E, Pearl J. A general algorithm for deciding transportability of experimental results. J Causal Inference 2013;1:107–134.10.1515/jci-2012-0004Search in Google Scholar

51. Pearl J, Bareinboim E. Transportability of causal and statistical relations: a formal approach. Proceedings of the 25th National Conference on Artificial Intelligence (AAAI) 2011:247–254.10.1109/ICDMW.2011.169Search in Google Scholar

52. Pearl J, Bareinboim E. External validity: from do-calculus to transportability across populations. Stat Sci 2014;29:579–595.10.21236/ADA563868Search in Google Scholar

53. Goyal R, De Gruttola V, Blitzstein J. Sampling networks from their posterior predictive distribution. Network Sci 2014;2:107–131.10.1017/nws.2014.2Search in Google Scholar PubMed PubMed Central

54. Eckles D, Karrer B, Ugander J. Design and analysis of experiments in networks: Reducing bias from interference. arXiv preprint 2014.Search in Google Scholar

55. Ugander J, Karrer B, Backstrom L, Kleinberg J. Graph cluster randomization: Network exposure to multiple universes. Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining 2013:329–337.10.1145/2487575.2487695Search in Google Scholar

56. Aronow PM, Crawford FW, Zubizarreta JR. Confidence intervals for means under constrained dependence arXiv preprint arXiv:1602.00359 2016.Search in Google Scholar

Published Online: 2016-11-29

© 2017 Walter de Gruyter GmbH, Berlin/Boston

This article is distributed under the terms of the Creative Commons Attribution Non-Commercial License, which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Downloaded on 23.4.2024 from https://www.degruyter.com/document/doi/10.1515/jci-2016-0003/html
Scroll to top button