Diagnostic tools for random effects in the repeated measures growth curve model

https://doi.org/10.1016/S0167-9473(99)00049-3Get rights and content

Abstract

Growth curve models assuming a normal distribution are often used in repeated measurements applications because of the wide availability of software. In many standard situations, a polynomial in time is fitted to describe the mean profiles under different treatments. The dependence among responses from the same individuals is generally handled by a random effects model, although an auto-regressive structure can often be more appropriate. We consider both, in the context of missing observations. We present diagnostics for two major problems: (1) the forms of the mixing distribution in random effects models, and their influence on inferences about treatment effects, and (2) the randomness of missing observations. To demonstrate the utility of our techniques, we reanalyze data on percentage protein content in milk, often erroneously analyzed as illustrating a dropout phenomenon.

Introduction

The original growth curve model for repeated measurements over time, introduced by Elston and Grizzle (1962), and generalized by Potthoff and Roy (1964), has come to be widely used when the responses can be assumed to be approximately normally distributed. It was later popularized by Laird and Ware (1982). This model uses polynomials in time to describe mean profiles, with random coefficients to generate a correlation structure among the repeated observations on each individual. However, as Elston (1964) pointed out, such a covariance matrix depends crucially on the time origin used, making the model difficult to interpret and often unsuitable. For further discussion of the problems with such random coefficients models, see Lindsey (1993, pp. 85–97).

A more appropriate approach to modelling the dependence of responses over time is to introduce some form of auto-regressive structure, leaving any random effects to handle inter-individual heterogeneity. Such models have been discussed by many authors; for a survey, see Lindsey (1993, Chapter 4). One of the most flexible techniques for fitting such models, combining random effects and auto-regression, is the Kalman filter. This allows observations to be unequally spaced in time and can, thus, handle randomly missing values; see, for example, Jones and Ackerson (1990), Jones and Boadi-Boateng (1991), and Jones (1993). We shall use Jones’ software in the analyses to follow.

One major drawback of this linear growth curve model is that the mean response is generally taken to vary as a polynomial over time. In most situations, this will be biologically unreasonable, some nonlinear function adapted to the specific situation being more appropriate. Few authors have attempted to accommodate this situation with both random effects and auto-regressive components; see, however, Heitjan 1991a, Heitjan 1991b and Lambert (1996). We shall not consider diagnostics to detect this type of problem, although mis-specification in this part of the model will influence conclusions about the suitability of other components of a model.

Many aspects of a repeated measurements model need to be checked when analyzing such data. We can only cover a few here. We shall be particularly interested in the appropriateness of the random effects distribution(s) in describing the heterogeneity found in the data, including ways in which specification of the time variable influences this. We shall also look at what information may be available about whether or not missing values can be assumed to be random, without making any attempt to model them. For the first problem, we shall look at what a fixed effects model can tell us about random effects. For the second, we shall consider the results of different approaches to fitting the auto-regression of the model, as well as looking at logistic regression. We shall study these in the context of residual analysis, and individual profiles.

Section snippets

The milk data

To illustrate our procedures, we shall use data that have been discussed several times in the statistical literature (Verbyla and Cullis, 1990; Diggle, 1990; Diggle and Kenward, 1994; Diggle et al., 1994; Little, 1995). They concern an experiment on the effect of three different feeding strategies on the protein content of milk produced by cows over time. The percent protein level was measured on 79 cows weekly during 19 consecutive weeks. The cows were randomly divided into three diet groups

Models

In the simplest cases, the growth model takes the N×T matrix of response values, Y (where N is the number of individuals, and T is the number of time points), to have meanE[Y]=XBZ,where X is the N×C inter-subject design matrix (describing the C diets and cohorts in our example) for the N individuals, while B is a C×P location parameter matrix (where P is the number of time-varying covariates) and Z is a P×T matrix of covariates changing with the responses on a unit, most often simply a P−1

Preliminary analysis

These data have already been analyzed several times in the literature, so that preliminary analysis using a series of plots, especially individual profiles, need not be presented here. That for biological time origin was presented by Verbyla and Cullis (1990), Diggle (1990, p. 159), and Diggle et al. (1994, p. 54). We now proceed in fitting what were called complete likelihood models above, using continuous AR (Jones, 1993).

However, we first look at the calving time as a response variable in

Diagnostics

In complex models, such as those for repeated measurements or where dispersion parameters (such as the variance) depend on the covariates, standard linear regression diagnostics such as residual analysis are often of limited use in detecting problems with a model. (For other examples, see Lindsey and Jones, 1997.) The best approach seems to be to try fitting a wider range of models in order to check the assumptions being made.

Discussion

The standard diagnostic techniques, such as residual plots, that we presented at the beginning of our analysis, showed us no basic anomalies with the models that we developed there. (We also tried variograms, but they proved to be of little use, perhaps because they are inappropriate when random slopes are present.) And yet, further study, especially by fitting fixed effects models, demonstrate that these models fit the data very poorly. There is so much variability among the individual

Acknowledgements

A version of CARMA, supplied as Fortran code to the second author in 1990 by Richard Jones whom we thank, was used to produce the results included in this paper. This program was applied by means of a user-friendly front end constructed in R (Ihaka and Gentleman, 1996), a fast S-Plus clone freely available under the GNU licence, which we thank Robert Gentleman and Ross Ihaka for developing. It is available in the R public library called growth at www.luc.ac.be/j̃lindsey/rcode.html.

Philippe

References (25)

  • Akaike, H., 1973. Information theory and an extension of the maximum likelihood principle. In Petrov, B.N. Csàki, F....
  • Diggle, P.J., 1990. Time Series. A Biostatistical Introduction. Oxford University Press,...
  • P.J. Diggle et al.

    Informative drop-out in longitudinal data analysis

    Appl. Statist.

    (1994)
  • Diggle, P.J., Liang, K.Y., Zeger, S.L., 1994. The Analysis of Longitudinal Data. Oxford University Press,...
  • R.C. Elston

    On estimating time-response curves

    Biometrics

    (1964)
  • R.C. Elston et al.

    Estimation of time response curves and their confidence bands

    Biometrics

    (1962)
  • Francis, B., Green, M., Payne, C., 1993. Glim 4: The Statistical System for Generalized Linear Interactive Modelling....
  • D.F. Heitjan

    Generalized Norton–Simon models of tumour growth

    Statist. Med.

    (1991)
  • D.F. Heitjan

    Nonlinear modeling of serial immmunologic data: a case study

    J. Amer. Statist. Assoc.

    (1991)
  • Hendry, D., 1995. Dynamic Econometrics. Oxford University Press,...
  • Hougaard, P., 1986. Survival models for heterogeneous populations derived from stable distributions. Biometrika 73,...
  • R. Ihaka et al.

    R: a language for data analysis and graphics

    J. Comput. Graphics Statist.

    (1996)
  • Cited by (2)

    View full text