Fitting semiparametric regressions for panel count survival data with an R package spef
Introduction
Panel count data arises when a recurrent event process in time is only observed at a finite number of, often random, observation times. Only the number of occurrences of events between observation times, instead of the exact event times, are observed. Such data frequently occur in clinical studies when continuous monitoring of the patients is infeasible or too costly. For example, in many long-term studies, patients who may experience multiple occurrences of the same event, were only checked at a number of discrete times when they visited their physicians.
Analyzing panel count data is challenging because of the incomplete nature of the observed information. That is, the exact event occurrence times are unobserved. A further complication is that the observation times and the censoring (follow-up) time may be associated with the underlying event process. For example, a patient with a lower risk may need fewer medical attentions and go to the clinics less often; a patient with a higher risk may be more likely to drop out the study at a early stage due to disease progression or death.
Development of statistical methods that properly address the challenges of panel count data has attracted considerable attention. In the case of nonparametric estimation of event rate function, one can derive the estimator based on likelihood or pseudolikelihood [1], [2], [3]. When covariate effects are of main interest, semiparametric models such as proportional means or proportional rates models are often desired. Various fitting methods have been proposed in the literature under different assumptions about the event process, observation times and censoring time.
In spite of the active methodological development, software development for semiparametric panel count regression has not caught up. This is in contrast to methods for right censored data or recurrent event data, which are much more mature, and have been implemented in major statistical softwares, such as procedures LIFEREG and PHREG in SAS, and package survival [4] in R. While developing a new estimating equation approach for semiparametric panel count regression, we have implemented computer programs for both existing and our own methods. All functions are wrapped into an R package spef (acronym for semiparametric estimating functions), which is available from the Comprehensive R Archive Network at http://CRAN.R-project.org/package=spef.
The rest of the article is organized as follows. The general notations of panel count data is introduced through a well-known bladder tumor data set in Section 2. Semiparametric panel count regression models and various fitting methods are presented in Sections 3 Model, 4 Methods, respectively. Usage of the package is summarized in Section 5. An illustration with the bladder tumor data is demonstrated in Section 6. A brief summary concludes in Section 7.
Section snippets
Panel count data
The bladder tumor data of Byar [5] is frequently used for illustration purposes in modeling panel count data [6], [7], [8], [9]. In the original study, patients who had superficial bladder tumors were randomized into three treatment arms: placebo, pyridoxine pills, and thiotepa. At each visit during the follow-up, tumors were counted, measured, and then removed transurethrally, and the treatment was continued. The objective of the study was to determine if the treatment reduces the recurrence
Model
A marginal mean model is considered as the base model to which various fitting procedures discussed below would apply. The covariates effects enter the marginal mean of the event process multiplicatively throughwhere β is a p × 1 vector of covariate coefficient and Λ( · ) is a completely unspecified baseline mean function. Model (1) only characterizes the mean of the event process without fully specifying how the process evolves. As a rich class, it covers many
Methods
Some authors approach the model fitting with estimating equations suggested by Model (1), while others choose to work on likelihood with additional Poisson assumption for the event process. Because of the semiparametric nature of the model, one may choose to estimate the nonparametric component Λ( · ) or not. Under some regularity conditions, Λ( · ) can be estimated as a step function with jumps on a discrete time grid, say G = {0 = s0 < s1 < ⋯ < sm = τ}, constructed from the set of all observation times.
Panel count survival response
Similar to function Surv in package survival, function PanelSurv in package spef creates a panel count survival object with three slots including a data frame of input variables (psDF), a vector of unique observation times (timeGrid), and a matrix representation of the number of events between the observation times (panelMatrix). The required inputs of function PanelSurv are a variable indicating subject id, a variable for observation times, and a variable for the number of new events since
Illustration
We illustrate the model fitting with the bladder tumor data set. All illustrations below use the same model formula whose response is created from the PanelSurv function.
R> fml <- PanelSurv(id, time, count)~num + size + treatment
We first use
Conclusion
We have implemented a list of methods for semiparametric panel count regression in the R package spef. Some of them do not require the underlying event process to be Poisson, and allow informative observation and censoring scheme. Practitioners who handle such kind of data will benefit from having cutting-edge statistical methods available in the popular statistical computing environment R.
In future releases of spef, we may improve the computing speed of by moving some calculations into C.
Acknowledgements
This research was partly supported by NSF. The authors would like to thank the editor and two referees for reviewing this paper.
References (22)
- et al.
Estimation of the mean function of point processes based on panel count data
Statistica Sinica
(1995) - et al.
Two estimators of the mean of a counting process with panel count data
The Annals of Statistics
(2000) - et al.
Estimation of the mean function with panel count data using monotone polynomial splines
Biometrika
(2007) - T. Therneau, T. original R port by Lumley, survival: Survival analysis, including penalised likelihood, 2010. R package...
The veterans administration study of chemoprophylaxis for recurrent stage I bladder tumors: comparison of placebo, pyridoxine, and total thiotepa
- et al.
Regression analysis of panel count data with covariate-dependent observation and censoring times
Journal of the Royal Statistical Society, Series B: Statistical Methodology
(2000) A semiparametric pseudolikelihood estimation method for panel count data
Biometrika
(2002)- et al.
Analysing panel count data with informative observation times
Biometrika
(2006) - X. Wang, S. Ma, J. Yan, Augmented estimating equations for semiparametric panel count regression with informative...
ggplot2: Elegant Graphics for Data Analysis
(2009)