A modified score function estimator for multinomial logistic regression in small samples

https://doi.org/10.1016/S0167-9473(01)00048-2Get rights and content

Abstract

Logistic regression modelling of mixed binary and continuous covariates is common in practice, but conventional estimation methods may not be feasible or appropriate for small samples. It is well known that the usual maximum likelihood estimates (MLEs) of the log-odds-ratio parameters are biased in finite samples, and there is a non-zero probability that an MLE is infinite, i.e., does not exist. In this paper, we extend the approach proposed by Firth (Biometrika 80 (1993) 27) for bias reduction of MLEs in exponential family models to the multinomial logistic regression model, and consider general regression covariate types. The method is based on a suitable modification of the score function that removes first order bias. We apply the method in the analysis of two datasets: one is a study of disease prognosis and the other is a disease prevention trial. In a series of simulation studies in small samples, the modified-score estimates for binomial and trinomial logistic regressions had mean bias closer to zero and smaller mean squared error than other approaches. The modified-score estimates have properties that make them attractive for routine application in logistic regressions of binary and continuous covariates, including the advantage that they can be obtained in samples in which the MLEs are infinite.

Introduction

Methods for logistic regression modelling of nominal categorical responses based on the multinomial logistic likelihood are now generally available in standard statistical packages, and have been applied in the analysis of case-control studies with multiple case or multiple control groups, and in randomized trials and cross-sectional surveys with categorical responses. One of the concerns of investigators is the valid estimation of model parameters in the finite sample sizes encountered in practice. In finite samples, the usual maximum likelihood estimates (MLEs) of the log odds ratios are biased, and the bias increases as the ratio of the number of observations to the number of parameters (n-to-p ratio) decreases (Cordeiro and McCullagh, 1991; Bull et al., 1997). This is of particular concern when there are several response categories and multiple covariates because the number of parameters can become large.

We consider an alternative estimation method for small samples based on a modification of the score function that removes first order bias and is equivalent to penalizing the likelihood by Jeffreys’ prior (Firth, 1993). We extend the modified score function method to the multinomial logistic regression model with nominal response categories, and compare the modified estimates to the usual MLEs and to the MLEs corrected by an estimate of the asymptotic bias. As the sample size increases, the modified-score estimates become equivalent to the usual MLEs. As systematic small sample comparisons of this approach have not been reported previously for binomial or multinomial logistic regression models, we also present a Monte Carlo simulation study in which we compare the mean bias and mean squared error (MSE) of the modified estimates to the MLEs, and to the MLEs corrected by the estimated asymptotic bias. Considering the same series of logistic regression models studied previously (Bull et al., 1997), we find that the modified-score estimates are competitive and often superior to the other approaches.

Section snippets

Methods for small-sample analysis

The small-sample properties of the logistic regression MLEs can be improved by the general approach of Cox and Snell (1968) which uses higher order terms in a Taylor series expansion of the log-likelihood to approximate the asymptotic bias and obtain bias-corrected MLEs (Anderson and Richardson, 1979; Schaefer, 1983; Copas, 1988; Cordeiro and McCullagh, 1991; Bull et al., 1997). When the magnitude of the linear predictor is small, Cordeiro and McCullagh (1991) showed that the effect of bias

Usual maximum likelihood estimation with bias correction

We consider a multicategory outcome y that is a multinomial variable with J+1 categories. For each category j (j=1,…,J), there is a regression function in which the log odds of response in category j, relative to category 0, is a linear function of regression parameters and a vector x of p covariates (including a constant): log{prob(y=j|x)/prob(y=0|x)}=βjTx. We let yi be a J×1 vector of indicators for the observed response category for observation i, with the corresponding J×1 vector of

Applications of small-sample estimation

The first application is a study of clinical factors that relate to the presence or absence of nodal involvement in patients with prostate cancer; 20 of the 53 patients have nodal involvement (Brown, 1980; Cox and Snell, 1989). Table 1 presents the estimates for a logistic regression model with six covariates: four binary indicators (X-ray, stage, grade, interaction of stage and grade) and two continuous variables (acid, age). In this dataset, the ratio of the number of observations to the

Design

The Monte Carlo study evaluated the MPLEs with respect to mean bias and MSE and compared them to the usual MLEs and the BCMLEs in multiple logistic regressions that included both binary and continuous covariates. We also calculated the mean bias and MSE for the MPLEs in all datasets, including those in which one or more of the MLEs did not exist. As in our previous study, we conducted three series of simulations to investigate the effects of sample size (200,100,75,50), type of covariates

Discussion

The applications and the Monte Carlo study demonstrate several advantages for the modified score function estimator. In the applications, it is apparent that the shrinkage effect of the modified score function operated more strongly when it was needed, i.e., when there was a large association and the ratio of observations to parameters was small, but was minimal for parameter estimates that did not require bias reduction. It was effectively applied in both very small samples and in large

Acknowledgements

During this study, S.B. Bull was a National Health Research Scholar of the National Health Research and Development Program of Health and Welfare Canada. This research was supported by the Natural Sciences and Engineering Research Council of Canada. Thanks to J.P. Lewinger for assistance in preparing the applications and for helpful discussions.

References (39)

  • M.A. Blajchman et al.

    For the Canadian post-transfusion hepatitis prevention study group. Post-transfusion hepatitis: impact of the non-A non-B hepatitis surrogate tests

    Lancet

    (1995)
  • M. Zelen

    Multinomial response models

    Comput. Statist. Data Anal.

    (1991)
  • A. Albert et al.

    On the existence of maximum likelihood estimates in logistic regression models

    Biometrika

    (1984)
  • A. Albert et al.

    Multivariate Interpretation of Clinical Laboratory Data.

    (1984)
  • J.M. Alho

    On the computation of likelihood and score test based confidence intervals in generalized linear models

    Statist. Med.

    (1992)
  • J.A. Anderson et al.

    Logistic discrimination and bias correction in maximum likelihood estimation

    Technometrics

    (1979)
  • Aptech Systems Incorporated, 1990. The GAUSS System, Version 2.0, Kent,...
  • B.W. Brown

    Prediction analyses for binary data.

  • S.B. Bull et al.

    Two-step jackknife bias reduction for logistic regression MLEs

    Commun. Statist.—Simulation Comput.

    (1994)
  • S.B. Bull et al.

    Jackknife bias reduction for polychotomous logistic regression

    Statist. Med.

    (1997)
  • J.B. Copas

    Binary regression models for contaminated data, with discussion

    J. Royal Statist. Soc. Ser. B

    (1988)
  • G.M. Cordeiro et al.

    Bias correction in generalized linear models

    J. Royal Statist. Soc. Ser. B

    (1991)
  • D.R. Cox et al.

    A general definition of residuals

    J. Royal Statist. Soc. Ser. B

    (1968)
  • D.R. Cox et al.

    The Analysis of Binary Data.

    (1989)
  • Cytel Software Corporation, 1992. LogXact-Turbo: A Software Package for Exact and Asymptotic Logistic Regression,...
  • D.E. Duffy et al.

    On the small sample properties of norm-restricted maximum likelihood estimators for logistic regression models

    Commun. Statist.—Theory Meth.

    (1989)
  • V.T. Farewell

    Jackknife estimation with structured data

    Biometrika

    (1978)
  • D. Firth

    Generalized linear models and Jeffreys priors: an iterative weighted least-squares approach.

  • D. Firth

    Bias reduction, the Jeffreys prior and GLIM.

  • Cited by (71)

    • Using Firth's method for model estimation and market segmentation based on choice data

      2019, Journal of Choice Modelling
      Citation Excerpt :

      As we show in this paper, a major advantage of the method is that it allows fitting a MNL model to individual-level data, and subsequently, exploring the heterogeneity in the respondents' preferences and segmenting the market. Bull et al. (2002) were the first to propose Firth's method to estimate the MNL model, but they applied it to small sample clinical trials outside a choice modeling context, and did not consider individual-level data. Firth's method was originally developed as a general bias reducing technique in the context of ML estimation, but it was also shown to provide finite parameter estimates in the case of separation (see, for binomial and trinomial logistic regression on clinical data, Bull et al. (2002), Heinze and Schemper (2002) and Heinze (2006)).

    • Neonatal Sepsis of Early Onset, and Hospital-Acquired and Community-Acquired Late Onset: A Prospective Population-Based Cohort Study

      2018, Journal of Pediatrics
      Citation Excerpt :

      We then fitted a multivariable model containing all predictors simultaneously. Because the distribution of certain variables was unbalanced between groups, we used Firth penalized maximum likelihood bias reduction method for multinomial regression39 and Cox regression40 and calculated corresponding CIs. We considered only the first episode of each patient in regression analyses to avoid conditionality between repeated sepsis episodes in the same patient.

    View all citing articles on Scopus
    View full text