Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Bayesian variable selection in linear quantile mixed models for longitudinal data with application to macular degeneration

Abstract

This paper presents a Bayesian analysis of linear mixed models for quantile regression based on a Cholesky decomposition for the covariance matrix of random effects. We develop a Bayesian shrinkage approach to quantile mixed regression models using a Bayesian adaptive lasso and an extended Bayesian adaptive group lasso. We also consider variable selection procedures for both fixed and random effects in a linear quantile mixed model via the Bayesian adaptive lasso and extended Bayesian adaptive group lasso with spike and slab priors. To improve mixing of the Markov chains, a simple and efficient partially collapsed Gibbs sampling algorithm is developed for posterior inference. Simulation experiments and an application to the Age-Related Macular Degeneration Trial data to demonstrate the proposed methods.

Introduction

Quantile regression(QR) in longitudinal or panel data models have received increasing attention in recent years. For example, [1] proposed a general method to estimate the QR coefficients by using a L1 penalized likelihood approach. Later [2] generalized the work of [1] to be more flexible model with endogenous variables. [3] offered a estimation of individual effects for panel data QR models, which is widely applied by practitioners because of computationally simple. [4] studied quantile panel models with correlated random effects. [5] proposed a Stochastic Approximation of the EM (SAEM) algorithm to analyze linear quantile mixed regressions(LQMMs) via the asymmetric Laplace distribution. Other related papers include [69], among many others.

Like all regression issues, when there are many covariates in longitudinal or panel QR models, variable selection becomes necessary to avoid overfitting and multicollinearity. [10] presented a penalized quantile regression model for random intercept using the Bayesian lasso priors and Bayesian adaptive lasso priors, respectivly. [11] also considered a Bayesian Lasso approach to jointly estimate a vector of covariate effects and a vector of random effects by introducing an l1 penalty. However, for the random effects, their method mainly focus on the penalty of diagonal elements in the covariance matrix of random effects after a modified Cholesky decomposition. In addition, they neglect the nonnegative constraint on the diagonal terms in the covariance matrix of random effects after a modified Cholesky decomposition, which can cause non-uniqueness in the decomposition [12]. Then, we first propose a new Bayesian shrinkage approach to solve these problems in the LQMMs. In particular, we develop an extended Bayesian adaptive group lasso which can accommodate the nonnegative constraint of the diagonal terms to attempt to identify the non-zero random effects. Unfortunately, the shrinkage approach need some ad hoc methods to do variable selection similar to [10] and [11] for the well known reason. An alternative to the Bayesian variable selection is to use spike and slab priors. We then develop a general variable selection technic via the Bayesian adaptive lasso and the extended Bayesian adaptive group lasso with spike and slab priors to identify the significant fixed and random effects simultaneously.

The paper is organized as follows. Section 2 presents the reparameterized LQMMs based on a Cholesky decomposition of covariance matrix of random effects. Section 3 describes the structure of our hierarchical Bayesian LQMMs and discuss our prior specifications. We also develop a Partially collapsed Gibbs (PCG) sampling algorithm for posterior inference. In section 4, we will develop a variable selection procedure for both fixed and random effects in a LQMM with spike and slab priors and an efficient PCG. We illustrate the performance of our method by simulation studies and a real data example in section 5 and 6. Results show that the proposed approach performs very well. In Section 7, we conclude the paper.

The re-parameterized LQMMs

Consider n subjects containing ni observations, , for j = 1, …, ni and , where yij is the jth response for subject i, is the jth row of a known ni × p matrix xi and is the jth row of a known ni × q matrix zi. The canonical linear mixed model is as follows: (1) where β is a p × 1 vector of unknown population fixed parameters, αi is a q × 1 vector of unobservable random effects with covariance matrix D. In order to guarantee the positive semidefinite of D, we adopt the Cholesky decomposition to it, resulting in where Γ = (γst) is a q × q lower triangular matrix with nonnegative diagonal entries. Similar to [13], the model (1) can be re-expressed as follows: (2) Here is a 1 × q(q + 1)/2 vector, where γk consists of the non-zero lower triangular elements of the kth row of Γ, i.e., , k = 1, …, q, Jq is a transformation matrix with size q2 × q(q + 1)/2 such that vec(Γ) = Jqγ, Iq denotes the q × q identity matrix. For example, if q = 2 the transformation matrix J2 would have the following form With model (2), the linear conditional τth (0 < τ < 1) mixed quantile estimator can be calculated by (3) In a Bayesian setup, this is equivalent to assuming that the error terms εij in (2) follow the asymmetric Laplace distribution (ALD), of which the probability density function is given by (4) where σ−1 and τ are the scale and skewness parameters, respectively. [14] first used the ALD to establish Bayesian QR in a linear model for independent data and proposed a random walk Metropolis-Hastings algorithm to draw samples. Since then, there has been a great deal of literature on the extension and application of the QR based on the ALD such as [1518]. However, direct sampling using the ALD is often inconvenient and not easy to generalize to more complex scenarios. We employ a mixture representation of ALD proposed by [19] to facilitate Bayesian inference in LQMMs, which was also used by many authors (e.g., see [11, 2022]), (5) where vijexp(1/σ) has an exponential prior with mean 1/σ, ζijN(0, 1) has a standard normal prior and is independent of vij, and . Let v = (vij: i = 1, …, n; j = 1, …, ni), ζ = (ζij: i = 1, …, n; j = 1, …, ni). Hence, based on the Cholesky decomposition and the mixture representation of ALD, we obtain the following hierarchal model,

Shrinkage estimation of linear quantile mixed regression

We first propose a new Bayesian shrinkage approach in LQMMs by using the Bayesian adaptive lasso and an extended Bayesian adaptive group lasso. A regularized τth quantile estimates of the fixed and random effect coefficients can be defined by (6) Motivated by [23], we put a Laplace priors on βs, where , a Laplace priors , where , and suppose that the error terms εij come from the ALD (4). Then, the posterior distribution of β and γ is proportional to (7) So it can be easily observed that maximizing the posterior distribution (7) is equivalent to minimizing the target function (6) if we ignore the nonnegative constraints γkk ≥ 0, k = 1, …q. However, it is not clear how to incorporate this constraints within the Bayesian Penalized LQMMs framework. Using truncated normal priors, [24] and [25] solved a similar problem on the selection of fixed effects, which is non-group structure in linear and generalized mixed model, respectively. We extend the idea to the Bayesian Group lasso on the shrinkage of random effects with group structure. We specify the conditional prior distributions of γk as . To facilitate posterior inference, we utilize a mixture representation to yield where . Then, γk can be expressed hierarchically as (8) for k = 1, …, q. Moving to the fixed effects parameters, β can also be written hierarchically as follows (9) for s = 1, …, p. There are two main ways to estimate the tuning parameter in Bayesian regularization: first, a fully Bayesian method, which specifies a hyperprior on it, such as a conjugate Gamma distribution [23], and second, an empirical Bayesian method, which estimates it using maximum marginal likelihood method [26]. Here we use the fully Bayesian approach and set Gamma priors on the parameters , and σ to complete the Bayesian structure. We develop a Partially collapsed Gibbs (PCG) sampling sampler by integrating the full conditional distribution of β with respect to {bj} to sample from posterior distributions. In order to improve the convergence behavior of the Gibbs sampler, [27] proposed PCG which is a generalization of blocking and collapsing method [28]. The key idea of the PCG is to replace some of the posterior distributions with marginal posterior distribution while preserving the target distribution. Next, we present the conditional distribution of unknown parameters.

(1) Conditional distribution of β:

Let T = diag(ts, s = 1, …, p) and . The conditional distribution of β is then a , where

(2) Conditional distribution of bi:

The conditional distribution of bi is a multivariate normal distribution with mean and variance where

(3) Conditional distribution of :

The full conditional distribution of follows a Inverse Gaussian(μ′, λ′) with p.d.f. (10) Here and , where .

(4) Conditional distributions of and :

The full conditional distributions of and are independent Gamma distributions,

(5) Conditional distributions of ts and ηk:

The full conditional distributions of ts and ηk are independent Inverse Gaussian,

(6) Conditional distribution of γ:

Denote , i = 1, …, n; j = 1, …, ni, and let Fijk be the covariate vector corresponding to γk, k = 1, …, q. The full conditional distribution of γk is then a truncated normal distribution where where and Fij(k represent the covariate matrix corresponding to γ(k).

(7) Conditional distribution of σ:

The full conditional distribution of σ is

Bayesian model selection in LQMMs

The shrinkage approach proposed in the above section has potential for variable selection in LQMMs. However, the estimators are never exact 0 due to the continuous priors on β and γ. This section develops a general model selection method with spike and slab priors to identify the relative important effects. We first discuss the prior specification and propose the hierarchal model. Subsequently, we consider a Partially collapsed Gibbs sampling algorithm for posterior computation, and derive the relevant conditional distributions.

We assume the following hierarchical Bayesian lasso with independent spike and slab type priors for β: where δ0(⋅) denotes a point mass at 0, is the prior probability of excluding the sth fixed effect in the model, which is assigned a beta prior with parameters and , resulting in a noninformative uniform prior on (0, 1). Furthermore, we assume the following extended hierarchical adaptive Bayesian group lasso with independent spike and slab type priors for γ: where π0 is the prior probability of excluding the kth random effect in the model, which is also assumed to be a noninformative uniform prior on (0, 1). We also present the conditional distribution of unknown parameters.

(1) Conditional distribution of β:

Let , and . The conditional distribution of βs is then a spike and slab distribution, where

(2) Conditional distribution of bi:

The conditional distribution of bi is a multivariate normal distribution with mean and variance where

(3) Conditional distribution of :

The full conditional distribution of follows a Inverse Gaussian(μ′, λ′) with and , where .

(4) Conditional distributions of λ1s and λ2k:

The full conditional distributions of λ1s and λ2k are independent Gamma distributions,

(5) Conditional distributions of ts and ηk: where InverseGamma(a, b) denotes a Inverse Gamma distribution with shape parameter a and scale parameter b.

(6) Conditional distribution of and π0:

The full conditional distributions of and π0 are independent Beta distribution.

(7) Conditional distribution of γ:

Denote , , , and let Fijk be the covariate vector corresponding to be the covariate vector corresponding to γk, k = 1, …, q. The full conditional distribution of γk is then a a spike and slab distribution, where Here, Φ(⋅) indicates the cumulative distribution function of the standard normal distribution, is the kth element of and is the kth diagonal element of .

(8) Conditional distribution of σ:

The full conditional distribution of σ is

Simulation studies

In this section we check the performance of the proposed extended Bayesian adaptive group lasso approach (adGL) and the Bayesian adaptive group lasso with independent spike and slab type priors approach (adSpikeGL) through simulations. We also compare them with other models: the PMQ method reported by [10], the BL and BAL methods proposed by [11] and the ordinary quantile regression estimator (QR) focusing solely on independent regression. The test data are generated from the following model: for i = 1, …, 50, j = 1, …, 5, where εij has the τth quantile equal to 0, random effects αiN(0, D), where (11) Similar formulations of D in mixed models have been used by [29] and [30]. We set and , where xijk, k = 1, …, 8 are drawn independently from the uniform [−2, 2]. The fixed effects parameters β = (β1, …, β8) are set as follows:

  • Simulation 1: β = (3, 1.5, 0, 0, 2, 0, 0, 0)′ to illustrate a sparse case;
  • Simulation 2: β = (0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85)′ to illustrate a dense case;
  • Simulation 3: β = (5, 0, 0, 0, 0, 0, 0, 0)′ to illustrate a very sparse case.

With each simulation of β, we generate data under three different quantiles i.e. τ = 0.1, 0.3 and 0.5, and four different error distributions: normal, normal mixture, laplace, and laplace mixture as follows

  • (1) εijN(0, 9), i = 1, …, n, j = 1, …, ni;
  • (2) εij ∼ 0.1N(0, 1) + 0.9N(0, 5), i = 1, …, n, j = 1, …, ni;
  • (3) εijLaplace(0, 3), i = 1, …, n, j = 1, …, ni;
  • (4) ;

We consider three sample sizes n = 30, ni = 5; n = 50, ni = 5; n = 100, ni = 5. Therefore, we have a total of 36 simulation setups for each of the sample sizes. We simulate 100 data sets for simulation setups. Priors for the Bayesian methods are taken to be weak prior. The hyperparameters in the Gamma priors for tuning and scale parameters are all set to be 0.1. We run our PCG sampler for 20000 following initial burn-in 10000 cycles. The convergence of the proposed PCG is monitored by using the multivariate potential scale reduction factor (MPSRF) introduced by [31]. Figs 16 show that the MPSRF generally become stable and get close to 1 after about 10000 iterations under normal mixture error distribution, suggesting that the above burn-in size is large enough to ensure the convergence of PCG. The results for other error distributions are similar and omitted.

thumbnail
Fig 1. MPSRF for adGL in Simulation 1, when normal mixture error distribution.

https://doi.org/10.1371/journal.pone.0241197.g001

thumbnail
Fig 2. MPSRF for adGL in Simulation 2, when normal mixture error distribution.

https://doi.org/10.1371/journal.pone.0241197.g002

thumbnail
Fig 3. MPSRF for adGL in Simulation 3, when normal mixture error distribution.

https://doi.org/10.1371/journal.pone.0241197.g003

thumbnail
Fig 4. MPSRF for adSpikeGL in Simulation 1, when normal mixture error distribution.

https://doi.org/10.1371/journal.pone.0241197.g004

thumbnail
Fig 5. MPSRF for adSpikeGL in Simulation 2, when normal mixture error distribution.

https://doi.org/10.1371/journal.pone.0241197.g005

thumbnail
Fig 6. MPSRF for adSpikeGL in Simulation 3, when normal mixture error distribution.

https://doi.org/10.1371/journal.pone.0241197.g006

We use the posterior median estimates of β and D as our point estimator, denoted as and , respectively, and we consider two error measures including the mean absolute deviations (MAD), root mean squared errors (RMSE). To check the performance of the parameter estimation, we also calculate the median of mean square error for β () and the median of quadratic loss error for D (). More specifically, where is the predicted value of yij, and the median is taken over the 100 simulations. The results of MAD and RMSE are summarized in Figs 712, meanwhile and are listed in Tables 19. Because BL, BAL, and QR neglect the covariance structure of random effects, we just report the results of the other three techniques for comparison.

thumbnail
Fig 7. Boxplots summarizing the MAD for three sample sizes in Simulation 1.

The rows from top to bottom correspond to normal, normal mixture, laplace, and laplace mixture error distributions respectively.

https://doi.org/10.1371/journal.pone.0241197.g007

thumbnail
Fig 8. Boxplots summarizing the RMSE for three sample sizes in Simulation 1.

The rows from top to bottom correspond to normal, normal mixture, laplace, and laplace mixture error distributions respectively.

https://doi.org/10.1371/journal.pone.0241197.g008

thumbnail
Fig 9. Boxplots summarizing the MAD for three sample sizes in Simulation 2.

The rows from top to bottom correspond to normal, normal mixture, laplace, and laplace mixture error distributions respectively.

https://doi.org/10.1371/journal.pone.0241197.g009

thumbnail
Fig 10. Boxplots summarizing the RMSE for three sample sizes in Simulation 2.

The rows from top to bottom correspond to normal, normal mixture, laplace, and laplace mixture error distributions respectively.

https://doi.org/10.1371/journal.pone.0241197.g010

thumbnail
Fig 11. Boxplots summarizing the MAD for three sample sizes in Simulation 3.

The rows from top to bottom correspond to normal, normal mixture, laplace, and laplace mixture error distributions respectively.

https://doi.org/10.1371/journal.pone.0241197.g011

thumbnail
Fig 12. Boxplots summarizing the RMSE for three sample sizes in Simulation 3.

The rows from top to bottom correspond to normal, normal mixture, laplace, and laplace mixture error distributions respectively.

https://doi.org/10.1371/journal.pone.0241197.g012

thumbnail
Table 1. Median of mean square errors () and quadratic loss errors() at τ = 0.1 for Simulation 1.

are in parentheses.

https://doi.org/10.1371/journal.pone.0241197.t001

thumbnail
Table 2. Median of mean square errors() and quadratic loss errors() at τ = 0.3 for Simulation 1.

are in parentheses.

https://doi.org/10.1371/journal.pone.0241197.t002

thumbnail
Table 3. Median of mean square errors() and quadratic loss errors() at τ = 0.5 for Simulation 1.

are in parentheses.

https://doi.org/10.1371/journal.pone.0241197.t003

thumbnail
Table 4. Median of mean square errors() and quadratic loss errors() at τ = 0.1 for Simulation 2.

are in parentheses.

https://doi.org/10.1371/journal.pone.0241197.t004

thumbnail
Table 5. Median of mean square errors() and quadratic loss errors() at τ = 0.3 for Simulation 2.

are in parentheses.

https://doi.org/10.1371/journal.pone.0241197.t005

thumbnail
Table 6. Median of mean square errors() and quadratic loss errors() at τ = 0.5 for Simulation 2.

are in parentheses.

https://doi.org/10.1371/journal.pone.0241197.t006

thumbnail
Table 7. Median of mean square errors() and quadratic loss errors() at τ = 0.1 for Simulation 3.

are in parentheses.

https://doi.org/10.1371/journal.pone.0241197.t007

thumbnail
Table 8. Median of mean square errors() and quadratic loss errors() at τ = 0.3 for Simulation 3.

are in parentheses.

https://doi.org/10.1371/journal.pone.0241197.t008

thumbnail
Table 9. Median of mean square errors() and quadratic loss errors() at τ = 0.5 for Simulation 3.

are in parentheses.

https://doi.org/10.1371/journal.pone.0241197.t009

A few observations can be seen from the results. Firstly, it can be seen from Figs 712 that the proposed approaches perform well in general, the adGL generally outperforms the other methods in terms of MAD and RMSE, which indicate explicitly penalizing the off-diagonal covariance components can improve the model selection procedure. Similar findings were also obtained by [32]. On the other hand, the adSpikeGL also has good performance, yet does not perform as well as the adGL. Secondly. Tables 19 show that adSpikeGL performs the best in terms of and . Take n = 50 for an example, adSpikeGL has the smallest in 22 out of 36 simulation setups, and the smallest in 24 out of 36 simulation setups. Though PMQ performs good for τ = 0.1, its performance drops when τ = 0.3 and τ = 0.5, especially in dense case (Simulation 2). Thirdly, Tables 19 also display that our approaches work well in terms of MAD, RMSE, and even when the true error term is not ALD, indicating that the LQMMs using ALD in model (2) is merely a working model with artificial assumptions that aim to achieve equivalence between the problems in maximizing ALD and minimizing (3). Finally, as we can observe from Figs 712 and Tables 19 that, in terms of the four criterions, adGL, adSpikeGl and PMQ all have an overall decreasing trend as the sample size increases, whereas BA, BAL and QR generally fail to estimate accurately, possibly because they almost completely ignore the random effects, which describe within-subject dependence among data, and their performance doesn’t get better as the sample size grows.

For adSpikeGL, the median posterior probability model (MPPM) criteria [33] and the highest posterior probability model (HPPM) criteria are used for model selection, respectively. Tables 1014 and 1115 summarize the number of correct and wrong zero coefficients for all cases based on MPPM and HPPM, respectively. On the whole, adSpikeGL generally yield promising results in most cases, even for extreme quantile when τ = 0.1 with notoriously challenging. As the sample size grows from 30 to 100, the number of the wrong zero coefficients goes to zero at all considered quantiles, and the number of the correct zero coefficients also become large at moderate and median quantiles. These all indicates good performance of our method in variable selection.

thumbnail
Table 10. Average numbers of correct and wrong zero coefficients for fixed effects and random effects in terms of MPPM for Simulation 1.

Random effects are in parentheses.

https://doi.org/10.1371/journal.pone.0241197.t010

thumbnail
Table 11. Average numbers of correct and wrong zero coefficients for fixed effects and random effects in terms of HPPM for Simulation 1.

Random effects are in parentheses.

https://doi.org/10.1371/journal.pone.0241197.t011

thumbnail
Table 12. Average numbers of correct and wrong zero coefficients for fixed effects and random effects in terms of MPPM for Simulation 2.

Random effects are in parentheses.

https://doi.org/10.1371/journal.pone.0241197.t012

thumbnail
Table 13. Average numbers of correct and wrong zero coefficients for fixed effects and random effects in terms of HPPM for Simulation 2.

Random effects are in parentheses.

https://doi.org/10.1371/journal.pone.0241197.t013

thumbnail
Table 14. Average numbers of correct and wrong zero coefficients for fixed effects and random effects in terms of MPPM for Simulation 3.

Random effects are in parentheses.

https://doi.org/10.1371/journal.pone.0241197.t014

thumbnail
Table 15. Average numbers of correct and wrong zero coefficients for fixed effects and random effects in terms of HPPM for Simulation 3.

Random effects are in parentheses.

https://doi.org/10.1371/journal.pone.0241197.t015

Next, in order to evaluate the sensitivity of our methods, we consider the following four different priors of tuning and scale parameters for Simulation 1: case(1) , case(2) , case(3) , case(4) . From Table 16, it can be observed that the differences between the four different priors is quite small in general. Hence our methods are not sensitivity to the choice of the hyper-parameters for the priors.

thumbnail
Table 16. MAD, RMSE, and for different priors in Simulation 1, when normal mixture error distribution.

https://doi.org/10.1371/journal.pone.0241197.t016

Real data example

In this section, we use the Age-Related Macular Degeneration Trial (ARMD) data to assess the behavior of the proposed model under different quantiles. This dataset consists of 4 variables, available in the R package nlme. Visual acuity of 188 patients with ARMD were measured at 4 time points, resulting in 752 observations. We consider the relationship between the variable of the visual acuity (VISUAL) and three other explanatory variables, including the value of visual acuity measured at baseline(VISUAL0), the time of the measurement(TIME), the treatment indicator(Treat). A detailed description is given in [34]. [34] analyzed the data set using a linear mean mixed model and found that there exist possible interactions between time and treat effects, we then consider the following LQMM (i = 1, …, 188, j = 1, …, 4):

Here we choose five different values of τ, i.e. 0.1, 0.3, 0.5, 0.7, 0.9, to thoroughly describe the response distribution. We implement the adGL, adSpikeGL, PMQ, BL, BAL and QR models on this data set. Table 17 shows the summaries of the MAD and the RMSE, where The results in Table 17 suggest that PMQ provides smaller MAD and the RMSE than other methods in upper quantiles(τ = 0.7 and τ = 09), however, adGL and adSpikeGL perform better in lower and median quantile (τ = 0.1, τ = 0.3 and τ = 0.5). We also report the posterior median estimations and 95% credible intervals for fixed effects and random effect variances in Table, respectively. As in simulations, adGL, adSpikeGL and PMQ are inclined to have similar performance, meanwhile, BAL, BL and QR also are likely to behave similarly, and all methods except for the adSpikeGL do not compress any fixed and random effects to zero.

The results of adSpikeGL in Tables 18 and 19 show that (i) for fixed effects, VISUAL0, TIME and TREAT are all identified as effective variables of Visual acuity at all quantiles except for TREAT at extreme quantile (τ = 0.1), this may be because treatment has no significant effect on the patients with poor visual acuity, (ii) for random effects, VISUAL0 and TREAT are deemed to be significant predictors of Visual acuity at all quantiles, however, TIME to be insignificant predictor at lower and median quantiles, and (iii) it is worth noticing that the fixed and random effect of the interactions between time and treat effects (TIME×TREAT) is not selected by adSpikeGL across all quantiles, suggesting that the effect of the treatment does not change over time no matter what the patient’s vision is.

thumbnail
Table 18. Posterior median and 95% credible intervals for the fixed effects of ARMD data.

https://doi.org/10.1371/journal.pone.0241197.t018

thumbnail
Table 19. Posterior median and 95% credible intervals for the random effect variances of ARMD data.

https://doi.org/10.1371/journal.pone.0241197.t019

Conclusion and discussion

In this work, we presented a novel shrinkage method to linear quantile mixed model for analysing and interpreting longitudinal data. We also consider the variable selection based on a Bayesian adaptive lasso and an extended Bayesian adaptive group lasso with spike and slab priors. Unlike other approaches for LQMMs, which need some ad hoc methods to do model selection, adSpikeGL can directly identify the significant fixed and random effects simultaneously. Several criterions were adopted to evaluate the finite sample performance of the proposed methods thoroughly by simulations and real dat example. The results reveal that the proposed methods is very competitive with the existing methods. On the other hand, as the sample size gets large, adGL and adGLSpike generally tend to provide smaller MMAD, MRMSE, and across all scenarios, and adGLSpike also has good performance in variable selection in most cases. The only exception is at τ = 0.1 where the number of the correct zero coefficients of adGLSpike did not show an increasing trend with sample size. The problem can be caused by the drawbacks of the ALD that do not allow the skewness, kurtosis and tails of distribution to vary. To overcome this problem [35] recently introduced a generalized asymmetric Laplace distribution (GALD) to establish QR in linear regression. In the future, we would like to extend the GALD to the linear mixed effect models. Moreover, we would also like to consider some other Bayesian shrinkage priors, such as the double-Pareto prior [36], horseshoe prior [37], the normal-gamma prior [38], etc, though the lasso-type priors is the most widely used shrinkage priors in the literature.

Acknowledgments

We thank the editor, the associate editor and two anonymous referees for their helpful comments which led to a considerable improvement of the original article.

References

  1. 1. Koenker R. Quantile regression for longitudinal data. Journal of Multivariate Analysis. 2004;91: 74–89.
  2. 2. Harding M, Lamarche C. A quantile regression approach for estimating panel data models using instrumental variables. Economics Letters. 2009;104:133–135.
  3. 3. Canay AI. A simple approach to quantile regression for panel data. Econometrics Journal. 2011;14:368–386.
  4. 4. Graham B, Hahn J, Poirier A, Powell J. A quantile correlated random coefficients panel data model. Journal of Econometrics. 2018;206:305–335.
  5. 5. Galarza C, Lachos V, Bandyopadhyay D. Quantile regression in linear mixed models: a stochastic approximation EM approach. Statistics and Its Interface. 2017;10:471–482. pmid:29104713
  6. 6. Chernozhukov V, Fernandez Val I, Hahn J, Newey W. Average and quantile effects in nonseparable panel models. Econometrica. 2013;81:535–580.
  7. 7. Harding M, Lamarche C. Estimating and testing a quantile regression model with interactive effects. Journal of Econometrics. 2014;178:101–113.
  8. 8. Sriram K, Shi P, Ghosh P. A Bayesian quantile regression model for insurance company costs data. Journal of the Royal Statistical Society, Series A. 2016;179:177–202.
  9. 9. Geraci M, Bottai M. Linear quantile mixed models. Statistics and computing. 2014;24:461–479.
  10. 10. Aghamohammadi A, Mohammadi S. Bayesian analysis of penalized quantile regression for longitudinal data. Statistical Papers. 2017;58:1035–1053.
  11. 11. Alhamzawi R, Yu K. Bayesian Lasso mixed quantile regression. Journal of Statistical Computation and Simulation. 2014;84:868–880.
  12. 12. Seber GAF. Linear Regression Analysis. New York:Wiley; 1977.
  13. 13. Ibrahim JG, Zhu H, Garcia RR, Guo R. Fixed and Random Effects Selection in Mixed Effects Models. Biometrics. 2011;67:495–503. pmid:20662831
  14. 14. Yu K, Moyeed RA. Bayesian quantile regression. Statistics and Probability Letters. 2001;54:437–447.
  15. 15. Yu K, Zhang J. A three-parameter asymmetric Laplace distribution and its extension. Communications in Statistician Theory and Mehods. 2005;34:1867–1879.
  16. 16. Geraci M, Bottai M. Quantile regression for longitudinal data using the asymmetric Laplace distribution. Biostatistics. 2007;8:140–154. pmid:16636139
  17. 17. Li Q, Xi R, Lin N. Bayesian regularized quantile Regression. Bayesian Analysis. 2010;5:533–556.
  18. 18. Alhamzawi R, Yu K. Bayesian adaptive LASSO quantile regression. Statistical Modelling. 2012;12:279–297.
  19. 19. Kozumi H, Kobayashi G. Gibbs sampling methods for Bayesian quantile regression. Journal of Statistical Computation and Simulation. 2011;81:1565–1578.
  20. 20. Ji YG, Lin N, Zhang BX. Model selection in binary and tobit quantile regression using the Gibbs sampler. Computational Statistics & Data Analysis. 2012;56:827–839.
  21. 21. Alshaybawee T, Midi H, Alhamzawi R. Bayesian elastic net single index quantile regression. Journal of Applied Statistics. 2017;44:853–871.
  22. 22. Kobayashi G. Bayesian endogenous tobit quantile regression. Bayesian Analysis. 2017;12:161–191.
  23. 23. Li Q, Lin N. The Bayesian Elastic Net. Bayesian Analysis. 2010;5:151–170.
  24. 24. Chen Z, Dunson D. Random effects selection in linear mixed models. Biometrics. 2003;59:762–769. pmid:14969453
  25. 25. Kinney SK, Dunson D. Fixed and random effects selection in linear and logistic models. Biometrics. 2008;63:690–698.
  26. 26. van der Pas S, Szabo B, der Vaart v,Aad . Adaptive posterior contraction rates for the horseshoe. Electronic Journal of Statistics. 2017;11:3196–3225.
  27. 27. van Dyk DA, Park T. Partially collapsed Gibbs samplers: Theory and methods. Journal of the American Statistical Association. 2008;103:790–796.
  28. 28. Liu JS, Wong WH, Kong A. Covariance Structure of the Gibbs Sampler, With Applications to Comparisons of Estimators and Augmentation Schemes. Biometrika. 1994;81:27–40.
  29. 29. Bondell HD, Krishna A, Ghosh SK. Joint Variable Selection for Fixed and Random Effects in Linear Mixed Effects Models. Biometrics. 2010;66:1069–1077. pmid:20163404
  30. 30. Peng H, Lu Y. Model selection in linear mixed effect models. Journal of Multivariate Analysis. 2012;109:109–129.
  31. 31. Brooks SP, Gelman A. General methods for monitoring convergence of iterative simulations. Journal of computational and graphical statistics. 1998;7:434–455.
  32. 32. Williams JP, Lu Y. Covariance Selection in the Linear Mixed Effect Model. JMLR: Workshop and Conference Proceedings. 2015;44:277–291.
  33. 33. Barbieri MM, Berger JO. Optimal predictive model selection. The Annals of Statistics. 2004;32:870–897.
  34. 34. Galecki A, Burzykowski T. Linear Mixed-Effects Models Using R. Springer; 2013.
  35. 35. Yan Y, Kottas A. A New Family of Error Distributions for Bayesian Quantile Regression. Working Paper. 2017.
  36. 36. Armagan T, Dunson D, Lee J. Generalized double Pareto shrinkage. Statistica Sinica. 2013;23:119–143. pmid:24478567
  37. 37. Carvalho CM, Polson NG, Scott JG. The horseshoe estimator for sparse signals. Biometrika. 2010;97:465–480.
  38. 38. Griffin JE, Brown PJ. Inference with normal-gamma prior distributions in regression problems. Bayesian Analysis. 2010;5:171–188.