Skip to main content
Log in

Using mixtures in seemingly unrelated linear regression models with non-normal errors

  • Published:
Statistics and Computing Aims and scope Submit manuscript

    We’re sorry, something doesn't seem to be working properly.

    Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Seemingly unrelated linear regression models are introduced in which the distribution of the errors is a finite mixture of Gaussian distributions. Identifiability conditions are provided. The score vector and the Hessian matrix are derived. Parameter estimation is performed using the maximum likelihood method and an Expectation–Maximisation algorithm is developed. The usefulness of the proposed methods and a numerical evaluation of their properties are illustrated through the analysis of simulated and real datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Ando, T., Zellner, A.: Hierarchical Bayesian analysis of the seemingly unrelated regression and simultaneous equations models using a combination of direct Monte Carlo and importance sampling techniques. Bayesian Anal. 5, 65–96 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  • Baird, I.G., Quastel, N.: Dolphin-safe tuna from California to Thailand: localisms in environmental certification of global commodity networks. Ann. Assoc. Am. Geogr. 101, 337–355 (2011)

    Article  Google Scholar 

  • Banfield, J.D., Raftery, A.E.: Model-based Gaussian and non-Gaussian clustering. Biometrics 49, 803–821 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  • Bartolucci, F., Scaccia, L.: The use of mixtures for dealing with non-normal regression errors. Comput. Stat. Data Anal. 48, 821–834 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  • Biernacki, C., Celeux, G., Govaert, G.: Assessing a mixture model for clustering with the integrated classification likelihood. IEEE Trans. Pattern Anal. Mach. Intell. 22, 719–725 (2000)

    Article  Google Scholar 

  • Biernacki, C., Celeux, G., Govaert, G.: Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models. Comput. Stat. Data Anal. 41, 561–575 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  • Boldea, O., Magnus, J.R.: Maximum likelihood estimation of the multivariate normal mixture model. J. Am. Stat. Assoc. 104, 1539–1549 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  • Chatterjee, S., Laudato, M., Lynch, L.A.: Genetic algorithms and their statistical applications: an introduction. Comput. Stat. Data Anal. 22, 633–651 (1996)

    Article  MATH  Google Scholar 

  • Celeux, G., Govaert, G.: Gaussian parsimonious clustering models. Pattern Recognit. 28, 781–793 (1995)

    Article  Google Scholar 

  • Chevalier, J.A., Kashyap, A.K., Rossi, P.E.: Why don’t prices rise during periods of peak demand? Evidence from scanner data. Am. Econ. Rev. 93, 15–37 (2003)

    Article  Google Scholar 

  • Cook, R.D., Weisberg, S.: An Introduction to Regression Graphics. Wiley, New York (1994)

    Book  MATH  Google Scholar 

  • Cutler, A., Windham, M.P.: Information-based validity functionals for mixture analysis. In: Bozdogan, H. (ed.) Proceedings of the First US/Japan Conference on the Frontiers of Statistical Modeling: An Informational Approach, pp. 149–170. Kluwer Academic, Dordrecht (1994)

    Chapter  Google Scholar 

  • Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood for incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B 39, 1–22 (1977)

    MathSciNet  MATH  Google Scholar 

  • Fraley, C., Raftery, A.E.: Model-based clustering, discriminant analysis and density estimation. J. Am. Stat. Assoc. 97, 611–631 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  • Fraley, C., Raftery, A.E., Murphy, T.B., Scrucca, L.: mclust version 4 for R: normal mixture modeling for model-based clustering, classification, and density estimation. Technical report no. 597, Department of Statistics, University of Washington (2012)

  • Fraser, D.A.S., Rekkas, M., Wong, A.: Highly accurate likelihood analysis for the seemingly unrelated regression problem. J. Econom. 127, 17–33 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  • Frühwirth-Schnatter, S.: Finite Mixture and Markov Switching Models. Springer, New York (2006)

    MATH  Google Scholar 

  • Galimberti, G., Soffritti, G.: A multivariate linear regression analysis using finite mixtures of \(t\) distributions. Comput. Stat. Data Anal. 71, 138–150 (2014)

    Article  MathSciNet  Google Scholar 

  • Henningsen, A., Hamann, J.D.: systemfit: a package for estimating systems of simultaneous equations in R. J. Stat. Softw. 23(4), 1–40 (2007)

    Article  Google Scholar 

  • Keribin, C.: Consistent estimation of the order of mixture models. Sankhyā 62, 49–66 (2000)

    MathSciNet  MATH  Google Scholar 

  • Kmenta, J., Gilbert, R.: Small sample properties of alternative estimators of seemingly unrelated regressions. J. Am. Stat. Assoc. 63, 1180–1200 (1968)

    Article  MathSciNet  Google Scholar 

  • Kowalski, J., Mendoza-Blanco, J.R., Tu, X.M., Gleser, L.J.: On the difference in inference and prediction between the joint and independent \(t\)-error models for seemingly unrelated regressions. Commun. Stat. Theory 28, 2119–2140 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  • Kurata, H.: On the efficiencies of several generalized least squares estimators in a seemingly unrelated regression model and a heteroscedastic model. J. Multivar. Anal. 70, 86–94 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  • Lange, K.L., Little, R.J.A., Taylor, J.M.G.: Robust statistical modeling using the \(t\) distribution. J. Am. Stat. Assoc. 84, 881–896 (1989)

    MathSciNet  Google Scholar 

  • Magnus, J.R.: Maximum likelihood estimation of the GLS model with unknown parameters in the disturbance covariance matrix. J. Econom. 7, 281–312 (1978)

    Article  MathSciNet  MATH  Google Scholar 

  • Magnus, J.R., Neudecker, H.: Matrix Differential Calculus with Applications in Statistics and Econometrics. Wiley, Chichester (1988)

    MATH  Google Scholar 

  • Maugis, C., Celeux, G., Martin-Magniette, M.-L.: Variable selection in model-based clustering: a general variable role modeling. Comput. Stat. Data Anal. 53, 3872–3882 (2009a)

    Article  MathSciNet  MATH  Google Scholar 

  • Maugis, C., Celeux, G., Martin-Magniette, M.-L.: Variable selection for clustering with Gaussian mixture models. Biometrics 65, 707–709 (2009b)

    Article  MathSciNet  MATH  Google Scholar 

  • McLachlan, G.J., Krishnan, T.: The EM Algorithm and Extensions, 2nd edn. Wiley, Chichester (2008)

    Book  MATH  Google Scholar 

  • McLachlan, G.J., Peel, D.: Finite Mixture Models. Wiley, Chichester (2000)

    Book  MATH  Google Scholar 

  • McLachlan, G.J., Peel, D., Bean, R.W.: Modelling high-dimensional data by mixtures of factor analyzers. Comput. Stat. Data Anal. 41, 379–388 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  • McNicholas, P.D., Murphy, T.B.: Parsimonious Gaussian mixture models. Stat. Comput. 18, 285–296 (2008)

    Article  MathSciNet  Google Scholar 

  • Melnykov, V., Melnykov, I.: Initializing the EM algorithm in Gaussian mixture models with an unknown number of components. Comput. Stat. Data Anal. 56, 1381–1395 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  • Ng, V.M.: Robust Bayesian inference for seemingly unrelated regressions with elliptical errors. J. Multivar. Anal. 83, 409–414 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  • Oberhofer, W., Kmenta, J.: A general procedure for obtaining maximum likelihood estimates in generalized regression models. Econometrica 42, 579–590 (1974)

    Article  MathSciNet  MATH  Google Scholar 

  • Park, T.: Equivalence of maximum likelihood estimation and iterative two-stage estimation for seemingly unrelated regression models. Commun. Stat. Theory 22, 2285–2296 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  • Percy, D.F.: Predictions for seemingly unrelated regression. J. R. Stat. Soc. Ser. B 54, 243–252 (1992)

    Google Scholar 

  • Ray, S., Lindsay, B.G.: Model selection in high dimensions: a quadratic-risk-based approach. J. R. Stat. Soc. Ser. B 70, 95–118 (2008)

    MathSciNet  MATH  Google Scholar 

  • R Core Team: R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. URL http://www.R-project.org/ (2014)

  • Redner, R.A., Walker, H.F.: Mixture densities, maximum likelihood and the EM algorithm. SIAM Rev. 26, 195–239 (1984)

    Article  MathSciNet  MATH  Google Scholar 

  • Rilstone, P., Veall, M.: Using bootstrapped confidence intervals for improved inferences with seemingly unrelated regression equations. Econom. Theory 12, 569–580 (1996)

    Article  MathSciNet  Google Scholar 

  • Rocke, D.: Bootstrap Bartlett adjustment in seemingly unrelated regression. J. Am. Stat. Assoc. 84, 598–601 (1989)

    Article  MathSciNet  Google Scholar 

  • Rossi, P.E.: bayesm: Bayesian inference for marketing/micro-econometrics. R package version 2.2-5. URL http://CRAN.R-project.org/package=bayesm (2012)

  • Schott, J.R.: Matrix Analysis for Statistics, 2nd edn. Wiley, New York (2005)

    MATH  Google Scholar 

  • Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978)

    Article  MathSciNet  MATH  Google Scholar 

  • Soffritti, G., Galimberti, G.: Multivariate linear regression with non-normal errors: a solution based on mixture models. Stat. Comput. 21, 523–536 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  • Srivastava, V.K., Giles, D.E.A.: Seemingly Unrelated Regression Equations Models. Marcel Dekker, New York (1987)

    MATH  Google Scholar 

  • Srivastava, V.K., Maekawa, K.: Efficiency properties of feasible generalized least squares estimators in SURE models under non-normal disturbances. J. Econom. 66, 99–121 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  • Zellner, A.: An efficient method of estimating seemingly unrelated regression equations and tests for aggregation bias. J. Am. Stat. Assoc. 57, 348–368 (1962)

    Article  MathSciNet  MATH  Google Scholar 

  • Zellner, A.: Estimators for seemingly unrelated regression equations: some exact finite sample results. J. Am. Stat. Assoc. 58, 977–992 (1963)

    Article  MathSciNet  MATH  Google Scholar 

  • Zellner, A.: An Introduction to Bayesian Inference in Econometrics. Wiley, New York (1971)

    MATH  Google Scholar 

  • Zellner, A., Ando, T.: A direct Monte Carlo approach for Bayesian analysis of the seemingly unrelated regression model. J. Econom. 159, 33–45 (2010a)

  • Zellner, A., Ando, T.: Bayesian and non-Bayesian analysis of the seemingly unrelated regression model with Student \(t\) errors, and its application for forecasting. Int. J. Forecast. 26, 413–434 (2010b)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Giuliano Galimberti.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 180 KB)

Appendices

Appendix 1: Proof of Theorem 2

The proof is based on the computation of the first order differential of \(l\left( \varvec{\theta }\right) \). The model log-likelihood in Eq. (11) can be expressed as \(l\left( \varvec{\theta }\right) =\sum _{i=1}^I \ln \left( \sum _{k=1}^K f_{ki}\right) \). Thus, the first differential of \(l\left( \varvec{\theta }\right) \) is

$$\begin{aligned} \mathrm {d}l\left( \varvec{\theta }\right) = \sum _{i=1}^I \mathrm {d}\ln \left( \sum _{k=1}^K f_{ki}\right) = \sum _{i=1}^I \left( \sum _{k=1}^K \alpha _{ki}\mathrm {d}\ln f_{ki}\right) . \end{aligned}$$
(24)

Up to an additive constant, \(\ln f_{ki}\) is equal to

$$\begin{aligned}&\ln \pi _k -\frac{1}{2}\ln \det \left( \varvec{\varSigma }_k\right) \\&\quad -\frac{1}{2}\mathrm {tr}\left[ \varvec{\varSigma }_k^{-1}\left( \mathbf {y}_i-\varvec{\lambda }_k-\mathbf {X}^{\prime }_{i}\varvec{\beta } \right) \left( \mathbf {y}_i-\varvec{\lambda }_k-\mathbf {X}_{i}^{\prime }\varvec{\beta } \right) ^{\prime }\right] , \end{aligned}$$

and

$$\begin{aligned} \mathrm {d}\ln f_{ki} = \mathrm {d} \ln \pi _k+\mathrm {d}_{ki1}+\mathrm {d}_{ki2}+\mathrm {d}_{ki3}, \end{aligned}$$
(25)

where

$$\begin{aligned}&\mathrm {d}_{ki1} = -\frac{1}{2}\mathrm {d}\left( \ln \det \left( \varvec{\varSigma }_k\right) \right) , \end{aligned}$$
(26)
$$\begin{aligned}&\mathrm {d}_{ki2} = -\frac{1}{2} \mathrm {tr}\left[ \mathrm {d}\left( \varvec{\varSigma }_k^{-1}\right) \left( \mathbf {y}_i-\varvec{\lambda }_k-\mathbf {X}^{\prime }_{i}\varvec{\beta } \right) \left( \mathbf {y}_i-\varvec{\lambda }_k-\mathbf {X}^{\prime }_{i}\varvec{\beta } \right) ^{\prime }\right] , \nonumber \\ \end{aligned}$$
(27)
$$\begin{aligned}&\mathrm {d}_{ki3} = -\frac{1}{2} \mathrm {tr}\left[ \varvec{\varSigma }_k^{-1}\mathrm {d}\left( \left( \mathbf {y}_i-\varvec{\lambda }_k-\mathbf {X}^{\prime }_{i}\varvec{\beta } \right) \left( \mathbf {y}_i-\varvec{\lambda }_k-\mathbf {X}^{\prime }_{i}\varvec{\beta } \right) ^{\prime }\right) \right] .\nonumber \\ \end{aligned}$$
(28)

The four terms in Eq. (25) can be re-expressed as follows:

$$\begin{aligned}&\mathrm {d}\ln \pi _k = \left( \mathrm {d}\varvec{\pi }\right) ^{\prime }\mathbf {a}_k, \end{aligned}$$
(29)
$$\begin{aligned}&\mathrm {d}_{ki1} = -\frac{1}{2}\mathrm {tr}\left[ \left( \mathrm {d}\varvec{\varSigma }_k\right) \varvec{\varSigma }_k^{-1}\right] , \end{aligned}$$
(30)
$$\begin{aligned}&\mathrm {d}_{ki2} = \frac{1}{2} \mathrm {tr}\left[ \left( \mathrm {d}\varvec{\varSigma }_k\right) \mathbf {b}_{ki}\mathbf {b}^{\prime }_{ki} \right] , \end{aligned}$$
(31)
$$\begin{aligned}&\mathrm {d}_{ki3} = \left( \mathrm {d}\varvec{\lambda }_k\right) ^{\prime }\mathbf {b}_{ki} + \left( \mathrm {d}\varvec{\beta }\right) ^{\prime }\mathbf {X}_{i}\mathbf {b}_{ki}, \end{aligned}$$
(32)

where Eqs. (30)–(32) are obtained by exploiting some results from matrix derivatives (Magnus and Neudecker 1988, pp. 182–183; Schott 2005, pp. 292,293,361). Since the sum of \(\mathrm {d}_{ki1}\) and \(\mathrm {d}_{ki2}\) results in

$$\begin{aligned} \mathrm {d}_{ki1}+\mathrm {d}_{ki2} = -\frac{1}{2}\mathrm {d}\left( \mathrm {v}\varvec{\varSigma }_k\right) ^{\prime }\mathbf {G}^{\prime }\mathrm {vec}\left( \mathbf {B}_{ki}\right) , \end{aligned}$$
(33)

(see Schott 2005, pp. 293,313,356,374) inserting Eqs. (29), (32) and (33) in Eq. (25) leads to

$$\begin{aligned} \mathrm {d}\ln f_{ki}= & {} \left( \mathrm {d}\varvec{\pi }\right) ^{\prime }\mathbf {a}_k+ \left( \mathrm {d}\varvec{\beta }\right) ^{\prime }\mathbf {X}_{i}\mathbf {b}_{ki} + \left( \mathrm {d}\varvec{\lambda }_k\right) ^{\prime }\mathbf {b}_{ki}\nonumber \\&-\frac{1}{2}\mathrm {d}\left( \mathrm {v}\varvec{\varSigma }_k\right) ^{\prime }\mathbf {G}^{\prime }\mathrm {vec}\left( \mathbf {B}_{ki}\right) \nonumber \\= & {} \left( \mathrm {d}\varvec{\pi }\right) ^{\prime }\mathbf {a}_k+ \left( \mathrm {d}\varvec{\beta }\right) ^{\prime }\mathbf {X}_{i}\mathbf {b}_{ki} + \left( \mathrm {d}\varvec{\theta }_k\right) ^{\prime }\mathbf {c}_{ki}. \end{aligned}$$
(34)

Using Eqs. (24) and (34), \(\mathrm {d}l\left( \varvec{\theta }\right) \) can be expressed as

$$\begin{aligned} \mathrm {d}l\left( \varvec{\theta }\right)= & {} \left( \mathrm {d}\varvec{\pi }\right) ^{\prime }\sum _{i=1}^I \sum _{k=1}^K \alpha _{ki}\mathbf {a}_k + \left( \mathrm {d}\varvec{\beta }\right) ^{\prime }\sum _{i=1}^I \mathbf {X}_{i}\sum _{k=1}^K \alpha _{ki}\mathbf {b}_{ki}\nonumber \\&+\sum _{k=1}^K \left( \mathrm {d}\varvec{\theta }_k\right) ^{\prime } \sum _{i=1}^I \alpha _{ki}\mathbf {c}_{ki}, \end{aligned}$$
(35)

thus proving the theorem.

Appendix 2: Proof of Theorem 3

The proof is based on the computation of the second order differential of \(l\left( \varvec{\theta }\right) \):

$$\begin{aligned} \mathrm {d}^2l\left( \varvec{\theta }\right) = \sum _{i=1}^I \mathrm {d}^2\ln \left( \sum _{k=1}^K f_{ki}\right) , \end{aligned}$$
(36)

where

$$\begin{aligned} \mathrm {d}^2\ln \left( \sum _{k=1}^K f_{ki}\right)= & {} \sum _{k=1}^K \alpha _{ki}\mathrm {d}^2\ln f_{ki}+\sum _{k=1}^K \alpha _{ki}\left( \mathrm {d}\ln f_{ki}\right) ^2\nonumber \\&-\left( \sum _{k=1}^K \alpha _{ki}\mathrm {d}\ln f_{ki}\right) ^2 \end{aligned}$$
(37)

(see Boldea and Magnus 2009, Appendix).

Since \(\left( \mathrm {d}\ln f_{ki}\right) ^2=\left( \mathrm {d}\ln f_{ki}\right) \left( \mathrm {d}\ln f_{ki}\right) ^{\prime }\), using Eq. (34) it results that

$$\begin{aligned} \left( \mathrm {d}\ln f_{ki}\right) ^2= & {} \left( \mathrm {d}\varvec{\pi }\right) ^{\prime }\mathbf {a}_k\mathbf {a}^{\prime }_k\mathrm {d}\varvec{\pi } + \left( \mathrm {d}\varvec{\pi }\right) ^{\prime }\mathbf {a}_k \mathbf {b}^{\prime }_{ki}\mathbf {X}^{\prime }_{i}\mathrm {d}\varvec{\beta }\nonumber \\&+ \left( \mathrm {d}\varvec{\pi }\right) ^{\prime }\mathbf {a}_k \mathbf {c}^{\prime }_{ki}\mathrm {d}\varvec{\theta }_k \nonumber \\&+ \left( \mathrm {d}\varvec{\beta }\right) ^{\prime }\mathbf {X}_{i}\mathbf {b}_{ki}\mathbf {a}^{\prime }_k\mathrm {d}\varvec{\pi } + \left( \mathrm {d}\varvec{\beta }\right) ^{\prime }\mathbf {X}_{i}\mathbf {b}_{ki}\mathbf {b}^{\prime }_{ki}\mathbf {X}^{\prime }_{i}\mathrm {d}\varvec{\beta }\nonumber \\&+\left( \mathrm {d}\varvec{\beta }\right) ^{\prime }\mathbf {X}_{i}\mathbf {b}_{ki}\mathbf {c}^{\prime }_{ki}\mathrm {d}\varvec{\theta }_k \nonumber \\&+ \left( \mathrm {d}\varvec{\theta }_k\right) ^{\prime }\mathbf {c}_{ki}\mathbf {a}^{\prime }_k\mathrm {d}\varvec{\pi } + \left( \mathrm {d}\varvec{\theta }_k\right) ^{\prime }\mathbf {c}_{ki}\mathbf {b}^{\prime }_{ki}\mathbf {X}^{\prime }_{i}\mathrm {d}\varvec{\beta }\nonumber \\&+ \left( \mathrm {d}\varvec{\theta }_k\right) ^{\prime }\mathbf {c}_{ki}\mathbf {c}^{\prime }_{ki}\mathrm {d}\varvec{\theta }_k. \end{aligned}$$
(38)

Similarly,

$$\begin{aligned} \left( \sum _{k=1}^K \alpha _{ki} \mathrm {d} \ln f_{ki}\right) ^2= & {} \left( \sum _{k=1}^K \alpha _{ki}\mathrm {d} \ln f_{ki}\right) \left( \sum _{k=1}^K\alpha _{ki}\mathrm {d}\ln f_{ki}\right) ^{\prime }\nonumber \\= & {} \left( \mathrm {d}\varvec{\pi }\right) ^{\prime }\bar{\mathbf {a}}_i\bar{\mathbf {a}}^{\prime }_i\mathrm {d}\varvec{\pi } + \left( \mathrm {d}\varvec{\pi }\right) ^{\prime }\bar{\mathbf {a}}_i \bar{\mathbf {b}}^{\prime }_{i}\mathbf {X}^{\prime }_{i}\mathrm {d}\varvec{\beta }\nonumber \\&+\left( \mathrm {d}\varvec{\pi }\right) ^{\prime }\bar{\mathbf {a}}_i \sum _{k=1}^K \alpha _{ki}\mathbf {c}^{\prime }_{ki}\mathrm {d}\varvec{\theta }_k \nonumber \\&+ \left( \mathrm {d}\varvec{\beta }\right) ^{\prime }\mathbf {X}_{i}\bar{\mathbf {b}}_{i}\bar{\mathbf {a}}^{\prime }_i\mathrm {d}\varvec{\pi }+ \left( \mathrm {d}\varvec{\beta }\right) ^{\prime }\mathbf {X}_{i}\bar{\mathbf {b}}_{i}\bar{\mathbf {b}}^{\prime }_{i}\mathbf {X}^{\prime }_{i}\mathrm {d}\varvec{\beta }\nonumber \\&+ \left( \mathrm {d}\varvec{\beta }\right) ^{\prime }\mathbf {X}_{i}\bar{\mathbf {b}}_{i}\sum _{k=1}^K\alpha _{ki}\mathbf {c}^{\prime }_{ki}\mathrm {d}\varvec{\theta }_k\nonumber \\&+\left[ \sum _{k=1}^K\left( \mathrm {d}\varvec{\theta }_k\right) ^{\prime }\alpha _{ki}\mathbf {c}_{ki}\right] \bar{\mathbf {a}}^{\prime }_i\mathrm {d}\varvec{\pi }\nonumber \\&+ \left[ \sum _{k=1}^K\left( \mathrm {d}\varvec{\theta }_k\right) ^{\prime }\alpha _{ki}\mathbf {c}_{ki}\right] \bar{\mathbf {b}}^{\prime }_{i} \mathbf {X}_{i}^{\prime }\mathrm {d}\varvec{\beta } \nonumber \\&+ \sum _{k=1}^K\sum _{h=1}^K\left( \mathrm {d}\varvec{\theta }_k\right) ^{\prime }\alpha _{ki}\alpha _{hi}\mathbf {c}_{ki}\mathbf {c}^{\prime }_{hi} \mathrm {d}\varvec{\theta }_l. \end{aligned}$$
(39)

Furthermore,

$$\begin{aligned} \mathrm {d}^2\ln f_{ki}= & {} -\left( \mathrm {d}\varvec{\pi }\right) ^{\prime }\mathbf {a}_k\mathbf {a}^{\prime }_k\mathrm {d}\varvec{\pi } -\left( \mathrm {d}\varvec{\beta }\right) ^{\prime }\mathbf {X}_{i}\varvec{\varSigma }_k^{-1}\mathbf {X}^{\prime }_{i}\mathrm {d}\varvec{\beta }\nonumber \\&-\left( \mathrm {d}\varvec{\theta }_k\right) ^{\prime }\mathbf {F}^{\prime }_{ki}\mathbf {X}^{\prime }_{i}\mathrm {d}\varvec{\beta }\nonumber \\&-\left( \mathrm {d}\varvec{\beta }\right) ^{\prime }\mathbf {X}_{i}\mathbf {F}_{ki}\mathrm {d}\varvec{\theta }_k -\left( \mathrm {d}\varvec{\theta }_k\right) ^{\prime }\mathbf {C}_{ki}\mathrm {d}\varvec{\theta }_k \end{aligned}$$
(40)

(see Appendix 3). From Eqs. (37), (38), (39) and (42) and by grouping together the common factors it follows that

$$\begin{aligned} \mathrm {d}^2\ln \left( \sum _{k=1}^K f_{ki}\right)= & {} -\left( \mathrm {d}\varvec{\pi }\right) ^{\prime }\bar{\mathbf {a}}_i\bar{\mathbf {a}}^{\prime }_i\mathrm {d}\varvec{\pi }\nonumber \\&+ \left( \mathrm {d}\varvec{\pi }\right) ^{\prime }\left[ \left( \sum _{k=1}^K\alpha _{ki}\mathbf {a}_k \mathbf {b}^{\prime }_{ki}\right) -\bar{\mathbf {a}}_i \bar{\mathbf {b}}^{\prime }_{i}\right] \mathbf {X}^{\prime }_{i}\mathrm {d}\varvec{\beta } \nonumber \\&+ \left( \mathrm {d}\varvec{\pi }\right) ^{\prime }\left[ \sum _{k=1}^K\alpha _{ki}\left( \mathbf {a}_{k}-\bar{\mathbf {a}}_i\right) \mathbf {c}^{\prime }_{ki}\mathrm {d}\varvec{\theta }_k\right] \nonumber \\&+ \left( \mathrm {d}\varvec{\beta }\right) ^{\prime }\mathbf {X}_{i}\left[ \left( \sum _{k=1}^K\alpha _{ki}\mathbf {b}_{ki} \mathbf {a}^{\prime }_{k}\right) -\bar{\mathbf {b}}_{i}\bar{\mathbf {a}}^{\prime }_i\right] \mathrm {d}\varvec{\pi } \nonumber \\&- \left( \mathrm {d}\varvec{\beta }\right) ^{\prime }\mathbf {X}_{i}\left[ \bar{\mathbf {B}}_{i}+ \bar{\mathbf {b}}_{i}\bar{\mathbf {b}}^{\prime }_{i}\right] \mathbf {X}^{\prime }_{i}\mathrm {d}\varvec{\beta } \nonumber \\&-\left( \mathrm {d}\varvec{\beta }\right) ^{\prime }\mathbf {X}_{i}\left\{ \sum _{k=1}^K\alpha _{ki} \left[ \mathbf {F}_{ki}-\left( \mathbf {b}_{ki}-\bar{\mathbf {b}}_{i}\right) \mathbf {c}^{\prime }_{ki}\right] \mathrm {d}\varvec{\theta }_k\right\} \nonumber \\&+ \left[ \sum _{k=1}^K\left( \mathrm {d}\varvec{\theta }_k\right) ^{\prime }\alpha _{ki}\mathbf {c}_{ki} \left( \mathbf {a}^{\prime }_k-\bar{\mathbf {a}}^{\prime }_i\right) \right] \mathrm {d}\varvec{\pi }\nonumber \\&- \left\{ \sum _{k=1}^K\left( \mathrm {d}\varvec{\theta }_k\right) ^{\prime }\alpha _{ki}\left[ \mathbf {F}^{\prime }_{ki} -\mathbf {c}_{ki}\left( \mathbf {b}^{\prime }_{ki}-\bar{\mathbf {b}}^{\prime }_{i}\right) \right] \right\} \mathbf {X}^{\prime }_{i}\mathrm {d}\varvec{\beta }\nonumber \\&- \sum _{k=1}^K\left( \mathrm {d}\varvec{\theta }_k\right) ^{\prime }\alpha _{ki}\left[ \mathbf {C}_{ki}-\mathbf {c}_{ki}\mathbf {c}_{ki}^{\prime }\right] \mathrm {d}\varvec{\theta }_k\nonumber \\&-\sum _{k=1}^K\sum _{h=1}^K\left[ \left( \mathrm {d}\varvec{\theta }_k\right) ^{\prime }\alpha _{ki}\alpha _{hi}\mathbf {c}_{ki}\mathbf {c}_{hi}^{\prime }\mathrm {d}\varvec{\theta }_h\right] . \end{aligned}$$
(41)

Inserting Eq. (41) in Eq. (36) completes the proof.

Appendix 3: Second order differential of \(\ln f_{ki}\)

Using Eq. (25) the second order differential of \(\ln f_{ki}\) can be expressed as

$$\begin{aligned} \mathrm {d}^2\ln f_{ki}=\mathrm {d}^2\ln \pi _k+\mathrm {d}\left( \mathrm {d}_{ki1}\right) +\mathrm {d}\left( \mathrm {d}_{ki2}\right) +\mathrm {d}\left( \mathrm {d}_{ki3}\right) . \end{aligned}$$
(42)

From Eq. (29) it follows that

$$\begin{aligned} \mathrm {d}^2\ln \pi _k=-\left( \mathrm {d}\varvec{\pi }\right) ^{\prime }\mathbf {a}_k\mathbf {a}^{\prime }_k\mathrm {d}\varvec{\pi }. \end{aligned}$$
(43)

The second term in Eq. (42) is equal to

$$\begin{aligned} \mathrm {d}\left( \mathrm {d}_{ki1}\right)= & {} -\frac{1}{2}\mathrm {tr}\left[ \mathrm {d}\varvec{\varSigma }_k\left( \mathrm {d}\varvec{\varSigma }_k^{-1}\right) \right] \nonumber \\= & {} \frac{1}{2}\mathrm {tr}\left[ \left( \mathrm {d}\varvec{\varSigma }_k\right) \varvec{\varSigma }_k^{-1}\left( \mathrm {d}\varvec{\varSigma }_k\right) \varvec{\varSigma }_k^{-1}\right] . \end{aligned}$$
(44)

The third term that composes \(\mathrm {d}^2\ln f_{ki}\) results to be

$$\begin{aligned} \mathrm {d}\left( \mathrm {d}_{ki2}\right)= & {} \frac{1}{2}\mathrm {tr}\left[ \mathrm {d}\left( \varvec{\varSigma }_k^{-1}\right) \left( \mathrm {d}\varvec{\varSigma }_k\right) \varvec{\varSigma }_k^{-1}\left( \mathbf {y}_i-\varvec{\lambda }_k-\mathbf {X}^{\prime }_{i}\varvec{\beta } \right) \right. \\&\left. \times \left( \mathbf {y}_i-\varvec{\lambda }_k-\mathbf {X}^{\prime }_{i}\varvec{\beta } \right) ^{\prime }\right] \\&+\frac{1}{2}\mathrm {tr}\left[ \varvec{\varSigma }_k^{-1}\left( \mathrm {d}\varvec{\varSigma }_k\right) \mathrm {d}\left( \varvec{\varSigma }_k^{-1}\right) \left( \mathbf {y}_i-\varvec{\lambda }_k-\mathbf {X}^{\prime }_{i}\varvec{\beta } \right) \right. \\&\left. \times \left( \mathbf {y}_i-\varvec{\lambda }_k-\mathbf {X}^{\prime }_{i}\varvec{\beta } \right) ^{\prime }\right] \\&+\frac{1}{2}\mathrm {tr}\left[ \varvec{\varSigma }_k^{-1}\left( \mathrm {d}\varvec{\varSigma }_k\right) \varvec{\varSigma }_k^{-1}\mathrm {d}\left( \left( \mathbf {y}_i-\varvec{\lambda }_k-\mathbf {X}^{\prime }_{i}\varvec{\beta } \right) \right. \right. \\&\left. \left. \times \left( \mathbf {y}_i-\varvec{\lambda }_k-\mathbf {X}^{\prime }_{i}\varvec{\beta } \right) ^{\prime }\right) \right] . \end{aligned}$$

By exploiting some properties of the trace of a square matrix (see, e.g., Schott 2005), \(\mathrm {d}\left( \mathrm {d}_{ki2}\right) \) can also be expressed as

$$\begin{aligned} \mathrm {d}\left( \mathrm {d}_{ki2}\right)&= \mathrm {tr}\left[ \left( \mathrm {d}\varvec{\varSigma }_k\right) \mathrm {d}\left( \varvec{\varSigma }_k^{-1}\right) \left( \mathbf {y}_i-\varvec{\lambda }_k-\mathbf {X}^{\prime }_{i}\varvec{\beta } \right) \right. \\&\quad \times \left. \left( \mathbf {y}_i-\varvec{\lambda }_k-\mathbf {X}^{\prime }_{i}\varvec{\beta } \right) ^{\prime }\varvec{\varSigma }_k^{-1}\right] \\&\quad +\frac{1}{2}\mathrm {tr}\left[ \varvec{\varSigma }_k^{-1}\left( \mathrm {d}\varvec{\varSigma }_k\right) \varvec{\varSigma }_k^{-1}\mathrm {d}\left( \left( \mathbf {y}_i-\varvec{\lambda }_k-\mathbf {X}^{\prime }_{i}\varvec{\beta } \right) \right. \right. \\&\quad \left. \left. \times \left( \mathbf {y}_i-\varvec{\lambda }_k-\mathbf {X}^{\prime }_{i}\varvec{\beta } \right) ^{\prime }\right) \right] , \end{aligned}$$

and using two theorems about the vec and trace operators (Schott 2005, Theorems 8.9 and 8.12) it follows that

$$\begin{aligned} \mathrm {d}\left( \mathrm {d}_{ki2}\right)= & {} \mathrm {tr}\left[ \left( \mathrm {d}\varvec{\varSigma }_k\right) \mathrm {d}\left( \varvec{\varSigma }_k^{-1}\right) \left( \mathbf {y}_i-\varvec{\lambda }_k-\mathbf {X}^{\prime }_{i}\varvec{\beta } \right) \right. \nonumber \\&\times \left. \left( \mathbf {y}_i-\varvec{\lambda }_k-\mathbf {X}^{\prime }_{i}\varvec{\beta } \right) ^{\prime }\varvec{\varSigma }_k^{-1}\right] \nonumber \\&-\left( \mathrm {d}\varvec{\lambda }_k\right) ^{\prime }\left( \mathbf {b}^{\prime }_{ki}\otimes \varvec{\varSigma }_k^{-1}\right) \mathrm {d}\left( \mathrm {vec}\varvec{\varSigma }_k\right) \nonumber \\&-\left( \mathrm {d}\varvec{\beta }\right) ^{\prime }\mathbf {X}_{i}\left( \mathbf {b}^{\prime }_{ki}\otimes \varvec{\varSigma }_k^{-1}\right) \mathrm {d}\left( \mathrm {vec}\varvec{\varSigma }_k\right) . \end{aligned}$$
(45)

From Eqs. (44) and (45) it follows that

$$\begin{aligned} \mathrm {d}\left( \mathrm {d}_{ki1}\right) +\mathrm {d}\left( \mathrm {d}_{ki2}\right)= & {} \frac{1}{2}\mathrm {tr}\left[ \left( \mathrm {d}\varvec{\varSigma }_k\right) \varvec{\varSigma }_k^{-1}\left( \mathrm {d}\varvec{\varSigma }_k\right) \varvec{\varSigma }_k^{-1}\right] \nonumber \\&-\,\mathrm {tr}\left[ \left( \mathrm {d}\varvec{\varSigma }_k\right) \varvec{\varSigma }_k^{-1}\left( \mathrm {d}\varvec{\varSigma }_k\right) \mathbf {b}_{ki}\mathbf {b}^{\prime }_{ki}\right] \nonumber \\&-\left( \mathrm {d}\varvec{\lambda }_k\right) ^{\prime }\left( \mathbf {b}^{\prime }_{ki}\otimes \varvec{\varSigma }_k^{-1}\right) \mathrm {d}\left( \mathrm {vec}\varvec{\varSigma }_k\right) \nonumber \\&-\left( \mathrm {d}\varvec{\beta }\right) ^{\prime }\mathbf {X}_{i}\left( \mathbf {b}^{\prime }_{ki}\otimes \varvec{\varSigma }_k^{-1}\right) \mathrm {d}\left( \mathrm {vec}\varvec{\varSigma }_k\right) \nonumber \\= & {} \frac{1}{2}\mathrm {tr}\left\{ \left( \mathrm {d}\varvec{\varSigma }_k\right) \varvec{\varSigma }_k^{-1}\left( \mathrm {d}\varvec{\varSigma }_k\right) \left[ \varvec{\varSigma }_k^{-1}\right. \right. \nonumber \\&\left. \left. +\varvec{\varSigma }_k^{-1}-\varvec{\varSigma }_k^{-1}-2\mathbf {b}_{ki}\mathbf {b}^{\prime }_{ki}\right] \right\} \nonumber \\&-\left( \mathrm {d}\varvec{\lambda }_k\right) ^{\prime }\left( \mathbf {b}^{\prime }_{ki}\otimes \varvec{\varSigma }_k^{-1}\right) \mathrm {d}\left( \mathrm {vec}\varvec{\varSigma }_k\right) \nonumber \\&-\left( \mathrm {d}\varvec{\beta }\right) ^{\prime }\mathbf {X}_{i}\left( \mathbf {b}^{\prime }_{ki}\otimes \varvec{\varSigma }_k^{-1}\right) \mathrm {d}\left( \mathrm {vec}\varvec{\varSigma }_k\right) \nonumber \\= & {} -\frac{1}{2}\mathrm {vec}\left( \left( \mathrm {d}\varvec{\varSigma }_k\right) ^{\prime }\right) ^{\prime }\nonumber \\&\times \left[ \left( \varvec{\varSigma }_k^{-1}-2\mathbf {B}_{ki}\right) ^{\prime }\otimes \varvec{\varSigma }_k^{-1}\right] \mathrm {vec}\left( \mathrm {d}\varvec{\varSigma }_k\right) \nonumber \\&-\left( \mathrm {d}\varvec{\lambda }_k\right) ^{\prime }\left( \mathbf {b}^{\prime }_{ki}\otimes \varvec{\varSigma }_k^{-1}\right) \mathrm {d}\left( \mathrm {vec}\varvec{\varSigma }_k\right) \nonumber \\&-\left( \mathrm {d}\varvec{\beta }\right) ^{\prime }\mathbf {X}_{i}\left( \mathbf {b}^{\prime }_{ki}\otimes \varvec{\varSigma }_k^{-1}\right) \mathrm {d}\left( \mathrm {vec}\varvec{\varSigma }_k\right) \nonumber \\= & {} -\frac{1}{2}\mathrm {d}\left( \mathrm {v}\varvec{\varSigma }_k\right) ^{\prime }\mathbf {G}^{\prime }\left[ \left( \varvec{\varSigma }_k^{-1}-2\mathbf {B}_{ki}\right) \otimes \varvec{\varSigma }_k^{-1}\right] \nonumber \\&\times \mathbf {G}\mathrm {d}\left( \mathrm {v}\varvec{\varSigma }_k\right) \nonumber \\&-\left( \mathrm {d}\varvec{\lambda }_k\right) ^{\prime }\left( \mathbf {b}^{\prime }_{ki}\otimes \varvec{\varSigma }_k^{-1}\right) \mathbf {G}\mathrm {d}\left( \mathrm {v}\varvec{\varSigma }_k\right) \nonumber \\&-\left( \mathrm {d}\varvec{\beta }\right) ^{\prime }\mathbf {X}_{i}\left( \mathbf {b}^{\prime }_{ki}\otimes \varvec{\varSigma }_k^{-1}\right) \mathbf {G}\mathrm {d}\left( \mathrm {v}\varvec{\varSigma }_k\right) ,\nonumber \\ \end{aligned}$$
(46)

where the third and fourth equalities are obtained using some properties of the vec operator (see, Schott 2005, p. 294).

From Eq. (32) it is possible to write

$$\begin{aligned} \mathrm {d}\left( \mathrm {d}_{ki3}\right)= & {} \left( \mathrm {d}\varvec{\lambda }_k\right) ^{\prime }\mathrm {d}\mathbf {b}_{ki} + \left( \mathrm {d}\varvec{\beta }\right) ^{\prime }\mathbf {X}_{i}\mathrm {d}\mathbf {b}_{ki}\nonumber \\= & {} -\left( \mathrm {d}\varvec{\lambda }_k\right) ^{\prime }\varvec{\varSigma }_k^{-1}\mathrm {d}\left( \varvec{\varSigma }_k\right) \mathbf {b}_{ki}\nonumber \\&-\left( \mathrm {d}\varvec{\lambda }_k\right) ^{\prime }\varvec{\varSigma }_k^{-1}\mathrm {d}\varvec{\lambda }_k-\left( \mathrm {d}\varvec{\lambda }_k\right) ^{\prime }\varvec{\varSigma }_k^{-1}\mathbf {X}^{\prime }_{i}\mathrm {d}\varvec{\beta }\nonumber \\&-\left( \mathrm {d}\varvec{\beta }\right) ^{\prime }\mathbf {X}_{i}\varvec{\varSigma }_k^{-1}\mathrm {d}\left( \varvec{\varSigma }_k\right) \mathbf {b}_{ki}-\left( \mathrm {d}\varvec{\beta }\right) ^{\prime }\mathbf {X}_{i}\varvec{\varSigma }_k^{-1}\mathrm {d}\varvec{\lambda }_k\nonumber \\&-\left( \mathrm {d}\varvec{\beta }\right) ^{\top }\mathbf {X}_{i}\varvec{\varSigma }_k^{-1}\mathbf {X}^{\prime }_{i}\mathrm {d}\varvec{\beta }\nonumber \\= & {} -\mathrm {d}\left( \mathrm {v}\varvec{\varSigma }_k\right) ^{\prime }\mathbf {G}^{\prime }\left( \mathbf {b}_{ki}\otimes \varvec{\varSigma }_k^{-1}\right) \mathrm {d}\varvec{\lambda }_k -\left( \mathrm {d}\varvec{\lambda }_k\right) ^{\prime }\varvec{\varSigma }_k^{-1}\mathrm {d}\varvec{\lambda }_k\nonumber \\&-\left( \mathrm {d}\varvec{\lambda }_k\right) ^{\prime }\varvec{\varSigma }_k^{-1}\mathbf {X}^{\prime }_{i}\mathrm {d}\varvec{\beta } -\mathrm {d}\left( \mathrm {v}\varvec{\varSigma }_k\right) ^{\prime }\mathbf {G}^{\prime }\left( \mathbf {b}_{ki}\otimes \varvec{\varSigma }_k^{-1}\right) \mathbf {X}^{\prime }_{i}\mathrm {d}\varvec{\beta }\nonumber \\&-\left( \mathrm {d}\varvec{\beta }\right) ^{\prime }\mathbf {X}_{i}\varvec{\varSigma }_k^{-1}\mathrm {d}\varvec{\lambda }_k-\left( \mathrm {d}\varvec{\beta }\right) ^{\prime }\mathbf {X}_{i}\varvec{\varSigma }_k^{-1}\mathbf {X}^{\prime }_{i}\mathrm {d}\varvec{\beta }, \end{aligned}$$
(47)

where the third equality results from the same theorems about the vec and trace operators employed above and the second equality is obtained using the following expression for \(\mathrm {d}\mathbf {b}_{ki}\):

$$\begin{aligned} \mathrm {d}\mathbf {b}_{ki}= & {} \mathrm {d}\left( \varvec{\varSigma }_k^{-1}\right) \left( \mathbf {y}_i-\varvec{\lambda }_k-\mathbf {X}^{\prime }_{i}\varvec{\beta } \right) +\varvec{\varSigma }_k^{-1}\mathrm {d}\left( \mathbf {y}_i-\varvec{\lambda }_k-\mathbf {X}^{\prime }_{i}\varvec{\beta } \right) \\ \nonumber= & {} -\varvec{\varSigma }_k^{-1}\mathrm {d}\left( \varvec{\varSigma }_k\right) \mathbf {b}_{ki}-\varvec{\varSigma }_k^{-1}\mathrm {d}\varvec{\lambda }_k-\varvec{\varSigma }_k^{-1}\mathbf {X}^{\prime }_{i}\mathrm {d}\varvec{\beta }. \end{aligned}$$

Inserting Eqs. (43), (46) and (47) in Eq. (42) and using the definitions of \(\varvec{\theta }_k\), \(\mathbf {F}_{ki}\) and \(\mathbf {C}_{ki}\) introduced in Sect. 2.3 results in the following expression for \(\mathrm {d}^2\ln f_{ki}\):

$$\begin{aligned} \mathrm {d}^2\ln f_{ki}= & {} -\left( \mathrm {d}\varvec{\pi }\right) ^{\prime }\mathbf {a}_k\mathbf {a}^{\prime }_k\mathrm {d}\varvec{\pi } -\left( \mathrm {d}\varvec{\beta }\right) ^{\prime }\mathbf {X}_{i}\varvec{\varSigma }_k^{-1}\mathbf {X}^{\prime }_{i}\mathrm {d}\varvec{\beta }\nonumber \\&-\left( \mathrm {d}\varvec{\theta }_k\right) ^{\prime }\mathbf {F}^{\prime }_{ki}\mathbf {X}^{\prime }_{i}\mathrm {d}\varvec{\beta } -\left( \mathrm {d}\varvec{\beta }\right) ^{\prime }\mathbf {X}_{i}\mathbf {F}_{ki}\mathrm {d}\varvec{\theta }_k\\ \nonumber&-\left( \mathrm {d}\varvec{\theta }_k\right) ^{\prime }\mathbf {C}_{ki}\mathrm {d}\varvec{\theta }_k. \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Galimberti, G., Scardovi, E. & Soffritti, G. Using mixtures in seemingly unrelated linear regression models with non-normal errors. Stat Comput 26, 1025–1038 (2016). https://doi.org/10.1007/s11222-015-9587-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-015-9587-0

Keywords

Navigation