Skip to main content
Log in

Handling dependent samples in meta-analytic structural equation models: A Wishart-based approach

  • Original Manuscript
  • Published:
Behavior Research Methods Aims and scope Submit manuscript

Abstract

We present an approach to meta-analytic structural equation models that relies on hierarchical modeling of sample covariance matrices under the assumption that the matrices are Wishart. The approach handles the commonplace fixed- and random-effects meta-analytic SEMs, and solves the problem of dependent covariance matrices where more than one covariance matrix is obtained from a single study or study author. The ability of the approach to adequately recover parameters is examined via a simulation study. The approach is implemented in the bayesianmasem R package and a demonstration shows applications of the model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. Although this is the convention with TSSEM, we believe that random-effects models should be preferred by default as it is unlikely that different studies sample the exact same population. Moreover, the classification of a fit index as ‘small’ is not straightforward (e.g. McNeish et al., 2018; Savalei, 2012; Ximénez et al., 2022).

  2. Although path and regression models are the most common MASEMs, these models do not involve latent variables.

  3. \(\varepsilon \) is often close in value to the RMSEA obtained from a multi-group SEM where parameters are constrained equal across groups.

  4. As m gets larger, the random-effects deviations as implied by the inverse-Wishart distribution is increasingly approximately a zero-mean multivariate normal vector (Wu & Browne, 2015) but with a more constrained covariance matrix than the unstructured covariance matrix estimated by TSSEM.

  5. bayesianmasem does not handle the problem of missing data.

  6. The difference in relative bias for loadings and error variances occurs because variances are on the squared loading scale. Alternatively stated, the relative bias of loadings was the same as the relative bias of error standard deviations.

  7. Following from the mean of an inverse-Wishart distribution, the relative bias of model-implied covariance matrix elements in the random-effects model is: \(\left[ \left( \nicefrac {m_1}{(m_1 - p - 1)}\right) \big / \left( \nicefrac {m}{(m - p - 1)}\right) \right] - 1,\ m_{(1)} = \varepsilon _{(1)}^{-2} + p - 1\).

  8. The degrees of freedom are based on the number replications, 1000.

References

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to James Ohisei Uanhoro.

Ethics declarations

Open Practices Statement

All code for simulation studies and data analysis are available at https://osf.io/yd5q4/.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Additional simulation results

Appendix B: Simulation-based calibration – Digman (1997) application

The data generation process (DGP) for the SBC study was based on the Digman (1997) example. The exact DGP was:

$$\begin{aligned} \begin{aligned} \textbf{S}_{ij}&{\sim } \text {GB}_p^{\text {II}}\left( \frac{n_i^*}{2},\frac{m_1}{2},\frac{m_1}{n_i^*}\varvec{\Psi }_{j[i]}, \textbf{0}_{p\times p}\right) \ \text {for } i \in \{1,\dots ,14\}\\&m_2\varvec{\Psi }_j {\sim } \mathcal {W}\left( \varvec{\Omega }(\varvec{\theta }), m_2\right) \ \text {for } j \in \{1,\dots ,7\}\\&\varvec{\Omega }(\varvec{\theta }) = \varvec{\Lambda }\varvec{\Phi }\varvec{\Lambda }^\prime + \varvec{\Delta },\ \varvec{\Phi }= \begin{bmatrix}1 &{} \\ \phi _{2,1} &{} 1\end{bmatrix} \end{aligned} \end{aligned}$$
(B1)
Fig. 8
figure 8

Draws from prior distributions

Fig. 9
figure 9

Histogram of SBC ranks – Digman (1997) application. m_ln_int_wi \(= m_1\) and m_ln_int_be \(= m_2\); phi = interfactor correlation; res_var = residual variances. Expectation is that the histogram counts are contained in the 95% simultaneous confidence bands

Fig. 10
figure 10

Empirical CDF check – Digman (1997) application. m_ln_int_wi \(= m_1\) and m_ln_int_be \(= m_2\); phi = interfactor correlation; res_var = residual variances. The expectation is that the sample ECDF is contained within the 95% simultaneous confidence bands about the theoretical CDF

Fig. 11
figure 11

Additional SBC evaluation metrics – Digman (1997) application. m_ln_int_wi \(= m_1\) and m_ln_int_be \(= m_2\); phi = interfactor correlation; res_var = residual variances. The number in parenthesis on the y-axis is the count of parameters. The vertical dashed lines are 95% confidence limits based on hypothesis tests; no estimate exceeded the limits, suggesting adequate calibration for all parameters

Priors were chosen such that the generated data would produce valid covariance matrices (e.g. Merkle et al., 2021; Uanhoro, 2023). Loadings and residual standard deviations had median values of 0.8 and 0.6 respectively. And \(m_1\) and \(m_2\) priors were chosen such that the median value of \(\rho \) would be about 0.25, \(\exp (-6) / (\exp (-5) + \exp (-6))\).

$$\begin{aligned} \begin{aligned} \varvec{\lambda }\sim&\mathcal {N}^+(0.8, \sigma _\lambda ),\ \sigma _\lambda \sim \mathcal {N}^+(0, 0.5),\ \sqrt{\text {diag}(\varvec{\Delta })} \sim \mathcal {N}^+(0.6, 0.25),\\&\ln (\begin{bmatrix}m_1 \\ m_2 \end{bmatrix} - p + 1) \sim \mathcal {N}^+\left( \begin{bmatrix}5 \\ 6 \end{bmatrix}, 0.5\right) ,\ \frac{\phi _{2,1} + 1}{2} \sim \text {Beta}(5, 5) \end{aligned} \end{aligned}$$
(B2)

The distribution of parameters based on Eq. B2 is shown in Fig. 8.

For the SBC study, each model was estimated using a single chain. We requested 5000 iterations, 1000 iterations were discarded for warmup, while the remaining 4000 iterations were thinned at every second iteration to reduce autocorrelation between posterior samples. Thus, 2000 posterior samples were retained per parameter. Finally, we repeated this process 1000 times.

Evaluation of SBC results was based on graphical summaries recommended by Säilynoja et al. (2022). We report the evaluations in Figs. 9 and 10 – these figures were produced using the SBC package in R (Kim et al., 2023).

Our expectation is that the distribution of ranks for each parameter are uniformly distributed. When this is true, the histogram counts will often remain within the 95% simultaneous confidence bands – this expectation is met for all parameters with very few exceptions, see Fig. 9. This suggests adequate calibration of all parameters.

The evaluation via histogram is sensitive to the number of bins. Hence, we also assessed the empirical cumulative distribution function (ECDF) of the ranks. Precisely, we assessed the difference of the ECDF from the theoretical CDF of a uniform variable. When these differences are contained within the 95% simultaneous bands, parameters are adequately calibrated. This expectation is met for all parameters, Fig. 10.

We also repeated the testing-based SBC evaluation procedures in Uanhoro (2023). The SBC ranks are first transformed to rankits: \(q_i = (r_i + 0.5)(L + 1)^{-1}\), where \(r_i\) are the ranks and \(L = 2000\), the number of retained posterior samples. The standard normal quantile function was applied to the rankits. If the ranks were approximately uniform, then the result should be an approximately standard normal variable. The bias of the mean (difference from 0 based on the one-sample \(t_{999}\) test), bias of the variance (difference from 1 based on the one-sample \(\chi ^2_{999}\) test), and a \(\chi ^2_{1000}\) test of standard normality (Cook et al., 2006) were then used to assess the standard normality expectation.Footnote 8 As shown in Fig. 11, no parameter resulted in a statistically significant test suggesting calibration for all parameters.

Appendix C: Simulation study of dependent correlation matrices

As mentioned in the Discussion section, we repeated the simulation study in the paper, but transformed the sample covariance matrices to correlation matrices prior to data analysis. Following from the expectation of an inverse-Wishart distribution, our model when applied to these data should return the following structured covariance matrix: \(\varvec{\Omega }(\varvec{\theta })(m_1 - p - 1)m_1^{-1}\), where \(m_1 = \varepsilon _1^{-2} + p - 1\) and \(\varepsilon _1 = \varepsilon \sqrt{(1-\rho )}\). Hence, the bias and empirical coverage rate evaluations are adjusted to reflect this. We excluded conditions where \(\rho = .75\) and \(\varepsilon = 0.05\) as analysis runtimes for the three conditions \((c \in \{5, 15, 25\})\) were overly time consuming. Finally, we only ran 300 replications, down from 1000 replications in the original study.

Fig. 12
figure 12

Relative bias of mean of posterior distribution for correlation simulation study. load. = 10 loading estimates, r = inter-factor correlation, ev. = 10 error variance estimates

Fig. 13
figure 13

Relative bias of standard deviation of posterior distribution for correlation simulation study. load. = 10 loading estimates, r = inter-factor correlation, ev. = 10 error variance estimates

Fig. 14
figure 14

Empirical coverage rate of the 90% credible interval for correlation simulation study. load. = 10 loading estimates, r = inter-factor correlation, ev. = 10 error variance estimates

Fig. 15
figure 15

Recovery of dispersion parameters for correlation simulation study. Within = \(\varepsilon _1\), Between = \(\varepsilon _2\), % between = \(\rho \)

Results are reported in Figs. 12, 13, and 14. Most parameters had acceptable levels of bias \((<|10\%|)\), apart from \(\varepsilon \) which was sometimes downwardly biased (especially when \(\varepsilon = 0.05\)). We believe this downward bias occurs because the process of converting a covariance matrix to a correlation matrix eliminates important variation, given the data generation process. Posterior standard deviations were often upwardly biased, especially for loading parameters. This suggests overly conservative inference, and resulted in higher than nominal coverage rates especially for loading parameters. Coverage for \(\varepsilon \) was always low given the downward parameter bias, and there were also periods of under-coverage for loading parameters at the combination of high values of \(\varepsilon \) and larger number of clusters. This under-coverage likely occurs even in the presence of overly wide posterior standard deviations because of the combination of some parameter bias and increased precision of posterior standard deviations at larger sample size. Finally, as with the original simulation study, there were problems estimating the dispersion parameters, see Fig. 15.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Uanhoro, J.O. Handling dependent samples in meta-analytic structural equation models: A Wishart-based approach. Behav Res (2024). https://doi.org/10.3758/s13428-024-02340-4

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.3758/s13428-024-02340-4

Navigation