Abstract
The inference of genetic regulatory networks (GRNs) reveals how genes interact with each other. A few genes can regulate many genes as targets to control cell functions. We present new methods based on the order-1 vector autoregression (VAR1) for inferring GRNs from gene expression time series. The methods use the automatic relevance determination (ARD) to incorporate the regulatory hub structure into the estimation of VAR1 in a Bayesian framework. Several sparse approximation schemes are applied to the estimated regression weights or VAR1 model to generate the sparse weighted adjacency matrices representing the inferred GRNs. We apply the proposed and several widespread reference methods to infer GRNs with up to 100 genes using simulated, DREAM4 in silico and experimental E. coli gene expression time series. We show that the proposed methods are efficient on simulated hub GRNs and scale-free GRNs using short time series simulated by VAR1s and outperform reference methods on small-scale DREAM4 in silico GRNs and E. coli GRNs. They can utilize the known major regulatory hubs to improve the performance on larger DREAM4 in silico GRNs and E. coli GRNs. The impact of nonlinear time series data on the performance of proposed methods is discussed.
-
Author contribution: The author has accepted responsibility for the entire content of this submitted manuscript and approved submission.
-
Research funding: None declared.
-
Conflict of interest statement: The author declare no conflicts of interest regarding this article.
A.1 Estimation of MR by the ARD
We first present the conditional posterior probability density function and marginal log-likelihood function of parameters. From the expressions in Eqs. (2) and (3),
By completing the squares of terms in the exponent of exponential form in Eq. (A1), the conditional posterior over W is a matrix Gaussian distribution with the probability density function
where
The partial derivatives of
The partial derivatives of ln π(α), ln π(β) with respect to parameters are expressed as
where π(α), π(β) are in Eq. (4).
The update rules of α, β of iterative process for solving the maximization problem in Eq. (6) are obtained by setting the partial derivatives of Λ(α, β) with respect to parameters in α, β equal 0, respectively.
Initial values: |
δ′ ← −∞ |
(α, β) ← (α 0, β 0) |
|
δ ” ←Λ(α, β) |
Iterative updates: |
while δ ” − δ′ > δ do |
δ′ ← δ″ |
|
|
|
δ″ ←Λ(α, β) |
end while |
return
|
A.2 Estimation of GEN
The minimization problem in Eq. (10) is equivalent to
where
where ρ > 0 is the penalty constant and
Initial value: |
|
Iterative updates: |
while
|
|
|
V′ ←V′ + η ⋅ (W − V) |
end while |
return
|
Here, η > 0 is the step size, S
d
(⋅) is entry-wise soft thresholding function with the threshold d > 0. The soft thresholding function of
A.3 Numerical implementations of GRN inference
To generate the candidate solutions for (α, β, W), the tuning parameter space Γ contains the 256 combinations of s
α
= 2.5, 5, 7.5, 10 × 1
p×1, r
α
= 10−3, 10−2, 10−1, 1 × 1
p×1, s
β
= 0.5, 1, 1.5, 2, r
β
= 10−3, 10−2, 10−1, 1. The initial value and stopping tolerance of Algorithm I are
References
Aalto, A., Viitasaari, L., Ilmonen, P., Mombaerts, L., and Goncalves, J. (2020). Gene regulatory network inference from sparsely sampled noisy data. Nat. Commun. 11: 3493. https://doi.org/10.1038/s41467-020-17217-1.Search in Google Scholar PubMed PubMed Central
Albert, R. (2005). Scale-free networks in cell biology. J. Cell Sci. 118: 4947–4957. https://doi.org/10.1242/jcs.02714.Search in Google Scholar PubMed
Alexa, A., Rahnenfuhrer, J., and Lengauer, T. (2006). Improved scoring of functional groups from gene expression data by decorrelating GO graph structure. Bioinformatics 22: 1600–1607. https://doi.org/10.1093/bioinformatics/btl140.Search in Google Scholar PubMed
Bansal, M., Belcastro, V., Ambesi-Impiombato, A., and di Bernardo, D. (2007). How to infer gene networks from expression profiles. Mol. Syst. Biol. 3: 78. https://doi.org/10.1038/msb4100120.Search in Google Scholar PubMed PubMed Central
Barabasi, A.-L. and Oltvai, Z.N. (2004). Network biology: understanding the cell’s functional organization. Nat. Rev. Genet. 5: 101–113. https://doi.org/10.1038/nrg1272.Search in Google Scholar PubMed
Bock, M., Ogishima, S., Tanaka, H., Kramer, S., and Kaderali, L. (2012). Hub-centered gene network reconstruction using automatic relevance determination. PLoS One 7: e35077. https://doi.org/10.1371/journal.pone.0035077.Search in Google Scholar PubMed PubMed Central
Boyd, S., Parikh, N., Chu, E., Peleato, B., and Eckstein, J. (2010). Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3: 1–122. https://doi.org/10.1561/2200000016.Search in Google Scholar
Butte, A.J. and Kohane, I.S. (2000). Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. Pac. Symp. Biocomput. 418–429. https://doi.org/10.1142/9789814447331_0040.Search in Google Scholar PubMed
Charbonnier, C., Chiquet, J., and Ambroise, C. (2010). Weighted-LASSO for structured network inference from time course data. Stat. Appl. Genet. Mol. Biol. 9: 15. https://doi.org/10.2202/1544-6115.1519.Search in Google Scholar PubMed
Che, D., Guo, S., Jiang, Q., and Chen, L. (2020). PFBNet: a priori-fused boosting method for gene regulatory network inference. BMC Bioinf. 21: 308. https://doi.org/10.1186/s12859-020-03639-7.Search in Google Scholar PubMed PubMed Central
Chen, G., Larsen, P., Almasri, E., and Dai, Y. (2008). Rank-based edge reconstruction for scale-free genetic regulatory networks. BMC Bioinf. 9: 75. https://doi.org/10.1186/1471-2105-9-75.Search in Google Scholar
Cornish, A.J. and Markowetz, F. (2014). SANTA: quantifying the functional content of molecular networks. PLoS Comput. Biol. 10: e1003808. https://doi.org/10.1371/journal.pcbi.1003808.Search in Google Scholar
Csardi, G. and Nepusz, T. (2006). The igraph software package for complex network research. Int. J. Complex Syst.: 1695.Search in Google Scholar
D’haeseleer, P., Wen, X., Fuhrman, S., and Somogyi, R. (1998). Mining the gene expression matrix: Inferring gene relationships from large scale gene expression data. Boston, MA: Springer.10.1007/978-1-4615-5345-8_22Search in Google Scholar
Edgar, R., Domrachev, M., and Lash, A.E. (2002). Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 30: 207–210. https://doi.org/10.1093/nar/30.1.207.Search in Google Scholar
Emmert-Streib, F., Dehmer, M., and Haibe-Kains, B. (2014). Gene regulatory networks and their applications: understanding biological and medical problems in terms of networks. Front. Cell Dev. Biol. 2: 38. https://doi.org/10.3389/fcell.2014.00038.Search in Google Scholar
Freeman, L.C. (1979). Centrality in social networks conceptual clarification. Soc. Network. 1: 215–239.10.1016/0378-8733(78)90021-7Search in Google Scholar
Friedman, N., Linial, M., Nachman, I., and Pe’er, D. (2000). Using Bayesian networks to analyze expression data. Comput. Biol. Med. 7: 601–620. https://doi.org/10.1089/106652700750050961.Search in Google Scholar PubMed
Fujita, A., Sato, J.R., Garay-Malpartida, H.M., Yamaguchi, R., Miyano, S., Sogayar, M.C., and Ferreira, C.E. (2007). Modeling gene expression regulatory networks with the sparse vector autoregressive model. BMC Syst. Biol. 1: 39. https://doi.org/10.1186/1752-0509-1-39.Search in Google Scholar PubMed PubMed Central
Gama-Castro, S., Salgado, H., Santos-Zavaleta, A., Ledezma-Tejeida, D., Muniz-Rascado, L., Garcia-Sotelo, J.S., Alquicira-Hernandez, K., Martinez-Flores, I., Pannier, L., Castro-Mondragon, J.A., et al.. (2016). RegulonDB version 9.0: high-level integration of gene regulation, coexpression, motif clustering and beyond. Nucleic Acids Res. 44: D133–D143. https://doi.org/10.1093/nar/gkv1156.Search in Google Scholar PubMed PubMed Central
Granger, C.W.J. (1969). Investigating causal relations by econometric models and cross-spectral methods. Econometrica 37: 424–438. https://doi.org/10.2307/1912791.Search in Google Scholar
Hecker, M., Lambeck, S., Toepfer, S., van Someren, E., and Guthke, R. (2009). Gene regulatory network inference: data integration in dynamic models-a review. BioSystems 96: 86–103. https://doi.org/10.1016/j.biosystems.2008.12.004.Search in Google Scholar PubMed
Hube, W., Carey, V.J., Gentleman, R., and Anders, S. (2015). Orchestrating high-throughput genomic analysis with Bioconductor. Nat. Methods 12: 115–121. https://doi.org/10.1038/nmeth.3252.Search in Google Scholar PubMed PubMed Central
Huynh-Thu, V.A. and Geurts, P. (2018). dynGENIE3: dynamical GENIE3 for the inference of gene networks from time series expression data. Sci. Rep. 8: 3384. https://doi.org/10.1038/s41598-018-21715-0.Search in Google Scholar PubMed PubMed Central
Huynh-Thu, V.A., Irrthum, A., Wehenkel, L., and Geurts, P. (2010). Inferring regulatory networks from expression data using tree-based methods. PLoS One 5: e12776. https://doi.org/10.1371/journal.pone.0012776.Search in Google Scholar PubMed PubMed Central
Kaderali, L., Zander, T., Faigle, U., Wolf, J., Schultze, J.L., and Schrader, R. (2006). CASPAR: a hierarchical Bayesian approach to predict survival times in cancer from gene expression data. Bioinformatics 22: 1495–1502. https://doi.org/10.1093/bioinformatics/btl103.Search in Google Scholar PubMed
Kramer, N., Schafer, J., and Boulesteix, A.L. (2009). Regularized estimation of large-scale gene association networks using graphical Gaussian models. BMC Bioinf. 10: 384. https://doi.org/10.1186/1471-2105-10-384.Search in Google Scholar PubMed PubMed Central
Langfelder, P. and Horvath, S. (2007). Eigengene networks for studying the relationships between co-expression modules. BMC Syst. Biol. 1: 54. https://doi.org/10.1186/1752-0509-1-54.Search in Google Scholar PubMed PubMed Central
Langfelder, P. and Horvath, S. (2008). WGCNA: an R package for weighted correlation network analysis. BMC Bioinf. 9: 559. https://doi.org/10.1186/1471-2105-9-559.Search in Google Scholar PubMed PubMed Central
Lo, K., Raftery, A.E., Dombek, K.M., Zhu, J., Schadt, E.E., Bumgarner, R.E., and Yeung, K.Y. (2012). Integrating external biological knowledge in the construction of regulatory networks from time-series expression data. BMC Syst. Biol. 6: 101. https://doi.org/10.1186/1752-0509-6-101.Search in Google Scholar PubMed PubMed Central
Marbach, D., Schaffter, T., Mattiussi, C., and Floreano, D. (2009). Generating realistic in silico gene networks for performance assessment of reverse engineering methods. J. Comput. Biol. 16: 229–239. https://doi.org/10.1089/cmb.2008.09tt.Search in Google Scholar PubMed
Margolin, A.A., Nemenman, I., Basso, K., Wiggins, C., Stolovitzky, G., Dalla Favera, R., and Califano, A. (2006). ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinf. 7: S7. https://doi.org/10.1186/1471-2105-7-S1-S7.Search in Google Scholar PubMed PubMed Central
Meyer, P.E., Kontos, K., Lafitte, F., and Bontempi, G. (2007). Information-theoretic inference of large transcriptional regulatory networks. EURASIP J. Bioinf. Syst. Biol. 79879. https://doi.org/10.1155/2007/79879.Search in Google Scholar PubMed PubMed Central
Meyer, P.E., Lafitte, F., and Bontemp, G. (2008). MINET: an open source R/Bioconductor package for mutual information based network inference. BMC Bioinf. 9: 461. https://doi.org/10.1186/1471-2105-9-461.Search in Google Scholar PubMed PubMed Central
Michailidis, G. and d’Alche-Buc, F. (2013). Autoregressive models for gene regulatory network inference: sparsity, stability and causality issues. Math. Biosci. 246: 326–334. https://doi.org/10.1016/j.mbs.2013.10.003.Search in Google Scholar PubMed
Nagarajan, R., Scutari, M., and Lebre, S. (2013). Bayesian networks in R with applications in systems biology. New York: Springer.10.1007/978-1-4614-6446-4Search in Google Scholar
Nepomuceno-Chamorro, I.A., Aguilar-Ruiz, J.S., and Riquelme, J.C. (2010). Inferring gene regression networks with model trees. BMC Bioinf. 11: 517. https://doi.org/10.1186/1471-2105-11-517.Search in Google Scholar PubMed PubMed Central
Nonaka, G., Blankschien, M., Herman, C., Gross, C.A., and Rhodius, V.A. (2006). Regulon and promoter analysis of the E. coli heat-shock factor, sigma32, reveals a multifaceted cellular response to heat stress. Genes Dev. 20: 1776–1789. https://doi.org/10.1101/gad.1428206.Search in Google Scholar PubMed PubMed Central
Omony, J. (2014). Biological Network Inference: a review of methods and assessment of tools and techniques. Annu. Res. Rev. Biol. 4: 577–601. https://doi.org/10.9734/arrb/2014/5718.Search in Google Scholar
Penfold, C.A. and Wild, D.L. (2011). How to infer gene networks from expression profiles, revisited. Interface Focus 1: 857–870. https://doi.org/10.1098/rsfs.2011.0053.Search in Google Scholar PubMed PubMed Central
Prim, R.C. (1957). Shortest connection networks and some generalizations. Bell Syst. Tech. J. 36: 1389–1401. https://doi.org/10.1002/j.1538-7305.1957.tb01515.x.Search in Google Scholar
R Core Team (2021). R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing.Search in Google Scholar
Schaefer, J., Opgen-Rhein, R., and Strimmer, K. (2015). GeneNet: modeling and inferring gene networks. R package version 1.2.13.Search in Google Scholar
Schäfer, J. and Strimmer, K. (2005). A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Stat. Appl. Genet. Mol. Biol. 4: 32. https://doi.org/10.2202/1544-6115.1175.Search in Google Scholar PubMed
Scutari, M. (2010). Learning Bayesian networks with the bnlearn R package. J. Stat. Software 35: 1–22. https://doi.org/10.18637/jss.v035.i03.Search in Google Scholar
Shannon, P. (2021). DREAM4: synthetic expression data for gene regulatory network inference from the 2009 DREAM4 challenge. R package version 1.30.0.Search in Google Scholar
Shimamura, T., Imoto, S., Yamaguchi, R., Fujita, A., Nagasaki, M., and Miyano, S. (2009). Recursive regularization for inferring gene networks from time-course gene expression profiles. BMC Syst. Biol. 3: 41.https://doi.org/10.1186/1752-0509-3-41.Search in Google Scholar PubMed PubMed Central
Sulaimanov, N., Kumar, S., Burdet, F., Ibberson, M., Pagni, M., and Koeppl, H. (2019). Inferring gene expression networks with hubs using a degree weighted Lasso approach. Bioinformatics 35: 987–994. https://doi.org/10.1093/bioinformatics/bty716.Search in Google Scholar PubMed
Tan, K.M., London, P., Mohan, K., Lee, S.I., Fazel, M., and Witten, D. (2014). Learning graphical models with hubs. J. Mach. Learn. Res. 15: 3297–3331.Search in Google Scholar
Tipping, M.E. (2001). Sparse Bayesian learning and the relevance vector machine. J. Mach. Learn. Res. 1: 211–244.Search in Google Scholar
Tsai, M.J., Wang, J.R., Ho, S.J., Shu, L.S., Huang, W.L., and Ho, S.Y. (2020). GREMA: modelling of emulated gene regulatory networks with confidence levels based on evolutionary intelligence to cope with the underdetermined problem. Bioinformatics 36: 3833–3840. https://doi.org/10.1093/bioinformatics/btaa267.Search in Google Scholar PubMed
Werhli, A.V. and Husmeier, D. (2007). Reconstructing gene regulatory networks with Bayesian networks by combining expression data with multiple sources of prior knowledge. Stat. Appl. Genet. Mol. Biol. 6: 15. https://doi.org/10.2202/1544-6115.1282.Search in Google Scholar PubMed
Zhang, B. and Horvath, S. (2005). A general framework for weighted gene co-expression network analysis. Stat. Appl. Genet. Mol. Biol. 4: 17. https://doi.org/10.2202/1544-6115.1128.Search in Google Scholar PubMed
Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. J. R. Stat. Soc. Series B Stat. Methodol. 67: 301–320. https://doi.org/10.1111/j.1467-9868.2005.00503.x.Search in Google Scholar
© 2021 Walter de Gruyter GmbH, Berlin/Boston