Abstract
The availability of large structured datasets has prompted the need for efficient data analysis and modeling techniques. In systems biology, data-driven modeling approaches create models of complex cellular systems without making assumptions about the underlying mechanisms. In this chapter, we will discuss eigenvalue-based approaches, which identify important characteristics (information) of big datasets through decomposition and dimensionality reduction. We intend to address singular value decomposition (SVD), principle component analysis (PCA), and partial least squares regression (PLSR) approaches for data-driven modeling. In multi-linear systems (that share characteristics such as time points, measurements, etc.), tensor decomposition becomes particularly important for understanding higher-order datasets. Therefore, we will also discuss how to scale up these methods to tensor decomposition using an example dealing with host-cell responses to viral infection.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Albeck, J.G., Burke, J.M., Spencer, S.L., Lauffenburger, D.A., Sorger, P.K.: Modeling a snap-action, variable-delay switch controlling extrinsic cell death. PLoS Biol. 6(12), 2831–2852 (2008). doi:10.1371/journal.pbio.0060299 (07-PLBI-RA-3412 [pii])
Albeck, J.G., MacBeath, G., White, F.M., Sorger, P.K., Lauffenburger, D.A., Gaudet, S.: Collecting and organizing systematic sets of protein data. Nat. Rev. Mol. Cell Biol. 7(11), 803–812 (2006)
Aldridge, B.B., Saez-Rodriguez, J., Muhlich, J.L., Sorger, P.K., Lauffenburger, D.A.: Fuzzy logic analysis of kinase pathway crosstalk in TNF/EGF/insulin-induced signaling. PLoS Comput. Biol. 5(4), e1000340 (2009). doi:10.1371/journal.pcbi.1000340
Alter, O., Brown, P.O., Botstein, D.: Singular value decomposition for genome-wide expression data processing and modeling. Proc. Natl. Acad. Sci. U.S.A. 97(18), 10101–10106 (2000)
Alter, O.: Genomic signal processing: from matrix algebra to genetic networks. Methods Mol. Biol. 377, 17–60 (2007). doi:10.1007/978-1-59745-390-5_2
Andersson, C.A., Bro, R.: The N-way toolbox for MATLAB. Chemometr. Intell. Lab. Syst. 52(1), 1–4 (2000). doi:10.1016/S0169-7439(00)00071-X
Beyer, E.M., MacBeath, G.: Cross-talk between receptor tyrosine kinase and tumor necrosis factor-alpha signaling networks regulates apoptosis but not proliferation. [Research Support, N.I.H., Extramural]. Mol. Cell. Proteomics 11(6), M111.013292. doi:10.1074/mcp.M111.013292
Bro, R.: PARAFAC. Tutorial and applications. Chemometr. Intell. Lab. Syst. 38, 149–171 (1997)
Bro, R., Kiers, H.A.L.: A new efficient method for determining the number of components in PARAFAC models. J. Chemometr. 17, 274–286 (2003)
Carroll, J.D., Chang, J.J.: Analysis of individual differences in multidimensional scaling via an N-way generalization of ‘Eckart-Young’ decomposition. Psychometrika 35, 283–319 (1970)
Chen, W.W., Schoeberl, B., Jasper, P.J., Niepel, M., Nielsen, U.B., Lauffenburger, D.A., Sorger, P.K.: Input-output behavior of ErbB signaling pathways as revealed by a mass action model trained against dynamic data. Mol. Syst. Biol. 5, 239 (2009)
Crescenzi, M., Giuliani, A.: The main biological determinants of tumor line taxonomy elucidated by a principal component analysis of microarray data. FEBS Lett. 507(1), 114–118 (2001)
Dworkin, M., Mukherjee, S., Jayaprakash, C., Das, J.: Dramatic reduction of dimensionality in large biochemical networks owing to strong pair correlations [Research Support, N.I.H., Extramural Research Support, Non-U.S. Gov’t]. J. R. Soc. Interface 9(73), 1824–1835 (2012). doi:10.1098/rsif.2011.0896
Efron, B., Tibshirani, R.J.: An Introduction to the Bootstrap. Chapman and Hall, London (1993)
Esfandiarei, M., McManus, B.M.: Molecular biology and pathogenesis of viral myocarditis. Annu. Rev. Pathol. 3, 127–155 (2008). doi:10.1146/annurev.pathmechdis.3.121806.151534
Gaudet, S., Janes, K.A., Albeck, J.G., Pace, E.A., Lauffenburger, D.A., Sorger, P.K.: A compendium of signals and responses triggered by prodeath and prosurvival cytokines. Mol. Cell Proteomics 4(10), 1569–1590 (2005)
Geladi, P., Kowalski, B.R.: Partial least-squares regression—a tutorial. Anal. Chim. Acta. 185, 1–17 (1986)
Gutenkunst, R.N., Waterfall, J.J., Casey, F.P., Brown, K.S., Myers, C.R., Sethna, J.P.: Universally sloppy parameter sensitivities in systems biology models [Research Support, N.I.H., Extramural Research Support, U.S. Gov’t, Non-P.H.S.]. PLoS Comput. Biol. 3(10), 1871–1878 (2007). doi:10.1371/journal.pcbi.0030189
Hoffmann, A., Levchenko, A., Scott, M.L., Baltimore, D.: The IkappaB-NF-kappaB signaling module: temporal control and selective gene activation. Science 298(5596), 1241–1245 (2002)
Hu, J., Locasale, J.W., Bielas, J.H., O’Sullivan, J., Sheahan, K., Cantley, L.C., Vitkup, D.: Heterogeneity of tumor-induced gene expression changes in the human metabolic network. Nat. Biotechnol. 31(6), 522–529 (2013). doi:10.1038/nbt.2530
Hug, S., Schmidl, D., Li, W.B., Greiter, M.B., Theis, F.J.: Bayesian model selection methods and their application to biological ODE systems. In: Uncertainty in Biology, A Computational Modeling Approach. Springer, Cham (2016, this volume)
Janes, K.A., Albeck, J.G., Gaudet, S., Sorger, P.K., Lauffenburger, D.A., Yaffe, M.B.: A systems model of signaling identifies a molecular basis set for cytokine-induced apoptosis. Science 310(5754), 1646–1653 (2005)
Janes, K.A., Lauffenburger, D.A.: A biological approach to computational models of proteomic networks. Curr. Opin. Chem. Biol. 10(1), 73–80 (2006)
Janes, K.A., Lauffenburger, D.A.: Models of signalling networks—what cell biologists can gain from them and give to them [Research Support, N.I.H., Extramural Research Support, Non-U.S. Gov’t]. J. Cell. Sci. 126(Pt 9), 1913–1921 (2013). doi:10.1242/jcs.112045
Janes, K.A., Yaffe, M.B.: Data-driven modelling of signal-transduction networks. Nat. Rev. Mol. Cell. Biol. 7(11), 820–828 (2006)
Jeffers, J.: Two case studies in the application of principal component analysis. J. Roy. Stat. Soc. 16(3), 225–236 (1967)
Jensen, K.J., Garmaroudi, F.S., Zhang, J., Lin, J., Boroomand, S., Zhang, M., Janes, K.A.: An ERK-p38 subnetwork coordinates host cell apoptosis and necrosis during coxsackievirus B3 infection [Research Support, N.I.H., Extramural Research Support, Non-U.S. Gov’t]. Cell Host Microbe 13(1), 67–76 (2013). doi:10.1016/j.chom.2012.11.009
Jensen, K.J., Janes, K.A.: Modeling the latent dimensions of multivariate signaling datasets. Phys. Biol. 9(4), 045004 (2012). doi:10.1088/1478-3975/9/4/045004
Jolliffe, I.T.: Principal Component Analysis. Springer Series in Statistic
Kemp, M.L., Wille, L., Lewis, C.L., Nicholson, L.B., Lauffenburger, D.A.: Quantitative network signal combinations downstream of TCR activation can predict IL-2 production response. J. Immunol. 178(8), 4984–4992 (2007)
Kirk, P., Silk, D., Stumpf, M.P.H.: Reverse engineering under uncertainty, In: Uncertainty in Biology, A Computational Modeling Approach. Springer, Cham (2016, this volume)
Kolda T.G., Bader, B.B.: Tensor Decompositions and Applications. Sandia National Laboratories Report, SAND2007-6702 (2007)
Kreeger, P.K.: Using partial least squares regression to analyze cellular response data. Sci. Signal 6(271), tr7 (2013). doi:10.1126/scisignal.2003849
Kreeger, P.K., Mandhana, R., Alford, S.K., Haigis, K.M., Lauffenburger, D.A.: RAS mutations affect tumor necrosis factor-induced apoptosis in colon carcinoma cells via ERK-modulatory negative and positive feedback circuits along with non-ERK pathway effects. Cancer Res. 69(20), 8191–8199 (2009). doi:10.1158/0008-5472.CAN-09-1921 (0008-5472.CAN-09-1921 [pii])
Krogh, A.: What are artificial neural networks? Nat. Biotechnol. 26(2), 195–197 (2008)
Kumar, D., Srikanth, R., Ahlfors, H., Lahesmaa, R., Rao, K.V.: Capturing cell-fate decisions from the molecular signatures of a receptor-dependent signaling response. Mol. Syst. Biol. 3, 150 (2007)
Kumar, N., Wolf-Yadlin, A., White, F.M., Lauffenburger, D.A.: Modeling HER2 effects on cell behavior from mass spectrometry phosphotyrosine data. PLoS Comput. Biol. 3(1), e4 (2007)
Lau, K.S., Juchheim, A.M., Cavaliere, K.R., Philips, S.R., Lauffenburger, D.A., Haigis, K.M.: In vivo systems analysis identifies spatial and temporal aspects of the modulation of TNF-alpha-induced apoptosis and proliferation by MAPKs. Sci. Signal 4(165), ra16 (2011). doi:10.1126/scisignal.2001338 (4/165/ra16 [pii])
Lee, M.J., Ye, A.S., Gardino, A.K., Heijink, A.M., Sorger, P.K., Macbeath, G., Yaffe, M.B.: Sequential application of anticancer drugs enhances cell death by rewiring apoptotic signaling networks. Cell 149(4), 780–794 (2012). doi:10.1016/j.cell.2012.03.031
Martens, H., Martens, M.: Analysis of one data table X: principal component analysis multivariate analysis of quality: an introduction, 1st edn, pp. 93–110. Wiley, New York (2001)
Miller-Jensen, K., Janes, K.A., Brugge, J.S., Lauffenburger, D.A.: Common effector processing mediates cell-specific responses to stimuli. Nature 448(7153), 604–608 (2007)
Misra, J., Schmitt, W., Hwang, D., Hsiao, L.L., Gullans, S., Stephanopoulos, G., Stephanopoulos, G.: Interactive exploration of microarray gene expression patterns in a reduced dimensional space. Genome. Res. 12(7), 1112–1120 (2002). doi:10.1101/gr.225302
Nakakuki, T., Birtwistle, M.R., Saeki, Y., Yumoto, N., Ide, K., Nagashima, T., Kholodenko, B. N.: Ligand-specific c-Fos expression emerges from the spatiotemporal control of ErbB network dynamics [Research Support, N.I.H., Extramural Research Support, Non-U.S. Gov’t]. Cell 141(5), 884–896 (2010). doi:10.1016/j.cell.2010.03.054
Noble, W.S.: What is a support vector machine? Nat. Biotechnol. 24(12), 1565–1567 (2006)
Pearson, K.: On lines and planes of closest fit to systems of points in space. Philos. Mag. 2, 559–572 (1901)
Saez-Rodriguez, J., Alexopoulos, L.G., Epperlein, J., Samaga, R., Lauffenburger, D.A., Klamt, S., Sorger, P.K.: Discrete logic modelling as a means to link protein signalling networks with functional analysis of mammalian signal transduction. Mol. Syst. Biol. 5, 331 (2009). doi:10.1038/msb.2009.87 (msb200987 [pii])
Schlens, J.: A tutorial on principal component analysis: derivation, discussion, and singular value decomposition. http://www.cs.princeton.edu/picasso/mats/PCA-Tutorial-Intuition_jp.pdf. Accessed 1 Aug 2013
Schoeberl, B., Eichler-Jonsson, C., Gilles, E.D., Muller, G.: Computational modeling of the dynamics of the MAP kinase cascade activated by surface and internalized EGF receptors. Nat. Biotechnol. 20(4), 370–375 (2002)
Spellman, P.T., Sherlock, G., Zhang, M.Q., Iyer, V.R., Anders, K., Eisen, M.B., Futcher, B.: Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol. Biol. Cell 9(12), 3273–3297 (1998)
Sunnåker, M., Stelling, J.: Model extension and model selection. In: Uncertainty in Biology, A Computational Modeling Approach. Springer, Cham (2016, this volume)
Tentner, A.R., Lee, M.J., Ostheimer, G.J., Samson, L.D., Lauffenburger, D.A., Yaffe, M.B.: Combined experimental and computational analysis of DNA damage signaling reveals context-dependent roles for Erk in apoptosis and G1/S arrest after genotoxic stress [Research Support, N.I.H., Extramural Research Support, U.S. Gov’t, Non-P.H.S.]. Mol. Syst. Biol. 8, 568 (2012). doi:10.1038/msb.2012.1
Tucker, L.R.: Some mathematical notes on three-mode factor analysis. Psychometrika 31, 279–311 (1966)
Vilela, M., Danuser, G.: What’s wrong with correlative experiments? Nat. Cell. Biol. 13(9), 1011 (2011). doi:10.1038/ncb2325
Whitton, J.L., Cornell, C.T., Feuer, R.: Host and virus determinants of picornavirus pathogenesis and tropism. Nat. Rev. Microbiol. 3(10), 765–776 (2005). doi:10.1038/nrmicro1284
Wiley, H.S., Shvartsman, S.Y., Lauffenburger, D.A.: Computational modeling of the EGF-receptor system: a paradigm for systems biology. Trends Cell. Biol. 13(1), 43–50 (2003)
Zou, H., Hastie, T., Tibshirani, R.: Sparse principal component analysis. J. Comput. Graph. Stat. 15(2), 265–286 (2006). doi:10.1198/106186006x113430
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Conflict of Interest
Conflict of Interest
The authors declare that they have no conflict of interest.
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Shah, M., Chitforoushzadeh, Z., Janes, K.A. (2016). Statistical Data Analysis and Modeling. In: Geris, L., Gomez-Cabrero, D. (eds) Uncertainty in Biology. Studies in Mechanobiology, Tissue Engineering and Biomaterials, vol 17. Springer, Cham. https://doi.org/10.1007/978-3-319-21296-8_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-21296-8_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-21295-1
Online ISBN: 978-3-319-21296-8
eBook Packages: EngineeringEngineering (R0)