Abstract
High-throughput biological data analysis has received a large amount of interest in the last decade due to pioneering technologies that are able to automatically generate large-scale datasets by performing millions of analytical tests on a daily basis. Here we present a new network-based approach to analyze a high-throughput phenomic dataset that was collected on maize inbreds and hybrids by an automated phenotyping facility. Our dataset consists of 1600 biological samples from 600 different genotypes (200 inbred and 400 hybrid lines). On each sample, 141 phenotypic traits were observed for 33 days. We apply a graph-theoretic approach to address two important problems: (i) to discover meaningful patterns in the dataset and (ii) to predict hybrid performance in terms of biomass based on automatically collected phenotypic traits. We propose a modelling framework in which the prediction problem becomes transformed into finding the shortest path in a correlation-based network. Preliminary results show small but encouraging correlations between predicted and observed biomass. Extensions of the algorithm and applications of the modelling framework to other types of biological data are discussed.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Alberto Castellini and Christian Edlich-Muth contributed equally to this work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Andorf, S., Gärtner, T., Steinfath, M., Witucka-Wall, H., Altmann, T., Repsilber, D.: Towards systems biology of heterosis: a hypothesis about molecular network structure applied for the Arabidopsis metabolome. EURASIP J. Bioinform. Syst. Biol. 2009(1), 1–12 (2009)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Chen, D., Neumann, K., Friedel, S., Kilian, B., Chen, M., Altmann, T., Klukas, C.: Dissecting the phenotypic components of crop plant growth and drought responses based on high-throughput image analysis. Plant Cell 26(12), 4636–4655 (2014)
Feher, K., Lisec, J., Römisch-Margl, L., Selbig, J., Gierl, A., Piepho, H.P., Nikoloski, Z., Willmitzer, L.: Deducing hybrid performance from parental metabolic profiles of young primary roots of maize by using a multivariate diallel approach. PLoS ONE 9(1), e85435 (2014)
Gärtner, T., Steinfath, M., Andorf, S., Lisec, J., Meyer, R.C., Altmann, T., Willmitzer, L., Selbig, J.: Improved heterosis prediction by combining information on DNA- and metabolic markers. PLoS ONE 4(4), e5220–547 (2009)
Groszmann, M., Greaves, I.K., Fujimoto, R., Peacock, W.J., Dennis, E.S.: The role of epigenetics in hybrid vigour. Trends Genet. 29(12), 684–690 (2013)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer Series in Statistics. Springer New York Inc., New York (2001)
Junker, A., Murayam, M.M., Weigelt-Fischer, K., Arana-Ceballos, F., Klukas, C., Melchinger, A.E., Meyer, R.C., Riewe, D., Altmann, T.: Optimizing experimental procedures for quantitative evaluation of crop plant performance in high throughput phenotyping systems. Frontiers in</CHECK>. Front. Plant Sci. 5, 770 (2015)
Klukas, C., Chen, D., Pape, J.M.: Integrated analysis platform: an open-source information system for high-throughput plant phenotyping. Plant Physiol. 165(2), 506–518 (2014)
Klukas, C., Pape, J.M., Entzian, A.: Analysis of high-throughput plant image data with the information system IAP. J. Integr. Bioinform. 9(2), 191 (2012)
Liaw, A., Wiener, M.: Classification and Regression by randomForest. R News 2(3), 18–22 (2002)
Marbach, D., Costello, J.C., Küffner, R., Vega, N.M., Prill, R.J., Camacho, D.M., Allison, K.R., Aderhold, A., The DREAM5 Consortium, Kellis, M., Collins, J.J., Stolovitzky, G.: Wisdom of crowds for robust gene network inference. Nat. Methods 9(8), 796–804 (2012)
Neumann, K., Klukas, C., Friedel, S., Rischbeck, P., Chen, D., Entzian, A., Stein, N., Graner, A., Kilian, B.: Dissecting spatio-temporal biomass accumulation in barley under different water regimes using high-throughput image analysis. Plant Cell and Environment, February 2015
Ogutu, J.O., Piepho, H.P.: Regularized group regression methods for genomic prediction: Bridge, MCP, SCAD, group bridge, group lasso, sparse group lasso, group MCP and group SCAD. BMC Proc. 8(Suppl 5), S7 (2014)
Schnable, P.S., Springer, N.M.: Progress toward understanding heterosis in crop plants. Annu. Rev. Plant Biol. 64, 71–88 (2013)
Steinfath, M., Gärtner, T., Lisec, J., Meyer, R.C., Altmann, T., Willmitzer, L., Selbig, J.: Prediction of hybrid biomass in Arabidopsis thaliana by selected parental SNP and metabolic markers. Theoret. Appl. Genet. 120(2), 239–247 (2010)
Strogatz, S.H.: Exploring complex networks. Nature 410(6825), 268–276 (2001)
Wold, H.: Soft Modelling By Latent Variables. Academic Press, London (1975)
Xu, S., Zhu, D., Zhang, Q.: Predicting hybrid performance in rice using genomic best linear unbiased prediction. PNAS 111(34), 12456–12461 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Castellini, A., Edlich-Muth, C., Muraya, M., Klukas, C., Altmann, T., Selbig, J. (2015). Towards a Graph-Theoretic Approach to Hybrid Performance Prediction from Large-Scale Phenotypic Data. In: Lones, M., Tyrrell, A., Smith, S., Fogel, G. (eds) Information Processing in Cells and Tissues. IPCAT 2015. Lecture Notes in Computer Science(), vol 9303. Springer, Cham. https://doi.org/10.1007/978-3-319-23108-2_15
Download citation
DOI: https://doi.org/10.1007/978-3-319-23108-2_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23107-5
Online ISBN: 978-3-319-23108-2
eBook Packages: Computer ScienceComputer Science (R0)