Abstract
The identification of coexpressed genes from microarray data is a challenging problem in bioinformatics and computational biology. The objective of this study is to obtain knowledge about the most important genes and clusters related to production outputs of real-world time-series microarray data in the industrial microbiology area. Each sample in the microarray data experiment is complemented with the measurement of the corresponding production and growth values. A novel aspect of this research refers to considering the relation of coexpression patterns with the measured outputs to guide the biological interpretation of results. Shape-based clustering models are developed using the pattern of gene expression values over time and further incorporating knowledge about the correlation between the change in the gene expression level and the output value. Experiments are performed for time-series microarray of bacteria, and an analysis from a biological perspective is carried out. The obtained results confirm the existence of relationships between output variables and gene expressions. Moreover, the shape-based clustering methods show promising results, being able to guide metabolic engineering actions with the identification of potential targets.
Similar content being viewed by others
Notes
The microarray data set obtained and used in this paper is available at request for academic purposes.
References
Andexer J, K SG, Nur-e Alam M, Lazos O, Foster T, Zimmermann A, Warneck T, Suthar D, Coates N, Koehn F, Skotnicki J, Carter G, Gregory M, Martin C, Moss S, Leadlay P, Wilkinson B (2011) Biosynthesis of the immunosuppressants FK506, FK520, and rapamycin involves a previously undescribed family of enzymes acting on chorismate. Proc Natl Acad Sci USA 108(12):4776–4781
Bolstad B, Irizarry R, A M, S TP (2003) A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19(2):185–193
Chira C, Sedano J, Villar JR, Camara M, Prieto C (2015) Shape-output gene clustering for time series microarrays. In 10th International conference on soft computing models in industrial and environmental applications, SOCO 2015, Burgos, Spain, June 2015, pp 241–250
Chira C, Sedano J, Villar JR, Prieto C, Corchado E (2013) Gene clustering in time series microarray analysis. In International joint conference SOCO’13-CISIS’13-ICEUTE’13—Salamanca, Spain, 11th–13th Sept 2013 Proceedings, pp 289–298
Dharmadi Y, Gonzalez R (2004) DNA microarrays: experimental issues, data analysis, and application to bacterial systems. Biotechnol Prog 20(5):1309–1324
Ernst J, Bar-Joseph Z (2006) Stem: a tool for the analysis of short time series gene expression data. BMC Bioinform 7(1):191
Kang A, Chang M (2012) Identification and reconstitution of genetic regulatory networks for improved microbial tolerance to isooctane. Mol BioSyst 8:1350–1358
Larrañaga P, Calvo B, Santana R, Bielza C, Galdiano J, Inza I, Lozano JA, Armañanzas R, Santafé G, Pérez A, Robles V (2006) Machine learning in bioinformatics. Brief Bioinform 7(1):86–112
Lee C-P, Leu Y (2011) A novel hybrid feature selection method for microarray data analysis. Appl Soft Comput 11:208–213
Liu H, Liu L, Zhang H (2010) Ensemble gene selection by grouping for microarray data classification. J Biomed Inform 43(2010):81–87
Liu T, Lin N, Shi N, Zhang B (2009) Information criterion-based clustering with order-restricted candidate profiles in short time-course microarray experiments. BMC Bioinform 10(1):146
Lu Y, Han J (2003) Cancer classification using gene expression data. Inf Syst 28(4):243–268
Mller-Levet C, Klawonn F, Cho K-H, Yin H, Wolkenhauer O (2005) Fuzzy sets in bioinformatics clustering of unevenly sampled gene expression time-series data. Fuzzy Sets Syst 152(1):49–66
Mller-Levet CS, Yin H (2005) Modeling and analysis of gene expression time-series based on co-expression. Int J Neural Syst 15(04):311–322
Motamedi H, Cai S-J, Shafiee A, Elliston K (1997) Structural organization of a multifunctional polyketide synthase involved in the biosynthesis of the macrolide immunosuppressant fk506. Eur J Biochem 244(1):74–80
Motamedi H, Shafiee A (1998) The biosynthetic gene cluster for the macrolactone ring of the immunosuppressant fk506. Eur J Biochem 256(3):528–34
Nieselt K, Battke F, Herbig A, Bruheim P, Wentzel A, Jakobsen O, Sletta H, Alam M, Merlo M, Moore J, Omara W, Morrissey E, Juarez-Hermosillo M, Rodriguez-Garcia A, Nentwich M, Thomas L, Iqbal M, Legaie R, Gaze W, Challis G, Jansen R, Dijkhuizen L, Rand D, Wild D, Bonin M, Reuther J, Wohlleben W, Smith M, Burroughs N, Martin J (2010) The dynamic architecture of the metabolic switch in streptomyces coelicolor. BMC Genomics 11(1):10
Pandey G, Yoshikawa K, Hirasawa T, Nagahisa K, Katakura Y, Furusawa C, Shimizu H, Shioya S (2007) Extracting the hidden features in saline osmotic tolerance in Saccharomyces cerevisiae from dna microarray data using the self-organizing map: biosynthesis of amino acids. Appl Microbiol Biotechnol 75:415–426
Peddada SD, Lobenhofer EK, Li L, Afshari CA, Weinberg CR, Umbach DM (2003) Gene selection and clustering for time-course and doseresponse microarray experiments using order-restricted inference. Bioinformatics 19(7):834–841
Phan S, Famili F, Tang Z, Pan Y, Liu Z, Ouyang J, Lenferink A, O’connor M M-C (2007) A novel pattern based clustering methodology for time-series microarray data. Int J Comput Math 84:585–597
Pickens L, Tang Y, Chooi Y-H (2011) Metabolic engineering for the production of natural products. Annu Rev Chem Biomol Eng 2(1):211–236
Prieto C, Risueno A, Fontanillo C, De Las Rivas J (2008) Human gene coexpression landscape: confident network derived from tissue transcriptomic profiles. PLoS One 3(12):e3911
Saeys Y, Inza I, Larrañaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23(19):2507–2517
Smyth G, Speed T (2003) Normalization of cdna microarray data. Methods 31(4):265–73
Storey JD, Xiao W, Leek JT, Tompkins RG, Davis RW (2005) Significance analysis of time course microarray experiments. Proc Natl Acad Sci USA 102(36):12837–12842
Tummala S, Junne S, Paredes C, Papoutsakis E (2003) Transcriptional analysis of product-concentration driven changes in cellular programs of recombinant clostridium acetobutylicumstrains. Biotechnol Bioeng 84(7):842–54
Wang Y, Tetko IV, Hall MA, Frank E, Facius A, Mayer KF, Mewes HW (2005) Gene selection from microarray data for cancer classification—a machine learning approach. Comput Biol Chem 29:37–46
Wu K, Chung L, Revill W, K L, Reeves C (2000) The FK520 gene cluster of Streptomyces hygroscopicus var. ascomyceticus (ATCC 14891) contains genes for biosynthesis of unusual polyketide extender units. Gene 251(1):81–90
Acknowledgments
This research has been supported by Spanish Ministry of Science and Innovation, under Project TIN2014-56967-R, and Junta de Castilla y León Project BIO/BU01/15.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
C. Chira, J. Sedano, J. R. Villar, M. Camara and C. Prieto declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Communicated by A. Herrero.
Rights and permissions
About this article
Cite this article
Chira, C., Sedano, J., Villar, J.R. et al. Gene clustering for time-series microarray with production outputs. Soft Comput 20, 4301–4312 (2016). https://doi.org/10.1007/s00500-016-2299-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-016-2299-3