Skip to main content
Log in

Comparing early and late data fusion methods for gene expression prediction

  • Focus
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

The most basic molecular mechanism enabling a living cell to dynamically adapt to variation occurring in its intra and extracellular environment is constituted by its ability to regulate the expression of many of its genes. At biomolecular level, this ability is mainly due to interactions occurring between regulatory motifs located in the core promoter regions and the transcription factors. A crucial question investigated by recently published works is if, and at what extent, the transcription patterns of large sets of genes can be predicted using only information encoded in the promoter regions. Even if encouraging results were obtained in gene expression patterns prediction experiments the assumption that all the signals required for the regulation of gene expression are contained in the gene promoter regions is an oversimplification as pointed out by recent findings demonstrating the existence of many regulatory levels involved in the fine modulation of gene transcription levels. In this contribution, we investigate the potential improvement in gene expression prediction performances achievable by using early and late data integration methods in order to provide a complete overview of the capabilities of data fusion approaches in a problem that can be annoverated among the most difficult in modern bioinformatics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Beer M, Tavazoie S (2004) Predicting gene expression from sequence. Cell 117

  • desJardins M et al (1997) Prediction of enzyme classification from protein sequence without the use of sequence similarity. In: Proceedings of the 5th international conference on intelligent systems for molecular biology. AAAI Press, Menlo Park, pp 92–99

  • Friedman J et al (2000) Additive logistic regression: a statistical view of boosting. Ann Stat 38(2):337–374

    Article  Google Scholar 

  • Gasch P et al (2000) Genomic expression programs in the response of yeast cells to environmental changes. Mol Biol Cell 11:4241–4257

    Google Scholar 

  • Hartigan J (1975) Clustering algorithms. Wiley, New York

    MATH  Google Scholar 

  • Iorio F et al (2009) Identifying network of drug mode action by gene expression profiling. J Comput Biol 16

  • Kuncheva LI et al (2001) Decision templates for multiple classifier fusion: an experimental comparison. Pattern Recognit 34(2):299–314

    Article  MATH  Google Scholar 

  • Lamb J et al (2006) The connectivity map: using gene-expression signatures to connect small molecules genes and diseases. Science 313

  • Lanckriet G et al (2004) A statistical framework for genomic data fusion. Bioinformatics 20:2626–2635

    Article  Google Scholar 

  • Lin H, Lin C, Weng R (2007) A note on Platt’s probabilistic outputs for support vector machines. Mach Learn 68:267–276

    Article  Google Scholar 

  • McIsaac K et al (2006) An improved map of conserved regulatory sites map for Saccharomyces cerevisiae. BMC Bioinf 7

  • Millar C, Grunstein M (2006) Genome-wide patterns of histone modifications in yeast. Nat Rev Mol Cell Biol 7

  • Noble W, Ben-Hur A (2007) Integrating information for protein function prediction. In: Lengauer T (ed) m genomes to therapies, vol 3, Wiley, New York, pp 1297–1314

  • O’Connor T, Wryck J (2007) Chromatindb: a database of genome-wide histone modification patterns for saccharomyces cerevisiae. Bioinformatics 23

  • Pavesi G, Valentini G (2009) Classification of co-expressed genes from dna regulatory regions. Information Fusion 10

  • Pavlidis P et al (2002) Learning gene functional classification from multiple data. J Comput Biol 9

  • Rosset S et al (2004) Boosting as a regularized path to a maximum margin classifier. J Mach Learn Res 5

  • Spellman P et al (1998) Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomices cerevisiae by microarray hybridization. Mol Biol Cell 9:3273–3297

    Google Scholar 

  • Subramanian A et al (2005) Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA 102

  • Yuan Y et al (2007) Prediction gene expression from sequence: a reexamination. PLOS Comp Biol 3

  • Zhu J et al (2004) Multi-class adaboost. Statistics and its Interface 2

Download references

Acknowledgments

The authors would like to gratefully acknowledge partial support by the PASCAL2 Network of Excellence under EC grant no. 216886. This publication only reflects the authors’ views. The author would also like to expressly thank Giorgio Valentini for the examination of early versions of the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Matteo Re.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Re, M. Comparing early and late data fusion methods for gene expression prediction. Soft Comput 15, 1497–1504 (2011). https://doi.org/10.1007/s00500-010-0599-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-010-0599-6

Keywords

Navigation