| MiCoViTo: a tool for gene-centric comparison and visualization of yeast transcriptome states1Laboratoire de Génétique Moléculaire CNRS UMR8541, Ecole Normale Supérieure, Paris, 75230 Cedex 05, France 2Equipe de Bioinformatique Génomique et Moléculaire INSERM E346, Université Paris 7, Paris, 75231 Cedex 05, France 3Present address: Lipper Center for Computational Genetics and Department of Genetics, Harvard Medical School, 200 Longwood Avenue, Boston, MA 02115, USA
BMC Bioinformatics 2004, 5:20doi:10.1186/1471-2105-5-20 The electronic version of this article is the complete one and can be found online at: http://www.biomedcentral.com/1471-2105/5/20
© 2004 Lelandais et al; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL. Keywords: microarray, functional categories, Saccharomyces cerevisiae, clusteringAbstractBackgroundInformation obtained by DNA microarray technology gives a rough snapshot of the transcriptome state, i.e., the expression level of all the genes expressed in a cell population at any given time. One of the challenging questions raised by the tremendous amount of microarray data is to identify groups of co-regulated genes and to understand their role in cell functions. ResultsMiCoViTo (Microarray Comparison Visualization Tool) is a set of biologists' tools for exploring, comparing and visualizing changes in the yeast transcriptome by a gene-centric approach. A relational database includes data linked to genome expression and graphical output makes it easy to visualize clusters of co-expressed genes in the context of available biological information. To this aim, upload of personal data is possible and microarray data from fifty publications dedicated to S. cerevisiae are provided on-line. A web interface guides the biologist during the usage of this tool and is freely accessible at http://www.transcriptome.ens.fr/micovito/ webcite. ConclusionsMiCoViTo offers an easy-to-read picture of local transcriptional changes connected to current biological knowledge. This should help biologists to mine yeast microarray data and better understand the underlying biology. We plan to add functional annotations from other organisms. That would allow inter-species comparison of transcriptomes via orthology tables. BackgroundAlthough the genome is mostly invariant in each cell of an organism, genes can have different expression patterns related to environmental conditions or developmental programs. For a given genome, different transcriptome states can be observed, depending on complex networks regulating adaptation and homeostasis. Fundamental questions about the topology of networks as protein interactome [1], metabolome [2] or transcriptional regulation networks [3,4] have been previously addressed. Interfaces like KEGG [5], Biocyc [6] or GenMapp [7] help biologists to put these results into a cellular context by mapping expression states onto metabolic network representations. Moreover, to visualize and edit networks, different tools like Cytoscape [8] or Osprey [9] are available. To understand the biology of the studied systems better, the trend is clearly towards the aggregation of multiple sources of biological information. This article focuses on the analysis of transcriptome states. Numerous microarray gene expression datasets are now available, giving the opportunity to get a picture of the transcriptome state in various cellular conditions. Various methods to mine compendia of transcriptome states and to try to understand their biological meaning already exist. One of the most successful is based on the assumption that genes having similar expression profiles across a set of conditions are likely to be involved in the same biological process [10]. Many approaches, including multivariate analysis, hierarchical clustering and SOM have been used to find patterns of expression and to create biologically relevant clusters (see [11] for review). By analysing whole microarray datasets, these "global" clustering approaches have already led to the formulation of interesting testable predictions [12]. However, it has been previously emphasized that classical clustering methods could be inefficient on large number of biologically unrelated datasets [13]. Indeed, in response to environmental changes, gene expression is modified only in a fraction of the transcriptome and the signal is hence diluted over the whole dataset. Few methods have been proposed to find these regulatory sub-signatures [13,14], and in fine, the most reliable way to go deeper into the data to capture interesting trends is to be an expert in the field. That is why tools allowing the biologist to mine microarray results, in order to find such expression modifications in a sub-set of genes related to his area of expertise, are highly desirable. Such "gene-centric" clustering analysis that distinguishes differentially-expressed genes in specific parts of the transcriptome (centred around a "seed gene") should overcome some drawbacks of global analysis approaches. The ideal tool would be gene-centric, and would provide understandable outputs mapping biological knowledge onto results. We present here MiCoViTo, a user-friendly tool for identifying and visualizing groups of genes having similar expression in two sets of microarray experiments representing two distinct transcriptome states. ImplementationPrincipleA given transcriptome state can be represented as a network where genes are joined pairwise by a weighted link proportional to a similarity measure between their corresponding expression profiles. The basic idea is therefore to compare the immediate transcriptome neighbourhood of a given gene (referred to hereafter as the seed gene) in two sets of microarray experiments describing two distinct transcriptome states. By neighbourhood, we mean genes whose expression profiles are closely related to that of the seed according to a chosen metric. By relaxing step by step the distance criterion, several neighbourhood levels, corresponding to larger and larger parts of the transcriptome state, can be defined. Thus, for each set of microarray experiments to be compared, each gene is assigned to the neighbourhood level reflecting the distance between its expression profile and the expression profile of the seed gene. In a second step, all neighbourhood level intersections are computed and arranged in a table (Figure 1, A). In this table, the upper left groups are genes having similar expression in both experiments compared to the seed profile while the upper right and the lower left groups include genes that have very close expression profiles in only one set of experiments. Only the upper left part of the table is of interest, as we do not want to consider genes that are co-expressed with the seed gene in none of the two transcriptome states. Therefore, the lower right part of the table is not analysed and is coloured in grey (Figure 1, A).
Note that MiCoViTo has been designed so that different types of comparisons can be performed, according to the given biological question. Neighbourhood comparison of one seed in two transcriptome states is described above (Figure 1). But, using the same principle, more elaborate comparisons can be performed like neighbourhood comparison of two distinct seeds in the same transcriptome state (Figure 2, A).
Visualization and mining of isolated clustersIn order to capture the biological meaning of gene clusters generated by MiCoViTo, it is necessary as a first step to visualize them in the context of current biological knowledge. Latter options for studying promising clusters are to map list of genes onto the metabolic pathways, to look for co-regulation in another expression dataset or to look for common cis-regulatory elements in promoters. Using a relational database system storing yeast MIPS functional information and external links, MiCoViTo provides access to all this information. For each neighbourhood intersection constructed as detailed above, a pie chart representing the gene distribution in one of the MIPS catalogues [15] (functional classification, protein class, EC number, PROSITE motifs, mutant phenotype and complexes catalogue) is displayed (Figure 1, B). When gene listing for one cluster is requested, additional information is available including direct links to the individual gene description pages of the SGD [16] and MIPS [15]. Furthermore, full gene lists can be posted directly to other online tools like KEGG for metabolism mapping [5], RSA tools for the discovery cis-regulatory motifs [17], SGD for Gene Ontology term mapping [16] or yMGV database for expression mining [18]. Datasets availableMicroarray data standards are now available [19] and public repositories have been constructed [20,21]. The community effort will ensure that more and more clean and formatted datasets become available for this kind of study. Here we use data from yMGV [18], the largest available yeast expression database. More than fifty previously-published microarray datasets are provided, giving a fair coverage of possible yeast transcriptome states (a listing is available in the web site). But users can also upload files onto the site to confront their results with published data or use MiCoViTo to compare two personal datasets. Technical informationMiCoViTo is composed of three parts: a set of programs for microarray data preprocessing and for comparing expression profiles, a web-interface and a relational database. All the softwares used to power MiCoViTo are freely distributed under an open source licence. Data pre-processing options such as pre-scaling or pre-centering expression profiles have been written in PERL while distance computation routines (Pearson distance, Squared Pearson distance and Euclidean distance) have been written in C in order to minimize computation time. The interface has been written in PHP. MIPS and SGD information is stored in a PostgreSQL relational database. Microarray data are stored in flat files. All graphical outputs are dynamically generated using the JpGraph PHP library [22]. ResultsAs an illustration, we used MiCoViTo to compare genes co-expressed during the cell cycle (Cho el al. dataset [23]) for two pairs of cyclin: CLN1/CLN2 and CLN1/CLB2 (Figure 2, B1). Results depend mainly on the proximity of seed expressions. Indeed, two genes close one to another like CLN1 and CLN2 (both implicated in the G1 phase), lead to the identification of numerous correlated genes also implicated in the G1 phase (Figure 2, B2) like PCL1 [24], MCD1 [25] or RNR1 [26]. But if CLN1 is compared to a gene implicated in another phase of the cycle like CLB2 (G2/M transition), no correlation at all is observed (Figure 2, B3). Moreover, the upper right and the lower left groups show genes implicated in the G2/M transition and the G1 phase like CLB2 and CLN1 respectively. DiscussionMiCoViTo aims to help biologists achieve an intuitive understanding of their own data using an approach based on the comparison of sets of microarray experiments. The user-friendly web interface is designed to be accessible to those with no particular technical skill. The major drawback of visualization tools based on biologists' knowledge and intuition is that it does not give a computer readable output and cannot be applied systematically to find interesting expression patterns. Particularly, seed choice is a crucial point of our approach because it precisely defines which part of the transcriptome to focus on. The gene-centric approach presented in this paper empowers biologists to focus on a particular sub-part of the transcriptome. In order to assist seed selection process, online lists of candidates for each set of microarray experiments are proposed. Those candidates are ranked according to their neighbourhood density (density is the number of genes co-expressed with seed according to a given expression distance threshold). Genes with a large neighbourhood density are those whose nearest neighbourhoods contain a lot of genes, which means that their expression profiles are similar to the expression of many other genes. Such information may be used as a starting point when no prior information about the transcriptome state topology is available. As initiated in a recent paper [27], one possible future direction is to use this approach to compare transcriptomes from different organisms captured in the same state. This will be possible if we can find functional annotation in the same format for two organisms and if we are able to define pairs of seeds having inter-organism correspondence. An annotation project like GO [28] addresses the first question. To define relevant inter-organism seeds, two ways appear realistic. The first one is to assume that sequence-based conservation implies function conservation. This is not always the case, but one can expect MiCoViTo results to be incoherent if seeds are not functionally related. Alternatively, one can start from pairs of genes previously described as orthologous. S. pombe could provide a good benchmark system to test this approach, since GO annotations are available and microarray datasets describing states previously studied in S. cerevisiae are publicly available. Moreover well-characterised S. cerevisiae/S. Pombe orthologous genes have been listed by the Sanger Center (Valerie Wood, personnal communication). ConclusionsMiCoViTo allows users to compare and visualize different transcriptome states in a gene-centric way. The generated clusters of genes are mapped onto existing biological knowledge to gain a higher level view of the ongoing transcriptional changes. Upload of personal data onto the site is possible but not mandatory since a compendium of more than fifty yeast microarray datasets is available. At present, this tool is restricted to S. cerevisiae, but a natural future direction will be to incorporate data originating from different species as well as orthology tables to allow comparison of seeds from different organisms. Availability and requirementsMiCoViTo is available at http://www.transcriptome.ens.fr/micovito/ webcite. The source code and database scheme are freely distributed to academic users upon request to the authors. A more detailed description of MiCoViTo including a step by step tutorial can be found online. List of abbreviationsMiCoViTo: Microarray Comparison Visualization Tool; DTT: dithitreitol; MIPS: Munich Information center for Protein Sequences; RSA tools: Regulatory Sequence Analysis Tools; SGD: Saccharomyces Genome Database; TF: transcription factor; yMGV: yeast Microarray Global Viewer. Authors' contributionsGL conceived, implemented MiCoViTo and drafted the manuscript, PMsuggested the first version of the visualization interface, contributed todiscussions and drafted the manuscript, PV and CJ are GL's Ph.D. advisors, they both contributed to discussions, SV provided help with algorithms and coordinated the project. All authors read and approved the final manuscript. AcknowledgementsThe authors want to thanks Stéphane Le Crom, Frédéric Devaux and Serge Hazout for helpful discussions and Boris Barbour for English correction. The MiCoViTo project was funded by the Programme Bioinformatique Inter-EPST-CNRS 2003, GL is supported by a MENRT, PM is supported by the French Therapeutical Research Association (AFRT) and the PhRMA foundation Center of Excellence in Integration of Genomics and Informatics (CEIGI), SV is supported by Hoechst Marion Roussel – Aventis grant number FRHMR2/9908. References
Have something to say? Post a comment on this article! |



on Google Scholar







author email
corresponding author email
Figure 1.
Figure 2.