INTRODUCTION

It is a general conception that the discovery of the structure of DNA and the understanding of its basic functions in the cell, as established in the second half of the 20th century, has been the central achievement in biology of the last 5 decades: This knowledge has taught us the basic principles of inheritance. Only recently we start to realize that the increasing knowledge on DNA has much more fundamental consequences for understanding biology. The results of the sequencing of an increasing number of complete genomes and the comparative evaluation of information in the genomic DNA sequences have opened new and unexpected aspects, which influence all fields of biology.

The two probably most important of these aspects emerging from the genomic sequencing data concern the evolution of genomes and the functional organization of genomes. The recent sequencing data of genomes reveal also that our picture on the genome changes considerably if high accuracy sequencing data are compared with the conclusions drawn from earlier shotgun sequencing approaches 1, 2. The availability of databases on DNA sequences represented at the RNA level - such as EST and cDNA databases - and the comparisons of these data with the sequences of the complete genome has as well yielded unexpected views on the extent of transcription of the genome. Together with the discovery of RNAi, this opens revolutionary views on eukaryotic genomes - their evolution as well as their function. What are these new views and what are their consequences for future research in biology?

The analysis of the genomic sequences has also stimulated the development of new techniques, which will allow the functional evaluation of the genome. Considerable parts of prior laboratory work have been shifted to the computer. Comparisons of nucleic acid sequences, which could earlier only done by nucleic acid hybridization experiments, can now be carried out at much larger scales, with better resolution and with exactly determined precision by sequence searches and by the alignments of nucleic acid sequences as they are available in databases. As a consequence new concepts for handling of biological data are required. At present, such concepts are, however, hardly in their initial phases of development. Future research in biology will be substantially based on tools of bioin-formatics, but we are not yet sufficiently prepared for this in terms of the state of bioinformatics.

On the experimental side, new laboratory techniques concern especially three fields of biology. The investigation of the transcriptional aspects of genomes - summarized under the term transcriptom - has yielded possibilities to compare gene expression at the RNA level in a large scale by the development of microarray techniques for nucleic acid sequences. Current approaches are directed towards the registration of gene activities and their changing patterns at all cellular and developmental levels. These approaches are complemented by studies of RNA expression by in situ hybridization. Such studies are systematically performed, for example, on Drosophila embryos by collecting the expression patterns of all available transcripts using whole mount in situ hybridization on embryos. The data was deposited in databases and become freely available to researchers.

A comparable approach is made at the protein level. Proteomics tries to collect expression patterns of proteins in a way similar to the recording of transcripts by microarrays. The techniques of proteomics are still less elaborate than those of RNA investigations, but new developments like protein microarrays will soon enhance the power of this field. Even though this approach appears not very different from that towards the transcriptom, proteomics faces with the fact of a much higher complexity of its targets: Proteins exists in many modifications (for example in phosphorylated, acetylated, methylated form or in many other modifications) and in multiple splicing variants. It becomes evident that it will be necessary to register and functionally evaluate millions of protein molecules in their cellular context and in their respective relationship. The development of bioinformatic tools for handling such information and for the use of it in multiple biological contexts will be one of the most intriguing and demanding tasks in future biology.

A third field, which has been developing rapidly in recent years, is the diverse microscopy techniques such as confocal microscopy, laser scanning microscopy and various techniques in electron microscopy. An example is the atomic force microscope (Binning et al. 1986 3). The potential to localize macromolecules with very high sensitivity and precision in the cell or even in subcellular compartments and to analyze their localization and dynamics - which can, for example, be achieved by photo bleaching - in the cell in a three-dimensional fashion has provided new insights in cellular processes and will be essential for research in cellular and developmental biology in the future. Added to these techniques are new genetic techniques which allows to introduce and express molecules, marked for example with fluorescent label (like Green Fluorescent Protein, GPF, and its variants), in a controlled fashion and with increasing precision and efficiency to manipulate single genes. This permits to establish the precise time of expression of molecules, the subcellular localization and the developmental pattern of distinct molecules, even if present in small amounts in single cells. Such techniques will be essential for integrating our knowledge and to transfer it into a cellular or tissue level.

In the subsequent part of this review I shall concentrate on two aspects of the recent developments in molecular biology, the changing views on evolution of the eukaryotic genome, based on genome research, and on the consequences of RNAi mechanisms for understanding regulatory mechanisms at the genome level.

GENOMES: A NEW WORLD

To Charles Darwin probably hardly anything in modern biology would have been more exciting than our new insights in the evolution of genomes. Though biologists (except the few political indoctrinated cranks like those having promoted Lyssenkoism) during the last century have not questioned the descendence theory as the basis for understanding biological evolution, the analysis of DNA sequencing data has yielded many unexpected facts in additional support of Darwin's descendence theory. These facts can very simply be summarized by the statement that probably most - if not all - of the major genetics traits have evolved very early on an evolutionary scale and have been maintained throughout the history of all phyla. This holds not only true for fundamental cellular mechanisms such as DNA replication and repair, cell cycle control and the basic mechanisms of mitosis and meiosis but also, for example, for major principles of the early embryonic development of organisms. Such principles are conserved as are known since years from the study of genes involved in early Drosophila development and segmentation 4, 5. The amazing maintenance of genetic tools in evolution reaches, however, much further: The development of entire organs is also based on very old principles as has first been demonstrated for the development of the eye 6. While classical comparative anatomy taught us that the ingenious invention of the eye has by nature been achieved several times independently in analogous ways (in insects, cephalopods, mammals), this dogma has now been proven wrong. The principle genes required for the construction of an eye are the same and have been designed early in evolution, even though the details of their involvement in the construction of eyes in different phyla might be different. But even more surprising: The key gene for eye development recovered from the mouse genome (Aniridia) can be used to create an insect eye in Drosophila in ectopic positions or even replace the respective Drosophila gene (eyeless) 7. Similar conclusions on evolutionary conservation of developmental pathways have been made for other gene products, as for example heart myosin, or for key genes in development like engrailed 8 and have recently been proven on a more universal level by general expression studies of groups of related genes 9.

One might argue that the evolutionary conservation of cellular processes such as cell cycle regulation, chromosome separation or DNA repair is not so unexpected as they define the basis of the function of a cell. But that the development of morphologically widely different organs has a similar ancient basis, is probably one of the most intriguing findings in modern biology. Such conclusions will have substantial consequences in practical and theoretical terms: The investigation of simple model organisms can supply us with the basis of knowledge which can be applied with little or no changes to mammals or even men 10, where experiments are complicated, time consuming, expensive or not even be possible 11. There are consequences for our judgment on the probability of those diseases, such as, for example, induced by viruses, can be transferred between organisms: An example is provided by the recent corona virus problems (“SARS”) 12. Not at least, we may be induced to raise ethical questions and questions on medical security for the use of primates for experimentation, or on the implications of gene transfer with retroviruses as vectors or organ transplantations between mammals.

GENOME COMPLEXITY

One of the puzzling features in the comparison of eukaryotic genomes has always been their extremely variable size 1. How is this compatible with the idea that the principle genetic systems are very similar throughout eukaryotes? How can we understand the increasing complexity of organisms if we assume a high degree of genetic stability in terms of equipment with similar genes?

First ideas regarding the question of maintenance of principle genetic information have been obtained long ago after it was recognized that eukaryotes posses a considerable part of their DNA sequences in multiple copies, the so-called repetitive DNA fraction (review 13). Initially it was assumed that repetitive DNA is mainly found in a portion of the genome called heterochromatin, which is considered genetically reactively inert (for discussion see 14).But soon it was recognized that repetitive DNA is found all over the genome and includes genes as well. This together with the analysis of the globin genes of vertebrates had soon led to the proposal that gene duplications play an important role in evolution 15. The analysis of complete genome sequences has substantiated this assumption and has proven that considerable proportions of the genome can be considered as duplicated DNA sequences. The rate of gene duplications in the human genome is, according to recent estimates, 0.5-1% in 1 million years 16. Consequently, the genome would be duplicated in approximately 200 Million years, a period short in terms of evolution. Obviously, not all duplications will be fixed in the genome, but they nevertheless provide ample possibilities for selection. Duplicated genes might remain identical in their products or they might diverge giving rise to proteins of different functions. Classic examples for the latter kind of genes are the globin genes, which diverged and are expressed in distinct developmental patterns (see 17, 18) and, more recently, the HOX genes. The evolutionary history of the Hox genes is particularly well investigated and allows to identify their duplication patters within vertebrates in detail 19. Gene duplications and the possible mechanisms for divergence were discussed for the eye crystalline genes by Piatogorsky and Wistom 20. Examples of genes retained in an identical or very similar sequence are the genes coding for histone proteins or for ribosomal RNAs.

The detailed analysis of genes derived from duplications has revealed remarkable features 19, 21: The divergence of duplicated genes occurs much more rapidly in regulatory regions than within the protein-coding part of the gene. This may have different consequences, depending on the kind of mutations: If the regulatory sequence becomes nonfunctional, the gene is functionally eliminated and will degenerate by mutation. A modified regulatory region may, however, also lead to a change in the gene expression pattern and initiate the evolution of a new, specific task for this gene product. This, in turn, may be accompanied or followed by divergence of the coding DNA sequence as well 22. The networks formed by regulatory sequences appear of a major importance for the evolutionary fate of a duplicated gene. The existence of such regulatory networks has been proposed very early by Britten and Davidson 23.

An additional mechanism contributes to duplication of (parts of) genomes: In many organisms, polyploidization occurred 24. Polyploidization appears often to be accompanied by a rapid evolution of the duplicated parts of the genome 25. Some of the polyploid genes become inactive by silencing effects (see below) while others diverge into new function, comparable to the situation described for single-gene duplication 26, 27, 28. Polyploidization events can hence explain the surprisingly large differences in the genome sizes of closely related organisms 29. Those ploidization steps have occurred can, however, often only be recognized by a detailed analysis of the complete genome sequence.

These two genomic mechanisms - gene duplication and ploidization - seem to account for much of the evolution of genomes. There exist other possibilities affecting gene evolution which one might consider as means for a “fine adjustment”. The fragmentation of eukaryotic genes into exons and introns appears to be important not only from the viewpoint that it offers a wide range for regulatory possibilities - for example by placing enhancers and silencers into the introns - but also by the opportunity to create varieties of gene products by various combinations of exons through differential splicing. Thus, in principle one might envisage this situation as a possibility to combine different 'genes' into one transcription unit 30. Although our evidence in this regard is scarce, this situation may be functionally important since it guarantees that the amino acid sequences of different products in the recombined exons remain identical. Hence, interactions with other molecules may remain guaranteed for a particular part of a protein even if other exons are added and enforce different functions.

The exon-intron structure of genes has also been considered in the context of recombination between the exons of different genes which may create new genes with new functions, for example by combining a DNA-binding domain in one exon with a protein-interacting domain of a different exon into a new combination. The evidence for such exon 're-shuffling' events is limited and it may have been of limited significance in the evolution of the genome but has certainly contributed to it 31, 32, 33.

The question arises how a long-distance recombinations of exons might occur 34. One possibility is gene conversion induced by restricted nucleotide similarities inducing regional pairing and subsequent recombination of DNA sequences in the context of repair processes after single- or double-stranded breaks in DNA. Another - probably more frequent - mechanism is the transposition of DNA induced by mobile elements. The evidence of the importance of mobile DNA elements for the genome structure and hence for evolution has amply increased during the past years 32, 35, 36, 37 and even functional interactions between different transposons have been demonstrated 38. Moreover, insertion of transposons can modify the expression patterns of genes 39.

Another possibility of introducing novel functions into a genome is still rather speculative. While it is clear that lateral gene transfer plays a role in prokaryotes, it is still unclear whether it can be considered as relevant for eukaryotic genome evolution. Some cases of lateral gene transfer have been documented 40 and the analysis of the complete genome sequence of Ciona has provided some additional indirect evidence that such events may have taken place 41. The mechanisms of such lateral transfer of DNA sequences are more or less open to speculation 42. But clearly, the transfer of genomic sequence integrated into [or: (retro-) viral] genomes seems a realistic possibility to explain lateral gene transfer. The argument of the host specificity of viruses has become rather questionable after it seems evident that relative minor changes in viral genomes may induce the possibility of infecting new hosts. This has for example been assumed to be the case for the infection of humans by corona virus originating from animals 12 leading to SARS epidemies. Also other viruses, for example hematorhagic viruses, may evolve in similar ways and change their hosts after relatively insignificant mutations.

The examples given before emphasize the major mechanisms for genome evolution as we see them today. It is sensible to ask whether there are other ways to generate completely new genes. The answer is difficult to give as one cannot exclude that a gene considered as 'new' is in fact a strongly diverged duplicated gene. More likely than the accidental de novo-evolution of a gene is that new genes might evolve from the combined action of various mechanisms. For example, a protein coding DNA sequences might be transposed into a genome region which is not functional. This could lead to the inclusion of nonfunctional DNA into a protein sequence if an open reading frame exists. The resulting protein might take over novel functions in the respective organisms. An example for genes which might be derived from such a process are the antifreeze glykoprotein (AFGP) genes of Dissostichus mawsoni (an Antarctic teleost fish) 43. In general, the frequency of such events would, however, to be considered low and it is not expected that such events substantially contribute to the evolution of genomes.

Genomes do not only develop by the creation of new genes but one has to assume that genes can be lost. Otherwise the genome would continuously increase in size. The comparison of the number of genes expressed in the brain of Drosophila, the brain of the honey bee (Apis) and the human brain has shown that of the 3000 conserved genes found expressed in human and Apis brains 100 genes are not present in the Drosophila genome 44. In other words, the Drosophila genome seems to have lost these 100 genes. If one accepts a linear extrapolation of this number to the entire genome, one would have to conclude that the Drosophila genome has in total lost some 500 genes, which were present in ancestors and which are still maintained in other phylogenetic groups up to man. One might derive that genomes in one phylogenetic group can discard sets of genes which in other phylogenetic groups remain conserved. The simultaneous inactivation of the genes of a regulatory gene networks - as have been discussed before in the context of the evolution of duplicated genes - would allow such processes. We can expect to obtain deeper insight into such processes from the comparisons of further complete genome sequences. This, however, still does not reveal any clues on the molecular mechanisms which might be responsible to discard genes or even sets of genes from a genome.

REGULATION OF THE GENONOME SIZE BY RNAi?

A key to an answer of the question how genes or sets of genes can be eliminated from a genome may be found in the recent development of our knowledge on the transcriptome. Two important new insights on transcription emerged over the past few years, which appear be related to one another. First, the analysis of EST and cDNA databases assembling transcripts from the mouse genome shows that at least 60% of the mouse genome is represented in transcripts 45. Since long it was known that a considerable proportion of newly synthesized RNA never leaves the nucleus and becomes rapidly degraded. However, the extent of transcription of the genome, even though these transcripts are accumulated from many different cell types, is unexpectedly high. Until now we do not have a clue as to the significance of this in biological terms.

A second, most intriguing new information on transcription concerns the discovery that a mechanisms called RNA interference (RNAi) plays a general role, not only in the posttranscriptional regulation of RNA levels but also in chromatin assembly processes 46, 47, 48. RNAi involves the activity of small RNA molecules derived from larger transcripts by controlled degradation (siRNAs: small interfering RNAs) as well as small RNA molecules specifically synthesized with the aim of controlling mRNA levels (miRNA: micro-RNAs) (review:49). Both types of RNA represent molecules of a length between 19 to 25 nucleotides, but they differ in their origin:

- miRNA is synthesized from non-protein coding DNA and are metabolized from transcripts accommodating inverted repeats. The double-stranded RNA formed by foldback is processed by an RNAse III-like enzyme, highly conserved through evolution from yeast to man and higher plants, called Dicer in animals or Dicer-like in plants. The nuclease cleaves within the double-stranded RNA (dsRNA) region and releases dsRNAs of 22-25 nucleotide pairs 50. These molecules can interact with the 3'-UTRs of transcripts and inhibit translation.

- siRNA is cleaved from larger RNA molecules, a process which also involves Dicer. The processing products associate into nucleoprotein complexes, including the RNA-induced silencing complex (RISC), and then binds to mRNA and cause its degradation. This process leads to a fine tuning of mRNA levels. siRNA is however not only formed from mRNAs but can also be formed from non-protein coding transcripts such as centromeric DNA transcripts and, in particular, from LTRs of transposons. In these cases siRNA becomes a constituent of chromatin. It is essential for chromatin packaging and - related to this - gene silencing processes as has recently been shown 51 for centromeric chromosome regions of Schizosac-charomyces.

These experiments imply that siRNA is able to control the assembly of the chromatin. The molecular mechanism is still unknown. One could assume that it is based on a recognition of the centromeric DNA sequences corresponding to the siRNA sequence as siRNA generally acts sequence-specifically. Over the past years it has been established that transcriptionally inactive chromosomal regions, such as telomere regions, and silenced genes, such as the mating type locus, have a closely similar composition of proteins (summarized in 52). Specific DNA signals are required to identify chromosome regions to be packaged in a way assuring gene inactivation. It is an obvious possibility that siRNA molecules identify by homology those chromosomal sites where packaging into inactive chromatin is required. These findings also shed a new light on the character of heterochromatin as a functional part of the genome.

In general, RNAi mechanism has apparently developed because of the need to identify nucleic sequences for regulatory purposes. Initially there were probably defense mechanism again viral infections 53. Such tasks may be closely related to another activity, ascribed to the multiple functions RNAi: the guided excision of genomic DNA sequences in ciliates. Ciliates are highly specialized single-cellular eukaryotes, distinguished by two nuclei - a generative diploid micronucleus and a vegetative, polyploid macronucleus. The peculiarity of the macronucleus is that only a minor part of the micronuclear genome is ploidized while the major part of the micronuclear genome becomes discarded 54. By Yao et al 53 it has been shown that the excision of DNA sequences from the genome is directed by small RNA molecules, synthesized in the micronucleus during conjugation and early macronuclear development. The exact mechanism is still unknown, but it is obvious that one can expect a close relationship of this mechanism with the DNA packaging events described earlier.

The observations of Schramke and Allshire 51 and Yao et al 53 open a way to understand, what has been a puzzle to geneticists for a long time: How can a genome remove DNA in a directed way and limited to specified DNA sequences? The most commonly known case is that of satellite DNAs. The high degree of variability in satellite DNA sequences between closely related species has for a long time been a question of interest without answer: How can satellite DNA sequences be dislincted simultaneously and be quickly removed from a genome and replaced by a totally different sequence type (see, for example:55)? The observations on ciliates, indicating that small RNAs are involved in sequence-directed DNA excisions. Now open a way to answer this question not only other cases of chromatin elimination, like those in Ascaris and in Crustaceans (see 52), might make use of related mechanisms: The same mechanisms could be applied to the simultaneous removal of blocks of repetitive DNA sequences, especially from centromeres, in germ cells. The genome is widely transcribed during the meiotic prophase. This makes suitable transcripts available, which could be used to excise blocks of satellite DNAs or other repeated DNA sequences (see 56, 57). Such transcripts might also be important for the tight packaging of the DNA with basic proteins at the end of the primary spermatocyte stage as a means of inactivating of the genome, similar as described by Schramke and Allshire for centromeric DNA. Under special conditions these transcripts might induce or facilitate DNA excision events comparable to those during the macronucleus in ciliates.

CONCLUSIONS

The developments in genome research during the past years have contributed fundamentally new insights in the structure and evolution of eukaryotic genomes. Even though we still do not have clear concepts for the details of the mechanisms controlling genome sizes, we at least can imagine how genome sizes can increase rapidly in evolution: gene duplication events and, in particular, ploidization are the prominent ways to enhance the amount of genomic DNA. This demands mechanisms for the controlled reduction of the genome size. RNAi mechanisms may provide means to decrease genome sizes by a sequence-specific removal of DNA sequences from the genome.

One of the most intriguing new aspects of genome research is the unexpected conservation of genetic pathways and, in particular, of regulatory key genes throughout evolution. This has consequences at many different levels. Basically it implies that organisms are more close in their genetic properties than was expected earlier. The possibility of transgression of barriers between species appears easier under such conditions and they are - opposite to more conventional views - not distinguished by very fundamental genetic differences. Minor genetic changes might determine their sexual isolation from one another. Consequences of this might become important on an applied level, especially in considering medical aspects. For example, we do not know how endogenous viruses or transposable genetic elements behave in the case of transplantation of organs. Can viruses or transposons activated in a foreign environment? If so, can they invade other cells of the host with all the possible consequences of inserting in more or less random genomic positions? One might even see the potential for lateral gene transfer in such cases as mobilized viruses or transposons might carry donor DNA sequences into germ cells of the host. The recognition that animal viruses can easily invade humans, as it appears the case of the Severe Acute Respiratory Syndrome (SARS), requests for increased caution in food production and animal breeding.

Consequences must also be drawn for bioethics: We can no further ignore that primates are genetically very similar. This will have implications for the use of primates as experimental subjects. On the other hands, in early days of gene technology it has been made a strong point from non-biologists that species are considered strictly separated in nature and that it is unethical to transgress natural barriers by transformation experiments. This argument is obviously not a convincing argument as the separation of species might be caused by minor genetic differences. Even if the separation of species should be more substantial it can be argued that the major genetic processes are similar enough to doubt the justification of such arguments against gene transfer.

Considering the fast changes of our views on genome evolution during the past few years, we might still expect new and unforeseen insights within the next years when more data for comparative studies become available.