Introduction

The mitochondrial genome (mtDNA) of Metazoa is regarded as the marker of choice for the reconstruction of phylogenetic relationships at several taxonomic levels, from populations to phyla, and has been widely used for the resolution of taxonomic controversies. Indeed, the small size of the molecule, its abundance in animal tissues, the strict orthology of encoded genes, the presence of genes/regions evolving at different rates, its uniparental inheritance and the absence (or very low level) of recombination (Elson and Lightowlers, 2006) make this molecule a reliable and easy-to-use phylogenetic marker. Phylogenies have been reconstructed from the sequence of single genes, including the popular cox1 gene, from the entire genic complement, or from gene arrangement. Indeed, mitochondrial gene order has been very effective in investigating deep-level phylogenetic relationships as it is characterized by a large number of states and generally low rates of changes (Boore et al., 1998; Lavrov and Lang, 2005).

Besides its use as a phylogenetic marker, the mtDNA represents a ‘full’ genome and the availability of complete sequences for several phyla provides a unique, largely unexplored, opportunity to decode the mechanisms underlying genomic evolution in a phylogenetic framework. Several structural genomic features (for example genome size, gene content, gene order, compositional features, nucleotide substitution rate, repeated sequences, non-coding sequences, secondary structure of the encoded RNA) can be systematically and quite easily investigated in the small mitochondrial genome. These features allow both the description of evolutionary trends in phylogenetically distant organisms and the identification of differences in functional constraints that might account for structural differences. In fact, differences in structural genomic features generally reflect differences in functional and evolutionary constraints. At present, only the rate and pattern of nucleotide substitutions have been widely analysed in mtDNA: in vertebrates, a correlation has been demonstrated between the asymmetrical process of genome replication and parameters as mutation pattern, substitution rate and compositional asymmetry, given that all these parameters are proportional to the time spent by the H strand in single-strand status, and vary with respect to genomic position (Reyes et al., 1998; Bielawski and Gold, 2002; Faith and Pollock, 2003; Raina et al., 2005; Broughton and Reneau, 2006). The evolutionary and functional meaning of other mt genome-level features awaits more detailed and systematic analyses, although some general hypotheses can be put forward. For instance, gene content is certainly affected by the ability of exchange of genetic material between the mitochondrial and nuclear compartments, the permeability and/or the presence of specific carriers on the mitochondrial membranes, gene dispensability and the difference in multimeric structure of the respiratory chain complexes between organisms. The secondary structure and size of transfer RNAs (tRNAs) and rRNAs are related to peculiarities of the mitochondrial translational apparatus, as demonstrated in nematodes by the concomitant unusual structure of mt tRNA, ribosomal RNAs (rRNA) and elongation factors (Okimoto and Wolstenholme, 1990; Okimoto et al., 1994; Sakurai et al., 2001, 2006; Ohtsuki et al., 2002). The number, size and location of non-coding regions is mostly related to the presence of replication and transcription regulatory signals, although the overlap between regulatory signals and coding sequences cannot be excluded in the compact metazoan mtDNA (Valverde et al., 1994; Peleg et al., 2004). Finally, gene arrangement is affected by the transcription mechanism, for example by the need to co-regulate the expression of some genes or the required stoichiometry of the gene products. Gene order changes can also originate from processes strictly related to replication, such as tandem duplications of genomic segments due to slipped-strand mispairing or imprecise termination of replication (Boore, 2000). Although mtDNA rearrangements can have profound functional implications on gene expression and genome replication, they have mainly been used to extract phylogenetic information, rather than to extrapolate data on the transcription or replication mode: one of the few exceptions is the prediction of the functionality of the duplicated control region (CR) in snake mtDNAs based on the analysis of the variability of compositional skew along the genome (Jiang et al., 2007).

In this review we will first provide a general overview of traditional and newly defined genomic features (that is gene content, genome size, architecture, AR, and gene strand asymmetry, GSA) inferred from the analysis of a carefully-revised data set of more than 1000 complete or almost complete metazoan mtDNAs. Then, we will focus on the comparison of mtDNAs from species belonging to the same genus. Indeed, as emerging from data reported in several recent publications (Fukami and Knowlton, 2005; Mueller and Boore, 2005; Iannelli et al., 2007a), differences observed in congeneric species can provide a good view of the evolutionary trend of the corresponding highest taxonomic groups, and the intra-genus comparison can be a quick and easy strategy to investigate the mtDNA evolutionary trend in previously unexplored taxonomic groups. Our synthesis will surely stimulate further comprehensive analyses of these and other mitochondrial genomic features in a phylogenetic framework.

The metazoan mtDNA data set

Despite the development of several specialized mitochondrial databases such as the NCBI Organelle Genome Resource, GOBASE, OGRe, AMiGA and MetaAMiGA and Mitome (Wolfsberg et al., 2001; Jameson et al., 2003; Feijao et al., 2006; O’Brien et al., 2006; Lee et al., 2007), the identification and retrieval of ‘all’ available complete mtDNAs is far from straightforward. Although these databases collect complete mtDNA sequences, and sometimes allow the retrieval and analysis of basic genomic data (such as gene order and base composition), most are updated rather infrequently and, most crucially, they do not attempt or fail to detect and correct the numerous errors present in gene annotations. In fact, only the manually curated entries of the OGRe database and the NCBI Organelle Genome Resource commonly report and fix mtDNA mis-annotations, although many entries remain uncorrected even in these resources. Most annotation errors consist of: incorrect denomination of a gene; apparent changes in gene position due to incorrect gene boundaries; and apparent lack of a gene due to partial mtDNA sequencing, or lack of annotation. Apart from these expected mis-annotations, there are other more subtle errors that can be identified only through bibliographic analyses. For example, the first published complete mtDNA of Branchiostoma lanceolatum (Y16474; Spruyt et al., 1998) has been demonstrated to be derived from Branchiostoma floridae (Nohara et al., 2005b) but most mt specialized databases still associate the mtDNA of B. lanceolatum to the Y16474 entry, rather than to the correct AB194383 sequence subsequently published (Nohara et al., 2005b). Similarly, most databases report only one of the two highly different male and female mtDNA isoforms transmitted with doubly uniparental inheritance (DUI) in bivalves (Breton et al., 2007), an inconsistency due to the elimination of sequence redundancy (that is additional mtDNAs belonging to the same species) performed without checking for significant differences between the putatively duplicated entries. The identification of the full set of complete mtDNA entries is also complicated by the fact that partial mtDNAs are sometimes labelled as complete sequences (implying apparent gene loss), whereas some complete sequences have an ambiguous description line, very different from the standard one, and can be identified only through specific interrogation of a primary database (see Lingula anatina mtDNA, AB178773).

One consequence of inconsistent annotations and/or descriptions of mtDNA entries is that the automated retrieval of the full data set of metazoan mtDNAs is quite problematic, and the same is true for statistics and comparative analyses of basic mitochondrial features such as genome size, gene content and taxonomic distribution of the available genomes. More intense efforts, including genome re-annotation, are required to carry out the analysis of more complex features, such as gene order, genome structure including the position of the CR, tRNA structures and more sophisticated compositional features.

The metazoan mtDNA data set analysed in this review has been obtained by examination of GenEmbl and three specialized mitochondrial databases: NCBI Organelle Genome Resource, OGRe and GOBASE (Ojala et al., 1981; Wolfsberg et al., 2001; Jameson et al., 2003). The final list, updated in March 2008, consists of 1265 sequences, including 59 partial or almost-complete mtDNAs. Some of the analysed partial sequences were erroneously described as complete genomes in the original database, whereas other partial mtDNAs have been included in the data set as they are the only representatives of a phylum, or as members of a congeneric pair. We identified mis-annotations in number, order and orientation of mt genes, in completeness of the mtDNA, or in species definition through different methods, such as reference consultation, software developed ad hoc to compare the feature tables of entries, and even sequence similarity analyses in the most enigmatic cases. Thus, 313 entries (24.8% of the entire data set) with mis-annotations in at least one feature (several entries have multiple mis-annotations) have been corrected. The list of analysed mtDNAs and corrected annotations is available on request.

Statistics on sequenced mtDNAs

The distribution of the completely sequenced mtDNAs between the major metazoan groups is reported in Table 1. As expected, the most represented group is the sub-phylum Vertebrata followed by the phylum Arthropoda (70% and 13% of the total data set, respectively), and the most abundant species are neopterygian fishes and eutherian mammals (31% and 16% of the total data set, respectively). A total of 14 phyla are represented by only 1 or 2 sequences, and 5 of them include only partial mtDNAs (Myzostomida, Phoronida, Nemertea, Pogonophora and Sipuncula; Table 1), indicating that many gaps in the taxonomic sample of metazoan mtDNAs need to be filled.

Table 1 Taxonomic distribution, gene content, distinct genome architecture and gene strand asymmetry of the available metazoan mtDNAs

The size of the 1206 completely sequenced mtDNAs ranges from 32 115 bp of Placopecten magellanicus (Mollusca Bivalvia) to 11 423 bp of Paraspadella gotoi (Chaetognatha) but one of two circular mt molecules of the rotifer Brachionus plicatilis is slightly smaller (11 153 bp; Suga et al., 2008). As shown in Figure 1, the size of the mt genome varies remarkably between the major metazoan groups, and can be highly variable even within the same group (see standard deviation bars). Among the most sampled metazoans (hash mark in Figure 1), Chordata, Echinodermata, Arthropoda and Platyhelminthes are characterized by low standard deviation and stable mtDNA size. On the contrary, Mollusca, enoplean Nematoda and Porifera exhibit a strong heterogeneity in genome size (high standard deviation in Figure 1). By way of contrast, the large standard deviation observed in Brachiopoda and Chelicerata is due to the presence of a single mtDNA with a very large size compared to the almost constant size of remaining mtDNAs (the L. anatina mtDNA is 28.8 kb long (Endo et al., 2005) against a mean value of 14.6±0.8 kb for three other brachiopods; the Metaseiulus occidentalis mtDNA is 24.9 kb long (Jeyaprakash and Hoy, 2007) against a mean of 14.8±0.8 kb for 27 other chelicerates).

Figure 1
figure 1

Mean length (kb±standard deviation bars) of complete mtDNAs calculated for the major metazoan groups. Hash mark indicates groups including more than 10 complete mtDNAs. The size of both circular mt chromosomes of Brachionus plicatilis (Suga et al., 2008) has been used to calculate the mean value in Rotifera.

Variation in the length of the CR, where repeated sequences are often present, accounts for much mtDNA size variability, however the largest differences in genome size are due to segmental duplication followed by several rearrangements, with consequent presence of duplicated genes or pseudogenes and increased length of non-coding sequences.

Gene content

As shown in Table 1, the most frequent mt gene content consists of 37 genes, however the gene number varies from 14 genes in two Chaetognatha species to 53 in Metaseiulus occidentalis (Arthropoda, Chelicerata; Helfenbein et al., 2004; Papillon et al., 2004; Faure and Casanova, 2006; Jeyaprakash and Hoy, 2007). This variability in gene content rarely concerns protein-coding and rRNA genes (see below), and is mostly due to tRNA genes (Table 1).

It is commonly reported that the mtDNA encodes for a full set of tRNAs able to decode the modified mt genetic code and then to perform the mt protein synthesis. This full set of mt tRNAs can vary from one taxon to the another, but it is significantly smaller than the nuclear one, as consequence of the relaxed wobble rules in codon–anticodon decoding, and of the usage of a single tRNA-Met(CAU) for both translation initiation and elongation. Thus, the expected set of mt tRNAs ranges from 24 genes for taxa with one deviation from the universal genetic code (UGA encoding Trp rather than Stop) to 22–23 genes for taxa with four deviations from the universal genetic code (modified meaning of UGA, AUA and AGR codons). However, taxa using the same mt genetic code can also encode for a radically different number of tRNA genes. For example, Demospongiae (Porifera) and Cnidaria use the same modified genetic code (one codon deviation) but they encode for a different number of tRNAs. In Demospongiae the number of tRNAs is extremely variable (from 2 to 27 genes), as a consequence of frequent losses or duplications of tRNA genes; most species also exhibit distinct initiator and elongator tRNA-Met. At the other extreme, cnidarian mtDNAs encode only one or two tRNAs (trnM and trnW) and it is believed that most mt tRNAs are imported from the cytoplasm (Beagley et al., 1995). Similarly, the mtDNA of Chaetognatha encodes for 1 tRNA rather than the 22 found in most other invertebrates that employ the same genetic code (Table 1). Furthermore, the number of tRNAs is constant in some higher level taxonomic groups, and highly variable in others. For example, 1.5% of the analysed vertebrate mtDNAs differ from the standard set of 22 tRNA genes whereas the figure is 15.6% for arthropod mtDNAs. In vertebrates this variation is almost entirely due to gene acquisition whereas both acquisition and loss are observed in arthropods (data not shown). Gene number of tRNA varies greatly within Bivalvia (Mollusca), Enoplea (Nematoda) and Porifera, where it is difficult or impossible to define a ‘standard’ tRNA gene content (see the high standard deviation of tRNA gene number in Table 1). As a cautionary note, we observe that the sporadic absence of a single tRNA gene can be caused by errors in tRNA annotations. For example, trnM was originally reported as lost in the mtDNA of the chaetognath Spadella cephaloptera (Papillon et al., 2004) and identified only in a subsequent study (Faure and Casanova, 2006), although it is still not annotated in the Spadella mtDNA entry (NC_006386).

Thus, we can conclude that tRNAs are the gene category with the highest ‘dispensability’ in the mitochondrial genome: their number depends not only on the mt genetic code but also on the evolutionary history of the taxa and on unknown factors probably related to the mtDNA's ability to acquire/lose genetic material and to the capacity of the organelle to import and use nuclear-encoded tRNA molecules.

An interesting example of loss/acquisition of tRNA genes is represented by the case of tRNA-Met: two distinct genes for an initiator and an elongator tRNA-Met, both with CAU anticodon, have been identified only in the phylum Placozoa, the basal metazoan lineage (Dellaporta et al., 2006; Signorovitch et al., 2007) and in Porifera Demospongiae. However, the existence of an elongator and an initiator tRNA-Met have been also hypothesized in tunicates and some bivalves, where an unusual tRNA-Met(UAU) has been recognized as an elongator tRNA-Met additional to the common tRNA-Met(CAU) (Hoffmann et al., 1992; Beagley et al., 1999; Yokobori et al., 1999, 2003, 2005; Gissi et al., 2004). This peculiar phylogenetic distribution suggests that the elongator and initiator tRNAs-Met have been lost early in metazoan diversification, and have been re-acquired independently in the two distant lineages of mollusc bivalves and tunicates. Our revised mtDNA annotations highlighted that the unusual tRNA-Met(UAU) is encoded by all available mtDNAs of tunicates and by six bivalve species, whereas the common tRNA-Met(CAU) gene is duplicated in five additional bivalve species (based on our annotation, trnM(UAU) is present in Mizuhopecten yessoensis, P. magellanicus, both M and F types of three Mytilus species, and the M type of Venerupis philippinarum; trnM(CAU) is duplicated in Acanthocardia tuberculata, two Crassostrea species, Hiatella arctica, and the F type of V. philippinarum). Thus, it should be further investigated whether the sporadic distribution of duplicated tRNA-Met(CAU) in Bivalvia is correlated to the absence of the additional tRNA-Met(UAU), as it is possible that the mt translational system of bivalves has acquired distinct elongator and initiator tRNAs-Met by duplication of an ancestral tRNA-Met gene or editing of a duplicated gene.

With respect to the other mt-encoded genes, the common content consists of 2 rRNA genes (rrnS and rrnL) and 13 protein-coding genes for subunits of the respiratory chain complexes (nad1nad6; nad4L; cox1–cox3; cob; atp6; atp8). The two rRNA genes are always mitochondrially encoded, and their duplication is very rare, having been observed in only four species (rrnS is duplicated in the bivalve Crassostrea gigas and in distinct haplotypes of the pillbug nematode Thaumamermis cosgrovei (Milbury and Gaffney, 2005; Tang and Hyman, 2007); rrnL is duplicated in the chigger mite Leptotrombidium pallidum (Shao et al., 2005b); both rrnS and rrnL are duplicated in the large mt haplotype of the nematode Strelkovimermis spiculatus). Loss or acquisition of the protein-coding genes are also rather infrequent (Table 2), and the actual loss of genes remains sometimes ambiguous due to uncertainty in gene annotation or to the availability of incomplete mtDNA sequences (see the absence of nad3 and nad6 in the mite Metaseiulus occidentalis, Jeyaprakash and Hoy, 2007, and the absence of atp8 and nad6 in two Hexactinellida sponges, Haen et al., 2007, respectively). Few cases of sporadic gene loss have been reported in vertebrates (Table 2), and only the loss of nad6 in the neopterygian Choinodraco rastrospinosus has been described as a feature common to all Antarctic fishes of the suborder Nothothenioidei, due to the real loss of the protein function (Papetti et al., 2007). Notably, genes encoding for subunits of the adinosinetriphosphate (ATP) synthase complex show a significant tendency to be lost or acquired by the metazoan mtDNA (Table 2). Indeed, atp9 is encoded by most Porifera mtDNAs (see below) and atp6 is absent in all Chaetognatha. atp8 is absent or highly modified in several phylogenetically distant groups, that is in Chaetognatha, Rotifera, most Mollusca Bivalvia (13 of the 19 sampled mtDNAs), Nematoda and Platyhelminthes. Indeed, putative atp8 genes have been annotated in one rotifer (Steinauer et al., 2005) and in the nematode Trichinella spiralis (Lavrov and Brown, 2001a), but we note that none of these putative proteins contain the MPQL amino acid signature conserved at the N-terminal of other metazoan ATP8 (Gissi et al., 2004). It should be stressed that the ATP8 protein is characterized by short and variable length, and by a higher conservation of the secondary structure compared to the primary sequence (Papakonstantinou et al., 1996a, 1996b; Gray, 1999), and that both these features have hampered the annotation of atp8 in several cases. Indeed, among bivalves atp8 has been annotated only in Hiatella (Heterochonchia; Dreyer and Steiner, 2006) and Lampsilis (Palaeoheterodonta; Serb and Lydeard, 2003) but we have identified this gene also in the M and F types of both Venerupis and Inversidens species. Similarly, atp8 was not annotated in the first published tunicate mtDNA of Halocynthia roretzi (Yokobori et al., 1999), although a very short form of ATP8 is actually mitochondrially encoded in all tunicates (Gissi and Pesole, 2003; Yokobori et al., 2005; Iannelli et al., 2007a). In spite of these annotation difficulties, atp8 could be considered a real ‘dispensable’ mitochondrially encoded gene, as it is absent in several distant metazoan lineages. It needs to be further investigated whether this loss is related to the transfer of the gene to the nuclear genome, or to the actual loss of the ATP8 protein due to differences in the ATP synthase structure between taxa.

Table 2 Taxa with a non-standard content of mitochondrial protein-coding genes with respect to the usual presence of 13 genes encoding for subunits of the respiratory chain

Extra protein-coding genes are present only in the mtDNA of non-bilaterian metazoans, that is Cnidaria and Porifera (Table 2). Although some of these genes are open reading frames (ORFs) with no similarity to known proteins, it has been observed that the additional proteins of Cnidaria are mostly involved in DNA interaction, whereas those of Porifera have metabolic roles (Flot and Tillier, 2007) and are also usually encoded by the mtDNA of closest relatives of metazoans. In fact, among the additional proteins, the atp9 and tatC genes identified in Porifera are also encoded by the mtDNA of protists and unicellular relatives of metazoans (Yen et al., 2002; Burger et al., 2003; Dellaporta et al., 2006), indicating that this is one of the several primitive non-metazoan features retained by Porifera (Haen et al., 2007). The sporadic presence/absence of distinct additional genes in non-bilaterian species indicates a general propensity of these mtDNAs to acquire or lose genetic material, sometimes through the activity of mobile elements. Indeed, Hexacorallia possess one or two introns of group I, probably still active as shown by the tendency of the nad5 intron to gain genes from the rest of the genome (Medina et al., 2006; Brugler and France, 2007; Sinniger et al., 2007), whereas in the sponge Amphimedon queenslandica the atp9 gene has been transferred to the nuclear genome, probably through a transposon-mediated translocation, as the functional nuclear atp9 gene is flanked by inverted repeats (Erpenbeck et al., 2007).

Finally, introns of group I have been identified only in basal metazoans belonging to Cnidaria and Porifera, whereas surprisingly a group II intron has been found in the annelid Nephtys sp., probably as result of a recent horizontal gene transfer from a bacterial or viral vector (Valles et al., 2008).

Genome architecture

The mtDNA AR is defined as the order of the entire set of functional mt-encoded genes, included duplicated and unusual mt genes. Therefore, this feature takes into account both gene content and gene order, and a difference in mtDNA AR means a difference in gene content and/or gene order. We have estimated the variability in mtDNA AR for each major taxonomic group by the ‘AR rate’, given by (NAR−1)/(NmtDNA−1), where NAR and NmtDNA are the number of different ARs and the number of completely sequenced mtDNAs of that taxa, respectively (Table 1). Thus, the AR rate ranges between 0 (no AR variability) and 100 (all mtDNAs have a different AR).

As shown in Table 1, AR rate values (All genes) higher than 70, indicatives of a strong variability in genome AR, are observed in nine phyla (Hemichordata, Annellida, Brachiopoda, Chaetognatha, Bryozoa, Entoprocta, Rotifera, Mollusca and Porifera) as well as in subgroups belonging to phyla with good conservation of mtDNA AR: Tunicata within Chordata, Myriapoda within Arthropoda, and Enoplea within Nematoda. Although some of these groups are represented by a few sequences (Table 1), the AR variability seems to be real, as in many cases the AR differences have been found in the most closely related species within that taxonomic group (data not shown).

Vertebrates are the sub-phylum with the lowest variability in mtDNA AR compared to other groups of similar taxonomic rank (4.9 AR rate against a range of 35.7–100 for other sub-phyla/classes), suggesting that the almost frozen mtDNA gene order and content ascribed to vertebrates is an exception in metazoan evolution, rather than a rule. Moreover, the frequent changes in genome AR found in vertebrate lepidosaurians and amphibians (Table 1) suggest that the mtDNA AR within vertebrates is not so invariant as previously believed, and that the observed stability could be the consequence, to some extent, of the uneven sampling among vertebrates.

Excluding tRNAs, often described as highly ‘mobile’ mitochondrial genes, the number of different genome ARs decreases only slightly in taxa showing the highest AR variability (see AR-tRNA and Δ% of Tunicata, Hemichordata, Brachiopoda, Chaetognatha, Bryozoa, Rotifera, Mollusca, enoplean Nematoda and Porifera in Table 1), with the exception of the phylum Annelida, where the extensive rearrangements appear exclusively associated with the translocation of tRNA genes. However, the exclusion of tRNA genes leads to a large decrease in AR rate (Δ% 60) in Vertebrata, Arthropoda and Platyhelminthes, all taxa with moderate variability of genome AR (see AR-tRNA and Δ% in Table 1). These data underline that changes in the number and location of tRNA genes are the main cause of differences in mtDNA AR only in case of a moderate AR variability, and suggest that different rearrangement mechanisms, characterized by a differential involvement of tRNA genes, could occur in taxa with unstable versus stable mtDNA AR. Alternatively, a ‘saturation effect’ in gene order changes can also account for the apparent lack of strong tRNA mobility in taxa with the highest AR rate (see also the paragraph on mtDNA variability in congeneric species).

In general, the observation of high variability in mtDNA AR in several phylogenetically distant groups indicates that acceleration of the rate of genomic rearrangements has occurred independently several times in the evolutionary history of metazoans, and suggests that the causes of this increased rate could be investigated through the analysis of physiological and metabolic features common to distant taxa sharing the same evolutionary pattern.

Gene strand asymmetry

The mtDNA of most metazoans is characterized by an asymmetric distribution of the genes between the two strands, allowing the identification of a major and a minor strand depending on the number of encoded genes. Here, the GSA has been quantified as the absolute value of the difference in gene number between the two strands, divided by the total number of genes. The resulting GSA value ranges from 0 to 1, with values close to zero indicating an almost equal number of genes encoded by the two strands, and values higher than 0.5 corresponding to the presence of at least 75% of the total genes on the major strand.

Quantitative data on the GSA point out to a large variability of GSA, ranging from almost 0 to 1, in Arthropoda, Mollusca, enoplean Nematoda and Porifera (Table 1), indicating that frequent exchanges of genes between the two mt strands have occurred during the evolution of these genomes. On the contrary, the GSA is almost invariant and very close to 0.5 in all vertebrates and cephalochordates, in accordance with the observation of low variability in genome AR (see AR rate in Table 1) and rare gene inversions in these two groups (see below).

Mitochondrial genomes showing a symmetric gene distribution (GSA 0.1, thus 50–55% of the total genes on each strand) are very rare, indeed this feature has been observed only in 17 phylogenetically distant species belonging to four phyla (a few echinoderms, cephalopods, scaphopods, crustaceans, hexapods, nematodes and sponges; Table 1). On the contrary, about 10% (122 sequences) of the analysed complete mtDNAs are characterized by the most extreme GSA, that is by all genes encoded on the same strand and therefore transcribed in the same direction. This peculiar feature is shared by all, or almost all, mtDNAs of five phyla (Annelida, Brachiopoda, Platyhelminthes, Cnidaria and Porifera) and three sub-phyla/classes (Tunicata, Bivalvia and Chromadorea; Table 3). These mtDNAs can be also regarded as containing only one cluster of co-oriented genes, and it is notable that the few exceptions to the ‘one-cluster structure’ found in Porifera, Cnidaria and Bivalvia consist of mtDNAs with two clusters of co-oriented genes, one for each strand (Table 3). Similarly, the mtDNAs of non-bivalve molluscs are characterized by a structure with one-, two- or four-exact or defective clusters of co-oriented genes (Table 3), where defective clusters are defined as co-oriented gene cluster interrupted by the presence of one or two adjacent genes located on the opposite strand. This feature suggests that inversions of large mitochondrial regions have occurred frequently during the evolution of some taxonomic groups, such as Mollusca, Porifera and Cnidaria. Finally, it is intriguing that five of the eight previously described taxa with only one coding strand are also characterized by a strong variability of genome AR (see Tunicata, Annelida, Brachiopoda, Bivalvia and Porifera in Tables 1 and 3), suggesting a possible correlation between these two features and that the same mechanism could be responsible for both the presence of a single cluster of co-oriented genes and a strong variability in gene number/order.

Table 3 Taxa with mtDNA containing up to four clusters of co-oriented genes

It should be stressed that the distribution of genes between the two strands is directly related to the mechanism of mtDNA transcription. In principle, only one strand should be transcribed in genomes with a single-coding strand, and the existence of one or multiple transcription units per strand could be predicted based on the distribution of the co-oriented genes as clustered or scattered along a given strand, respectively. However, the data on mt transcription in mammals demonstrate that we can also expect the transcription of the entire minor or non-coding strand, against the principle of the highest cellular economy, probably in response to the requirement of gene regulation by antisense transcripts.

Benefits of congeneric comparisons

The best way to carry out evolutionary analyses is to compare a given feature in organisms divergent enough to observe significant differences, but sufficiently similar to exclude the saturation in the occurred differences. Congeneric species, that is species belonging to the same genus, can be regarded as a special case of closely related organisms for which the phylogenetic distance is based on traditional morphological characters and is the lowest that can be measured between clearly distinguishable organisms. Of course, different congeneric pairs are not expected to have the same divergence time neither the same evolutionary rate and pattern but their evolutionary divergence may be considered as minimized and standardized within a given phylum on the basis of classical taxonomic criteria.

The current availability of a large number of congeneric mtDNAs is partly the consequence of fortuitous coincidences related to the interest of several authors on the same subject (Yokobori et al., 2004; Cook, 2005; Akasaki et al., 2006), whereas in other cases the mtDNA has been intentionally sequenced and compared in congeneric species, subspecies, strains or members of a species complex with the aim to reconstruct phylogenetic relationships (Minegishi et al., 2005; Nishibori et al., 2005; Goios et al., 2007), to study a peculiar evolutionary dynamics (Mueller and Boore, 2005), to develop reliable molecular markers for species identification and monitoring (Guo et al., 2006; Nakao et al., 2007; Park et al., 2007a), or to solve controversies on the taxonomic status (Fukami and Knowlton, 2005; Iannelli et al., 2007b). Beyond being the result of exotic interest, these studies expanded our knowledge on genomic evolution and can be applied in field as different as human health (Le et al., 2002; Nakao et al., 2007; Park et al., 2007a), aquaculture activities (Milbury and Gaffney, 2005; Guo et al., 2006, 2007), genetic improvement of commercially relevant species (Nishibori et al., 2005) and monitoring of natural populations or endangered species (Milbury and Gaffney, 2005; Nishibori et al., 2005). For instance, the recent evolutionary history of mice has been reconstructed through the analysis of wild-type and inbred Mus strains with a well-documented history, revealing a faster substitution rate and a higher accumulation of non-synonymous substitutions in inbred mice compared to wild types (Goios et al., 2007); the mitochondrial phylogeny of chicken breeds and wild-type species of the genus Gallus has revealed that several inter-species hybridizations have contributed to the establishment of contemporary domesticated chicken, also providing useful data for a genetic improvement of chicken breeds (Nishibori et al., 2005). As regards genomic evolution, the comparisons of Plethodontidae salamanders have further elucidated the mechanisms of gene duplication and modification of gene order, given that these amphibians are characterized by unusually frequent mtDNA rearrangements compared to other vertebrates (Mueller and Boore, 2005). Similarly, the high incidence of tandem duplications in several lineages of parthenogenic lizards, such as the geckos Heteronotia binoei, has been studied in detail to investigate the origin and evolution of large duplicated mt segments (Fujita et al., 2007).

The capacity of mtDNA to discriminate between closely related species, thus its potential as a ‘DNA barcode’, has stimulated the sequencing of the entire mtDNA in several species, in order to identify the mt region most useful as a species-specific marker. Thus, the mtDNA has been entirely sequenced for the two human intestinal parasites Diphyllobothrium latum and Diphyllobothrium nihonkaiense (Platyhelminthes), sibling species morphologically almost indistinguishable at the larval stage, with the aim of identifying reliable molecular markers for diagnostic purposes (Nakao et al., 2007; Park et al., 2007a). Similarly, the problematic morphological identification of two monogeneans belonging to the genus Gyrodactylus, both parasites of commercially relevant fishes, has lead to the sequencing of their entire mt genome (Huyse et al., 2007; Plaisance et al., 2007). The mtDNA has been also used for monitoring the progeny of inter-species crosses of carp, providing evidence of mtDNA paternal leakage and recombination in triploid crucian carp (Guo et al., 2006), and of strictly maternal inheritance in allotetraploid carp (Guo et al., 2007).

Recently, the comparative mitogenomics approach has been applied to clarify the taxonomic status of some species, as in the case of members of the coral species complex Montastraea annularis (Cnidaria; Fukami and Knowlton, 2005), coral morphospecies of the genus Pocillopora (Cnidaria; Flot and Tillier, 2007), distinct genotypes of the cestode Echinococcus granulosus (Platyhelminthes; Le et al., 2002), and putative cryptic species of the cosmopolitan ascidian Ciona intestinalis (Tunicata; Iannelli et al., 2007b). These studies demonstrate that the capacity of mtDNA to solve taxonomic controversies at species-level depends on the overall mt evolutionary dynamics, that is on the existence of a fast rate in the nucleotide substitutions and/or in the changes of genome AR. Thus, the low variability of cnidarian mtDNA, characterized by slow substitution rate and rare genomic rearrangements (Lavrov et al., 2005; Medina et al., 2006; Wang and Lavrov, 2007), has made the mtDNA useless for species definition in both Montastraea and Pocillopora genera (Fukami and Knowlton, 2005; Flot and Tillier, 2007). In fact, the gene order is identical within both genera, whereas the sequence divergence ranges from 0 to 0.12% between six Montastraea annularis individuals (Fukami and Knowlton, 2005) and is 0.18% between two Pocillopora species (Flot and Tillier, 2007). These differences are too small to permit a clear discrimination between species. Only the hypervariable mtDNA regions (atp6 and the putative CR) of needle corals belonging to the Seriatopora genus appear to have enough genetic variability to be useful for species phylogeny and population genetic studies (Chen et al., 2008). On the contrary, the fast evolutionary dynamics of Platyhelminthes and Tunicata make the mtDNA of these taxa a reliable molecular marker for species discrimination. Indeed, the fast substitution rate of Platyhelminthes mtDNA has allowed the recognition of different genotypes of the parasite E. granulosus as distinct species (Le et al., 2002): these ‘genotypes’ were defined based on adult morphology, host specificity, life cycle and molecular features (Thompson and McManus, 2002), but their taxonomic status is highly debated. The comparative analysis of the complete mtDNAs of G4 (horse–dog strain) and G1 (sheep–dog strain) genotypes of E. granulosus revealed that the G4 and G1 genotypes are almost as distant from each other, as each one is distant from the congeneric species Echinococcus multilocularis, thus the sequence divergence is compatible with an inter-species comparison and suggests that G1 and G4 merit the status of separate species (Le et al., 2002). The most extreme evolutionary dynamics has been found in tunicate mtDNA, characterized by high substitution rate and extensive gene rearrangements (Yokobori et al., 1999, 2005; Gissi et al., 2004). These peculiarities have enabled the unequivocal demonstration of the existence of two cryptic species in the cosmopolitan ascidian C. intestinalis: individuals of two putative species show strong mtDNA differences incompatible with an intra-species variability at level of gene order, base composition, non-coding regions and sequence divergence (Iannelli et al., 2007b).

In conclusion, the mtDNA appears a powerful marker to discriminate between closely related taxa, provided that the taxa demonstrate a relatively rapid evolutionary dynamics.

mtDNA variability in congeneric species

In this review, we have systematically collected and detailed the mtDNA AR of the available congeneric metazoan species, as comparisons at low evolutionary distance facilitate the detection of genomic differences commonly obscured over long evolutionary distances by multiple or reverse changes. Moreover, the variability observed in closely related organisms can be regarded as an indication of the genomic plasticity present at higher taxonomic levels.

As reported in Table 4, 428 mtDNA sequences can be included in intra-genus comparative analyses, and at least two species/subspecies have been sequenced for 144 metazoan genera. The taxonomic distribution of these genera reflects the uneven mtDNA sampling already discussed: most genera belong to Vertebrata and there are no congeneric species for Annelida, Myriapoda and any poorly sampled phyla (Table 1). The most widely sampled genera include mice (genus Mus), for which there are 20 sampled species/strains, and freshwater eels of the genus Anguilla (Neopterygii), for which all extant 15 species and 3 subspecies have been sampled (Supplementary Material Table S1). Intra-genus differences in genome AR are observed in all analysed genera of Tunicata, Bivalvia, enoplea Nematoda and Porifera, whereas a constancy of genome AR is observed in almost all vertebrates and in all comparisons of the phyla Cnidaria and Echinodermata (compare ‘No Genera’ to ‘NGdiff-AR’ columns in Table 4).

Table 4 Metazoan mtDNAs of congeneric species, with number of genera showing differences in genomic architecture (NGdiffAR)

The type and extent of congeneric differences are summarized in Table 5, which also shows some relevant comparisons between different mtDNA isoforms belonging to the same species such as M and F mtDNA type of bivalves and distinct haplotypes of some enoplan nematodes. In 10 of the 19 intra-genus comparisons, the tRNA genes specifically contribute to increase the differences in genome AR, as the number of AR decreases when tRNAs are excluded from the data set (compare AR-All and AR-tRNA in Table 5). Thus, we have analysed separately the effect of differences in gene content from gene order on genome AR. In particular differences in gene content involve mostly tRNA and rarely concern rRNA genes (Table 5). Similarly, the gene order differences, estimated by the normalized breakpoint distance (BDn), involve almost exclusively tRNA genes, as the exclusion of tRNAs decreases or set at zero the BDn in most comparisons (Table 5). Strikingly, tunicate and enoplean comparisons (that is Phallusia and Romanomermis) exhibit the most extensive differences in gene order (highest values of BDn-All) and no BDn decrease when tRNAs are excluded from the data set, suggesting saturation of gene order rearrangements. In general, data on congeneric comparisons confirm the widespread belief of high tRNA mobility in metazoan mtDNA, and highlight frequent losses/acquisitions of tRNA genes.

Table 5 Congeneric species showing differences in genome architecture, gene order (breakpoint distance) and gene content

It can be also noticed that in several genera there is a concurrent variation of gene content and gene order (Table 3), indeed 8 of 19 genera show differences both in gene content and gene order. This observation is consistent with a genome rearrangement mechanism by duplication of large mtDNA segments and subsequent degeneration/loss of most additional sequences, with functionally surviving genes mostly encoding for tRNAs. According to this hypothesis, most genera with changes in genome AR are also characterized by variation of genome size (Table 5).

A discontinuous pattern of gene rearrangement in Chordata

Gene order rearrangements at intra-genus level are very frequent in ascidians (Tunicata), moderate in lancelets (Cephalochordata) and rare in vertebrates (Tables 4 and 5). Similarly, the intensity of these rearrangements is again higher in protochordates (that is Cephalochordata and Tunicata) than in vertebrates.

As shown in Table 5, congeneric comparisons of ascidians show extensive gene order rearrangements ranging from the translocation of one or six genes in the Ciona genus (depending on the species pair; Yokobori et al., 2003; Gissi et al., 2004; Iannelli et al., 2007b), to the translocation of half of the genes in the Phallusia genus, that shows a BDn close to the random value (Iannelli et al., 2007a). Moreover, additional intra-genus differences concern the length and number of non-coding regions, and the presence/absence of short duplicated sequences including non-coding regions or small portions of RNA genes (Gissi et al., 2004; Iannelli et al., 2007a).

In the sub-phylum of Cephalochordata, the mtDNA has been analysed in all three existing genera of Branchiostoma, Epigonichthys and Asymmetron: an intra-genus rearrangement has been initially ascribed to the genus Epigonichthys (Nohara et al., 2005a) and successively to the genus Asymmetron (Kon et al., 2007), as the taxonomic status of Epigonichthys lucayanus has been revised from E. lucayanus species to Asymmetron lucayanus species complex (Nishikawa, 2004; Kon et al., 2006). The rearrangement within the genus Asymmetron is fairly simple, consisting of the translocation of trnN, and the transposition plus inversion of the trnQ gene (Kon et al., 2007). However, this rearrangement is rather peculiar as it includes a gene inversion. Indeed, gene inversions are observed in Chordata only within Cephalochordata, given that the Asymmetron genus shows the inversion of a large mt region, including four genes, with respect to other lancelet genera (Kon et al., 2007). These data suggest the occurrence of mtDNA recombination in Cephalochordata, as gene inversions can be easily explained by a recombination mechanism.

Among vertebrates, only salamanders of the family Plethodontidae (Amphibia) exhibit gene rearrangements at the intra-genus level (Table 5) and represent an interesting group where it is possible to observe the mechanism of duplication-random gene loss, commonly hypothesized for mitochondrial gene order rearrangements ‘at work’ (Moritz and Brown, 1986; Boore, 2000; Mueller and Boore, 2005). In these animals a modified gene order is often associated with the presence of pseudogenes and/or additional gene copies, which could be vestiges of ancestrally duplicated regions (Mueller and Boore, 2005). The detailed analyses of the combined pattern of sequence remnants and variable gene order in different plethodontid genera have suggested the occurrence of a single tandem duplication in Batrachoseps attenuatus, two independent duplications in Plethodon elongatus and one or two independent tandem duplications in the Aneides genus (Mueller and Boore, 2005). Moreover, the observed duplications are compatible with events of imprecise initiation–termination of DNA replication, slipped-strand mispairing or intra-molecular recombination, that is with mechanisms commonly hypothesized to explain the occurrence of sequence duplications (Mueller and Boore, 2005). Other peculiar cases of intra-genus variability in vertebrates can be observed in some lizards and geckos (Lepidosauria; Macey et al., 2004; Fujita et al., 2007). In amphisbaenian reptiles of the genus Bipes, the gene order is identical between three congeneric species, but in Bipes biporus there is a tandem duplication of the gene cluster trnT-trnP followed by pseudogene formation so that the final gene order is equal to the ancestral one (Macey et al., 2004). Surprisingly, the pattern of pseudogene formation differs between different populations of Bipes biporus, thus different rearrangements can be identified when functional and non-functional genes are taken into account (Macey et al., 2004). Several complete mtDNAs of hybrid parthenogenic geckos of the H. binoei species complex contain tandem duplications ranging in size from 5.7 to 9.4 kb, and including both the CR and all gene categories (tRNA, rRNA and proteins; Fujita et al., 2007). The small direct repeats located at the duplication endpoints have suggested a slipped-strand mispairing during DNA replication as the mechanism generating these duplications, moreover the relatively common indels in the duplicated segments point to deletions as the mechanism disrupting gene function (Fujita et al., 2007).

The pattern and frequency of gene order rearrangement observed in congeneric comparisons of Chordata (Table 5) well exemplifies the data derived from the analysis of the entire data set of 865 chordate mtDNAs (Table 1). Indeed, the intra-genus comparisons confirm the extreme variability in gene order observed in tunicates, which show different gene orders from one species to the other, and no gene block shared by all available species (Iannelli et al., 2007a). On the contrary, cephalochordates are characterized by moderate rearrangements, including gene inversions, and by a gene order that can be easily traced back to that of vertebrates (Nohara et al., 2005a; Kon et al., 2007). Finally, the gene order of vertebrates is not so invariant as previously reported, and the mtDNA AR appears remarkably dynamic especially in Amphibia and Lepidosauria (see respective AR rate in Table 1), as also indicated by the few cases of vertebrate congeneric comparisons here described. In fact, the dogma of a frozen gene order in vertebrates has been broken by the identification of several deviations from the ‘typical’ gene order in marsupials, birds, crocodiles, reptiles, amphibians, bony fishes and lampreys (Boore, 1999; Pereira, 2000; Inoue et al., 2003). Many of these alternative gene arrangements involve genes close to the CR or surrounding the L-strand replication origin (Boore, 1999), and are sometimes associated with duplication of the CR, leading to the hypothesis that errors in mtDNA replication have given rise to these rearrangements. The observation of these ‘hot spots’ strongly suggests the need to analyse the evolution of vertebrate gene order including data on the location of the replication origins and their eventual duplications/losses, in order to better discriminate between alternative mechanisms of gene order rearrangements. Although an updated review of the vertebrate mtDNA organization will be noteworthy and timely, it is outside the scope of this review.

Arthropoda: loss and acquisition of tRNA genes and control regions

Among 17 arthropod genera, congeneric differences in mtDNA AR have been found in three sub-phyla, and are less frequent in Hexapoda compared to Crustacea and Chelicerata (Tables 4 and 5).

In Hexapoda, differences in gene content have been found in luminous beetles of the genus Rhagophtalmus (Coleoptera), where a cluster of three tRNA genes is present in Rhagophtalmus lufengensis and absent in the congeneric beetle R. ohbai (Li et al., 2007). Although the authors suggest further investigations, both mtDNA sequences are complete, and unusual tRNA structures that could confuse tRNA identification have not been observed in coleopteran mtDNAs, suggesting that this is a real mtDNA difference.

In Crustacea, the observed intra-genus rearrangemets are always related to tRNA genes, indeed the gene order differences between two Tigriopus species are due to the translocation of the single trnW gene (Machida et al., 2002; Burton et al., 2007), whereas there are three almost identical copies of trnC repeated in tandem in Pollicipes polymerus (Lavrov et al., 2004) against only one trnC gene in Pollicipes mitella (Lim and Hwang, 2006; this paper has several gene mis-annotations).

Congeneric differences are much more extensive in Chelicerata, at least within Acari, and differences in genome size, gene content and gene arrangement have been found in chigger mites of the genus Leptotrombidium (Table 5). In particular, L. pallidum differs from two congeneric species for the presence of a rrnS pseudogene, two identical and oppositely oriented rrnL genes, and four nearly identical extra non-coding regions. However, the arrangement of the shared mt genes is almost identical within the genus, except for the translocation of trnQ (Shao et al., 2005b, 2006a). These peculiarities, combined with a gene arrangement drastically different from the ancestral arthropod gene order, have led to the hypothesis that the mtDNA AR of L. pallidum arose by a mechanism of ‘duplication-random gene loss’ followed by inter-genome non-homologous recombination (Shao et al., 2006a). Interestingly, clues of mtDNA recombination have been found even in other congeneric comparisons of Acari. For instance, four prostriate ticks of the genus Ixodes share the same gene arrangement but differ in the number of CRs: non-Australasian Ixodes have a single CR, whereas Australasian Ixodes have two CRs undergoing concerted evolution, probably by the recombination mechanism of gene conversion (Shao et al., 2005a).

Within Chelicerata, differences in gene content have also been reported for scorpions belonging to the genus Mesobuthus, as a trnD gene has been identified and annotated in Mesobuthus gibbosus (Jones et al., 2007) but not in Mesobuthus martensii (Choi et al., 2007). However, our comparative analyses demonstrate that trnD is absent in both Mesobuthus species, and that the reported difference is only apparent and related to the atypical cloverleaf structure of scorpion tRNA genes, which have D- and T-replacement loops and short anticodon stems (Davila et al., 2005; Choi et al., 2007; Jones et al., 2007). In fact, we noted that the predicted trnD of M. gibbosus is quite bizarre: it has only three stems, a few paired nucleotides, and it is fully embedded inside (about 220 bp from the beginning of) the rrnS gene, whose 5′ end is well conserved in scorpions and other arthropods (data not shown). Moreover, trnD is lost also in the other sequenced scorpion mtDNA (Centruroides limpidus; Davila et al., 2005). Therefore, our re-annotation of scorpion mtDNAs strongly suggests that trnD is absent in all available scorpion species, and the gene content and genome AR is conserved in the Mesobuthus genus. Thus, the only intra-genus difference between Mesobuthus scorpions concerns genome size, and is due to the presence of repeated sequences in the CR.

In summary, the mtDNA AR variability observed in Arthropoda at the intra-genus level is mostly related to translocations, losses or acquisitions of tRNA genes. This observation reflects the general evolutionary trend of arthropod mtDNA, where the most common rearrangements are translocations of tRNA genes (see decrease of Δ% in Table 1), and involve mostly genes adjacent to the CR or surrounding a tRNA gene cluster (trnA-trnR-trnS(AGN)-trnE-trnF) considered as the replication origin of the mtDNA minor strand (Boore, 1999).

The extraordinary variability of mollusc Bivalvia

Bivalve molluscs show frequent and extensive mtDNA variability at intra-genus level comparable to that observed in tunicates (Tables 4 and 5). In fact, two oyster species (Crassostrea genus) show broad differences in gene content and gene order (with relocation of most tRNA genes), together with other unusual mt features such as the presence of two trnM(CAU) genes and the split of rrnL into two fragments in both species, and the presence of a cob translational frameshift only in Crassostrea virginica (Milbury and Gaffney, 2005). These features are also associated with a high mt sequence divergence, comparable only to that observed between congeneric ascidians of the Ciona genus (Milbury and Gaffney, 2005).

Similar peculiarities have been also found in bivalves possessing two distinct mtDNA types, that is a female (F) and a male (M) type differentially transmitted to the progeny through DUI (Breton et al., 2007). In DUI species, the extent of genome rearrangement between the two gender-specific mtDNAs is quite variable, and, based on our mtDNA re-annotations, it ranges from the translocation of several genes in Inversidens japanensis (no gene inversion observed in our analysis), to differences in gene content in V. philippinarum and in some isoforms of the Mytilus genus (Table 5; Mizi et al., 2005; Breton et al., 2006; Zbawicka et al., 2007). The gene order is identical in the three analysed species of Mytilus, also described as members of the same species complex (Gossling, 1992), and duplication of the CR and some adjacent genes has been sporadically found only in some M and ‘recently masculinized’ mtDNA types of Mytilus (Mizi et al., 2005; Breton et al., 2006). In addition to these features, species with DUI show an accelerated nucleotide substitution rate compared to other animals, that is also much pronounced in the F than in M type (reviewed in Breton et al., 2007).

The differences here summarized for congeneric comparisons of Bivalvia are in accordance with the extraordinary variability previously described in this taxon, characterized by strong differences in genome AR, loss of atp8, presence of additional trnM genes, and broad variability in tRNA gene number (Tables 1 and 2). Moreover, the mt genomes of Bivalvia show a strong size variability (Figure 1).

Two distinct evolutionary trends in Nematoda

Within the phylum Nematoda, a surprising intra-genus variability in genome AR has been found in the class Enoplea (Table 5).

The unpublished mtDNAs of three Romanomermis species, enoplean parasites, reveal many gene duplications and extensive rearrangements, in accordance with the previous identification in Romanomermis culicivorax of a 3-kb-long repeat encompassing protein-coding genes and organized in tandem copies with several deletions and inversions (Azevedo and Hyman, 1993). An even more atypical AR variability has been found between distinct individuals of two enoplean obligate parasites, T. cosgrovei and S. spiculatus. In T. cosgrovei the mtDNAs of different individuals range in size from 19 to 34 kb, and greatly differ in the so-called hypervariable region of the genome (Tang and Hyman, 2007). In fact, the hypervariable segment is characterized by duplicated regions arranged in direct or inverted orientation, and contains both functional and non-functional genes organized in a shuffled gene order. The remaining portion of the genome, defined as ‘constant’, has an identical gene order in all analysed T. cosgrovei individuals (Tang and Hyman, 2007). In S. spiculatus, the unpublished small and large mtDNA haplotypes differs at least for the duplication of both rRNA genes in the large haplotype.

Compared to Enoplea, the mtDNA AR is quite stable in nematodes belonging to the Chromadorea class. In fact, we have observed no gene order differences in two chromadorean congeneric species pairs, that is between two members of the putative Necator americanus complex species (Hu et al., 2002, 2003a), and between two Caenorhabditis elegans and Caenorhabditis briggsae (the mtDNA of C. briggsae is a mitochondrial scaffold derived from the nuclear genome project; see Supplementary Material Table S1). Our comparative analyses have identified only the presence of additional non-coding regions (NC) in the mtDNA of C. briggsae compared to C. elegans, in accordance with the observed variability in number and size of non-coding regions found in nematodes. As consequence, there is a quite large difference in genome size between the two Caenorhabditis species (mean±s.d.: 14 107±443 bp).

In summary, congeneric comparisons in Nematoda suggest that gene order rearrangements and size variations are remarkably higher in Enoplea than in Chromadorea, an observation also confirmed by the analysis of the whole data set of nematodes (Table 1). The mtDNA of Nematoda is characterized by evolutionary dynamics strongly different from that of other metazoans, as it shows a fast nucleotide substitution rate (Lavrov and Lang, 2005), a very high AT content, and unusual characteristics related to the mitochondrial translational machinery, such as unique initiation codons (Wolstenholme, 1992; Hu et al., 2003b), unconventional cloverleaf tRNA structures with a TV-replacement loop (Okimoto and Wolstenholme, 1990) and small rRNAs whose reduced size is probably correlated with the unconventional tRNA structure (Okimoto et al., 1994). In addition, intra-genome recombination, a process likely absent in most metazoan mtDNAs, has been experimentally demonstrated in the enoplean Meloidogynes javanica (Lunt and Hyman, 1997), and is suggested to be the mechanism responsible of the intricate mtDNA AR found in the enoplean Romanomermis and Thaumamermis species (Azevedo and Hyman, 1993; Tang and Hyman, 2007). Recombination is also the more simple and plausible mechanism to explain mitochondrial gene inversions, which is the frequent exchange of coding strand observed in nematodes. Indeed, all Chromadorea nematodes are characterized by a single-coding strand (GSA=1 in Table 1), whereas in Enoplea the mt genes are encoded by both strands following a distribution pattern highly different from one species to the other (so the GSA of Enoplea ranges from 0 to 0.80). Thus, it can be hypothesized that gene inversions and recombination events have frequently occurred during the evolution of nematode mtDNAs.

Platyhelminthes: gene translocations with no inversions

Among the five intra-genus comparisons within Platyhelminthes (Table 4), only the genus Schistosoma shows gene order rearrangements (Table 5). In the six available Schistosoma species it has been observed the translocation of several tRNA and a few protein genes, the variability of genome size due to the presence of repeated sequences in the main non-coding region (Despres et al., 1993; Littlewood et al., 2006), and the conservation of gene transcriptional polarity (that is no gene inversions; Littlewood et al., 2006). This intra-genus variability perfectly reflects the general evolutionary dynamics of the phylum Platyhelminthes, characterized by variable genome AR due to frequent translocation of tRNA genes (see Δ% in Table 1), presence of a single-coding strand in all species (Tables 1 and 3) and loss of atp8 (Table 2). These characteristics have to be considered in the context of other peculiarities distinguishing the mtDNA of Platyhelminthes from other metazoans, such as the use of a unique genetic code (Bessho et al., 1992; Telford et al., 2000) and a very fast nucleotide substitution rate (Lavrov and Lang, 2005).

Conclusions

The phylogenetic tree of Metazoa in Figure 2 shows the mitochondrial genomic features analysed here in an evolutionary context, and also depicts taxa with extremely fast or slow mt nucleotide substitution rates with long and short branches, respectively (see legend of Figure 2 for references).

Figure 2
figure 2

Structural genomic features of metazoan mtDNA drawn into a phylogenetic context. Detailed explanation of coloured boxes: ‘high architecture (AR) variability’, AR rate calculated on all genes 70 (Table 1); ‘AR variability due to tRNAs’, percentage difference in AR variability 60 (see Δ% in Table 1); ‘max gene strand asymmetry (GSA) variability’, GSA ranging from 0 to almost 1 in most mtDNA of the metazoan group (see GSA in Table 1); ‘one coding strand’, presence of one coding strand (GSA=1) in all, or almost all, mtDNAs of a taxa (Table 3); ‘co-oriented gene clusters’, genome AR with two or four clusters of co-oriented genes in most mtDNA of the metazoan group (Table 3). Low-sampled groups (only 1–2 mtDNAs) are represented by dashed lines. Asterisk indicates partial mtDNA sequences. Names in bold and underlined indicate the availability of congeneric species showing differences in mtDNA AR (Table 5). Long/short branches indicate fast/slow mt nucleotide substitution rates according to the following references: Tunicata (Yokobori et al., 1999, 2005); Platyhelminthes (Lavrov and Lang, 2005); Rotifera Acanthocephala (Steinauer et al., 2005); Chaetognatha (Papillon et al., 2004); Onychophora (Podsiadlowski et al., 2008); Nematoda (Lavrov and Lang, 2005); Bryozoa (Waeschenbach et al., 2006); Chelicerata (Park et al., 2007b); Bivalvia (Knudsen et al., 2006); Cnidaria (Medina et al., 2006) and Porifera (Lavrov and Lang, 2005). The metazoan phylogeny has been obtained from the ‘Tree of life web project’ (http://tolweb.org/tree/phylogeny.html), with the following exceptions: the deuterostome phylogeny was modified according to Zeng and Swalla (2005) and Bourlat et al. (2006); the chordate phylogeny is represented as unresolved, based on the controversy between traditional and molecular data (Delsuc et al., 2006); Echiura and Pogonophora are represented as distinct phyla, although they are often included in Annelida. Myzostomida are not represented, as commonly included in Annelida. Abbreviations: B, Bilateria; D, Deuterostomia; E, Ecdysozoa; L, Lophotrochozoa.

Although the current view of metazoan phylogeny is partially resolved and the mtDNA sequences are extremely limited for many taxa (dashed lines in Figure 2), it is clear that the overall mtDNA plasticity in Metazoa is higher than previously thought. The gene category mainly responsible for this plasticity consists of the tRNA genes, whose number is highly variable both among and within taxa (Table 1), suggesting that metazoan mitochondria represent a quite flexible system with respect to their ability to import tRNAs from the cytoplasm and the tolerance of the mtDNA towards the acquisition or loss of functional tRNA genes. In contrast, the number of protein-coding genes is stable in metazoans, as a few additional proteins have been found in non-bilaterians (green boxes in Figure 2) and the atp8 gene is lost in five evolutionary-distant taxa (grey boxes in Figure 2). The presence of additional proteins is considered as a primitive mtDNA character, shared also by non-metazoans, whereas the loss of atp8 (as well as the presence of putative and highly modified isoforms of atp8 in some rotifers and nematodes) is observed only in fast-evolving taxa. Thus, we can hypothesize that atp8 is a ‘dispensable’ gene in the mtDNA, and that its loss has occurred several times during evolution, usually concomitant with an increase of the mt substitution rate. The genome AR is highly variable in several taxa scattered along the phylogenetic tree (yellow boxes in Figure 2), and most of these taxa do not show a fast nucleotide substitution rate, suggesting that the rate of genome rearrangements is independent from the substitution rate. Moreover, the AR variability is mainly due to the variation of number/position of tRNAs (orange boxes in Figure 2) only in taxa with moderate/low AR rate (absence of yellow boxes in Figure 2): only in Annelida we observe the concurrent presence of orange and yellow boxes in Figure 2. Similar data emerge also from the congeneric comparisons, as in the highest rearranged Phallusia and Romanomermis species, the BDn does not decrease excluding the tRNA genes (Table 5). This situation suggests that different rearrangement mechanisms, characterized by a differential involvement of the tRNA genes, could occur in taxa with a variable versus stable mtDNA AR, or that a ‘saturation effect’ in the gene order changes has obscured the involvement of tRNA in the highly rearranged mtDNAs.

The asymmetric distribution of the genes between the two strands is a feature common to almost all metazoans (Table 1). The most extreme GSA, characterized by all genes encoded on the same strand, has been observed in a wide taxonomic range including Tunicata, chromadorean Nematoda, Lophotrochozoa, Rotifera, Platyhelminthes and non-bilaterians (red boxes in Figure 2), leading to the hypothesis that this gene organization was present in the metazoan ancestor. It is also noteworthy that the exceptions to the rule of ‘one coding-strand’ found in non-bilaterians and lophotrochozoans (especially molluscs) consist mostly of mtDNAs with a few perfect or defective clusters of co-oriented genes (violet boxes in Figure 2). This further suggests that the ancestral gene organization was initially modified by inversion of large mtDNA segments, and then by strand translocation of single genes (the interrupting genes in Table 3). The observation of the maximum variability in GSA (GSA ranging from 0 to about 1; light blue in Figure 2) in two large ecdysozoan phyla is also in accordance with the theory of an ancestral ‘one coding-strand’ mtDNA structure that was slowly modified during metazoan diversification.

The data summarized here highlight that the mtDNA evolutionary trend has dramatically changed in vertebrates compared with other metazoans. Indeed, an almost invariant gene content, mtDNA AR and GSA characterizes the sub-phylum Vertebrata, and a general trend towards stabilization of these structural features is also present in remaining deuterostomes (see the few coloured boxes marking deuterostome branches in Figure 2, compared to the boxes abundance observed in other taxa). The most notable exception to the stability of deuterostome mtDNA concerns Tunicata, which exhibit variability in mtDNA AR and GSA similar to some lophotrochozoans and non-bilaterians. This similarity could suggest that the peculiar evolutionary trend of tunicate mtDNA is a primitive, rather than a derived mitochondrial trait of Chordata.

The differences in gene order and/or gene content found in congeneric comparisons species have always confirmed the genome plasticity observed at the highest taxonomic levels (Table 5 and Figure 2). As a result of these comparisons, we have confirmed that differences in mtDNA AR are essentially due to variation in the number and location of tRNA genes, whereas in the fast-evolving tunicates and enoplean species the observed differences remain high even when tRNA genes are excluded from analyses (compare BDn-All to BDn-tRNA in Table 5).

Congeneric comparisons have often revealed more subtle genomic differences that are not easily identifiable in distant species: some studies have allowed the detailed reconstruction of evolutionary processes such as the pattern of gene duplication and pseudogene formation (in Plethodontidae salamanders and Bipes reptiles; Macey et al., 2004; Mueller and Boore, 2005), and the origin of new non-coding or regulatory regions (that is in tunicates and Ixodes ticks; Shao et al., 2005a; Iannelli et al., 2007a, 2007b). Thus, congeneric comparisons are very promising for the study of non-coding regions, the identification of regulatory sequences, and the investigation of rearrangement mechanisms. The advantages for evolutionary genomic studies, as well as for the setting of diagnostic test for species identification-monitoring, make the strategy of analysing the mtDNA in congeneric species a valuable approach, especially for phyla such as Ctenophora, Tardigrada, Kinorhyncha, Gastrotricha and Gnathostomulida for which complete mtDNA sequences are still absent.