Copyright © 2005 Elsevier Inc. All rights reserved.
Murine segmental duplications are hot spots for chromosome and gene evolution
Lluís Armengola, Tomàs Marquès-Bonetb, Joseph Cheungc, Razi Khajac, Juan R. Gonzáleza, Stephen W. Schererc, Arcadi Navarrob and Xavier Estivilla,
, 
Received 12 June 2005;
Abstract
Mouse and rat genomic sequences permit us to obtain a global view of evolutionary rearrangements that have occurred between the two species and to define hallmarks that might underlie these events. We present a comparative study of the sequence assemblies of mouse and rat genomes and report an enrichment of rodent-specific segmental duplications in regions where synteny is not preserved. We show that segmental duplications present higher rates of molecular evolution and that genes in rearranged regions have evolved faster than those located elsewhere. Previous studies have shown that synteny breakpoints between the mouse and the human genomes are enriched in human segmental duplications, suggesting a causative connection between such structures and evolutionary rearrangements. Our work provides further evidence to support the role of segmental duplications in chromosomal rearrangements in the evolution of the architecture of mammalian chromosomes and in the speciation processes that separate the mouse and the rat.
Keywords: Chromosomal evolution; Mouse genome; Rat genome; Segmental duplications; Synteny breakpoints; Molecular evolution; Evolutionary breakpoints
Article Outline
- Results
- Identification of synteny blocks
- Segmental duplications correlate with regions of BOS
- Repeat and GC composition of synteny breakpoint regions
- Gaps in synteny breakpoints
- Genes at regions of BOS
- Genomic distribution of rates of molecular evolution
- Discussion
- Materials and methods
- Identification of synteny blocks
- Segmental duplications in breakpoint regions
- Gaps in the assemblies
- Simulation studies
- Genes, ontology, and repeat content
- Evolutionary rates
- Acknowledgements
- Appendix A. Supplementary data
- References
Although the random-breakage model [1] and [2] has largely been accepted as the paradigm for chromosomal evolution, data from the study of newly available genomic sequences and the possibility of performing multispecies comparisons of genomes question this theory. For instance, clustering of evolutionary breakpoints that result in a large number of small syntenic blocks in certain genomic regions is a major argument in favor of the fragile-breakage model [3], [4], [5], [6] and [7]. The fragile-breakage theory states that evolutionary breakpoints would not be randomly distributed throughout the genomes but would accumulate into relatively short fragile regions [3]. Nevertheless, some authors have proven that the available sequence data do not support a model in which only a discrete collection of hot spots is responsible for the rearrangement breakpoints [8]. The fragile-breakage theory is also supported by the observed recurrence of human chromosomal rearrangements that are the cause of several disorders [9] and [10] and the existence of fragile sites in the genomes [11], [12] and [13]. So far, the nature and composition of such fragile sites in mammals, as well as the relationship between evolutionary and disorder-causing breakpoints, remain unclear although several studies have attempted to identify sequence elements involved in such rearrangements [12], [14], [15] and [16].
Previous studies have shown that regions where evolutionary chromosomal rearrangements have occurred (also called breaks of synteny and abbreviated BOS) between mouse and human are significantly enriched in primate-specific segmental duplications (SDs) [17] and [18]. Although SDs might not necessarily be the cause of such evolutionary rearrangements, it is tempting to speculate about a putative role for these low-copy repeat sequences in the evolution and plasticity of genomes, in much the same manner in which they trigger rearrangements in genomic disorders [10], [19] and [20]. Indeed, data from studies in Drosophila show that repetitive elements have generated rearrangements separating different species [21]. In addition to the presence of low-copy repeats, other repeat sequences have been found to be present in regions surrounding evolutionary breakpoints [14], [22] and [23]. An unusual composition of repeats in regions where evolutionary rearrangements have taken place might provide clues to a better understanding of the molecular mechanisms by which these events occur as well as point to putatively responsible sequences.
Chromosomal rearrangements are also thought to have a role in speciation, acting as genetic barriers to gene flow and thus increasing the time of divergence of genes linked to them. Previous studies have reported an association between rates of chromosomal rearrangement and genic evolution [24] and [25]. The issue, however, is far from settled since contradictory evidence has also been reported [26] and [27] and therefore, alternative hypotheses must be examined [28]. For example, genes within segmental duplications present higher rates of sequence and gene-expression divergence than single-copy genes [29] and [30] which, given their association with rearrangements [17] and [18], might help to explain the association between chromosomal rearrangements and higher evolutionary rates.
Current drafts of genomic sequences from mouse [31] and rat [32] are an invaluable resource for a detailed sequence-level study seeking to unravel the features involved in evolutionary chromosomal breakpoints between these two closely related species. We present here a comparative study of the sequences of these two organisms in which we identify synteny blocks caused by large-scale rearrangements, study the nature and composition of regions where synteny is not preserved, and analyze the genomic distribution of evolutionary rates.
Results
Identification of synteny blocks
For the identification of synteny blocks between the mouse and the rat genomes we used the publicly available alignments between mouse and rat genomic sequences obtained from UCSC Genome Bioinformatics Group (http://www.genome.ucsc.edu).
We started from a set of over 1.2 million genomic sequence anchors that were connected to give a total of 4117 synteny segments of length >25 kb. These segments were further grouped into 102 synteny blocks with a length of over 250 kb shared by the two species (see Materials and methods) and with an average size of 23.9 Mb in the mouse and 25.6 Mb in the rat genome (seeSupplementary Table 1).
The random-breakage model of chromosome evolution [1] and [33] predicts that the length of syntenic segments approximates an exponential distribution with density function f(x) = (1/L)−x/L, where L is the average length of all syntenic segments. In concordance with previous synteny analyses using older assemblies of the mouse and rat genomes [4] and [5], the lengths of the synteny segments we obtained from our study were not in agreement with the distribution predicted by the random-breakage model, even when we centered the study on large synteny segments (Fig. 1). We observed an enrichment of small segments (<5 Mb, p = 6.57 × 10−6), which would support the fragile-breakage model [3] and an increased frequency of some long fragments (Fig. 1).
| Full-size image (24K) |
Fig. 1. Distribution of mouse–rat synteny block lengths. Frequency histogram of the lengths of the 102 synteny blocks observed in our analysis fitted with the distribution of expected fragment lengths in a random distribution. The observed data do not fit well the curve predicted by the density function describing the random-breakage model of chromosomal evolution, especially because an enrichment of small fragments (<5 Mb) is observed, together with an enrichment of some larger segments.
Following Nb = Nsb − Nc (where Nb is the number of breakpoints, Nsb is the number of synteny blocks, and Nc is the number of chromosomes) [34], 82 evolutionary breakpoints were identified in the mouse genome and 81 in the rat genome. Two synteny blocks in the second genome flank each breakpoint in the first, so we distinguish between multichromosomal (when the synteny blocks in the second genome correspond to different chromosomes) and unichromosomal breakpoints. In both genomes, unichromosomal breakpoints occur more than twice as often as multichromosomal breakpoints (data not shown). The lengths of synteny breakpoints range from hundreds of base pairs to millions of base pairs and were found to span around 4–5% of each genome (seeSupplementary Table 1).
Segmental duplications correlate with regions of BOS
We previously identified all large and recent rodent-specific SDs (>90% sequence identity, >5 kb in length) corresponding to mm5 mouse and rn3 rat UCSC assemblies as described in [35]. Data are publicly available at http://www.projects.tcag.ca/xenodup. To obtain a visual overview of the synteny segments, the BOS, and the regions containing SDs, we drew dot plots of the shared synteny blocks between the two genomes and superimposed coordinates of SDs of each genome. We observed that duplicons were present in a large number of the regions where the synteny was lost between the two species (Fig. 2). Using coordinates of both SDs and synteny blocks, we performed a more detailed analysis.
| Full-size image (27K) |
Fig. 2. Segmental duplications correlate with mouse/rat breaks of synteny. Dot-plot representations of alignments between mouse chromosome 14 and the rat genome. Direct and reverse alignments appear as red and blue lines, respectively. On the x axis, information on the corresponding rat chromosomes is depicted according to the color code in the legend. Positions of SDs in the mouse genome are represented as bluish rectangles in the x axis. In this image, the correlation between synteny breaks and SDs in the mouse chromosome is observed. A complete set of dot plots can be obtained on demand. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
By simply counting, we found an average of 13 SDs per megabase in syntenic regions of the mouse and rat genomes and, in contrast, we counted 27 SDs on average per megabase in regions occupied by synteny breakpoints (Table 1). Due to the known clustering of SDs in relatively short chromosomal regions in the two genomes [35] and [36] and to avoid bias produced by this fact, we decided to simplify our approach and verify the presence or absence of SDs in these regions. We identified SDs in 49 (60%) of 82 breakpoints in conserved synteny in the mouse genome and in 35 (43%) of 81 in the rat genome (Table 2). No SDs were present in breakpoints from mouse chromosomes 12, 15, and 19 or rat chromosome 6.
Segmental duplications in block and in BOS regions in mouse and rat
Synteny break regions correspond to genomic regions where the synteny criteria are not met.
a Megabases.b Segmental duplications.
c Mus musculus.
d Rattus norvegicus.
Segmental duplications in breaks of synteny (BOS) and breakpoints in mouse and rat
| Chromosome | BOS with SD/totala | p valueb | BP with SD/totalc | p value |
|---|---|---|---|---|
| Mus musculus | ||||
| 1 | 5/5 | 0.005 | 4/10 | 0.003 |
| 2 | 2/2 | 0.003 | 3/4 | 0.006 |
| 3 | 1/3 | 0.023 | 2/6 | 0.015 |
| 5 | 7/8 | 0.003 | 6/16 | 0.001 |
| 8 | 4/4 | 0.018 | 4/8 | 0.015 |
| 10 | 7/15 | <0.001 | 9/30 | 0.009 |
| 11 | 1/1 | 0.021 | 1/2 | 0.007 |
| 12 | 0/7 | 0.999 | 0/14 | 0.999 |
| 13 | 2/5 | 0.010 | 3/10 | 0.030 |
| 14 | 2/2 | 0.007 | 3/4 | 0.001 |
| 15 | 0/1 | 0.999 | 2/2 | 0.022 |
| 16 | 2/3 | <0.001 | 1/6 | 0.100 |
| 17 | 10/13 | <0.001 | 11/26 | 0.018 |
| 18 | 2/4 | 0.003 | 2/8 | 0.006 |
| 19 | 0/2 | 0.999 | 1/4 | 0.049 |
| X | 4/7 | 0.240 | 3/14 | 0.378 |
| Total | 49/82 | 55/164 | ||
| Rattus norvegicus | ||||
| 1 | 4/11 | 0.036 | 9/22 | 0.047 |
| 2 | 3/7 | 0.003 | 7/14 | 0.023 |
| 4 | 2/3 | 0.010 | 3/6 | 0.013 |
| 5 | 1/1 | 0.022 | 1/2 | 0.013 |
| 6 | 0/10 | 0.999 | 8/20 | 0.003 |
| 7 | 2/5 | 0.028 | 2/10 | 0.034 |
| 9 | 2/5 | 0.033 | 4/10 | 0.023 |
| 10 | 2/3 | 0.023 | 2/6 | 0.013 |
| 11 | 1/2 | 0.007 | 2/4 | 0.005 |
| 12 | 3/3 | 0.015 | 3/6 | 0.021 |
| 13 | 1/2 | 0.030 | 1/4 | 0.015 |
| 14 | 1/2 | 0.096 | 1/4 | 0.043 |
| 15 | 1/1 | 0.028 | 1/2 | 0.038 |
| 16 | 1/2 | 0.045 | 1/4 | 0.041 |
| 17 | 1/5 | 0.216 | 3/10 | 0.199 |
| 18 | 1/2 | 0.018 | 1/4 | 0.019 |
| 19 | 1/1 | 0.016 | 1/2 | 0.042 |
| 20 | 3/9 | 0.003 | 5/18 | 0.022 |
| X | 5/7 | <0.001 | 3/14 | 0.197 |
| Total | 35/81 | 58/162 | ||
Only chromosomes that have synteny breaks are shown.
a Number of BOS containing segmental duplications (SDs)/total number of BOS regions.b Permutation p value.
c Number of breakpoint regions containing segmental duplications/total number of breakpoints.
To measure the significance of this association and exclude the possibility that our results were incidental, we performed a computer simulation in which the positions of the synteny breakpoints were randomly assigned but their size and the positions of the SDs were kept constant. We then evaluated the presence of SDs in the randomly located breaks of synteny. Comparing these results with our own, we concluded that the amount of SDs in the BOS regions is significantly higher than the expected in a random distribution of evolutionary breakpoints for those mouse chromosomes in which SDs were found, except for chromosome X, and for all rat chromosomes, except chromosomes 14 and 17 (Table 2). We, therefore, conclude that the association of SDs with the synteny breakpoints is not due to chance.
Two breakpoints flank each synteny block, except those that contain the telomeres. To refine our study, we looked for SDs in the 50 kb flanking these breakpoints. We found SDs in 55 of 164 regions explored in the mouse genome and in 58 of 162 in the rat genome, which corresponds to
35% of regions flanking breakpoints containing SDs in both genomes (Table 2). The number of breakpoint-flanking regions containing SDs was found to be significantly higher than expected for all mouse chromosomes except 12, 16, and X, compared to a random distribution of synteny breakpoints. The same observation was made for rat chromosomes, with the exception of chromosomes 3 and X. Interestingly, mouse chromosomes 15 and 19, and rat chromosome 14, which did not contain SDs within the synteny breakpoint regions (see above and Table 2), were found to contain more SDs than expected in the breakpoint-flanking regions.
Repeat and GC composition of synteny breakpoint regions
We analyzed the GC and repeat content in breakpoint regions to verify whether there are sequence features that could facilitate rearrangements in the rodent lineages.
As a first approach, we generated graphics containing chromosomal representations of synteny blocks together with density plots of GCs and repeats, looking for a consistent pattern that could correlate these elements (data not shown). De visu, we did not observe any consistent pattern of increased repeat or GC content in the breakpoint regions or within 50 kb surrounding them (see above). Total repeat content in breakpoint regions ranged between 37 and 71% in the mouse and between 31 and 59% in the rat. Since the amount of different types of repeats varies among the different chromosomes, instead of comparing with the genome average we compared the observed amounts with the expected in a random distribution of synteny breakpoints (Supplementary Table 2). A few breakpoint regions contained increased amounts of overall repeat content compared to the simulations, which was attributable to different types of repeats in different chromosomes. Nevertheless, this increase in repeat content was not observed in the majority of chromosomes nor was it specific to a type of repeat. Furthermore, we could not decipher any pattern that is followed by a majority of the break of synteny regions. Finally, no abnormal GC composition was observed for the breakpoint regions in any chromosome (Supplementary Table 2).
Gaps in synteny breakpoints
The generation of rat and, especially, mouse genome sequences involved a lot of shotgun sequencing. It is known that this methodology is prone to cause misassemblies due to the presence of repeat sequences [36] and [37]. On the other hand, SDs are also known to lead to misassemblies and gaps in the sequences [18], [38], [39] and [40], and the inability to map them unambiguously to an orthologous position might also lead to synteny gaps. To discard the possibility that sequence gaps were confounding our analyses, we used restrictive synteny criteria (see Materials and methods) and tested whether gaps were present in synteny breakpoints and if this presence was significantly higher than the expected if evolutionary breakpoints were randomly distributed in the chromosomes. Due to the huge amount of gaps present in both genomes, most synteny breakpoints were found to contain gaps. For all chromosomes, except for the mouse X chromosome, we report that the presence of sequence gaps was not significantly higher than the expected in a random distribution of breakpoints (Supplementary Table 3).
Genes at regions of BOS
In the mouse, 16,725 genes were found to be located in syntenic fragments (2442.47 Mb in size) and 654 in nonsyntenic regions (66.60 Mb). This means 6.8 genes/Mb in syntenic regions and 8.6 genes/Mb in BOS regions. For the rat genome, we found that 6573 genes were in syntenic regions (2613.30 Mb in size), while 203 were located in breakpoint regions (86.57 Mb). Overall, 1.59 rat genes/Mb were found in syntenic regions and 1.87 genes/Mb in nonsyntenic regions. Although the RefSeq gene sets are still incomplete (especially for the rat) and may not reflect the total number of genes, with the available data we conclude that both syntenic and nonsyntenic regions have similar amounts of genes.
We used the Gene Ontology Tree Machine (http://www.genereg.ornl.gov/gotm/) to obtain a comparison of the functional profile of genes located in break of synteny regions with the genome average. We found enrichment of genes corresponding to different GO categories; including genes related to pheromone biology and sensory organ development (Supplementary Table 4). Interestingly, it is known that these types of genes are implicated in biological adaptation and speciation processes.
Genomic distribution of rates of molecular evolution
We first examined the possibility of different evolutionary rates in different chromosomes and found that they are clearly heterogeneous (Kruskal–Wallis, p < 0.001). The potential causes of these differences are multiple. First, as previously shown in other species [41] and [42], the X chromosome presents lower divergence than the average for autosomes (Table 3, dS = 0.1581 vs 0.1981, permutation test, p < 0.001). We, therefore, removed sex chromosomes from subsequent analyses. A second potential cause of chromosomal heterogeneity is linked to telomeres, which have also been shown to be associated with factors affecting evolutionary rates such as either higher or lower recombination rates or higher GC content [25]. In the current dataset, genes located in telomeres (within 3 Mb of any end of the chromosome) showed lower synonymous divergence than genes elsewhere in the genome and higher GC content (Table 3, dS = 0.1841 vs 0.1991, p < 0.001; GC 45.94 vs 46.40, p < 0.05). Therefore, these genes were excluded from further analysis, producing a dataset of 12,139 genes with average evolutionary rates of dN = 0.0331, dS = 0.1991, and dN/dS = 0.1690.
Evolutionary rates of genes in relation to SDs and evolutionary rearrangements
| Na | dN | dS | dN/dS |
||||
|---|---|---|---|---|---|---|---|
| Mean | SE | Mean | SE | Mean | SE | ||
| Genes within SDsb | 322 | 0.0578 | 0.0032 | 0.2120 | 0.0036 | 0.2622 | 0.0138 |
| Genes not located in SDs | 11,817 | 0.0324 | 0.0003 | 0.1988 | 0.0006 | 0.1665 | 0.0017 |
| p valuec | <0.001 | <0.001 | <0.001 | ||||
| Genes in no-blocks | 256 | 0.0295 | 0.0020 | 0.2046 | 0.0042 | 0.1444 | 0.0009 |
| Genes in synteny blocks | 11,364 | 0.0324 | 0.0003 | 0.1986 | 0.0006 | 0.167 | 0.0017 |
| p valuec | 0.174 | 0.134 | 0.05 | ||||
| Inside inversions | 2,138 | 0.0343 | 0.0008 | 0.2122 | 0.0014 | 0.1669 | 0.0040 |
| Outside inversions | 9,226 | 0.0318 | 0.0004 | 0.1956 | 0.0007 | 0.1658 | 0.0019 |
| p valuec | 0.002 | < 0.001 | 0.795 | ||||
| <2.5 translocation breakpoint | 506 | 0.0329 | 0.0014 | 0.2054 | 0.0030 | 0.1602 | 0.0064 |
| >2.5 any breakpoint | 10,203 | 0.0325 | 0.0003 | 0.1977 | 0.0006 | 0.1682 | 0.0018 |
| p valuec | 0.805 | 0.011 | 0.337 | ||||
| <2.5 inversion breakpoint | 546 | 0.0310 | 0.0012 | 0.2046 | 0.0025 | 0.1551 | 0.0062 |
| >2.5 any breakpoint | 10,203 | 0.0325 | 0.0003 | 0.1977 | 0.0006 | 0.1682 | 0.0018 |
| p valuec | 0.316 | 0.014 | 0.100 | ||||
Averages of evolutionary rates for different categories of rearrangements are shown.
a N, number of genes.b SDs, segmental duplications.
c Permutation p value comparing the averages for each category of genes.
To test whether the reported acceleration in rates of evolution in SDs in other species [29], [30], [43] and [44] can also be detected between rat and mouse, we compared evolutionary rates of genes involved in SDs with genes that are not in SDs, regardless of their chromosomal position. Genes in SDs present significantly higher synonymous and nonsynonymous rates of substitution than single-copy genes. Interestingly, they also present higher rates of protein evolution, as indicated by their significantly higher dN/dS ratio (Table 3).
To test for the effects of rearrangements on rates of evolution we excluded all genes involved in SDs and compared all genes in regions of break of synteny to all genes in syntenic regions. Genes located in no-block regions (regions where synteny between mouse and rat cannot be reconstructed) were not found to evolve faster than genes in syntenic regions (Table 3). In fact, the dN/dS ratio is marginally significant but in the opposite direction. We decided to remove these genes, for which synteny could not be defined, from further analysis, producing a final dataset of 11,364 genes. With this curated dataset, we compared genes located inside inversions with those located outside inversions. We found that genes within inversions present significantly higher synonymous and nonsynonymous rates of substitution (Table 3). In addition to the regions within or outside inversions, it is also possible to study genes in regions surrounding any BOS corresponding to inversions and translocations. We compared genes within 2.5 Mb of the breakpoints of such rearrangements with genes located in colinear regions (zones beyond 2.5 Mb from any breakpoint) and found that genes in these regions present a statistically significant increase in dS (Table 3).
Discussion
Two decades ago, Nadeau and Taylor proposed the random-breakage model of chromosomal evolution based on statistical arguments and the synteny data between human and mouse available at the time [1]. With the availability of genome sequence data for several mammalian genomes, analyses that are more detailed can now be performed to examine chromosome evolution and dynamics at the DNA sequence level. Given the resolution of our study, the inability to fit the lengths of the observed synteny segments with the expected ones in the random-breakage model suggests that this theory may not be the most appropriate to describe the occurrence of the evolutionary breakpoints. This observation is in agreement with previous reports on synteny using sequences from different organisms and older assembly versions of the mouse genome [4] and [5]. We observed an enrichment of small syntenic segments (<5 Mb, p = 6.57 × 10−6) and some long syntenic segments. Bearing in mind the observation of a significant enrichment of SDs in regions that coincide with synteny breakpoints, one could speculate about a connection between small synteny regions and the clustering of SDs in several regions of these two genomes [36] and [38]. The short synteny segments identified in our study could be attributable to the clustering of breakpoints in relatively short fragile regions, as proposed by the fragile-breakage model, while the long ones are likely to be attributable to the short time of divergence since the mouse/rat common ancestor. Clustering of SDs in discrete genomic regions would lead to a number of synteny blocks undetectable at the resolution of this study. The higher number of intrachromosomal SDs in both genomes could also be an explanation for a higher occurrence of evolutionary inversions, which resulted in the twofold higher amount of intrachromosomal evolutionary breakpoints observed.
Since we focused on synteny segments longer than 250 kb, our study did not have the potential to detect all synteny breakpoints between the mouse and the rat. Using the current mouse and rat genome assemblies, there are several factors that could interfere with the identification of the exact positions of synteny block boundaries: (i) the existence of unfinished regions (sequence gaps) in both genome sequences, (ii) the presence of SDs creating gaps and confounding the correct genome assembly [18] and [36], and (iii) the presence of large clusters of masked repeats. To override the possibility that local assembly errors interfere with our analysis, we used relatively conservative criteria to define synteny segments (see Materials and methods). The possibility that misassemblies and gaps were confounding our results (due to the presence of SDs in breakpoint regions) was excluded since these regions are not significantly different in terms of presence of sequence gaps compared to random regions chosen from both genomes.
Different types of repeat sequences are thought to play a role in chromosomal rearrangements in mammalian genomes [14], [22], [45] and [46], as well as in other eukaryotic organisms [47], [48] and [49]. In our study, some break of synteny regions were found to be significantly different from the rest of the genome regarding repeat content although no differences were found regarding GC content or gene density.
Comparisons between human and mouse revealed that primate-specific SDs are significantly enriched in regions where evolutionary chromosomal breakpoints occur [17] and [18]. The presence of SDs has also been shown, by different methods, in BOS between human and other great apes [7], [50],







E-mail Article
Add to my Quick Links

Cited By in Scopus (10)


