Next Article in Journal
Cognitive–Behavioral Profile in Pediatric Patients with Syndrome 5p-; Genotype–Phenotype Correlationships
Previous Article in Journal
Comparative Transcriptomic Analysis of Insecticide-Resistant Aedes aegypti from Puerto Rico Reveals Insecticide-Specific Patterns of Gene Expression
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Abundance and Diversification of Repetitive Elements in Decapoda Genomes

by
Christelle Rutz
1,
Lena Bonassin
1,2,3,
Arnaud Kress
1,
Caterina Francesconi
2,3,
Ljudevit Luka Boštjančić
1,2,3,
Dorine Merlat
1,
Kathrin Theissinger
2,† and
Odile Lecompte
1,*,†
1
Department of Computer Science, ICube, UMR 7357, University of Strasbourg, CNRS, Rue Eugène Boeckel 1, 67000 Strasbourg, France
2
LOEWE Centre for Translational Biodiversity Genomics (LOEWE-TBG), Senckenberg Biodiversity and Climate Research Centre, Georg-Voigt-Str. 14-16, 60325 Frankfurt am Main, Germany
3
Department of Molecular Ecology, Institute for Environmental Sciences, Rhineland-Palatinate Technical University Kaiserslautern Landau, Fortstr. 7, 76829 Landau, Germany
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Genes 2023, 14(8), 1627; https://doi.org/10.3390/genes14081627
Submission received: 7 July 2023 / Revised: 5 August 2023 / Accepted: 12 August 2023 / Published: 15 August 2023
(This article belongs to the Section Animal Genetics and Genomics)

Abstract

:
Repetitive elements are a major component of DNA sequences due to their ability to propagate through the genome. Characterization of Metazoan repetitive profiles is improving; however, current pipelines fail to identify a significant proportion of divergent repeats in non-model organisms. The Decapoda order, for which repeat content analyses are largely lacking, is characterized by extremely variable genome sizes that suggest an important presence of repetitive elements. Here, we developed a new standardized pipeline to annotate repetitive elements in non-model organisms, which we applied to twenty Decapoda and six other Crustacea genomes. Using this new tool, we identified 10% more repetitive elements than standard pipelines. Repetitive elements were more abundant in Decapoda species than in other Crustacea, with a very large number of highly repeated satellite DNA families. Moreover, we demonstrated a high correlation between assembly size and transposable elements and different repeat dynamics between Dendrobranchiata and Reptantia. The patterns of repetitive elements largely reflect the phylogenetic relationships of Decapoda and the distinct evolutionary trajectories within Crustacea. In summary, our results highlight the impact of repetitive elements on genome evolution in Decapoda and the value of our novel annotation pipeline, which will provide a baseline for future comparative analyses.

1. Introduction

With over 15,000 living species, Decapoda represents a diverse order of Crustacea that includes lobsters, crayfish, crabs, prawns, and shrimps [1]. They are a crucial component of marine and freshwater ecosystems [2,3]. The Decapoda order originated around 455 million years ago, in the Late Ordovician, and is divided into two suborders: the Dendrobranchiata (commonly known as prawns) and the Pleocyemata. The latter encompasses Caridea (swimming shrimps) and a crawling/walking group called Reptantia that consists of Achelata (spiny lobsters), Astacidea (true lobsters and crayfish), Anomura (hermit crabs), and Brachyura (short-tailed crabs) [4].
Decapoda are characterized by highly variable genome sizes. According to the Animal Genome Size Database (https://www.genomesize.com, accessed on 17 May 2022), genome size estimates range from 2.3 Gb for Penaeus duorarum to 5.1 Gb for Aristaeomorpha foliacea in the Dendrobranchiata suborder. In Pleocyemata, particularly in the Caridea infraorder, genome size variations are even more striking, with estimates ranging from 3.2 Gb for Antecaridina sp. to 40 Gb for Sclerocrangon ferox. Freshwater crayfish (Astacidea infraorder) also display substantial genome size variations, ranging from 2 to 6 Gb in Cambaridae and Parastacidae families. Recent genome size estimates for the noble crayfish Astacus astacus and the narrow-clawed crayfish Pontastacus leptodactylus, both representatives of the Astacidae family, reach 17 Gb (K. Theissinger, unpublished results) and 18.7 Gb [5], respectively. Decapoda also displays high variation in the number of chromosomes. The number of chromosomes in the Dendrobranchiata suborder is mainly at a 2n of 88 (reviewed in [6,7]), while this number can explode in Pleocyemata species to a 2n of 376 for the Astacidea Pacifastacus leniusculus [8,9].
Variations in genome sizes are usually attributed to the presence of repetitive elements (REs), which can represent the major part of the genome in some eukaryotic species [10]. A high proportion of REs can greatly complicate genome sequencing and can lead to fragmented and incomplete assemblies [11,12,13]. This may explain the notorious difficulties encountered in the sequencing of large Decapoda genomes, with only eight assemblies available at the chromosome level. To date, the relationship between the genome size and repeat content, and the impact of REs on genome evolution, remain poorly studied in Crustacea.
The role of REs can be diverse (reviewed in [14]). They can affect transcription and regulation at transcriptional and post-transcriptional levels. Through their ability to act as signals to locate and process information stored in coding sequences, they can influence damage repair, DNA restructuring, chromatin and nuclear organization, and cell division. REs can be classified into two types: tandem repeats (satellite DNA, satDNA) and transposable elements, TEs, also known as interspersed repeats [15].
SatDNAs consist of tandemly repeated patterns of nucleotides, called repeat units (monomers) [16]. Different satDNA families are present in the genome, with usually only one or a few predominant families [17,18,19,20]. SatDNAs can have specific roles in gene and genome regulation, such as chromosome organization, pairing, and segregation formation of the centromere locus [21,22], in epigenetic regulation of heterochromatin establishment, and modulation of gene expression in response to stress [23,24]. In Crustacea, some SatDNA transcripts can have an impact on the inter-molt stage [25]. Despite their importance, the distribution patterns, percentage, and copy number of satDNAs are not yet fully explored in Crustacea.
Transposable elements (TEs) are mobile elements known to participate in DNA replication and cause gene rearrangements that can confer new functional properties [26,27,28,29]. Deletions, duplications, and inversions can be caused by recombination events between homologous regions dispersed by related TEs at distant genomic positions. When they are inserted into genes or coding regions, TEs can alter gene expression and may produce deleterious effects, such as diseases, or neutral effects on the host [28,30,31,32]. Organisms living in challenging environmental conditions can have more TEs in their genome, increasing genome plasticity to respond to stress factors [33]. TEs can be divided into two classes based on their replication mechanisms: Class I elements transpose with RNA-mediated mechanisms (retrotransposons), while in Class II the transposition mode is DNA-based (DNA transposons) [34,35,36,37]. In Class I, LTR retrotransposons and Penelope-like elements are characterized by Long Terminal Repeat (LTR). DIRS are bound by direct or inverted repeats. Finally, LINEs (long interspersed nuclear elements) and SINEs (short interspersed nuclear elements) are retrotransposons that do not have terminal repeats but a polyA tail at the 3’ end. Unlike LINEs, SINEs evolved from non-coding RNA genes and are non-autonomous. Class II can be divided into two subclasses. Subclass 1 includes TIR and Crypton elements, while subclass 2 includes Helitrons and Mavericks. Apart from SINEs, most TEs encode proteins that are necessary for their transposition in an autonomous way. However, accumulation of mutations can lead to incomplete versions of TEs that no longer encode transposition enzymes. The identification of these truncated alternatives represents a particular challenge for automated annotation pipelines.
Currently, there are several pipelines available for annotation of REs. The most commonly used tools are RepeatModeler2 [38] and RepeatMasker [39]. However, a wide variety of additional tools have been developed, such as RECON [40], RepeatScout [41] and LtrHarvest/Ltr_retriever [42], REPET [43], RepeatExplorer [44] (based on paired-end reads). The availability of multiple tools highlights the lack of a standardized protocol, making it impossible to directly compare the RE composition between different genomes based solely on the literature. Moreover, current pipeline annotations of REs fail to identify a significant portion of divergent repeats in non-model organisms. To address these limitations, we designed a standardized protocol for RE annotation that encompasses both TEs and satDNAs. This pipeline was used to establish the RE landscape of twenty Decapoda and six other Crustacea, enabling an objective comparison of the Decapoda repeatomes in terms of abundance, composition, and evolutionary dynamics. Our standardized approach allowed us to assess the contribution of REs to the evolution of the enigmatic Decapoda genomes. Furthermore, we explored the possibility of using the REs as reliable phylogenetic markers for Decapoda. Lastly, this study also provides a new library of REs in Decapoda genomes that extends the existing databases and can be used for future analyses.

2. Materials and Methods

2.1. Genomic Datasets

Available assemblies for Decapoda species were downloaded from NCBI GenBank and RefSeq (last accessed 16 February 2022). Contig and scaffold N50 are useful values to estimate the contiguity of the genome by indicating the length of the shortest contig or scaffold that cover 50% of assembly. However, Decapoda genomes present variable N50 values (Table S1). The BUSCO completeness score, which can be independent of the contiguity of the genome, was also determined for each genome to assess the completeness of the assemblies (Table 1) [45]. Only the 20 genomes with a BUSCO completeness score of at least 25% were selected. Considering the low number and fragmentation status of available Decapoda genomes, a lower BUSCO score threshold than usually used was chosen to retain at least one genome in all infraorders that had genome assemblies. To obtain a broader perspective of the landscape of Decapoda REs compared to crustaceans, we added 6 non-Decapoda crustaceans (Table 1). This allowed us to see if Decapoda species have a different or similar trend in terms of the proportion of the individual repeat families, the presence/absence of RE families, and finally their evolutionary trajectories in comparison to six other Crustacea.

2.2. Identification and Annotation of Repetitive Elements

2.2.1. Identification of Satellite DNA Families

For each species, a set of Illumina paired-end reads was randomly chosen in the SRA database (Table 1). Reads that mapped to the mitochondrial genome were discarded, and the remaining reads were sampled to represent 1.6% of estimated genome size. Genome size estimations were retrieved for all genomes, except for Chionoecetes opilio (Table 1). For this genome, all short paired-end reads corresponding to the assembly were downloaded and the genome size was estimated using KmerGenie version 1.7051 [65]. The sets of reads were then analysed using the TAREAN pipeline, Galaxy version 2.3.8.1 [66] (reads trimmed at 100 bp and default parameters) to compile each species-specific library of satellite elements.

2.2.2. Construction of a Common Library of Repetitive Elements

De novo identification of repetitive elements in each genome was performed using RepeatModeler2 version 2.0.1 [38] with the LTRStruct option and default parameters. The LTRStruct option is an LTR structural discovery pipeline that allows a better identification of LTR elements by using LTR_Harvest and LTR_retriever.
All species-specific libraries of repetitive elements identified with RepeatModeler2 were renamed according to the RepBase version 26.05 [67] nomenclature, with the repeat family, a unique number for the family to distinguish the different sequences of the repeat, the 3-letter species name, the repeat class and family, and finally the complete species name. Similar renaming was applied to species-specific libraries of high-confidence satellites identified by the TAREAN pipeline, with the addition of a ‘tarean’ tag after the unique number.
All species-specific libraries of high-confidence satellites and repeats identified by the TAREAN pipeline and RepeatModeler2 were combined with the Arthropoda-specific subset of RepBase26.05 to form a single library (Figure 1). This library was then split into 2 sub-libraries. The first one corresponds to the known TEs and the second one represents unknown TEs, satellites, and simple repeats.

2.2.3. Identification of Repetitive Elements

In order to annotate repetitive elements that are present in the 26 crustacean genomes, we used RepeatMasker version 4.1.2-p1 [39] following a two-step approach (Figure 1). First, we used RepeatMasker with the library of known TEs using the options -a -gccalc -excln -s -nolow to identify and mask TEs in genomic sequences. We then performed a second run of RepeatMasker (with -a -gccalc -excln -s options) on the previously masked genomes using the second library to identify unclassified TEs, satellite DNA, and simple repeats. The ProcessRepeats and buildSummary tools of RepeatMasker were then used to combine all results and produce a detailed summary of annotations.

2.2.4. Statistical Analysis

In order to test for correlation between genome size, assembly size, repeats, or TE load (number of copies) or percentage, we used a linear regression model and the Spearman rank sum method with α = 0.005 using R package ggplot2 with lm method. A dendrogram was produced by calculating pairwise distances between repeat profiles (the pattern of presence and absence of repetitive elements) using hclust with the Euclidean method, and the heatmap was plotted using Orange3 [68]. The sequence divergence distribution was calculated as Kimura distances (rates of transitions and transversions) using the RepeatMasker tools “calcDivergenceFromAlign.pl” and “createRepeatLandscape.pl”.

3. Results and Discussion

3.1. Construction of Repetitive Elements Reference

To obtain a comprehensive view of REs in Decapoda and reduce the number of elements classified as “unknown”, we developed a standardized protocol to annotate TEs and satDNAs at the genomic level (see Methods and Figure 1). This pipeline integrates the consensus sequences of the Arthropoda section of the RepBase database and the de novo identification of REs in all species by a combination of RepeatModeler2 and the TAREAN pipeline, in order to generate an extensive library of consensus sequences. The TAREAN pipeline was used to specifically identify satDNAs. Due to their structure and high sequence homogeneity, satDNAs are extremely difficult to assemble and are often excluded from the assembly [12]. Therefore, we searched for satDNAs in Illumina raw reads paired-end sequences using the TAREAN pipeline to construct the “Satellite libraries”. Using the TAREAN pipeline, we retrieved between 0 and 43 satDNA families annotated as “High fidelity”, while RepeatModeler2 identified only 0 to 4 satDNA families (Table 2).
Using our newly developed pipeline, we identified between 3643 and 11,431 families of REs in the different assemblies, including between 7.25% and 33.37% of “unknown” sequences (Table 2). Unknown elements are repetitive sequences that could not be further classified. The lowest percentage of unknown elements is observed in Dendrobranchiata species. This might be explained by the presence of the annotated TEs of the Dendrobranchiata Penaeus vannamei in RepBase, allowing a better identification in closely related species.
All detected REs were renamed according to the RepBase nomenclature. In fact, the RE classification by Wicker et al. (2007) [35] is widely used, but new TEs have been characterized since the establishment of the classification in 2007, resulting in conflicts in TE databases. Kojima (2019) [37] improved the classification of the RepBase database [40], but TE annotations can differ between RepBase, RepeatModeler2 database, and DFAM due to capital letters or multiple naming of the same element, for example. A manual correction of repeat names was thus applied when needed in order to obtain a clear annotation.
All libraries generated by RepeatModeler2, the TAREAN pipeline, and RepBase were merged into a single library. This extensive database contains a total of 71,601 sequences including sequences from RepBase. Among these families, known TEs represent 31,579 sequences. With this new merged library, we considerably extended the number of annotated families compared to the RepBase database of Arthropoda REs. Indeed, RepBase provides consensus sequences of 13,906 repetitive elements in Arthropoda, including 109 satDNAs. These elements are distributed in 218 Arthropoda species and in Eukaryota or Metazoa common ancestors. However, only sixteen Crustacea and six Decapoda species are represented, with 1419 and 328 sequences, respectively. Moreover, most Decapoda sequences (320) are from a single species, P. vannamei, as repeats from other species have not been submitted to RepBase. This shows the lack of knowledge of REs in Decapoda species in established databases. Our work also extended the number of known satDNA families in Decapoda species, with 405 consensus sequences compared to the 109 present in RepBase. The new REs identified in this study are provided in Supplementary Materials (Figure S1). Well-categorized REs have also been submitted to RepBase.

3.2. Annotation of Repetitive Elements in Decapoda Genomes

With our new extensive database, we performed two rounds of annotation using RepeatMasker. In the first round we only used known TEs in order to have a better characterization and reduce the proportion of unknown TEs, and in the second we used all the remaining REs. We identified between 6805 and 31,798 consensus RE sequences in the different assemblies (Table 2). This represents an increase of approximately 16,500 families on average in Decapoda compared to previous annotations and 6500 for the other Crustacea. Moreover, our standardized protocol successfully identified the type of REs that were previously unclassified for most species (now between 4.40% and 24.15%). This represents a considerable improvement over the results obtained with the widely used species-specific databases.
Taking into account all the satDNA families annotated in the genome with the merged library, we annotated between 11 and 109 different families (previously 10 to 40 using the species-specific strategy, Table 2). The Astacidea and Anomura infraorders have higher numbers of satDNA families, ranging from 92 to 109, except for H. americanus and B. latro. The latter two species have a number of satDNA families more similar to the other Decapoda species, with 61 and 59 satDNA consensus sequences, respectively. The large number of satDNA families detected in Astacidea and Anomura is in agreement with the 258 families detected in the crayfish Pontastacus leptodactylus [5]. The diversification of satDNA families in Astacidea and Anomura is remarkable compared to the observations in other species. For example, Drosophila species generally have less than ten different families in their genomes, and humans have nine [20,69]. However, a large number of satDNA repeats has already been found in Arthropoda, such as Triatoma infestans (42 families, genome size 1.4 Gb) [70], Locusta migratoria (62 families, genome size 6 Gb) [18], the morabine grasshoppers (129 families, genome size 5 Gb) [71], and the fish Megaleporinus microcephalus (164 families, assembly size 1.2 Gb) [72]. It should be noted that our results may still underestimate the real number of satDNA families, due to the fragmentation of available assemblies (Table S1). In fact, some satDNA families identified by the TAREAN pipeline in Illumina reads were not retrieved in the genome assembly. It is likely that the missing satDNAs were contained in reads that were not included in the final assembly. However, the number of satDNAs remains consistent in each infraorder.
Interestingly, the number of RE families is correlated with both estimated genome size and assembly size (Table 1) with a Spearman rank correlation test of ρ = 0.83, p-value = 8.925 × 10−8 and ρ = 0.92, p-value = 1.146 × 10−6, respectively. The same correlation is observed with satDNA families, with Spearman rank correlation test of ρ = 0.84, p-value = 6.875 × 10−8 and ρ = 0.90, p-value = 3.83 × 10−10, respectively. This result reveals the importance of the diversification of RE families in larger genomes.
The strategy used in this study increases the knowledge of REs in Decapoda species and provides an extended library that can be used in future studies (Figure S1). Unfortunately, there are still a large number of unknown REs in some of the annotated genomes. A manual curation of the library would be necessary but was beyond the scope of this study. We also want to mention that, due to the high presence of REs, genome assemblies are often fragmented, preventing the exhaustive annotation of TEs that can be absent from the assemblies or split into two contigs. The study of Sproul et al. (2022) of more than 600 insect species showed the influence of sequencing technology on repeat detection, with long read assemblies containing 36% more repeats than short-read assemblies and a huge impact on LTR detection [73]. This is because assemblies based on long reads are often more contiguous [74,75]. In our case, most of the genomes were assembled using long reads or a combination of long and short reads, and short-read assemblies do not stand out concerning repeat content or diversification (Table S1).

3.3. Proportion of Repetitive Elements in Decapoda Genomes

The RE proportions are variable both between and within phylogenetic clades of the analysed species. The proportion of REs in the studied Arthropoda genomes is above 40%. Exceptions are two Decapoda species, C. quadricarinatus, with the lowest contig N50, and C. multidentata, with the lowest BUSCO score. They present 38.73% and 39.02% of repeat content, respectively (Table 1, Figure 2, and Table S1). The non-Decapoda H. azteca also presents fewer REs, with 26.12%, and is one of the genomes assembled with short reads only (Figure 2 and Table S1), but given the fragmented status of these genomes, these percentages may underestimate the RE proportion. Compared to the Decapoda species, which have an average of 59.7% REs in their genomes, the non-Decapoda Crustacea analysed in this study exhibit a lower proportion of REs, with an average of 46.4%. However, it is important to note that A. vulgare stands out among the non-Decapoda studied, as it has a remarkably high percentage of repeats (76.26%). If A. vulgare is excluded, the average of REs in non-Decapoda is reduced to 40.4% and the difference is significant, with Wilcoxon p-value = 0.0074. Within Decapoda species, Anomura presents an especially high percentage of REs, with on average 73.6%. Indeed, the Anomura species P. platypus has the highest proportion of REs among the studied species with 78.89% (Figure 2). In contrast, the genome with the lowest percentage of repeats was the non-Decapoda H. azteca with 26.12%. Thus, the RE proportions were highly variable among the phylogenetic clades, as was the content of RE categories.
We also observed a variability in the content of REs within suborders. Among Decapoda, Dendrobranchiata exhibited half the amount of LINEs compared to Pleocyemata, with up to 35.3% in the Astacidea C. destructor (Figure 2). Dendrobranchiata was characterized by a high proportion of DNA transposons, for example in A. vulgare, with between 13% and 18% of DNA transposons. The Anomura infraorder has the highest percentage of LTRs, with more than 16%, and the Achelata P. ornatus has the lowest, with 3.24%. SINE elements were rare in all genomes, ranging from 0.02% in H. Azteca to 2.54% in P. trituberculatus. DIRS elements contribute less than 1% of the repeat content in almost all genomes. The main exception was M. nipponense, where DIRS represented 8.84%. This species also has the highest proportion of Penelope elements, with 5.18%. The infraorder with the second highest number of Penelope elements was Astacidea, with a mean of 2.3%. Unclassified elements were less frequent in the Dendrobranchiata suborder, with around 3.5%, probably because of the better characterization of REs in this suborder in the RepBase database, with the almost exclusive presence of annotations derived from P. vannamei. Therefore, more divergent species present a higher proportion of unclassified elements, such as E. affinis with 24.15%. The content variability suggests that the different suborders of the studied crustacean species have specific major REs present in their genomes.
According to RE studies of Decapoda species included in assembly publications, the proportion of REs varies from 8% to 82% [48,49,50,51,53,54,55,76,77,78,79,80,81,82,83,84]. Tan et al. (2020) annotated the repeatome of eight decapod species and estimated repetitive content between 27% and 50%, with the majority of the genomes having more LINEs, except for P. vannamei, which had more DNA transposons [54]. Compared to these studies, we annotated approximately 10% more repeats with our pipeline. For the P. virginalis genome, 8.8% of repetitive elements were retrieved in the assembly Pvir0.4 (GenBank accession: GCA_002838885.1) and 27.52% in the study of Tan et al. (2020) [54]. However, in the assembly DKFZ_Pvir_1.0 (GenBank accession: GCA_020271785.1), the new assembly version used in this study, we annotated 57.87% of repetitive elements [51,55]. In the assembly of P. clarkii, Xu et al. (2021) annotated 82.42% of repeats, while in our study, we observed only 71.26% (Figure 2). For the P. platypus genome, we observed similar overall results to Tang et al. (2021) (Figure 2) [56]. However, the percentages of LINEs and LTRs are increased by almost 10% each, while unknown TEs were reduced to 17%. The percentage of REs in E. sinensis was estimated at 40.5% and 61.42% in two different studies [54,85], while here we determined that repetitive elements represent 58.93% of the genome (Figure 2). Taken together, these results show that our method provides greater or equal proportion of REs but with a better characterization.
The Decapoda species studied here all presented high proportions of REs, ranging from 58% to 79% (Figure 2). They are in the upper range of what is generally observed in Arthropoda. Indeed, comparative studies carried out on arthropods (mainly based on insects) report highly variable proportions of TEs, ranging from 1% to 80% [73,86,87]. We expect even higher proportions of REs with the forthcoming sequencing of giant genomes in Decapoda or other Crustacea. Recently, the assembly of the Antarctic krill (belonging to a sister order of Decapoda) demonstrated that 92% of its genome is constituted of REs, 78% of them being TEs, indicating that Arthropoda can have an extremely high proportion of REs [88]. In terms of TE landscape, Decapoda presents only a few SINE elements, as for all Arthropoda (Figure 2). Previous studies in Dendrobranchiata species reported that the most abundant groups of repeats, disregarding simple sequence repeats, were DNA transposons or LINEs, with different results depending on the bioinformatic tools used [73,86,87]. Here, we showed that DNA transposons were the major subclass in all Dendrobranchiata species, followed by LINEs (Figure 2). This is similar to what is observed in most insect species, where DNA transposons are generally the major TE group present in genomes [73,86,87]. Interestingly, our results revealed a different situation in the studied Pleocyemata species, where LINE and LTR elements are more abundant (Figure 2). This can be compared to what is observed in some insect orders exhibiting a different TE composition: LTRs are more abundant in Diptera species, and Odonata and Orthoptera species are richer in LINE elements [73,86]. The change in the major type of REs between suborders suggests an altered strategy for genome stability maintenance and regulation of REs between suborders. Sproul et al. (2022) demonstrated that LINE-rich species lineages present many REs that are associated with protein-coding genes [73]. Such associations suggest consequences regarding phenotype evolution. The presence of a TE near a gene can lead to methylation changes. Indeed, it already has been shown that LINEs can serve as amplifiers for silencing away from the X-chromosome inactivation center, and LINEs and SINEs for gene imprinting [34,89]. The movement of a LINE, or other TE, to a new genomic locus, can thus have an impact on nearby gene expression, and ultimately reshape gene expression networks and impact genome evolution.

3.4. Correlation between Genome Size and Repetitive Elements

The 20 Decapoda species analysed in the present study have large differences in genome size estimations (1.6 Gb to 8.5 Gb). These differences were also evident in assembly sizes, although less pronounced (1 Gb to 4.8 Gb). The variability of the genome sizes raised the question of the contribution of REs to their host genome. After masking each genome, we calculated the load of REs, i.e., the number of copies of REs and TEs only, and the percentage of REs and TEs only. We then tested for a correlation between the aforementioned values and both assembly size and estimated genome size. The assembly size was positively correlated with both the load (ρ = 0.87, p-value = 1.864 × 10−6) and the percentage of TEs (ρ = 0.6, p-value = 1.48 × 10−3) (Figure 3A,B). The estimated genome size (Table 2) was positively correlated with the load of TEs (ρ = 0.62, p-value = 7.114 × 10−4), but there was no significant correlation with the percentage of TEs (ρ = 0.47, p-value = 1.421 × 10−2) (Figure 3C,D). Although the number of satDNA families was correlated with both assembly size and estimated genome size, when satDNA elements are included, the significance of the correlation between the load of REs and genome/assembly size is smaller (Figure S1). The correlations between the percentage of REs and both assembly and estimated genome size were not significant, with α = 0.005 (Figure S1).
For the first time in Decapoda species, a strong correlation is demonstrated between assembly size and load (number of copies) of TEs. This strong positive correlation reveals the impact of the number of TEs on the size of the assembly, with larger genomes associated with a higher presence of TEs. The percentage of TEs or REs is more often analysed than the load. In our study, the percentage of TEs was less significantly correlated with genome or assembly size than the load of TEs, and REs were not correlated with genome size. As in our study, Petersen et al. (2019) [86] found a positive correlation between the percentage of TEs and assembly size in arthropods, but they also found a positive correlation between the percentage of TEs and estimate size, which was not observed in our study. Moreover, Sproul et al. (2022) [73] found a positive correlation between the proportion of REs and assembly size in insects, which was not confirmed in our study. The differences between our results and the cited studies are likely due to the difficulties in assembling REs in large genomes such as Decapoda [73,86]. During assembly, REs can be excluded from the assembly even if they are present in the genome. It is therefore expected that REs are more correlated with assembly size than the estimated size. REs can also be fragmented and included in the assembly only partially, contributing to the load of REs in the genome but not to the percentage. This could explain the higher correlation coefficient observed for the load of REs in Decapoda genomes and highlights the usefulness of studying both percentage and load of REs in fragmented assemblies. The presence of fragmented REs is particularly true for satDNAs, which are often concatenated, since the assembler cannot define how many repetitions are present if they are not entirely covered by a long read. These difficulties in assembling satDNAs are particularly pronounced when assemblies are highly fragmented, as in this study, and could explain the decrease in or absence of the significance of the tests when including satDNAs. An improvement in genome contiguity could therefore affect inferences of correlation between REs and genome size. However, removing genomes of BUSCO score of less than 50% does not change conclusions on correlations between repeats and genome size.

3.5. Frequency of satDNA Families Occurrence

In Crustacea, and particularly in Decapoda, we annotated a large number of different satDNA families (Table 2) and evaluated the occurrence of each family in each genome (Figure 4). In each genome, the majority of satDNA families were detected one to nine times. Depending on the genomes, between one and thirty-four families appeared between 10 and 99 times. With nine out of the ninety-seven satDNA families repeated more than 1000 times, P. clarkii was the species with the highest number of highly repeated satDNA families. In contrast, five genomes do not have highly repeated satDNA families (more than 99 occurrences). Thus, although Decapoda has extremely large numbers of satDNA families (Table 2), only a few are predominant in each genome (Figure 4), as seen in several other studies [18,19,20]. The Decapoda and non-Decapoda species studied here are no exception. The Decapoda infraorders Astacidea and Anomura had the largest genome size estimation and assembly size (Table 1) and presented the largest numbers of families that were highly repeated in their genomes (Figure 4). They also tend to have the highest total number of families (Table 2). This suggests that satDNA is a key factor in explaining the huge variations in genome size observed in Decapoda.

3.6. Diversity of Repetitive Elements

To investigate the diversity of REs, we determined the number of copies (the load) of each superfamily of REs identified for each genome (Figure 5). With 67 superfamilies of TEs present in at least one species, the majority of the known superfamilies of REs were found in the investigated genomes, as seen in insects [86], and appear highly conserved across all the genomes (Figure 5). Among the studied Decapoda genomes, there was a clear pattern of high and low presence of repeat superfamilies, with only a few distinct variations between species by repeat suborder.
The load of REs of each superfamily was then used as a profile for each genome to construct the dendrogram by clustering of the RE profiles (Figure 5). This dendrogram mainly followed the currently known species phylogeny [4] except for A. vulgare, whose RE proportions and composition were more similar to Decapoda (Figure 2) and two Anomura species that were grouped with the Caridea. The genome of A. vulgare (1.6 Gb) was larger than the other Crustacea analysed in this study (238 Mb–1 Gb), with the highest percentage of repeats among the studied non-Decapoda crustacean species (Figure 2). This may explain why A. vulgare is clustered with Decapoda species and not with other crustaceans (Figure 5). Nevertheless, we could see a clear differentiation between Decapoda species and the other Crustacea that have a lower number and a distinct composition of REs, except for A. vulgare. Similarly, we could clearly distinguish Dendrobranchiata from Pleocyemata infraorders, with the presence of LINE ingi and SINE MIR. Within Pleocyemata, Caridea was also separated from the other Reptantia species, in agreement with the established phylogeny [4]. Many studies, including Petersen et al. (2019) [86], Sproul et al. (2022) [73], and Wu and Lu (2019) [87], based their RE analysis on already published phylogenetic trees. In our study, we clustered the repetitive profile of each genome and obtained a phylogenetic signal that respects the major classification (Figure 5) [1]. In fact, REs have been used recently as evidence for phylogenetic tree construction in plants, with RE abundance resolving species relationships in a similar manner to DNA sequences from plastid and nuclear ribosomal regions [90,91]. This can be explained by the capacity of some REs to have a high conservation and synteny within species [92,93,94]. This approach could therefore be used in the future to determine the phylogeny of non-model species using low-coverage, low-cost sequencing.

3.7. Sequence Divergence Distribution of Transposable Elements

The genetic distance between each annotated TE copy and the consensus sequence of the respective TE family was calculated using the Kimura 2P distance in order to analyse the sequence divergence distribution and approximate the age and intensity of duplication events (Figure 6). The distribution shows the genomic coverage of TE copies according to the percentage of divergence from their family consensus estimated using the Kimura 2P distance. A peak indicates that a large group of TE copies shares the same divergence to the consensus sequence and suggests a major expansion event of these elements. This event is more recent if the peak is located at a low Kimura 2P distance from the consensus, i.e., at a low percentage of divergence. At a high Kimura 2P distance, a wide peak can indicate that TE copies have undergone genetic drift or other processes, leading to high sequence divergence and suggesting an ancient expansion event.
In Dendrobranchiata, sequence divergence landscapes were similar for the five species (Figure 6). We observed two very similar peaks. The first one presented a larger number of LTRs and a smaller increase in LINE elements between 10% to 15% of divergence. The peak of LTRs was particularly high in P. japonicus and P. indicus. At the same time point, we observed an increasing amount of DNA transposons with the same distance to the consensus in P. monodon. A longer time ago, an augmentation of DNA transposons and LTR elements around 25% of divergence was shared by all species. This suggests that all the Dendrobranchiata shared the same old evolutionary events. The P. monodon genome was one of the few analysed Decapoda genomes showing a recent peak of SINE elements with the two Procambarus species. We would therefore expect to see a higher proportion of SINEs in P. monodon compared to other genomes. However, SINE elements were only slightly more abundant in this genome due to a higher presence of SINE MIR elements (Figure 2 and Figure 5). Interestingly, the content of repeats showed that DNA transposons are the most widespread among the suborder (Figure 2). However, the expansion of DNA transposons was older and more spread out over time (Figure 6). In contrast, the landscape and diversity of repeats showed a higher peak of LTR elements over time in the suborder compared to the other species, with Gypsy being the most abundant (Figure 5 and Figure 6). There were almost no sequences with low divergence. This quasi-absence of recent peaks in Dendrobranchiata suggests low activity of the TEs in recent times in these genomes (Figure 6).
The two Caridea species presented a different sequence divergence landscape (Figure 6). In C. multidentata, there was a recent peak of unknown elements between 5% to 10% of divergence. This peak could be caused by the expansion of one or several families of unknown TEs. We also observed that from high divergence, the fraction of the genome increased as the Kimura 2P distances decreased. This trend could be seen until the event at 5% to 10% of divergence. After this event, and more recently, the number of TEs with very low divergence decreased, with almost no TEs at 0% of divergence. This suggests that despite the peak of recently active unknown elements, TEs are not active anymore for this species. For M. nipponense, we observed two recent peaks at 1–4% and 10% of Kimura divergence corresponding to LINE, Penelope, and LTR elements for the first one and DIRS for the second one. We observed integrated virus expansion between 5% and 25% of divergence. This was in accordance with the diversification of repeats (Figure 5), where the M. niponnense genome was the Decapoda with the highest amount of integrated virus. The presence of sequences with little divergence from the consensus sequences suggests that TEs are active in this genome (Figure 6).
Within Astacidea, H. americanus has a different TE landscape compared to the other four species belonging to the infraorder (Figure 6). Indeed, the genome has a high peak at a divergence of 15% of unknown elements. Interestingly, we observed an ancient event concerning integrated viruses at 40% to 45% of Kimura 2P distance. The H. americanus genome was the only Decapoda genome studied here presenting this characteristic. Integrated virus could not be seen in the proportion of repeats because of their low presence in genomes and was included in the category “other REs” (Figure 2). Integrated virus in H. americanus sequences corresponds to the white spot syndrome virus (WSSV) [95], suggesting that H. americanus faced this virus a long time ago and these sequences were then propagated (Figure 6). Since WSSV is a worldwide threat to shrimps and potentially to many crustacean species, this interesting finding in a resistant species (i.e., H. americanus) could be important for future inferences into susceptibility/resistance to WSSV [96,97]. In the H. americanus genome, there was a clear increase in LINE, LTR, and DNA transposon coverage with a low percentage of divergence, which leads us to conclude that TEs are still active in this genome. TEs are also active in the Procambarus species, which has a similar landscape, with several elements at a low divergence and especially LINEs. We also observed an augmentation of Penelope and SINE elements at low divergence for both species. In P. clarkii, there was also a small peak at 10% of divergence of unknown elements. In contrast to the TEs in C. quadricarinatus, TEs seem to be active in C. destructor, with an increase in LINEs at low divergence. The expansion of LINEs in C. quadricarinatus was, instead, more ancient, at 6% to 10% of divergence.
In Brachyura, all genomes seemed to have active TEs, but the TE landscapes across the genomes of this infraorder differ from each other (Figure 6). In P. trituberculatus, the LINEs with no divergence from consensus sequences were three times more abundant than LINEs at 1% of divergence. These LINEs were in a very active phase in this genome. Penelope elements were also more abundant at 0% of divergence. The C. sapidus genome showed an almost constant increased coverage of TEs with lower divergence for all elements. However, we observed an increasing number of LTRs with no divergence and a decreasing number of LINEs and DNA transposons. The genome of E. sinensis was the only Brachyura genome presenting two peaks. The oldest one was at 15% of Kimura 2P distance and was caused by unknown elements. The latest event involved LINE, LTR, and unknown elements at divergences between 0% and 7%. Of the Brachyura, C. opilio had the least active TEs. We observed a large peak between 0% to 20% of divergence, where LINEs and LTRs increased. The proportion of DNA transposons also increased during this time, but at a lower coverage.
Concerning the last two infraorders, in Achelata, the P. ornatus genome has a middle age peak at 15% of divergence, corresponding to LTRs (Figure 6). There was also a recent and high peak, around 4–8% of divergence, caused by the expansion of LINE elements, with 2% of the genome being represented by LINEs that are 6% divergent. This suggests that LINEs were, until recently, highly transcriptionally active in the genome but are now inactive. The high presence of LINE elements was also visible when considering the proportion of repeats in the genome (Figure 2). In Anomura, the intragroup with the highest percentage of LTRs within Decapoda (Figure 2), B. latro and the Paralithodes species had very different landscapes. The B. latro genome seemed to have inactive TEs, with two peaks of LTRs and LINEs at 3% and 15% of Kimura 2P distance (Figure 6). On the other hand, Paralithodes species had highly active LINEs and LTRs, with 6.8% and 3.6% of LINE elements without divergence to consensus sequences in P. platypus and P. camtschaticus, respectively. Finally, for other crustaceans, the amount of unknown elements in their genomes was predominant, making the analysis of the divergence distribution of TEs in their genomes difficult to interpret (Figure S2).
A clear differentiation in sequence divergence distribution between Dendrobranchiata and Pleocyemata species was observed, as seen with the proportion and diversity of repeats (Figure 6). Indeed, Dendrobranchiata have more non-transcriptionally active TEs compared to the majority of Pleocyemata. Among all Pleocyemata species studied here, almost all have at least one or more types of active TEs. The expansion of a particular subfamily of RE increases genome plasticity and can indicate periods of rapid evolutionary changes [14,33]. This suggests that Pleocyemata genomes had a rapid evolution on a recent timescale. Genomes with recent accumulations of repeats present highly similar repeats or types of repeats that can be long (mostly LTRs and LINEs). These long repetitive regions are more difficult to assemble, and so repeat resolution during assembly is even more problematic [98]. Indeed, we could argue that a large number of the genomes studied presented recent accumulation of long REs. These long REs, being difficult to assemble, can be a possible explanation of assembly fragmentation. Moreover, species with larger genome sizes tend to have more transcriptionally active TEs, but also more REs.

4. Conclusions

In this study, we annotated repetitive elements in twenty Decapoda and six other Crustacea genome assemblies publicly available, using a new pipeline for the annotation of repetitive elements. We showed that repetitive elements constitute a large fraction of Decapoda genomes, with a highly variable content of REs both between and within infraorders of Decapoda. Additionally, our analysis indicates that in Decapoda, both the load of repetitive elements and the number of RE families are correlated with the assembly size of the genome. Moreover, larger genomes tend to have more active TEs (high proportion of sequences at 0% of divergence from their consensus), confirming the impact of REs in genome size expansion. We also demonstrated that, although the age distribution of TE superfamilies shows intra- and inter-lineage variation, the clustered RE profile reflects the phylogeny of the major groups analysed in this study. Compared to non-Decapoda Crustacea, Decapoda have a higher proportion and number of REs in their genome. Moreover, the pattern of RE families present in Decapoda is well-conserved across species. With our protocol, we showed that the combination of repeat libraries of all species provides an excellent tool to analyse content and diversification of repetitive elements with on average 8% more categorized elements. The new consensus sequences can improve the annotation of TEs in other Crustacea or Arthropoda species by increasing the number of consensuses for homology searches. We suggest using this two-step pipeline for all repeatome studies on non-model organisms that are often underrepresented in public databases. Our pipeline provides a baseline for future genomic analysis, producing standardized and reproducible analyses that will allow for much more rigorous and complete comparative analysis of repeats in non-model organisms.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/genes14081627/s1, File S1: crustaceans_RE_library.fa; Table S1: Assembly metrics.; Figure S1: Correlation between genome size and REs.; Figure S2: Sequence divergence distribution of TEs.

Author Contributions

Conceptualization, C.R., L.B., C.F., L.L.B., K.T. and O.L.; methodology, C.R., D.M., L.L.B., L.B., C.F. and O.L.; software, C.R. and A.K.; visualization, C.R.; writing—original draft preparation, C.R., K.T. and O.L.; writing—review and editing, C.R., C.F., L.B., L.L.B., K.T. and O.L.; supervision, K.T. and O.L.; project administration, K.T. and O.L.; funding acquisition, K.T. and O.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was produced within a framework of the GEODE project from the international collaborative research project co-funded by the Agence Nationale de la Recherche and the Deutsche Forschungsgemeinschaft (ANR-21-CE02-0028; DFG TH 1807/7-1). This work was supported by the French ministry of higher education and research and the doctoral school of Life Science of the University of Strasbourg.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

In this study, we generated a library of repetitive elements in crustacean species. Elements fully categorized were submitted to Repbase. The library of new repetitive elements found during this study is also provided in Supplementary Materials.

Acknowledgments

We thank the platform of Bioinformatics and Genomics BiGEst-ICube for bioinformatics supports. We are also very grateful to Julie Thompson for her critical reading of the manuscript and her valuable suggestions.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. De Grave, S.; Pentcheff, N.D.; Ahyong, S.T.; Chan, T.-Y.; Crandall, K.A.; Dworschak, P.C.; Felder, D.L.; Feldmann, R.M.; Fransen, C.H.; Goulding, L.Y.; et al. A Classification of Living and Fossil Genera of Decapod Crustaceans. Raffles Bull. Zool. 2009, 21, 1–109. [Google Scholar]
  2. Reynolds, J.; Souty-Grosset, C.; Richardson, A. Ecological Roles of Crayfish in Freshwater and Terrestrial Habitats. Freshw. Crayfish 2013, 19, 197–218. [Google Scholar]
  3. Souty-Grosset, C.; Holdich, D.D.M.; Noël, P.Y.; Reynolds, J.; Haffner, P. Atlas of Crayfish in Europe; Muséum national d’Histoire naturelle: Paris, France, 2006; Volume 187. [Google Scholar]
  4. Wolfe, J.M.; Breinholt, J.W.; Crandall, K.A.; Lemmon, A.R.; Lemmon, E.M.; Timm, L.E.; Siddall, M.E.; Bracken-Grissom, H.D. A Phylogenomic Framework, Evolutionary Timeline and Genomic Resources for Comparative Studies of Decapod Crustaceans. Proc. R. Soc. B Biol. Sci. 2019, 286, 20190079. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  5. Boštjančić, L.L.; Bonassin, L.; Anušić, L.; Lovrenčić, L.; Besendorfer, V.; Maguire, I.; Grandjean, F.; Austin, C.M.; Greve, C.; Hamadou, A.B.; et al. The Pontastacus Leptodactylus (Astacidae) Repeatome Provides Insight into Genome Evolution and Reveals Remarkable Diversity of Satellite DNA. Front. Genet. 2021, 11, 611745. [Google Scholar] [CrossRef]
  6. Lécher, P.; Defaye, D.; Noel, P. Chromosomes and Nuclear DNA of Crustacea. Invertebr. Reprod. Dev.t 1995, 27, 85–114. [Google Scholar] [CrossRef]
  7. González-Tizón, A.M.; Rojo, V.; Menini, E.; Torrecilla, Z.; Martínez-Lage, A. Karyological Analysis of the Shrimp Palaemon Serratus (Decapoda: Palaemonidae). J. Crustac. Biol. 2013, 33, 843–848. [Google Scholar] [CrossRef] [Green Version]
  8. Niiyama, H. On the Unprecedentedly Large Number of Chromosomes of the Crayfish, Astacus Trowbridgii Stimpson. Annot. Zool. Japon. 1962, 35, 229–233. [Google Scholar]
  9. Crandall, K.A.; De Grave, S. An Updated Classification of the Freshwater Crayfishes (Decapoda: Astacidea) of the World, with a Complete Species List. J. Crustac. Biol. 2017, 37, 615–653. [Google Scholar] [CrossRef] [Green Version]
  10. Gregory, T.R. Chapter 1—Genome Size Evolution in Animals. In The Evolution of the Genome; Gregory, T.R., Ed.; Academic Press: Burlington, NJ, USA, 2005; pp. 3–87. [Google Scholar]
  11. Tørresen, O.K.; Star, B.; Mier, P.; Andrade-Navarro, M.A.; Bateman, A.; Jarnot, P.; Gruca, A.; Grynberg, M.; Kajava, A.V.; Promponas, V.J.; et al. Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases. Nucleic Acids Res. 2019, 47, 10994–11006. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  12. Treangen, T.J.; Salzberg, S.L. Repetitive DNA and next-generation sequencing: Computational challenges and solutions. Nat. Rev. Genet. 2012, 13, 36–46. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  13. Pop, M. Genome assembly reborn: Recent computational challenges. Brief. Bioinform. 2009, 10, 354–366. [Google Scholar] [CrossRef] [Green Version]
  14. Shapiro, J.A.; von Sternberg, R. Why repetitive DNA is essential to genome function. Biol. Rev. 2005, 80, 227–250. [Google Scholar] [CrossRef] [Green Version]
  15. Jurka, J.; Kapitonov, V.V.; Kohany, O.; Jurka, M.V. Repetitive Sequences in Complex Genomes: Structure and Evolution. Annu. Rev. Genom. Hum. Genet. 2007, 8, 241–259. [Google Scholar] [CrossRef] [Green Version]
  16. Garrido-Ramos, M.A. Satellite DNA: An Evolving Topic. Genes 2017, 8, 230. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  17. Macas, J.; Neumann, P.; Navrátilová, A. Repetitive DNA in the pea (Pisum sativum L.) genome: Comprehensive characterization using 454 sequencing and comparison to soybean and Medicago truncatula. BMC Genom. 2007, 8, 427. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  18. Ruiz-Ruano, F.J.; López-León, M.D.; Cabrero, J.; Camacho, J.P.M. High-throughput analysis of the satellitome illuminates satellite DNA evolution. Sci. Rep. 2016, 6, 28333. [Google Scholar] [CrossRef] [Green Version]
  19. Mravinac, B.; Plohl, M.; Ugarković, Ð. Preservation and High Sequence Conservation of Satellite DNAs Suggest Functional Constraints. J. Mol. Evol. 2005, 61, 542–550. [Google Scholar] [CrossRef]
  20. Miga, K.H. Completing the human genome: The progress and challenge of satellite DNA assembly. Chromosome Res. 2015, 23, 421–426. [Google Scholar] [CrossRef]
  21. Plohl, M.; Luchetti, A.; Meštrović, N.; Mantovani, B. Satellite DNAs between selfishness and functionality: Structure, genomics and evolution of tandem repeats in centromeric (hetero)chromatin. Gene 2008, 409, 72–82. [Google Scholar] [CrossRef]
  22. Plohl, M.; MešTrović, N.; Mravinac, B. Satellite DNA Evolution. Repetitive DNA 2012, 7, 126–152. [Google Scholar]
  23. Pezer, Ž.; Brajković, J.; Feliciello, I.; Ugarković, Đ. Satellite DNA-Mediated Effects on Genome Regulation. Genome Dyn. 2012, 7, 153–169. [Google Scholar]
  24. Biscotti, M.A.; Canapa, A.; Forconi, M.; Olmo, E.; Barucca, M. Transcription of tandemly repetitive DNA: Functional roles. Chromosome Res. 2015, 23, 463–477. [Google Scholar] [CrossRef] [PubMed]
  25. Wang, S.Y.; Biesiot, P.M.; Skinner, D.M. Toward an Understanding of Satellite DNA Function in Crustacea. Integr. Comp. Biol. 1999, 39, 471–486. [Google Scholar] [CrossRef] [Green Version]
  26. Bourque, G.; Burns, K.H.; Gehring, M.; Gorbunova, V.; Seluanov, A.; Hammell, M.; Imbeault, M.; Izsvák, Z.; Levin, H.L.; Macfarlan, T.S.; et al. Ten things you should know about transposable elements. Genome Biol. 2018, 19, 199. [Google Scholar] [CrossRef]
  27. Bennetzen, J.L.; Wang, H. The Contributions of Transposable Elements to the Structure, Function, and Evolution of Plant Genomes. Annu. Rev. Plant Biol. 2014, 65, 505–530. [Google Scholar] [CrossRef]
  28. Deininger, P.L.; Moran, J.V.; Batzer, M.A.; Kazazian, H.H. Mobile elements and mammalian genome evolution. Curr. Opin. Genet. Dev. 2003, 13, 651–658. [Google Scholar] [CrossRef]
  29. Craig, N.L.; Lambowitz, A.; Gragie, R.; Gellert, M. Mobile DNA II; ASM Press: Washington, DC, USA, 2002. [Google Scholar]
  30. Kim, Y.-J.; Lee, J.; Han, K. Transposable Elements: No More “Junk DNA”. Genom. Inform. 2012, 10, 226–233. [Google Scholar] [CrossRef]
  31. Barrón, M.G.; Fiston-Lavier, A.-S.; Petrov, D.A.; González, J. Population Genomics of Transposable Elements in Drosophila. Annu. Rev. Genet. 2014, 48, 561–581. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  32. Burns, K.H.; Boeke, J.D. Human Transposon Tectonics. Cell 2012, 149, 740–752. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  33. Lanciano, S.; Mirouze, M. Transposable elements: All mobile, all different, some stress responsive, some adaptive? Curr. Opin. Genet. Dev. 2018, 49, 106–114. [Google Scholar] [CrossRef]
  34. Slotkin, R.K.; Martienssen, R. Transposable elements and the epigenetic regulation of the genome. Nat. Rev. Genet. 2007, 8, 272–285. [Google Scholar] [CrossRef]
  35. Wicker, T.; Sabot, F.; Hua-Van, A.; Bennetzen, J.L.; Capy, P.; Chalhoub, B.; Flavell, A.; Leroy, P.; Morgante, M.; Panaud, O.; et al. A unified classification system for eukaryotic transposable elements. Nat. Rev. Genet. 2007, 8, 973–982. [Google Scholar] [CrossRef] [PubMed]
  36. Di Stefano, L. All Quiet on the TE Front? The Role of Chromatin in Transposable Element Silencing. Cells 2022, 11, 2501. [Google Scholar] [CrossRef] [PubMed]
  37. Kojima, K.K. Structural and sequence diversity of eukaryotic transposable elements. Genes Genet. Syst. 2019, 94, 233–252. [Google Scholar] [CrossRef] [Green Version]
  38. Flynn, J.M.; Hubley, R.; Goubert, C.; Rosen, J.; Clark, A.G.; Feschotte, C.; Smit, A.F. RepeatModeler2 for automated genomic discovery of transposable element families. Proc. Natl. Acad. Sci. USA 2020, 117, 9451–9457. [Google Scholar] [CrossRef] [PubMed]
  39. Smit, A.F.A.; Hubley, R.; Green, P. RepeatMasker Open-4.0 2013–2015. Available online: http://www.repeatmasker.org (accessed on 12 May 2021).
  40. Bao, Z.; Eddy, S.R. Automated De Novo Identification of Repeat Sequence Families in Sequenced Genomes. Genome Res. 2002, 12, 1269–1276. [Google Scholar] [CrossRef] [Green Version]
  41. Price, A.L.; Jones, N.C.; Pevzner, P.A. De novo identification of repeat families in large genomes. Bioinformatics 2005, 21 (Suppl. S1), i351–i358. [Google Scholar] [CrossRef] [Green Version]
  42. Ou, S.; Jiang, N. LTR_retriever: A Highly Accurate and Sensitive Program for Identification of Long Terminal Repeat Retrotransposons. Plant Physiol. 2018, 176, 1410–1422. [Google Scholar] [CrossRef] [Green Version]
  43. Flutre, T.; Duprat, E.; Feuillet, C.; Quesneville, H. Considering Transposable Element Diversification in De Novo Annotation Approaches. PLoS ONE. 2011, 6, e16526. [Google Scholar] [CrossRef]
  44. Novák, P.; Neumann, P.; Pech, J.; Steinhaisl, J.; Macas, J. RepeatExplorer: A Galaxy-based web server for genome-wide characterization of eukaryotic repetitive elements from next-generation sequence reads. Bioinformatics 2013, 29, 792–793. [Google Scholar] [CrossRef] [Green Version]
  45. Holt, C.; Campbell, M.; Keays, D.A.; Edelman, N.; Kapusta, A.; Maclary, E.; Domyan, E.T.; Suh, A.; Warren, W.C.; Yandell, M.; et al. Improved Genome Assembly and Annotation for the Rock Pigeon (Columba livia). G3 Genes Genomes Genet. 2018, 8, 1391–1398. [Google Scholar] [CrossRef] [Green Version]
  46. Meng, X.; Fu, Q.; Luan, S.; Luo, K.; Sui, J.; Kong, J. Genome Survey and High-Resolution Genetic Map Provide Valuable Genetic Resources for Fenneropenaeus Chinensis. Sci. Rep. 2021, 11, 7533. [Google Scholar] [CrossRef] [PubMed]
  47. Swathi, A.; Shekhar, M.S.; Katneni, V.K.; Vijayan, K.K. Genome Size Estimation of Brackishwater Fishes and Penaeid Shrimps by Flow Cytometry. Mol. Biol. Rep. 2018, 45, 951–960. [Google Scholar] [CrossRef]
  48. Kawato, S.; Nishitsuji, K.; Arimoto, A.; Hisata, K.; Kawamitsu, M.; Nozaki, R.; Kondo, H.; Shinzato, C.; Ohira, T.; Satoh, N.; et al. Genome and Transcriptome Assemblies of the Kuruma Shrimp, Marsupenaeus Japonicus. G3 Genes Genomes Genet. 2021, 11, jkab268. [Google Scholar] [CrossRef] [PubMed]
  49. Jin, S.; Bian, C.; Jiang, S.; Han, K.; Xiong, Y.; Zhang, W.; Shi, C.; Qiao, H.; Gao, Z.; Li, R.; et al. A Chromosome-Level Genome Assembly of the Oriental River Prawn, Macrobrachium Nipponense. GigaScience 2021, 10, giaa160. [Google Scholar] [CrossRef] [PubMed]
  50. Veldsman, W.P.; Ma, K.Y.; Hui, J.H.L.; Chan, T.F.; Baeza, J.A.; Qin, J.; Chu, K.H. Comparative Genomics of the Coconut Crab and Other Decapod Crustaceans: Exploring the Molecular Basis of Terrestrial Adaptation. BMC Genom. 2021, 22, 313. [Google Scholar] [CrossRef]
  51. Gutekunst, J.; Andriantsoa, R.; Falckenhayn, C.; Hanna, K.; Stein, W.; Rasamy, J.; Lyko, F. Clonal Genome Evolution and Rapid Invasive Spread of the Marbled Crayfish. Nat. Ecol. Evol. 2018, 2, 567–573. [Google Scholar] [CrossRef] [Green Version]
  52. Shi, L.; Yi, S.; Li, Y. Genome Survey Sequencing of Red Swamp Crayfish Procambarus Clarkii. Mol. Biol. Rep. 2018, 45, 799–806. [Google Scholar] [CrossRef]
  53. Austin, C.M.; Croft, L.J.; Grandjean, F.; Gan, H.M. The NGS Magic Pudding: A Nanopore-Led Long-Read Genome Assembly for the Commercial Australian Freshwater Crayfish, Cherax Destructor. Front. Genet. 2022, 12, 695763. [Google Scholar] [CrossRef]
  54. Tan, M.H.; Gan, H.M.; Lee, Y.P.; Grandjean, F.; Croft, L.J.; Austin, C.M. A Giant Genome for a Giant Crayfish (Cherax Quadricarinatus) With Insights Into Cox1 Pseudogenes in Decapod Genomes. Front. Genet. 2020, 11, 201. [Google Scholar] [CrossRef] [Green Version]
  55. Polinski, J.M.; Zimin, A.V.; Clark, K.F.; Kohn, A.B.; Sadowski, N.; Timp, W.; Ptitsyn, A.; Khanna, P.; Romanova, D.Y.; Williams, P.; et al. The American Lobster Genome Reveals Insights on Longevity, Neural, and Immune Adaptations. Sci. Adv. 2021, 7, eabe8290. [Google Scholar] [CrossRef]
  56. Tang, B.; Wang, Z.; Liu, Q.; Wang, Z.; Ren, Y.; Guo, H.; Qi, T.; Li, Y.; Zhang, H.; Jiang, S.; et al. Chromosome-level Genome Assembly of Paralithodes platypus Provides Insights into Evolution and Adaptation of King Crabs. Mol. Ecol. Resour. 2021, 21, 511–525. [Google Scholar] [CrossRef]
  57. Liu, L.; Cui, Z.; Song, C.; Liu, Y.; Hui, M.; Wang, C. Flow Cytometric Analysis of DNA Content for Four Commercially Important Crabs in China. Acta Oceanol. Sin. 2016, 35, 7–11. [Google Scholar] [CrossRef]
  58. Jimenez, A.G.; Kinsey, S.T.; Dillaman, R.M.; Kapraun, D.F. Nuclear DNA Content Variation Associated with Muscle Fiber Hypertrophic Growth in Decapod Crustaceans. Genome 2010, 53, 161–171. [Google Scholar] [CrossRef] [PubMed]
  59. Kim, J.-H.; Kim, H.; Kim, H.; Chan, B.; Kang, S.; Kim, W. Draft Genome Assembly of a Fouling Barnacle, Amphibalanus Amphitrite (Darwin, 1854): The First Reference Genome for Thecostraca. Front. Ecol. Evol. 2019, 7, 465. [Google Scholar] [CrossRef] [Green Version]
  60. Chebbi, M.A.; Becking, T.; Moumen, B.; Giraud, I.; Gilbert, C.; Peccoud, J.; Cordaux, R. The Genome of Armadillidium vulgare (Crustacea, Isopoda) Provides Insights into Sex Chromosome Evolution in the Context of Cytoplasmic Sex Determination. Mol. Biol. Evol. 2019, 36, 727–741. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  61. Routtu, J.; Hall, M.D.; Albere, B.; Beisel, C.; Bergeron, R.D.; Chaturvedi, A.; Choi, J.-H.; Colbourne, J.; De Meester, L.; Stephens, M.T.; et al. An SNP-Based Second-Generation Genetic Map of Daphnia magna and Its Application to QTL Analysis of Phenotypic Traits. BMC Genom. 2014, 15, 1033. [Google Scholar] [CrossRef] [Green Version]
  62. Tran Van, P.; Anselmetti, Y.; Bast, J.; Dumas, Z.; Galtier, N.; Jaron, K.S.; Martens, K.; Parker, D.J.; Robinson-Rechavi, M.; Schwander, T.; et al. First Annotated Draft Genomes of Nonmarine Ostracods (Ostracoda, Crustacea) with Different Reproductive Modes. G3 Genes Genomes Genet. 2021, 11, jkab043. [Google Scholar] [CrossRef] [PubMed]
  63. Rasch, E.; Lee, C.; Wyngaard, G. DNA-Feulgen Cytophotometric Determination of Genome Size for the Freshwater-Invading Copepod Eurytemora Affinis. Genome/Natl. Res. Counc. Can. 2004, 47, 559–564. [Google Scholar] [CrossRef] [Green Version]
  64. Poynton, H.; Hasenbein, S.; Benoit, J.; Sepulveda, M.; Poelchau, M.; Hughes, D.; Murali, S.; Chen, S.; Glastad, K.; Goodisman, M.; et al. The Toxicogenome of Hyalella azteca: A Model for Sediment Ecotoxicology and Evolutionary Toxicology. Environ. Sci. Technol. 2018, 52, 6009–6022. [Google Scholar] [CrossRef]
  65. Chikhi, R.; Medvedev, P. Informed and Automated K-Mer Size Selection for Genome Assembly. Bioinformatics 2014, 30, 31–37. [Google Scholar] [CrossRef] [Green Version]
  66. Novák, P.; Ávila Robledillo, L.; Koblížková, A.; Vrbová, I.; Neumann, P.; Macas, J. TAREAN: A Computational Tool for Identification and Characterization of Satellite DNA from Unassembled Short Reads. Nucleic Acids Res. 2017, 45, e111. [Google Scholar] [CrossRef]
  67. Bao, W.; Kojima, K.K.; Kohany, O. Repbase Update, a Database of Repetitive Elements in Eukaryotic Genomes. Mob. DNA 2015, 6, 11. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  68. Demšar, J.; Curk, T.; Erjavec, A.; Gorup, Č.; Hočevar, T.; Milutinovič, M.; Možina, M.; Polajnar, M.; Toplak, M.; Starič, A.; et al. Orange: Data mining toolbox in python. J. Mach. Learn. Res. 2013, 14, 2349–2353. [Google Scholar]
  69. Silva, B.S.M.L.; Picorelli, A.C.R.; Kuhn, G.C.S. In Silico Identification and Characterization of Satellite DNAs in 23 Drosophila Species from the Montium Group. Genes 2023, 14, 300. [Google Scholar] [CrossRef]
  70. Pita, S.; Panzera, F.; Mora, P.; Vela, J.; Cuadrado, Á.; Sánchez, A.; Palomeque, T.; Lorite, P. Comparative Repeatome Analysis on Triatoma Infestans Andean and Non-Andean Lineages, Main Vector of Chagas Disease. PLoS ONE 2017, 12, e0181635. [Google Scholar] [CrossRef] [Green Version]
  71. Palacios-Gimenez, O.M.; Koelman, J.; Palmada-Flores, M.; Bradford, T.M.; Jones, K.K.; Cooper, S.J.B.; Kawakami, T.; Suh, A. Comparative Analysis of Morabine Grasshopper Genomes Reveals Highly Abundant Transposable Elements and Rapidly Proliferating Satellite DNA Repeats. BMC Biol. 2020, 18, 199. [Google Scholar] [CrossRef]
  72. Utsunomia, R.; Silva, D.M.Z.d.A.; Ruiz-Ruano, F.J.; Goes, C.A.G.; Melo, S.; Ramos, L.P.; Oliveira, C.; Porto-Foresti, F.; Foresti, F.; Hashimoto, D.T. Satellitome Landscape Analysis of Megaleporinus Macrocephalus (Teleostei, Anostomidae) Reveals Intense Accumulation of Satellite Sequences on the Heteromorphic Sex Chromosome. Sci. Rep. 2019, 9, 5856. [Google Scholar] [PubMed] [Green Version]
  73. Sproul, J.S.; Hotaling, S.; Heckenhauer, J.; Powell, A.; Larracuente, A.M.; Kelley, J.L.; Pauls, S.U.; Frandsen, P.B. Repetitive Elements in the Era of Biodiversity Genomics: Insights from 600+ Insect Genomes. bioRxiv 2022. [Google Scholar]
  74. Logsdon, G.A.; Vollger, M.R.; Eichler, E.E. Long-Read Human Genome Sequencing and Its Applications. Nat. Rev. Genet. 2020, 21, 597–614. [Google Scholar] [CrossRef]
  75. Paajanen, P.; Kettleborough, G.; López-Girona, E.; Giolai, M.; Heavens, D.; Baker, D.; Lister, A.; Cugliandolo, F.; Wilde, G.; Hein, I.; et al. A Critical Comparison of Technologies for a Plant Genome Sequencing Project. GigaScience 2019, 8, giy163. [Google Scholar] [CrossRef] [Green Version]
  76. Xu, Z.; Gao, T.; Xu, Y.; Li, X.; Li, J.; Lin, H.; Yan, W.; Pan, J.; Tang, J. A chromosome-level reference genome of red swamp crayfish Procambarus clarkii provides insights into the gene families regarding growth or development in crustaceans. Genomics 2021, 113, 3274–3284. [Google Scholar] [CrossRef]
  77. Wang, Q.; Ren, X.; Liu, P.; Li, J.; Lv, J.; Wang, J.; Zhang, H.; Wei, W.; Zhou, Y.; He, Y.; et al. Improved genome assembly of Chinese shrimp (Fenneropenaeus chinensis) suggests adaptation to the environment during evolution and domestication. Mol. Ecol. Res. 2022, 334–344. [Google Scholar] [CrossRef] [PubMed]
  78. Katneni, V.K.; Shekhar, M.S.; Jangam, A.K.; Krishnan, K.; Prabhudas, S.K.; Kaikkolante, N.; Baghel, D.S.; Koyadan, V.K.; Jena, J.; Mohapatra, T. A Superior Contiguous Whole Genome Assembly for Shrimp (Penaeus indicus). Front. Mar. Sci. 2022, 8, 808354. [Google Scholar] [CrossRef]
  79. Uengwetwanit, T.; Pootakham, W.; Nookaew, I.; Sonthirod, C.; Angthong, P.; Sittikankaew, K.; Rungrassamee, W.; Arayamethakorn, S.; Wongsurawat, T.; Jenjaroenpun, P.; et al. A chromosome-level assembly of the black tiger shrimp (Penaeus monodon) genome facilitates the identification of growth-associated genes. Mol. Ecol. Resour. 2021, 21, 1620–1640. [Google Scholar] [CrossRef]
  80. Yuan, J.; Zhang, X.; Li, F.; Xiang, J. Genome Sequencing and Assembly Strategies and a Comparative Analysis of the Genomic Characteristics in Penaeid Shrimp Species. Front. Genet. 2021, 12, 658619. [Google Scholar] [CrossRef]
  81. Zhang, X.; Yuan, J.; Sun, Y.; Li, S.; Gao, Y.; Yu, Y.; Liu, C.; Wang, Q.; Lv, X.; Zhang, X.; et al. Penaeid shrimp genome provides insights into benthic adaptation and frequent molting. Nat. Commun. 2019, 10, 356. [Google Scholar] [CrossRef] [Green Version]
  82. Liu, M.; Ge, S.; Bhandari, S.; Fan, C.; Jiao, Y.; Gai, C.; Wang, Y.; Liu, H. Genome characterization and comparative analysis among three swimming crab species. Front. Mar. Sci. 2022, 9, 895119. [Google Scholar] [CrossRef]
  83. Tang, B.; Zhang, D.; Li, H.; Jiang, S.; Zhang, H.; Xuan, F.; Ge, B.; Wang, Z.; Liu, Y.; Sha, Z.; et al. Chromosome-level genome assembly reveals the unique genome evolution of the swimming crab (Portunus trituberculatus). GigaScience 2020, 9, giz161. [Google Scholar] [CrossRef] [PubMed]
  84. Bachvaroff, T.R.; McDonald, R.C.; Plough, L.V.; Chung, J.S. Chromosome-level genome assembly of the blue crab, Callinectes sapidus. G3 Genes Genomes Genet. 2021, 11, jkab212. [Google Scholar] [CrossRef]
  85. Tang, B.; Wang, Z.; Liu, Q.; Zhang, H.; Jiang, S.; Li, X.; Wang, Z.; Sun, Y.; Sha, Z.; Jiang, H.; et al. High-Quality Genome Assembly of Eriocheir japonica sinensis Reveals Its Unique Genome Evolution. Front. Genet. 2020, 10, 1340. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  86. Petersen, M.; Armisén, D.; Gibbs, R.A.; Hering, L.; Khila, A.; Mayer, G.; Richards, S.; Niehuis, O.; Misof, B. Diversity and Evolution of the Transposable Element Repertoire in Arthropods with Particular Reference to Insects. BMC Ecol. Evol. 2019, 19, 11. [Google Scholar] [CrossRef] [Green Version]
  87. Wu, C.; Lu, J. Diversification of Transposable Elements in Arthropods and Its Impact on Genome Evolution. Genes 2019, 10, 338. [Google Scholar] [CrossRef] [Green Version]
  88. Shao, C.; Sun, S.; Liu, K.; Wang, J.; Li, S.; Liu, Q.; Deagle, B.E.; Seim, I.; Biscontin, A.; Wang, Q.; et al. The Enormous Repetitive Antarctic Krill Genome Reveals Environmental Adaptations and Population Insights. Cell 2023, 186, 1279–1294.e19. [Google Scholar] [CrossRef] [PubMed]
  89. Lyon, M.F. Do LINEs Have a Role in X-Chromosome Inactivation? J. Biomed. Biotechnol. 2006, 2006, 59746. [Google Scholar] [CrossRef]
  90. Dodsworth, S.; Chase, M.W.; Kelly, L.J.; Leitch, I.J.; Macas, J.; Novák, P.; Piednoël, M.; Weiss-Schneeweiss, H.; Leitch, A.R. Genomic Repeat Abundances Contain Phylogenetic Signal. Syst. Biol. 2015, 64, 112–126. [Google Scholar] [CrossRef] [Green Version]
  91. Dodsworth, S.; Jang, T.-S.; Struebig, M.; Chase, M.W.; Weiss-Schneeweiss, H.; Leitch, A.R. Genome-Wide Repeat Dynamics Reflect Phylogenetic Distance in Closely Related Allotetraploid Nicotiana (Solanaceae). Plant Syst. Evol. 2017, 303, 1013–1020. [Google Scholar] [CrossRef]
  92. Zhu, L.; Swergold, G.D.; Seldin, M.F. Examination of Sequence Homology between Human Chromosome 20 and the Mouse Genome: Intense Conservation of Many Genomic Elements. Hum. Genet. 2003, 113, 60–70. [Google Scholar] [CrossRef] [PubMed]
  93. Silva, J.C.; Shabalina, S.A.; Harris, D.G.; Spouge, J.L.; Kondrashovi, A.S. Conserved Fragments of Transposable Elements in Intergenic Regions: Evidence for Widespread Recruitment of MIR- and L2-Derived Sequences within the Mouse and Human Genomes. Genet Res 2003, 82, 1–18. [Google Scholar] [CrossRef]
  94. Vitales, D.; Garcia, S.; Dodsworth, S. Reconstructing Phylogenetic Relationships Based on Repeat Sequence Similarities. Mol. Phylogenet. Evol. 2020, 147, 106766. [Google Scholar] [CrossRef]
  95. Bao, W.; Tang, K.F.J.; Alcivar-Warren, A. The Complete Genome of an Endogenous Nimavirus (Nimav-1_LVa) From the Pacific Whiteleg Shrimp Penaeus (Litopenaeus) vannamei. Genes 2020, 11, 94. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  96. Cawthorn, R.J. Diseases of American Lobsters (Homarus Americanus): A Review. J. Invertebr. Pathol. 2011, 106, 71–78. [Google Scholar] [CrossRef] [PubMed]
  97. Clark, K.F.; Greenwood, S.J.; Acorn, A.R.; Byrne, P.J. Molecular Immune Response of the American Lobster (Homarus Americanus) to the White Spot Syndrome Virus. J. Invertebr. Pathol. 2013, 114, 298–308. [Google Scholar] [CrossRef] [PubMed]
  98. Sotero-Caio, C.G.; Platt, R.N., II; Suh, A.; Ray, D.A. Evolution and Diversity of Transposable Elements in Vertebrate Genomes. Genome Biol. Evol. 2017, 9, 161–177. [Google Scholar] [CrossRef] [Green Version]
Figure 1. Standardized annotation protocol for repetitive elements developed in this study.
Figure 1. Standardized annotation protocol for repetitive elements developed in this study.
Genes 14 01627 g001
Figure 2. Proportion and content of repetitive elements in genomes. Percentage of repetitive elements in the genome by class of repetitive elements. De, Dendrobranchiata; Ca, Caridea; Ac, Achelata; As, Astacidea; An, Anomura; Br, Brachyura; Oc, other Crustacea.
Figure 2. Proportion and content of repetitive elements in genomes. Percentage of repetitive elements in the genome by class of repetitive elements. De, Dendrobranchiata; Ca, Caridea; Ac, Achelata; As, Astacidea; An, Anomura; Br, Brachyura; Oc, other Crustacea.
Genes 14 01627 g002
Figure 3. Correlation between genome size and TEs. Correlation plots between assembly or estimated genome size and load (number of copies) or percentage of TEs. Orders and suborders are indicated by different colours. (A). Correlation between assembly size and the load of TEs. Spearman rank correlation test: ρ = 0.87, p-value = 1.864 × 10−6. (B). Correlation between assembly size and the percentage of TEs. Spearman rank correlation test: ρ = 0.6, p-value = 1.48 × 10−3. (C). Correlation between estimated genome size and the load of TEs. Spearman rank correlation test: ρ = 0.62, p-value = 7.114 × 10−4. (D). Correlation between estimated genome size and the percentage of TEs. Spearman rank correlation test: ρ = 0.47, p-value =1.421 × 10−2.
Figure 3. Correlation between genome size and TEs. Correlation plots between assembly or estimated genome size and load (number of copies) or percentage of TEs. Orders and suborders are indicated by different colours. (A). Correlation between assembly size and the load of TEs. Spearman rank correlation test: ρ = 0.87, p-value = 1.864 × 10−6. (B). Correlation between assembly size and the percentage of TEs. Spearman rank correlation test: ρ = 0.6, p-value = 1.48 × 10−3. (C). Correlation between estimated genome size and the load of TEs. Spearman rank correlation test: ρ = 0.62, p-value = 7.114 × 10−4. (D). Correlation between estimated genome size and the percentage of TEs. Spearman rank correlation test: ρ = 0.47, p-value =1.421 × 10−2.
Genes 14 01627 g003
Figure 4. Distribution of satDNA families according to the number of occurrences in each genome. Low-frequency families (less than 10 occurrences) are indicated in dark green, while highly abundant families with more than 1000 occurrences are indicated in red. Number indicated for each species is the estimated genome size. De, Dendrobranchiata; Ca, Caridea; Ac, Achelata; As, Astacidea; An, Anomura; Br, Brachyura; Oc, other Crustacea.
Figure 4. Distribution of satDNA families according to the number of occurrences in each genome. Low-frequency families (less than 10 occurrences) are indicated in dark green, while highly abundant families with more than 1000 occurrences are indicated in red. Number indicated for each species is the estimated genome size. De, Dendrobranchiata; Ca, Caridea; Ac, Achelata; As, Astacidea; An, Anomura; Br, Brachyura; Oc, other Crustacea.
Genes 14 01627 g004
Figure 5. Diversity of repetitive elements. Log2 of the load of each family of repetitive elements identified for each genome was graduated between 0 (blue) and 21 (red). Gray colour indicates raw values of 0, before log2 transformation. The dendrogram was produced according to repeat profile by clustering.
Figure 5. Diversity of repetitive elements. Log2 of the load of each family of repetitive elements identified for each genome was graduated between 0 (blue) and 21 (red). Gray colour indicates raw values of 0, before log2 transformation. The dendrogram was produced according to repeat profile by clustering.
Genes 14 01627 g005
Figure 6. Sequence divergence distribution of TEs representing TE accumulation history based on Kimura 2P distance. Percentage of sequence divergence, or Kimura substitution level, is indicated on the x-axis. On the y-axis is the percentage of the genome occupied by each TE type; the scale is different for each genome depending on the percentage occupied. The TE type is indicated by the color chart.
Figure 6. Sequence divergence distribution of TEs representing TE accumulation history based on Kimura 2P distance. Percentage of sequence divergence, or Kimura substitution level, is indicated on the x-axis. On the y-axis is the percentage of the genome occupied by each TE type; the scale is different for each genome depending on the percentage occupied. The TE type is indicated by the color chart.
Genes 14 01627 g006
Table 1. Genomic datasets used in this study.
Table 1. Genomic datasets used in this study.
Suborder/InfraorderSpeciesAssembly Access IDAssembly Size (Mb)BUSCO Completeness (%)Paired-End Illumina Reads SRA Access IDEstimate Genome Size (Mb)Estimate Genome Size Reference
DendrobranchiataPenaeus chinensisGCF019202785.1146690.7SRR134521532660[46]
Penaeus indicusGCA018983055.1193688.5SRR129695432810[47]
Penaeus japonicusGCF017312705.1170596.6DRR2787442170[47]
Penaeus monodonGCF015228065.1239483.9SRR112780662200[47]
Penaeus vannameiGCF003789085.1166484.8SRR136616922270[47]
CarideaCaridina multidentataGCA002091895.1194925.2DRR0545593230[48]
Macrobrachium nipponenseGCA015104395.1198541SRR90263934600[49]
AchelataPanulirus ornatusGCA018397875.1192670SSR138225893230[50]
AstacideaProcambarus virginalisGCA020271785.1370167SRR129019063500[51]
Procambarus clarkiiGCF020424385.1273594.3SRR144571958500[52]
Cherax destructorGCA009830355.1333781.7SRR104670554500[53]
Cherax quadricarinatusGCA009761615.1323769.9SRR104847125000[54]
Homarus americanusGCF018991925.1229293SRR126991667700[55]
AnomuraParalithodes camtschaticusGCA018397895.1381044.2SRR138058577290[50]
Paralithodes platypusGCA013283005.1480571.7SRR11457495490[56]
Birgus latroGCA018397915.1295957.7SRR138161586220[50]
BrachyuraChionoecetes opilioGCA016584305.1200391SRR112782301655
Eriocheir sinensisGCA013436485.1127292.6SRR119713292230[57]
Portunus trituberculatusGCF017591435.1100593.5SRR99640282250[57]
Callinectes sapidusGCA020233015.199890.4SRR158341032290[58]
Other CrustaceaAmphibalanus Amphitrite (Cirripedia)GCA019059575.180893.9SRR9595623481[59]
Armadillidium vulgare (Isopoda)GCA004104545.1172584.5SRR81561781660[60]
Daphnia magna (Phyllopoda)GCA020631705.216198.6SRR15012074238[61]
Darwinula stevensoni (Podocopida)GCA905338385.138290.3SRR8695251437[62]
Eurytemora affinis (Copepoda)GCA000591075.238991SRR2452640616[63]
Hyalella Azteca (Amphipoda)GCA000764305.455193.8SRR15560431050[64]
Table 2. Number of RE libraries identified and annotated using species-specific libraries or a merged library from all species. RMo—RepeatModeler2, Tp—TAREAN pipeline.
Table 2. Number of RE libraries identified and annotated using species-specific libraries or a merged library from all species. RMo—RepeatModeler2, Tp—TAREAN pipeline.
Suborder/InfraorderSpeciesAb Initio satDNA Families IdentifiedNumber of Families Annotated Using RMo Species-Specific and Repbase as Library for Each SpeciesNumber of Families Annotated Using Merged Libraries of RMo and Tp Libraries for All Species and Repbase
RMoTpAll RE FamiliesPercentage of UnknownsatDNA OnlyAll RE FamiliesPercentage of UnknownSatdna Only
DendrobranchiataP. chinensis17754712.38%2422,7023.44%56
P. indicus1282527.72%3024,2373.40%57
P. japonicus3576937.25%2922,6113.61%59
P. monodon0486479.28%2825,1833.57%57
P. vannamei0376218.85%3023,2403.49%55
CarideaC. multidentata1611,10411.93%3828,06511%74
M. nipponense2010,45519.68%3826,02113.42%57
AchelataP. ornatus16885021.13%3525,9958.12%60
AstacideaP. virginalis131921328.26%3326,4839.95%96
P. clarkii239883822.52%3426,05113.67%97
C. destructor42410,39114.10%4029,9706.88%92
C. quadricarinatus14310,41114.33%3526,9664.99%96
H. americanus12955724.16%3527,87317.29%61
AnomuraP. camtschaticus21911,43124.95%3330,16914.36%95
P. platypus03611,33232.76%3431,79813.27%109
B. latro1211,05325.48%3731,20716.30%59
BrachyuraC. opilio0010,40022.89%2926,56112.26%52
E. sinensis10848620.74%2923,93711.82%49
P. trituberculatus00739912.28%2021,0706.42%39
C. sapidus02691113.68%1819,0418.68%31
Other CrustaceaA. Amphitrite (Cirripedia)11671727.06%1411,96914.90%22
A. vulgare (Isopoda)013943117.40%2719,09811.91%47
D. magna (Phyllopoda)23364317.90%10680514.63%11
D. stevensoni (Podocopida)12976225.59%2217,33923.89%38
E. affinis (Copepoda)18606933.37%3213,33424.15%46
H. Azteca (Amphipoda)110685116.21%2814,42413.69%46
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Rutz, C.; Bonassin, L.; Kress, A.; Francesconi, C.; Boštjančić, L.L.; Merlat, D.; Theissinger, K.; Lecompte, O. Abundance and Diversification of Repetitive Elements in Decapoda Genomes. Genes 2023, 14, 1627. https://doi.org/10.3390/genes14081627

AMA Style

Rutz C, Bonassin L, Kress A, Francesconi C, Boštjančić LL, Merlat D, Theissinger K, Lecompte O. Abundance and Diversification of Repetitive Elements in Decapoda Genomes. Genes. 2023; 14(8):1627. https://doi.org/10.3390/genes14081627

Chicago/Turabian Style

Rutz, Christelle, Lena Bonassin, Arnaud Kress, Caterina Francesconi, Ljudevit Luka Boštjančić, Dorine Merlat, Kathrin Theissinger, and Odile Lecompte. 2023. "Abundance and Diversification of Repetitive Elements in Decapoda Genomes" Genes 14, no. 8: 1627. https://doi.org/10.3390/genes14081627

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop