Introduction

Over the last decades, molecular techniques have provided important innovation in the field of invasion biology (Rius et al. 2015), especially in the last few years with the use of genomic approaches (Bernardi et al. 2016; Barbosa 2021; Stuart et al. 2021). Particularly, population genetics and phylogenetics are increasingly used to investigate some key aspects of biological invasions, such as the dynamic patterns of genetic diversity and evolutionary processes in real time (Dlugosch and Parker 2008; Blackwell et al. 2020; Popovic and Bernatchez 2021).

Introductions may result from the transport of few individuals to a new location where they thrive, in part due to being released from predation, competition and parasitic load pressures (Baker and Stebbins 1965; Sax et al. 2007). For example, the recent explosion of lionfishes in the Caribbean is a spectacular illustration of the devastating effects of biological invasions (Betancur et al. 2011). While apparently straightforward, the genetics of bioinvasions are counterintuitive (Dlugosch and Parker 2008), as shown by empirical studies that oppose general predictions (Chiesa et al. 2019). Theoretically, founding effects should be revealed by a decrease in genetic diversity resulting from bottlenecked populations. Nevertheless, despite their lowered evolutionary potential, and due to a combination of ecological and biological characteristics, these species can still become successful in their new environment (e.g. Azzurro et al. 2006; Xue et al. 2018). This is known, in invasion biology, as the ‘genetic paradox’ (Roman and Darling 2007).

In other cases, and especially in aquatic systems (e.g. Holland 2001; Xue et al. 2018), invasive populations may maintain a high genetic diversity with no trace of bottleneck. This situation seems to be very common in the Mediterranean Sea (e.g. Bernardi et al. 2010), one of the most invaded region of the world, with recent estimates of the number of introduced species close to 700 established metazoans (Zenetos and Galanidi 2020).

In 1869, the opening of the Suez Canal, started a process of invasion from the Red Sea into the Mediterranean (termed Lessepsian migration, after the Canal engineer Ferdinand de Lesseps), resulting in a variety of ecological impacts on the receiving ecosystem (Por 1978; Golani 2010; Katsanevakis et al. 2016). Whilst species moving in the opposite direction, the so-called ‘anti-Lessepsian migrants’, are very rare (Tortonese 1984; Bos et al. 2020). Lessepsian fishes now comprise more than 100 species (Zenetos et al. 2012; Azzurro et al. 2015), and they are increasingly the focus of genetic studies (Chiesa et al. 2019). In general, the specific dynamics of biological invasions are poorly known, yet, in this case, the situation presents some definite advantages for scientists. The date of the opening of the invasion route is known, and the geographical source of the invaders is also known, at least in general terms, as the Red Sea region. Thus, theoretical predictions seem fairly simple. Some individuals from the Red Sea would enter the Mediterranean via the Suez Canal and would later expand into the wide-open ecological niches. This situation would predict a genetic bottleneck due to an invading subsample of the original populations, followed by a fast range expansion, a pattern that is consistent with other documented invasions (Sax et al. 2005). However, in contrast to this prediction, Red Sea and invasive Mediterranean populations studied so far displayed a high genetic similarity and no evidence of genetic bottleneck (Golani and Ritte 1999; Bucciarelli et al. 2002; Hassan et al. 2003; Hassan and Bonhomme 2005; Azzurro et al. 2006; Bariche and Bernardi 2009), with notable exceptions such as the bluespotted cornetfish, Fistularia commersonii (Golani et al. 2007; Tenggardjaja et al. 2014; Jackson et al. 2015; Bernardi et al. 2016). In general, data suggest that colonization occurred by a large number of individuals or by multiple colonization events (Chiesa et al. 2019).

Possible explanations for the genetic patterns of Lessepsian fishes are that introductions are only successful when a large number of individuals enter the Mediterranean at once, (or that multiple introduction events occur in rapid succession). A further hypothesis is that individuals undergo preadaptation (genetic changes prior to introduction), and when selected individuals enter the new environment, these changes become adaptive and the invasion successful (Blackburn et al. 2017; Stuart et al. 2020; Barbosa 2021). Key components missing to our understanding of the Lessepsian bioinvasions are the temporal and spatial dimensions of the earliest phase of invasion. So far, only two studies assessed the temporal component: blue-chin parrotfish, Scarus ghobban, individuals were collected in Lebanon very soon after first being detected, and so were dusky spinefoot rabbitfish Siganus luridus, as soon as they reached the island of Linosa, Italy. Yet, in both cases, no evidence of bottleneck was observed (Azzurro et al. 2006; Bariche and Bernardi 2009).

In this study, we focused our attention on the spatial component of the question, by looking at the region closest to the Suez Canal, in order to determine if any evidence of population structure or selection was present at the boundary between the Red Sea and the Mediterranean. We used two species of reef fishes: a rabbitfish, the marbled spinefoot Siganus rivulatus, and a filefish, the reticulated leatherjacket, Stephanolepis diaspros, to investigate the potential link between population structure and natural selection on invasion success using thousands of Single Nucleotide Polymorphisms (SNPs) obtained by RAD sequencing.

The goal of this study was, therefore, to sample these two reef fishes with different life histories in the vicinity of the Suez Canal, to assess the spatial component of Lessepsian bioinvasions at the entry gate of the Mediterranean, using thousands of molecular markers to assess the relative role of drift and selection in their invasion success.

Materials and methods

Siganus rivulatus and Stephanolepis diaspros

The two focal species have similar invasion histories, being among the first Lessepsian immigrants to having been observed in the Mediterranean in 1927 (Steinitz 1927), and to establish permanent populations in the Levant. In the last decades, these two species have significantly expanded their distribution westwards. The marbled spinefoot had reached the Adriatic coast by the year 2000 (Pallaoro and Dulc 2004), and the Strait of Sicily by 2017 (Stamouli et al. 2017). The reticulated leatherjacket had reached Tunisian, Maltese, and Italian waters by 2015, (Deidun et al. 2015; Tiralongo et al. 2019; Katsanevakis et al. 2020). The two species have, however, different life histories. Both species produce benthic adhesive eggs (Lam 1974; Popper et al. 1979), however, after hatching, pelagic larval duration differs. On the Lebanese coast, spawning occurs in June in S. rivulatus (Bariche et al. 2004), which may indicate a pelagic larval life duration of approximately 30 days, and thus fits the usual time of larval life recorded in this family (Bryan and Madraisau 1977; Popper et al. 1979). In contrast, pelagic larval duration is restricted to only 17–20 days in Stephanolepis (Rogers et al. 2001). The rabbitfish periodically spawn very large egg masses (average female carries over 200,000 oocytes), and display a ‘group-synchronous’ strategy of ovarian development (Bariche et al. 2009), while Stephanolepis have parental egg care and spawn comparatively fewer eggs (average female carries approximately 50,000 oocytes) (Mancera-Rodríguez and Castro-Hernández 2015).

Sampling Samples of Siganus rivulatus and Stephanolepis diaspros were collected by local fishers, and with beach seines and hand nets in June and July 2010. Samples were collected in the vicinity of the Suez Canal, within 50 km of the Canal in the Red Sea, and within 300 km of the Canal in the Mediterranean. A total of 38 and 40 samples of S. rivulatus and S. diaspros, respectively, were collected in Egypt and Israel, with an additional nine samples of S. diaspros being collected in Tunisia to evaluate longer range spatial effects (Fig. 1, Table 1). Fin clips were immediately placed in 95% ethanol and kept at ambient temperature until reaching the laboratory where they were stored at − 20 °C.

Fig. 1
figure 1

Map of the study region showing sampling locations used in this study: HAI: Haifa, Israel; PSD: Port Said, Egypt; ISM: Ismailia, Egypt, SUE: Suez, Egypt; ATT: Attaka, Egypt; SOK: Sokhna, Egypt; ALE: Alexandria, Egypt; DJE: Djerba, Tunisia. Blue dots indicate sampling locations for Siganus rivulatus sampling locations, red dots indicate sampling locations for Stephanolepis diaspros, purple dots indicate sampling locations for both species

Table 1 Sampling locations and number of individuals for Siganus rivulatus and Stephanolepis diaspros

DNA extraction—Library preparation DNA was extracted from fin clips using DNeasy Blood & Tissue kits (Qiagen) according to the manufacturer's protocol. We constructed RAD libraries using a variation of the original protocol with the restriction enzyme SbfI (Miller et al. 2007, 2012; Baird et al. 2008; Longo and Bernardi 2015). Initial genomic DNA amount for each individual was 400 ng. Libraries were physically sheared on a Covaris S2 sonicator with an intensity of 5, duty cycle of 10%, cycles/burst of 200 and a cycle time of 30 s. We carried out the final PCR amplification step in 50 μl reaction volumes with 10 amplification cycles. Ampure XP beads (Agencourt) were used for each purification step and size selection. The library was sequenced in a single lane on an Illumina HiSeq 4000 at the Vincent J. Coates Genomics Sequencing Laboratory at UC Berkeley (USA). Then, we applied Perl scripts to trim the raw reads to 92 base pairs (bp) on the 3' end, quality filtered and demultiplex them according to the 6 bp unique barcodes. Reads with Phred scores of < 33 were discarded. The barcodes and restriction site residues (6 bp) were removed from the 5' end, and this resulted in a final sequence length of 80 bp.

RAD analysis The software program STACKS version 2.2 (Catchen et al. 2011, 2013) was used to identify orthologous sequences among individuals. In order to optimize the STACKS protocol, we followed published guidelines (Rochette and Catchen 2017). Briefly, we first identified 14 individuals with the highest sequencing coverage and created a catalog of loci based on these samples. (using the ustacks and cstacks components of STACKS). In addition, we optimized the values of the parameters M and n (M and n values were kept identical to each other), by varying them from 1 to 9 as recommended. We found that the optimal value for M and n was 3 for Siganus rivulatus, and 4 for Stephanolepis diaspros. We then included the remaining samples using ustacks.

Analysis of population structure Population structure was determined using all samples for Siganus rivulatus and two separate datasets for Stephanolepis diaspros. The first dataset only included samples adjacent to the Suez Canal, as for better comparison with S. rivulatus. A second dataset included also samples from Djerba, Tunisia, to assess large spatial scales in that species. Using the population component in STACKS, we further filtered the dataset by retaining loci which aligned in ≥ 80% of individuals (r command, − r 0.8) in every population. In order to remove paralogs, we used the minor alleles function of STACKS following recommendations (–min_mav 0.05). We then generated genepop files using the populations component in STACKS with the write_single_snp option (a single SNP is kept for each locus), which were converted afterwards using the program PGDSpider V.2.1.0.3 (Lischer and Excoffier 2012).

Population structure was analyzed using common population genetic metrics estimated by GENODIVE (Fst, F’st, and heterozygosities), and STACKS (Fst), as well as using STRUCTURE, PCA, and DAPC approaches. Structure files from the STACKS populations output were analyzed using a Bayesian approach in STRUCTURE version 2.3.4 (Pritchard et al. 2000). Ten replicate runs were performed for a range of K from one to seven, with 10,000 iterations as the burn-in parameter and 100,000 iterations under the admixture model. The highest likelihood for K was estimated according to the Evanno method (Evanno et al. 2005) implemented in Structure Harvester (Earl and VonHoldt 2012). We performed a Principal Component Analysis (PCA) when analyzing two populations, and when more than two populations were analyzed, we used a Discriminant Analysis of Principal Components (DAPC), (Jombart et al. 2010; Miller et al. 2020), which combines the benefits of discriminant and principal component analyses, and is particularly useful to study differences between clusters (i.e., sites or populations) as it utilizes a multivariate approach to explore the entire variation in the data while minimizing that within clusters. These analyses were performed using the ADEGENET package in R (Jombart 2008; R CoreTeam 2016) using the vcf file produced by the STACKS populations output used as an input file. The algorithm find.clusters identified the plausible number of clusters by comparing Bayesian Information Criterion (BIC) values, and the cross-validation tool xvalDapc determined the number of principal components that were retained.

Outlier loci analyses We examined genetic structure in neutral and outlier loci, separately, to investigate the possibility that different factors have shaped population divergence in unique patterns. Outlier loci were identified using PCADAPT implemented in R (Privé et al. 2020). We used the STACKS populations output to determine Fst (Phi st) values and their distributions to separate loci under directional (Loci with large Fst, i.e. Fst > 0.1), and stabilizing selection. Loci identifiers were then placed in “whitefiles”, which were used to run population scripts again. Although working with outliers might incorporate a series of shortfalls (Bierne et al. 2011, 2013; Lotterhos and Whitlock 2015), they are commonly used to show evidence of the diverging effects of selection between populations (Gaither et al. 2015; Longo and Bernardi 2015; Bernardi et al. 2016; Stockwell et al. 2016).

Functional analyses

All outlier loci were compared to GenBank entries with BLAST, where E-values of 0.001 and below were kept and recorded (probability of obtaining the same result by chance < 0.001). When protein coding matching sequences were found, they were classified using KEGG assignments (Ogata et al. 1999; Kanehisa et al. 2008).

Results

Loci and polymorphism statistics

Genomic DNA was sequenced from a total of 78 samples (38 Siganus rivulatus and 40 Stephanolepis diaspros) in an Illumina lane producing approximately 100 million reads. After passing all filters in the population scripts, STACKS identified a total of 31,249 and 17,831 loci for S. rivulatus and S. diaspros, respectively (Table 2). When grouping individuals in only two populations (Red Sea and Mediterranean), STACKS identified 27,836 and 12,344 loci, respectively (Table 2).

Table 2 Diversity characteristics of Siganus rivulatus and Stephanolepis diaspros datasets

Observed heterozygosities were similar in the two species, 0.187 and 0.196 for S. rivulatus and S. diaspros, respectively (Table 2). A decrease in observed heterozygosity in invading Mediterranean populations was not observed in either species. Indeed, those values were very similar in Red Sea and Mediterranean populations for both species: 0.161–0.187 for Red Sea and Mediterranean populations of S. rivulatus, 0.185 and 0.179 for Red Sea and Mediterranean populations of S. diaspros.

Population structure

All loci

Classical measures of population structure showed that structure was very weak between Red Sea and Mediterranean populations of S. rivulatus: Fst = 0.015, Fst = 0.005, F’st = 0.006 (Table 2). The locus with the highest Fst had a value of 0.506. In contrast, S. diaspros showed higher values, suggesting population structure between Red Sea and Mediterranean populations: Fst = 0.042, Fst = 0.058, F’st = 0.069 (Table 2). The locus with the highest Fst showed a higher value of 0.687.

A principal component analysis (PCA) mirrored the results described above. Red Sea and Mediterranean populations did not separate for S. rivulatus, while they showed structure in S. diaspros (Fig. 2). A STRUCTURE analysis, again, was consistent with both PCA and population structure metrics, showing no structure in S. rivulatus and distinct patterns between Red Sea and Mediterranean populations of S. diaspros (Fig. 3).

Fig. 2
figure 2

Discriminant analysis of principal components (DAPC) cluster plots of Mediterranean and Red Sea populations (p = 2) of Siganus rivulatus and Stephanolepis diaspros based on 27,836 and 12,344 loci, respectively (upper panel). Lower panel shows a smilar analysis based on outlier loci (15 and 22 loci, respectively). Plots were created in R using the adegenet package

Fig. 3
figure 3

STRUCTURE plots of Siganus rivulatus and Stephanolepis diaspros individuals from the Red Sea and the Mediterranean based on neutral and outlier SNPs. STRUCTURE plots (K = 2) with Bayesian assignment of individuals (vertical bars) based on presumed neutral loci (upper panel) and outlier loci (loci suspected to be under selection (lower panel)

When considering each population independently (as opposed to separating individuals in only Red Sea and Mediterranean populations), results were consistent with what was presented above (Table 2, Supplementary Fig. 1).

Outlier loci and genes under selection

A combination of PCADAPT and STACKS identified loci under potential directional selection (outlier loci). The number of outlier loci was lower in S. rivulatus than in S. diaspros: 15 versus 22 for the Red Sea and Mediterranean comparison, 74 versus 201 when considering all populations (Table 2). As expected, values of Fst were much higher when considering outlier loci, and were again lower in S. rivulatus than in S. diaspros (0.154 and 0.22, respectively, Table 2). Patterns observed with PCAs and STRUCTURE were consistent with the patterns based on all loci as described above, but the separation of populations was even more pronounced in S. diaspros, while still not evident in S. rivulatus (Figs. 2 and 3).

Outlier loci were compared to GenBank sequences. Protein coding genes were identified for six and five loci for S. rivulatus and S. diaspros, respectively (Table 3). The high proportion of protein coding genes within the outlier loci (40% and 22.7% in S. rivulatus and S. diaspros) is consistent with the assumption that outlier loci are indicative of selection. Three genes were identified as ionic transporters (Table 3), which may play an important role in osmoregulation in fishes.

Table 3 Genes under putative selection (outlier loci), for Siganus rivulatus and Stephanolepis diaspros

Additional population

Considering the structure obtained in S. diaspros, we sampled an additional population further away from the Suez Canal in the Mediterranean (Djerba, Tunisia). The addition of this population gave us insight into the pattern of population structure in that species. Patterns of population structure described above were even more pronounced with the addition of this population, with higher levels of population structure (Fst = 0.069, F’st = 0.081, Table 2), more divergent populations as analyzed by PCA and DAPC (Supplementary Fig. 1), and more outlier loci (332 loci, Table 2). We analyzed the composition of the protein coding genes associated with the outlier loci of this analysis, and found that the largest family of genes under selection were related with Environmental Processing Information, comprising 26% of all the loci under selection (Fig. 4), suggesting potential for local adaptation to a new environment.

Fig. 4
figure 4

KEGG assignments of outlier loci that match protein coding genes using the KEGG’s orthology and link annotation tool (KOALA). Pie depicts the relative percentage of each category

Discussion

Successful biological invasions generally share common attributes, such as large numbers of invading individuals that carry high genetic diversity (Blackburn et al. 2015), genomic regions under selection (Bernardi et al. 2016; Stuart et al. 2020; Barbosa 2021), or hybridization strategies (Blackwell et al. 2020; Valencia-Montoya et al. 2020; Popovic and Bernatchez 2021). In the case of Lessepsian bionvasions, most fish populations show an absence of genetic bottleneck, and a lack of population structure between Red Sea and Mediterranean populations (Bernardi et al. 2010; Chiesa et al. 2019). Thus far, however, little information has been available about one important aspect of the invasion, namely the genetics of populations around their entry gate, ie. in the immediate vicinity of the Suez Canal. The goal of this study was to fill this knowledge gap by looking, as a case study, at the genomics of the rabbitfish Siganus rivulatus and the filefish Stephanolepis diaspros.

Importantly, the focal region of this study is very limited spatially, with the Canal being a little less than 200 km in length, thus apparently not conducive to strong genetic structure in marine organisms (White et al. 2010; Selkoe and Toonen 2011; Pinsky et al. 2017). This expectation (no genetic differentiation) is consistent with some of our results. Indeed, the PCA and STRUCTURE analyses, as well as the classic metrics for population structure and genetic diversity, all indicated that the rabbitfish, S. rivulatus, showed little genetic structure between Red Sea and Mediterranean populations. In contrast, and surprisingly, the filefish, S. diaspros, showed strong population structure, even at such small spatial scale. This population structure mirrored the pattern seen at larger spatial scales when we added a further population from Djerba, Tunisia. This pattern is consistent with the prediction that strong population structure at larger scales is a good proxy for population structure in invading populations (Gaither et al. 2013; Jackson et al. 2015). Life-history traits are known to influence connectivity of marine fish populations (Chopelet et al. 2009). In this regard, the monogamous habit of filefishes and their complex mating system (Rim and Bradai 2011) may explain the genetic patterns that we observed. Yet, the strong population structure across the Suez Canal that was observed in S. diaspros is worth exploring in light of the main determinants, dispersal mechanisms v.s. selection, that possibly affect the genetic structure of these invaders.

Dispersal mechanisms

Crucial gaps in knowledge on dispersal mechanisms still hinder a thorough understanding of patterns of genetic diversity of Lessepsian fishes, but some hypotheses may be proposed. In marine fish populations, where the pelagic larval stage may last weeks to months, dispersal capability is often equated to gene flow and genetic divergence (Slatkin 1987; Selkoe and Toonen 2011; Medina et al. 2018). Due to the difficulties of tracking the fate of propagules, no direct observation exists on the pattern of dispersal through the 200 km long Suez Canal. Oceanographic models for the Mediterranean provide realistic pathways for potential dispersal of fish propagules (Bouzaiene et al. 2020). Within the Suez Canal, water movement models show that, while some recent southward migration may be possible, the main water transport is northward, from the Red Sea into the Mediterranean, as expected from the min observed biological invasions (Biton 2020). While it is assumed that most invasions occur via passive larval transport, parasitological evidence supports that crossing is achieved, at least in part, by adult individuals that actively swim from the Red Sea into the Mediterranean (Diamant 2010). In this respect, the limited swimming ability of adult filefishes, and their territoriality (Kawase and Nakazono 1996), could explain the observed genetic differentiation between Red Sea and Mediterranean populations, resulting from a reduced number of propagules entering the Mediterranean. The idea that dispersal characteristics may have influenced the difference in population structure in S. rivulatus and S. diaspros is also consistent with previous studies that have shown some evidence of structure in Stephanolepis filefishes (Yoon et al. 2012), and a lack of structure in Mediterranean rabbitfishes (Bonhomme et al. 2003; Hassan et al. 2003; Hassan and Bonhomme 2005; Azzurro et al. 2006). It is important to note, however, that other species of Pacific rabbitfishes, do show some, albeit low, levels of population structure in Indonesia and in Guam (Priest et al. 2012; Bramandito et al. 2018).

Natural selection and preadaptations

The environmental conditions encountered by Lessepsian migrants during their transition through the Suez Canal are very diverse: the Red Sea tends to be warmer and saltier than the Mediterranean, and although the salinity of the bitter lakes is variable, it is generally still much saltier than either Red Sea or Mediterranean (El-Bassat 2008). Our results reinforce the idea that the ability of invaders to cope with osmotic changes, in addition to simple dispersal capabilities, is an important factor.

Dispersal (and its genetic consequence, gene flow) counter the potential for local adaptation. However, when selection is strong, it may overcome the homogenizing effects of gene flow, to the point of creating population structure and even occasionally resulting in speciation (Nosil 2008; Savolainen et al. 2013; Arick et al. 2020). Considering the strong environmental gradient encountered by Lessepsian migrants in the Suez Canal, genes under selection involved with osmoregulation have been found in the bluespotted cornetfish, a very successful Lessepsian migrant (Bernardi et al. 2016). Therefore, we looked for evidence of selection in the face of gene flow in the two focal species of this study. In this study, both focal species showed loci under directional selection, and for both species, genes under selection included functions that are potentially involved with osmoregulation. Interestingly, selection was uncovered in the face of low (S. diaspros) and high (S. rivulatus) gene flow, indicating that selection must be very strong to be detectable.

Conclusion

With the advent of genomic approaches, the role of genes under selection has been pushed to the forefront of the field. In general, invading individuals have been present in their new environment for too short a time for mutations to accumulate, and selection to act upon them. A more likely scenario is that variability is present in the original population, and either a large number of individuals invades, and only locally adapted individuals survive, or alternatively, only preadapted individuals are capable of thriving in their new environment. In order to distinguish those scenarios, one must look at the very early phase of invasion. Results obtained in this study suggest that individuals showing signatures of selection are present in the vicinity of Suez Canal, suggesting that at least some Lessepsian invaders show preadapation characteristics right out of the gate.