Introduction

The rapid shift of arboviral diseases burden is concerning, particularly in Africa, where most people are impoverished, and health services are preoccupied with the malaria burden. However, in Sudan, in addition to substantiating the shift from the high burden of malaria parasites to arboviral diseases, multiple reports from various parts of the country indicate a massive increase in arboviral disease cases and expansion according to the Federal Ministry of Health reports1,2.

Dengue, yellow fever, chikungunya and other arboviral diseases are endemic in Sudan2. These arboviruses are transmitted to humans via the bites of infected mosquitoes of the genus Aedes, subgenus Stegomyia, particularly Aedes aegypti, which plays a major role in transmitting arboviral diseases. In the country, entomological surveillance revealed that, Ae. aegypti is the predominant mosquito in all arbovirus’ endemic areas in the country3,4,5,6.

Wide ecological research has been conducted on the population structure and dynamics of insects and this can indicate future population trends. Predicting outbreaks requires an understanding of the relationship between population structure and change in response to anticipated environment changes. Hence, population genetics research may reveal important details about a species' dispersion and population dynamics7.

Microsatellites are one of the most potent tools produced in recent years in population structure and population genetics, among many molecular markers available8. Microsatellites are highly variable genetic markers that have frequently been used in population genetic investigations at the intraspecific level because of their high polymorphism, co-dominant inheritance ease and reliability of scoring alleles, high abundance, and highly changeable nature of their loci in the genome. Therefore, microsatellites are widely used as common markers in insect studies9,10,11.

Several studies on the genetic structure of Ae. aegypti have been conducted using microsatellite markers in different parts of the world such as the Pacific region12, China13, United States of America14, Philippines15, Sri Lanka16, Black Sea17, Kenya18, Sudan3 and several others. A global study reviewed the genetic variation at 12 microsatellite loci in 79 Ae. aegypti populations from 30 countries across six continents to infer historical and modern invasion patterns. Their findings verified the genetic departure of the two subspecies Ae. aegypti formosus and Ae. aegypti aegypti19.

The control of mosquito-borne diseases has primarily been accomplished by vector control, most typically by killing the vectors with various biocides. However, control programs based on this technique have been universally considered ineffective due to the rise in resistance20,21. Estimating the genetic composition of the Ae. aegypti population and gene flow would help researchers better understand dengue epidemiology. Molecular genotyping of the mosquito vector using these markers has revealed new information on the vectors especially microevolution, exposing the populations’ gene flow pattern, which may be interpreted as the vector's ability to disperse22,23.

Eastern Sudan witnessed the greatest chikungunya epidemic in Africa to date in 2018/19, affecting roughly 500,000 people, with the Aedes aegypti vector being the most prominent vector in the outbreak areas24. The Sudanese Ministry of Health recorded 3326 cases of dengue fever in 8 Sudanese States on November 23, 2022 with the disease claiming the lives of 23 individuals. The largest dengue fever outbreak to hit Sudan in almost a decade is currently occurring, with Red Sea State and North and South Kordofan being particularly heavily struck25.

Aedes aegypti, which is thought to have originated in Africa, is known to have two subspecies or variants that differ in terms of behaviour, transmitting capacity, and distribution26,27. In Sudan, both Aedes aegypti subspecies are common and have a wide range of distribution. However, the Aedes aegypti aegypti (Aaa) subspecies seems to be more prevalent in the east of the country while the Aedes aegypti formosus (Aaf) subspecies is the most common form of the vector in western Sudan1,3. Only a few studies have reported the subspecies genetic differences in Sudan, and they indicated that the genetic structure of the two subspecies were clearly different from one another1,3,28.

The study of disease vectors' genetic structure and variability sheds crucial light on their biology, behaviour, genetic assimilation, and ability to transfer diseases. A previous study on the distinct populations of Aedes aegypti subspecies in different areas of Sudan using CO1 mitochondrial marker observed that the two subspecies were phylogenetically structured into two clusters1. However, another study using the ND4 mitochondrial gene, indicated gene flow among the populations of the Aedes aegypti subspecies, suggesting that they are not entirely genetically isolated28.

Hence, understanding the genetic characteristics of the two forms of Aedes aegypti is required for a better understanding of the biology, behaviour, genetic mixing, and disease-transmission potential of the vectors. In this research, Aedes aegypti subspecies populations were sampled from eight sites in Sudan to examine the genetic structure and diversity of the populations using seven microsatellite markers.

Results

Mosquito identification and genetic variability

The results of the identification showed that the Aedes aegypti mosquitoes from the western and southern parts of the county (Darfur and Kordofan) thus Nyala (N), Al Fasher (F), Al Junaynah (J), and Kadugli (D) were Aaf. Samples from each of the four towns located in the eastern and central parts of the county, namely Port Sudan (P), Tokar (T), Kassala (K), and Barakat/Gezira (G), were morphologically identified as the Aaa subspecies (Fig. 1).

Figure 1
figure 1

Map showing the eight study sites of Aedes aegypti subspecies in Sudan. Aaa, Aedes aegypti aegypti. Aaf, Aedes aegypti Formosus.

At the seven different microsatellite loci, 202 Aedes aegypti mosquitoes from eight different sites were genotyped. Not all the loci were successfully amplified in all the examined locations, and the number of genotyped individuals per loci varied from 5 to 31 (Table 1). There was no evidence of scoring errors due to significant allele dropout or stuttering as well as no proof of null alleles presence in all loci. All the seven loci were polymorphic, albeit at varying degrees, with the number of alleles per locus ranging from 14 at locus A10 (AR = 3.4) to 37 at locus B19 (AR = 3.8) (Table 2). The average number of alleles across the seven loci in the populations ranged from 8.7 ± 4.27 in Fasher to 14.3 ± 6.18 in Kassala, with an average of 12.3 ± 2.16 alleles per locus (Table 1).

Table 1 Number of alleles means and total number in eight populations within seven loci.
Table 2 Summary statistics of number of individuals genotyped (N), allelic range and richness (AR), number of alleles (NA) and genetic diversity at each locus and each population of Ae. aegypti.

Private alleles (restricted to a single population) were observed at all loci except A10 locus and accounted for 46 of the 183 alleles (25.1%) recorded across all loci at all sites, while G11 recorded the highest number of private alleles. The greatest number of private alleles was observed at Kadugli and Nyala with 7 private alleles in both, followed by Junaynah and Gezira with 8 private alleles (Table 2). All microsatellite loci of the Ae. aegypti populations were found to be polymorphic with the average number of alleles per locus ranging from 9.25 (G11) to 20.38 (B07) (Table 2).

The number of alleles (NA), allelic range, allelic richness (AR), and gene diversity (Gd) were used to evaluate genetic diversity, and the results showed variations across loci and sites (Table 2). Although allelic richness showed variation among different sites and loci, the average AR seemed to be consistent, ranging from 3.08 in M313 to 3.79 in B07. Generally, all the sites showed a relatively high gene diversity, ranging from 0.812 to 0.915. Barakat/Gezira had the highest average gene diversity (0.949) between sites (Fig. 2).

Figure 2
figure 2

Allele frequency distribution for eight populations of Ae. aegypti across seven microsatellite loci.

Hardy–Weinberg equilibrium (HWE), linkage disequilibrium (LD), and FIS among the eight populations of Aedes aegypti

All loci showed significant deviations from HWE equilibrium (in one population or more) except M201 locus which followed HWE. Generally, 14 out of 56 tests (25%) significantly departed from equilibrium after Benjamini–Hochberg multiple testing correction (Table 3). Port Sudan, Kadugli and Junaynah populations showed non-significant deviation from HWE, which probably means that those populations were not following HWE (Table 3). Significant deviation from linkage disequilibrium was found in 39 of the 168 pairwise comparisons between individual loci at each site (23.2% of tests performed) (Table 4).

Table 3 Summary statistics of microsatellite data of eight populations of Ae. aegypti from Sudan.
Table 4 Linkage disequilibrium between pairs of microsatellite loci of Ae. aegypti populations.

The inbreeding coefficient (FIS) over all loci demonstrated that the majority of populations revealed an excess of observed heterozygotes (many negative values). A high inbreeding rate was observed within these populations (FIS average ranged from 0.021 to 0.179) (Table 3).

Genetic diversity, LD, HWE and FIS for the two subspecies of Ae. aegypti

In the Aaa subspecies, the p-value was significant across the 7 loci indicating deviation from HWE, while linkage disequilibrium was identified in 10 out of 21 pairs (47.6%), with inbreeding factor (FIS average) = − 0.077 and moderate to low FST value (0.023). In the Aaf subspecies, HWE demonstrated departure in all loci except A10 and M201, with linkage disequilibrium noted in 7 out of 21 pairs (33%), FST = 0.019 and higher average inbreeding (-0.086) (Table 5). The Wilcoxon sign-rank test and mode shift test revealed no possibilities of recent population bottleneck in all the populations. All loci fit T.P.M., mutation-drift equilibrium, normal L-shaped distribution since the probability (one tail for H excess) is around 1 in all populations (non-significant p-value > 0.05) (Table 3).

Table 5 Summary statistics of microsatellite data in the two subspecies of Ae. aegypti from Sudan.

Molecular variation and differentiation in Ae. aegypti populations

Hierarchical AMOVA was initially performed on the two groups of Aedes aegypti: Ae. aegypti aegypti were from Port Sudan, Tokar, Kassala and Gezira populations while Ae. aegypti formosus were from Kadugli, Nyala, Fasher and Junaynah populations. The variance components in this comparison revealed a high percentage within populations (96.02%) compared with variation among groups (2.23%) (Table 6). Both FCT (diversity between groups) estimate (FCT = 0.0224), and FSC (diversity among populations within a group) value (FSC = 0.018) were significant (p < 0.05).

Table 6 Hierarchical analysis of molecular variance (AMOVA) of the allele frequencies of seven microsatellite loci in two subspecies (groups) of eight Ae. aegypti populations.

The isolation by distance test between all population pairs (Mantel test) was highly significant (p = 0.003) with a moderate relationship (correlation coefficient r = 0.391). Thus the correlation between geographical and genetic distance matrices advocated that landscape features may have some influence on the genetic differentiation (Fig. 3).

Figure 3
figure 3

Unrooted neighbour-joining tree based on DA genetic distance at seven nuclear microsatellites of Ae. aegypti from eight sites in Sudan, numbers at the nodes are percentage bootstrap support from 1000 replicates. The scale bar represents 5% sequence divergence.

The Migration network using divMigrate and based on Nm estimates revealed strong gene flow between Port Sudan, Kassala and Tokar Aaa populations which are geographically located in eastern Sudan, as well as between Nyala and Fasher populations (Aaf populations). The relative migration values were found to be the most between Tokar and Kassala populations and Fasher and Nyala populations (Fig. 4). STRUCTURE analysis was then performed and according to Wright's values, the overall FST = 0.03981 revealed moderate genetic differentiation. Generally, FST values among Aedes aegypti samples across the eight study sites were low (0.00–0.045) (Table 7).

Figure 4
figure 4

Bayesian clustering analysis generated through STRUCTURE and STRUCTURE HARVESTER based on eight microsatellite loci of eight Ae. aegypti populations (reduce analysis of ‘pure’ Group 1) to determine the exact value of K. (a) Results of assignment tests for numbers of clusters K = 2 indicated along the x-axis. (b) Mean (± SD) log posterior probabilities (c) estimate of ΔK for each value of K (putative number of populations. Each vertical line represents one individual, and y coordinates denote each individual's percentage assignment to each of the genetic clusters, represented by a different colour. Numbers from 1–8 are the study sites, 1 Port Sudan, 2 Tokar, 3 Kassala, 4 Barakat/Gezira, 5 Kadugli, 6 Nyala, 7 Fasher and 8 Junaynah.

Table 7 Population pairwise FST of the allele frequencies of seven microsatellite loci in two subspecies (groups) of eight populations of Aedes aegypti from Sudan.

Genetic structure of Ae. aegypti

The unrooted neighbor-joining (NJ) phylogram tree revealed two segregated groups (two main clusters), splitting the localities (Fig. 5). The first group included all three populations of Aaa thus Port Sudan, Tokar and Kassala, while the second group contained all populations of Aaf thus Kadugli, Nyala, Fasher and Junaynah in addition to Gezira whose Aaa clustered in group 2 (Fig. 6).

Figure 5
figure 5

Relationship between pairwise estimates of genetic distance (FST) and geographical distance (km) for Ae. aegypti microsatellite data. Trendline shows the general pattern of increasing genetic distance with greater geographic distance (IBD).

Figure 6
figure 6

Three-dimensional factorial correspondence analysis (FCA) showing multivariate relationships among eight Ae. aegypti populations based on seven microsatellite loci variation. Population 1 Port Sudan, 2 Tokar, 3 Kassala, 4 Barakat/Gezira, 5 Kadugli, 6 Nyala, 7 Fasher, 8 Junaynah.

The Factorial Correspondence Analysis (FCA) plots suggested that Aaa populations from Port Sudan, Tokar and Kassala clustered into one group (which is consistent with the NJ dendrogram tree) while Aaf populations of Fasher and Nyala grouped together and Gezira, Kadugli and Junaynah were also revealed to be one group (Fig. 7).

Figure 7
figure 7

Migration network using divMigrate and based on Nm estimates. Each node represents a population. More gene flow between populations is indicated by the nodes' closeness, and the relative migration values are indicated by the arrows' strong colours. Code for the population names: AED: population from Kadugli, AEJ: population from Junaynah. AEG: population from Barakat/Gezira, AEN: population from Nyala, AEF: population from Fasher, AEP: population from Port Sudan, AEK: population from Kassala, AET: population from Tokar.

The Bayesian cluster analysis using STRUCTURE revealed that when K = 2, the estimate of the Delta K and likelihood of the data (LnP(D)) was largest, implying two genetically separate groups (Fig. 7). After completing the first run of Structure + Structure Harvester, group 1 (red cluster) consisted of Ae. aegypti aegypti populations east of the Nile River, including Port Sudan, Tokar, Kassala, and Gezira, and group 2 (blue cluster) consisted of Ae. aegypti formosus populations Kadugli, Nyala, Fasher, and Junaynah suggesting that there are two main populations (basically, the two subspecies). A similar pattern of population clustering was further substantiated by the DAPC analysis (Fig. 8), where all the populations were seen overlapping, except Junaynah Aaf population. However, Kadugli, Nyala and Fasher (Aaf populations) appeared to be somewhat distant from the other Aaa populations (Fig. 8).

Figure 8
figure 8

Discriminant Analysis of Principal Components (DAPC) for eight Aedes aegypti populations from Sudan using a microsatellite dataset. The graph depicts individuals as dots and groups as inertia ellipses. In the inset, a bar plot of discriminant analysis eigenvalues (DA eigenvalues) is shown. The number of bars represents the number of discriminant functions preserved in the analysis, and the eigenvalues represent the ratio of variance between groups to variation within groups for each discriminant function. 1: Port Sudan; 2: Tokar; 3: Kassala; 4: Barakat/Gezira; 5: Kadugli, 6: Nyala; 7: Fasher; 8: Junaynah.

Discussion

The threat of emerging and re-emerging arboviral infections is quickly increasing over the world, notably in Africa29. In Sudan, arboviral illnesses have become a major public health concern. Yellow fever, dengue fever, and chikungunya epidemics have caused substantial mortality and morbidity in different parts of the country during the last two decades, mainly in Port Sudan and Kassala in the east and Darfur in the west2,5,30,31. Aedes aegypti has been reported in Sudan since 1903 and was described for the first time in Khartoum by Balfour2. It plays a critical role in the spread of the viruses that cause these diseases1,32.

On the African continent, two Ae. aegypti subspecies/forms known as Ae. aegypti aegypti (Aaa) and Ae. aegypti formosus (Aaf) exist and these subspecies have differences in their distribution, behaviour, breeding sites, and virus transmission capacity1,33. In previous studies conducted by1,28, the distribution and genetic diversity of Aedes aegypti subspecies across the Sahelian belt in Sudan using the cytochrome oxidase and NADH dehydrogenase subunit 4 (ND4) mitochondrial gene markers were described. In this study, we used microsatellite markers to investigate the genetic structure and differentiation of populations of the Ae. aegypti subspecies/forms in Sudan.

Overall, genetic diversity of Ae. aegypti estimated in this study was relatively high (NA = 14–37), (AR = 2.8–3.5), (Gd = 0.818–0.915), (HO = 0.878–0.982), (HE = 0.816–0.911) compared to other population structure studies of Ae. aegypti using microsatellites. The high alleles number range in this study reflects the vastly polymorphic nature of the selected microsatellites markers. A recent study of Ref.3 in Aedes aegypti populations in Sudan reported a lower allelic range (7–21) compared to the allelic range reported in this study. However, other research16,34 showed an allelic range closer to our study, thus ranging from 15 to 32 and 5 to 36 alleles, respectively. The allelic richness ranged between 4.16 and 8.67 in the study of Ref.3 which was higher than the richness in our study (AR = 2.8–3.5) and that of our study was higher than that (1.629 -2.945) of Ref.34.

Generally, the populations of Aaa possessed slightly higher FST values (FST = 0.023) compared to Aaf populations (FST = 0.019) which is consistent with the ND4 MtDNA dataset28. However, the CO1 MtDNA genetic diversity revealed contradictory results with no difference between Aaa and Aaf subspecies1. Significant deviations from HWE was revealed in 14 out of 56 tests (25%) and this showed a significant departure from HWE for the two subspecies. All the Aaa subspecies/forms’ populations departed from HWE, while in the Aaf subspecies/forms, there was a departure in all loci except for A10 and M201. A similar pattern was observed in a study in Senegal that detected HWE in 16 out of 56 possible tests and from those, one significant deviation was detected in the Aaf samples and five in Aaa35. A worldwide study discovered that HWE occurred in 42 of 300 populations of Ae. aegypti subspecies/forms populations22.

The inbreeding factor revealed a higher average in Aaa (FIS average = 0.086) compared to Aaf populations (FIS average = 0.077). The average allelic richness showed similarity between the two subspecies populations, which agreed with a study from Gabon and Kenya18. The limited linkage disequilibrium was not consistently observed for any locus pair, thus suggesting that linkage disequilibrium was not the result of physical linkage (co-segregation of alleles at loci on the same chromosome). Instead, significant results could most likely be explained by localised demographic effects.

The isolation by distance revealed a highly significant moderate correlation (p = 0.003, correlation coefficient (r) = 0.391) between the genetic diversity of the microsatellite genes across the whole populations of Ae. aegypti in this study and the geographical distance and this was concordant with1 which used CO1 MtDNA dataset resulting in a significant moderate relationship (correlation coefficient value (r) = 0.586, p = 0.005). Another study in Sudan3 also found correlation of genetic variations with the geographical distance between study sites in east and west of Sudan (R2 = 0.4272, p = 0.01) which strongly supports our finding that the isolation of the subspecies was most probably by distance.

The unrooted neighbor-joining tree clustered the populations of Port Sudan, Tokar and Kassala (Aaa) in a group, while Fasher Aaf population stood alone, Gezira (Aaa) and Kadugli (Aaf) clustered together and Junaynah and Nyala (Aaf) clustered together, with the exception of Gezira, which clustered with the Aaf group, this result is similar to the study of1. Also, the genetic structuring of the two subspecies of Aedes aegypti is in agreement with the recent study of3 which indicated the presence of two genetically distinct subspecies of Ae. aegypti.

The three-dimensional factorial correspondence analysis (FCA) plot demonstrated the genetic grouping among sites, Port Sudan, Tokar and Kassala (Aaa) grouped together, Fasher and Nyala (Aaf) clustered together while Gezira, Kadugli and Junaynah constituted the 3rd group. It is worth noting that members of groups 2 and 3 were geographically related (located in the west and middle parts of the country), while members of group 1 (located in the eastern part of the country) were more diverged (high bootstrap of 95). These results might indicate recent historical gene flow, which could be linked to the geographical distances between the different groups e.g., Port Sudan in group 1 is located only 141 km from Tokar in the same group which is located 1713 km from Junaynah in group 3. The significant relationship noticed in this study in the isolation by distance analysis correlation coefficient (r) = 0.394, p = 0.003 might justify the limited gene flow between the subspecies populations and this has been proven in the migration network which indicated the high gene flow between the geographically closed populations.

AMOVA results of microsatellite genes indicated a high percentage of variance components within populations (96.02%) compared with variation among groups (2.23%). AMOVA results showed higher variation percentages among the two subspecies groups in both mitochondrial genes CO1 and ND4 (39.22% and 26.64% respectively)1,28 with high genetic variations within populations (53.53%) and less among groups. Another study3 indicated that the majority of genetic variation in Aedes aegypti populations from Sudan was among individuals and within regions, with just 5% of the total variation related to variations between groups, which was consistent with this study.

Interestingly, the Bayesian model-based clustering was largely congruent in partitioning the populations into two genetic groups (best structure K = 2) and clearly indicated that the two subspecies/forms populations were structured in two groups. A study conducted in Kenya and Gabon stated comparable conclusions of a first split of all the samples to two clusters (K = 2), however the STRUCTURE results indicate the forms are clearly two, although not totally separated and this split roughly represents the strong genetic differentiation between Aaf and Aaa, as suggested in previous studies17,19. In the case of this study, these two genetically distinct groupings (perhaps linked to the historically documented isolation stated by36 and matched with the geographic dispersion reported in1, might be attributed.

Although the gene flow across subspecies populations appears low, the Migration network revealed that the gene flow within Aaa and Aaf populations seemed to be happening according to their geographical location rather than their forms/subspecies. There is a migration and moderate gene flow between Kadugli (Aaf) and Gezira (Aaa), on the other hand, a strong gene flow was found between the Aaa (Port Sudan, Kassala and Tokar) populations. These findings agreed with the study of1 which indicated the limited gene flow mostly attributed to the geographical distances as well as different ecological environments restricts the flight range of Aedes mosquito and gene flow between the two subspecies populations.

The genetic structure of Ae. aegypti subspecies using the microsatellite markers revealed that the populations of the two subspecies were separated as two groups, especially populations of Aaf which clustered together while using different clustering methods (NJ tree, FCA plotting and STRUCTURE). Despite the fact that mitochondrial genetic variations1,28 revealed low gene flow and high genetic diversity between the two subspecies populations in Sudan, it is difficult to say whether this variation reflects a true difference between the two subspecies or the geographical distances that limited gene flow.

Conversely, a recent study in Gabon and Kenya found little genetic isolation between forest and domestic Ae. aegypti, implying that there may be extensive gene flow between them, while phylogenetic relationships revealed a clear separation between the two sites18. It is likely that gene flow between the two subspecies began lately, with the Aaf invasion into human habitat, where the Aaa already existed. Powell and Tabachnick determined from genetic data that there was complete isolation and absence of gene flow between the two subspecies around 400–550 years ago26.

In this investigation, the hypothesis defined using microsatellite-based estimates of genetic structure found that the two groups were genetically diverse and distinct. Overall, these findings help us better understand the forms of Ae. aegypti in East Africa, where data is scarce. The reality is that various populations have vastly varied vector competencies due to phenotypic variations. The sensitivity of Ae. aegypti aegypti to disease transmission may be connected to insect population migration and/or possible intermingling of individuals from different locations. As a result, population genetic studies require determining the genetics of these populations and investigating the genetic variations linked to vector abilities37.

Our research explored the genetic structure, gene flow and diversity of the two subspecies of Aedes aegypti vector populations in Sudan across different regions. These data can be utilized to track the effectiveness of control measures, changes in gene flow patterns, and new introductions. The vectorial capacity of Ae. aegypti populations and subspecies to spread arboviruses varies greatly33,37,38.

Bearing in mind that the two subspecies differ in their behaviour and potential to transmit disease, their distribution and existence in each arboviral outbreak area in Sudan should be considered when developing any vector control intervention1,26,39. Our findings will be essential to the control program's success if the nation adopts innovative vector control strategies. According to40, a genetic alteration that depends on enduring genetic variation in populations must be specific to the intended population.

Other future studies on vector behaviour, vector competence, breeding habitats, genetic variations and structure in other sites using higher sample sizes and study sites and viral transmission of Ae. aegypti subspecies vectors are recommended in order to improve the surveillance system of Ae. aegypti vector.

Lastly, migrations and mobility caused by humans may promote the long-distance spread of vectors, resulting in the admixture of populations adapted to urban and forest environments, which may have consequences for the management and transmission of disease. The government must designate effective preventive and control measures, increase environmental governance in the areas inhabited by both subspecies in accordance with their vectorial potential and gene flow, and implement mosquito control measures.

Methods

Mosquito sample collection and identification

Samples of Ae. aegypti larvae and pupae were collected (January 2014- April 2017) from both indoor and outdoor breeding habitats from eight study sites (Port Sudan, Tokar, Kassala, Fasher, Nyala, Gezira, Kadugli, and Junaynah) described in1. The study sites were selected according to the past reports of dengue and other arboviruses cases and Aedes aegypti vector records. Mosquito aquatic stages were then transferred to the insectarium at National Public Health Laboratory (NPHL) at Khartoum/Sudan where the samples were sorted out, classified, discarded to trays with water and larvae food1 and reared to adults at optimum temperature (25 ± 2 °C) and relative humidity (80–90%) with a 12:12 (L: D) photoperiod.

Using appropriate taxonomic keys41, the larvae were identified morphologically to their species. After adult emergence, Ae. aegypti females were identified to their subspecies according to the morphological taxonomic key42. The identified female mosquitoes (Aaa and Aaf) were individually preserved in labelled microfuge tubes with 70% isopropanol and then placed in a freezer of − 20 °C. The preserved samples were transferred to the Universiti Sains Malaysia (USM) prior to proceeding with the molecular work.

Genomic DNA extraction

Aedes aegypti samples from each study site (a minimum of 10 individuals per site) were used for extraction. Prior to extraction, the mosquito samples were washed twice using ethanol and distilled water and dried out. Using DNeasy Blood and Tissue Extraction Kit (Qiagen, Germany), genomic DNA was extracted from single female mosquitoes following the manufacturer's instructions with minor adjustments (an increase of the incubation time to 65 °C overnight to increase the lyses of the cells). After extraction, genomic DNA was eluted in nuclease-free water and stored in a freezer of − 20 °C. DNA integrity was assessed and visualized on 0.8% (w/v) agarose gel electrophoresis in 0.5X TBE buffer and the quantity was further assessed using UV spectrophotometer Q3000 (Quawell). Species identification was confirmed as explained by43.

Microsatellite DNA molecular technique

Seven microsatellite markers (A10, B07, H08, G11, M313, M201, and B19) designed by44 which were single-copy microsatellite sequences identified from enriched plasmid libraries and selected cosmid subclones and have proved quite useful in evaluating the population genetics of Ae. aegypti in a number of populations44 were selected.

Singleplex PCRs were performed using a BioRad MyCyclerTM Thermal Cycler (BioRad Laboratories, Inc.). According to the supplier’s (Promega Company, USA) reaction mixture guideline, each 50 μl reaction volume contained 10 μl of 5X Green Buffer GoTaq (Promega), 3 μl of 25 mM MgCl2, 1 μl of 25 mM dNTP, 1 μl of each primer, 0.25 μl of Taq polymerase, 2 μl (> 50 ng) of template DNA and 31.75 μl of double distilled water. Fluorescent (two dyes) primers were used due to the further fragment analysis as shown in Table 8.

Table 8 Characteristics of seven polymorphic microsatellite loci Ae. aegypti.

The PCR cycling conditions were initial denaturation of 94 °C for 5 min, 30 cycles of 94 °C for 1 min, 60 °C annealing temperature for 1 min and extension at 72 °C for 2 min with a 10 min final extension at 72 °C for the marker primers A10, B07, H08 and G11. The cycling conditions of initial denaturation 94 °C for 5 min followed by 39 cycles of 94 °C for 20 s, annealing temperature of 55 °C for 20 s and an extension of 72 °C for 30 s and final extension at 72 °C for 10 min for B19, M313 and M201 marker primers. Primers, annealing temperatures, and their sequences are presented in Table 8.

The PCR products were analyzed in agarose gel electrophoresis of 2% and visualized under ultraviolet light using GelDoc-It® TS 310 UV documentation System (Ultraviolet Products Ltd. Cambridge, UK). Samples with clear bands were sent to NHK Bioscience Solutions Sdn. Bhd for fragment analyses using Applied Biosystems 3730XL DNA Analyzer.

Standard genetic procedures and variability

Peak size of each individual microsatellite allele fragment was identified, analysed, and scored using Peak Scanner v1.0 (Applied Biosystems) with internal size standard (GS500LIZ). Samples were rescored, and amplification procedures (if possible) repeated, whenever PCR irregularities were encountered. Allele peaks in the electrophoretogram were scored according to45. MicroChecker v2.2.346 was used to identify and rectify data irregularities including typographic errors, scoring errors due to dropout of broad alleles or stutter peaks due to low DNA quality and detect and correct microsatellite null alleles.

CONVERT v1.3147 and PGDSpider v2.1.0.348 were used to create summary statistics for microsatellite data (allele frequencies and private allele for each locus in each population) and also used to convert the raw data so that it could be analysed in various software packages47,48.

HWE, LD and FIS estimations

Arlequin v3.5.2.249 was used for genetic variation assessment through measuring mean of both allele numbers (NA) per locus and population and allele size range. The software package FSTAT v2.9.3.250 was used to analyse diversity among sites using allelic richness (AR) (with the rarefaction method to correct for differences in sample size). Using Arlequin v3.5.2.249, we estimated observed (HO) and expected (HE) heterozygosity per locus and population, as well as mean genetic heterozygosity across all loci. Using the same program, the deviation from Hardy–Weinberg Equilibrium (HWE) was calculated based on exact testing with 10,000 Markov chain stages and 5000 dememorization steps. The likelihood ratio test of linkage disequilibrium based on the Expectation–Maximization (EM) algorithm51 was performed on all pairwise locus comparisons for all sites in Arlequin v3.5.2.249 with 10,000 permutations to test for the presence of significant association between alleles among loci pairs. With 10,000 permutations, an exact test was performed to look for statistically significant deviations from independent segregation of genotypes linkage equilibrium [linkage disequilibrium (LD)], followed by the false discovery rate (FDR) adjustment52 at the 9% significance level. The inbreeding coefficient (FIS) was also estimated using the software program FSTAT Version 2.9.3.250, with a value ranging from − 1 (no inbreeding) to + 1 (high inbreeding) (total identical).

In order to determine occurrence of recent effective population size reduction, BOTTLENECK v1.2.0253,54 was used to perform Wilcoxon sign-rank test and mode shift test (distortion of the typical L- shape distribution). Wilcoxon's test was run using the two-phased mutation model (TPM)53,55 setting the proportion of Stepwise mutation model (SMM) in respect to TPM to 95% and the variance to 12. A total of 5000 simulation iterations was conducted, as suggested by54. This included 95 percent single stepwise mutation and 5% infinite allele mutation with statistical significance determined using 1000 simulations.

Genetic structure and variations

An unrooted Neighbour-Joining phylogenetic tree was created with POPTREE256 using Nei's genetic distance (DA)57 and 1000 bootstrap replications to determine the confidence level of each node to visualize the relationships among sites58. Pairwise genetic divergence values between populations were estimated in Arlequin v3.5.2.249 using FST (proportion of the total genetic variance contained in a subpopulation (the S subscript) relative to the total genetic variance (the T subscript) values. The possible values are 0 to 1. A high FST suggests that populations differ significantly from one another, with statistical significance based on 10,000 permutations. Different hierarchical Analyses of Molecular Variance (AMOVA) to evaluate the relative attribution of variance among populations, among individuals within populations, and within individuals with 1000 random permutations was used to perform hierarchical variation structuring in Arlequin v3.5.2.249.

The Mantel correlation coefficient (r) between matrices of genetic (FST) and geographic distance was calculated using Arlequin v3.5.2.249 with 10,000 random permutations to see if genetic relationships among sampling areas conformed to a pattern of genetic isolation by distance (IBD). Microsoft Excel was used to create isolation by distance charts (km).

The Factorial Correspondence Analysis (FCA) was done as a complementary approach to a univariate test like FST since multilocus population genetic data are multivariate in nature59. It was employed to assess population subdivision on pairwise genetic distance among 202 individuals from eight Ae. aegypti populations.

GENETIX version 4.0560 was used to perform FCA based on genotypic data obtained for individuals from the populations. Correction for multiple testing for HWE, LD, FIS and Wilcoxon’s test was performed using the FDR approach as described in Benjamini & Hochberg (1995) at the 95% confidence level. Additionally, a clustering analysis was performed using Discriminant Analysis on Principal Components (DAPC) from Adegenet (Jombart, 2008). Furthermore, divMigrate (https://popgen.shinyapps.io/divMigrate-online/) was used to construct a network representing the relative rate and direction of migration among populations61, with Nm as the measure of genetic distance. The significance of the Nm values was determined by performing 1000 bootstraps with = 0.05.

The hierarchical variations could be attributed to differences between groups; according to the subspecies populations, the subspecies identified in Ref.28 as well as clustering according to K = 2 from STRUCTURE were estimated. Three hierarchical levels of variation were tested for each run, among groups within total (FCT), among populations within groups (FSC) and among populations within total (FST).

Following that, two distinct clustering methods were employed to identify groups of genetically related individuals and sampling locations, as well as to assess their spatial distribution. First, individuals were assigned to clusters using a Bayesian model-based clustering approach performed in STRUCTURE v2.3.362. The Bayesian clustering methodology employed in Structure 2.3.4 offered a comparative assessment of population structure62,63. The number of clusters (K) was calculated using the web software Structure Harvester, as reported by Ref.64. Using the online software Structure Harvester, after performing 15 independent runs of K = 1 to 8 at 10,000 Markov Chain Monte-Carlo (MCMC) repetitions and a burn‐in period of 1000 iterations, Admixture model and correlated allele frequencies were utilized, together with a uniform prior for α, with an initial value of 1 and maximum of 10.0; λ was set at 1.0. For the selected value of K, we assessed the membership coefficients per individual per cluster (Q), setting the assignment threshold to Q > 0.80. Using STRUCTURE Harvester v0.6.94, the best number of clusters was illustrated by plotting the average estimated LnP(D) (Ln probability of the data) and the K technique of Ref.64.

Conclusion

While understanding the genetic variety and composition of disease vectors is crucial for managing them, this information is frequently inadequate. The two groupings that resulted from the analysis of the populations were suggested to be two genetically different groups (subspecies). Geographical distances and genetic variation showed a moderate to strong significant association. Subspecies populations appear to migrate and exchange genes based on their geographic proximity. In Sudan and other African nations, when it comes to the spread of dengue disease, chikungunya, yellow fever, and other arboviruses, research is required to comprehend the ecological factors that influence the distribution and transmission capacity of the two subspecies and to create effective viral control initiatives.