ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Research Article

Architecture of the superintegron in Vibrio cholerae: identification of core and unique genes

[version 1; peer review: 2 approved, 1 approved with reservations]
PUBLISHED 27 Feb 2013
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

Abstract

Background: Vibrio cholerae, the etiologic agent of cholera, is indigenous to aquatic environments. The V. cholerae genome consists of two chromosomes; the smallest of these harbors a large gene capture and excision system called the superintegron (SI), of ~120 kbp. The flexible nature of the SI that results from gene cassette capture, deletion and rearrangement is thought to make it a hotspot of V. cholerae diversity, but beyond the basic structure it is not clear if there is a core genome in the SI and if so how it is structured. The aim of this study was to explore the core genome structure and the differences in gene content among strains of V. cholerae.
Methods: From the complete genomes of seven V. cholerae and one Vibrio mimicus representative strains, we recovered the SI sequences based on the locations of the structural gene IntI4 and the V. cholerae repeats. Analysis of the pangenome, including cluster analysis of functional genes, pangenome profile analysis, genetic variation analysis of functional genes, strain evolution analysis and function enrichment analysis of gene clusters, was performed using a pangenome analysis pipeline in addition to the R scripts, splitsTree4 and genoPlotR.
Results and conclusions: Here, we reveal the genetic architecture of the V. cholerae SI. It contains eight core genes when V. mimicus is included and 21 core genes when only V. cholerae strains are considered; many of them are present in several copies. The V. cholerae SI has an open pangenome, which means that V. cholerae may be able to import new gene cassettes to SI. The set of dispensable SI genes is influenced by the niche and type species. The core genes are distributed along the SI, apparently without a position effect.

Keywords

Vibrio cholerae, superintegron, core genes, unique genes

Introduction

Vibrio cholerae is a diverse, environmental, gram-negative bacterial species that can be pathogenic and can cause cholera, a severe diarrheal disease that occurs most frequently in epidemic form1,2. The V. cholerae genome consists of two chromosomes. The largest chromosome of 2.96 Mbp encodes most essential genes. The 1.07 Mbp small chromosome contains few essential genes and the superintegron (SI), a large gene capture and excision system of ~120 kbp2 (Figure 1). The SI is characterized by a site-specific integrase gene (IntI4) closely associated with a cognate recombination site attI and a promoter Pc followed by a large array of gene cassettes. Within the SI, the gene cassettes generally consist of a promoterless open reading frame (ORF) flanked by two recombination sites termed V. cholerae repeats (VCRs)3. Cassettes can be excised from any position in the array through VCR × VCR recombination mediated by the integrase. The resulting circular intermediate can then be integrated, preferentially through attI × VCR recombination by the integrase, bringing the cassette under control of Pc4,5. Since gene cassettes are usually promoterless, only the first few cassettes are expressed by Pc and the rest of the array can be seen as a reservoir of standing genetic variation5.

2ee9b1a5-020e-4bde-bd25-d1de78d31e04_figure1.gif

Figure 1. Schematic organization of the Vibrio cholerae genome and the superintegron (SI).

The functional platform of the SI consists of an integrase gene, a cassette promoter (Pc), and a primary recombination site (attI). The system maintains an array of several cassettes, which generally consist of a promoterless ORF flanked by two recombination sites termed VCR (V. cholerae repeats).

The functions of the majority of the SI genes are unknown; however, a few genes have been characterized and it has been suggested that they are involved in adaptive functions such as toxin-antitoxin (TA) loci. TA loci consist of two genes in an operon encoding a ‘toxin’ and an ‘antitoxin’. The expression of the toxins reduces cell growth and prevents colony formation, thus exerting a bacteriostatic rather than bacteriocidal condition. However, cell viability can be rescued by later overproduction of the cognate antitoxins6.

The pangenome describes the complete repertoire of genes in a bacterial species, which includes the "core genome" containing genes present in all strains, a "dispensable genome" containing genes present in two or more strains, and "unique genes" specific to single strains7. Previous phylogeographic analysis, considering V. cholerae strains and its sister species Vibrio metecus8, showed that, in contrast to the core genome, the SI displays strong geographical differentiation, and cassettes from the V. cholerae group cluster with those of V. metecus from the same place rather than with cassettes from geographically distinct V. cholerae. It suggested that SI structure is influenced by geographic boundaries and in response to environmental conditions. The flexible nature of the SI that results from gene cassette capture, deletion and rearrangement is thought to make it a hotspot of V. cholerae diversity, but beyond the basic structure it is not clear if there is a core genome in the SI and if so how it is structured. The aim of this work was to explore the core genome structure and the differential gene content among strains of V. cholerae.

Methodology

Based on the complete genomes of seven V. cholerae and one V. mimicus representative strains (Table 1), we searched repeats above 10 nucleotides and used one VCR sequence (AAC AAA CGC CTC AAG AGG GAC TGT CAA CGC GTG GCG TTT CCA GTC CCA TTG AGC CGT GGT GGT TTC GGT TGT TGT GTT TGA GTT TAG TGT TAT GCG TTG TCA GCC CCT TAG GCG GGC G) to search for sequences with more of 45% nucleotide identity. The SI sequences were recovered using the locations of the structural gene IntI4 and VCRs identified with the UGENE software9. Cluster analysis of functional genes was performed using the pangenome analysis pipeline10, which searches for homologs or orthologs among multiple genomes using the MultiParanoid (MP) method (based on a 90% nucleotide identity threshold). For each pair of genes in the same cluster, the local matched region is no less than 25% of the longer protein coding sequence and the global matched region is no less than 50% of the longer protein coding sequence. The minimum score value and E-value in BLAST are 50 and 1e-810. The gene content was converted to a presence/absence (0/1) matrix and then the core, dispensable and unique genes were identified by in-house R scripts. The phylogenetic tree based in gene content and split network for gene content were constructed with SplitsTree411 using the GeneContentDistance method12. The SI structure and comparison of seven V. cholerae and, their sister species, V. mimicus were performed using genoPlotR13.

Table 1. Superintegron regions extracted from V. cholerae and V. mimicus genomes.

OrganismSerogroup/
Biotype
Geographical
origin
Source of
isolation
Year of
isolation
StartEnd*Size
(bp)
G+C
(%)
ORFsLocus IntI4Accession in NCBI
V. cholerae
N16961
O1 El TorBangladeshClinical197530975043541812566942.20166VCA0291NC_002506
V. cholerae
2010EL1786
O1 El TorHaitiClinical2010361951356589946442.08138Vch1786_II0037NC_016446
V. cholerae
MJ-1236
O1 El TorMatlab,
Bangladesh
Clinical1994931735105059611886241.46135VCD_000984NC_012667
V. cholerae
O395
O1
Classical
IndiaClinical196579982791635011652441.35175VCO395_0938NC_009456
V. cholerae
LMA3984
O1Para, BrazilEnvironmental20072944283328473842042.7047VCLMA_B0259NC_017269
V. cholerae
M66-2
O1IndonesiaClinical19373109494094339848542.15133VCM66_A0290NC_012580
V. cholerae
IEC224
O1Para, BrazilClinical1990s30971743523712552142.21167O3Y_14823NC_016945
V. mimicus
MB-451
NDMatlab,
Bangladesh
ClinicalND74487087290512803641.39115VII_000636NZ_ADAF01000002

*Nucleotide position on the chromosome. ND, not determined.

Results and discussion

SI regions were extracted from the seven V. cholerae and one V. mimicus genomes (Table 1). The 1285 genes recovered were clustered and a total of 408 clusters were detected (Figure 2A; Table S1). The pangenome of the SI of Vibrio strains evaluated was 408 genes, of which eight correspond to core genes, 196 are distributed or dispensable genes and 204 are unique genes. Six of the eight core genes are present in many copies (Table 2). The pangenome profile analysis shows that the cluster numbers of core genome are almost the same, when the SI considered reaches nine, while the pangenome is still increasing (Figure 2A). We infer that the V. cholerae SI has an open pangenome, which means that V. cholerae may have the ability to import new SI gene cassettes, which affect its plasticity and diversity. On the other hand, the set of SIs, from clinical and environmental lineages, used in this study are apparently representative of this species because allowed to establish that the core genome is close to being completed.

2ee9b1a5-020e-4bde-bd25-d1de78d31e04_figure2.gif

Figure 2.

(A) Pangenome plot of the SI region considering seven V. cholerae and one V. mimicus genomes. 1285 total genes, 408 pangenome clusters and eight core clusters were identified. (B) Word clouds of cluster function enrichment comparison according to clusters of orthologous groups (COG) for whole and core clusters identified are shown at the top and bottom, respectively. Clusters that are not assigned in the COG classification were excluded from the figure.

Table 2. Core genes of the V. cholerae SI.

The table shows the clusters, conservation level between genomes, the functional categories, gene description and the corresponding locus tag in the reference N16961 genome.

ClusterIDConservation
level
COG*DescriptionLocus_tag in N16961
18-hypothetical proteinVCA0407,VCA0353,VCA0336,
VCA0297,VCA0302
28COG0456RacetyltransferaseVCA0470
38-lipoproteinVCA0425,VCA0414
48-hypothetical proteinVCA0381,VCA0435,VCA0357,
VCA0306
58-hypothetical proteinVCA0434,VCA0411
78COG4974Lsite-specific
recombinase IntI4
VCA0291
88-relB proteinVCA0349,VCA0504
98COG1670JacetyltransferaseVCA0505,VCA0436,VCA0417,
VCA0316
247COG0110RacetyltransferaseVCA0473
257COG3668Rplasmid stabilization
element ParE
VCA0359
277COG2944Kvirulence gene
repressor RsaL
VCA0469
317-hypothetical proteinVCA0497
327COG1694RmazG-related proteinVCA0485
337-cytotoxic translational
repressor of toxin-
antitoxin stability system
VCA0468
347COG0346Eglyoxalase/bleomycin
resistance protein
VCA0506,VCA0347
357-hypothetical proteinVCA0486
377COG2161Dantitoxin of toxin-
antitoxin stability system
VCA0477
407COG0456RGCN5-related
N-acetyltransferase
VCA0382
417COG1943LIS1004 transposaseVCA0493
437COG3668Rplasmid stabilization
system protein
VCA0489
447COG3636Khypothetical proteinVCA0498

*COG: Cluster of Orthologous Groups; "-" depicts no COG assignation.

Function enrichment analysis of gene clusters were performed according to description of gene annotation (File S1) supplied to the pangenome analysis pipeline10. From the 408 clusters, 329 were unclassified by the function enrichment analysis. Following the categorization of Cluster of Orthologous Groups (COG), the characterized clusters were rich in the following categories: translation, ribosomal structure and biogenesis, transcription, replication, recombination and repair, cell cycle control, cell division, chromosome partitioning, defense mechanisms, cell wall/membrane/envelope biogenesis and posttranslational modification, protein turnover, chaperones, amino acid transport and metabolism, nucleotide transport and metabolism, lipid transport and metabolism, secondary metabolites biosynthesis, transport and catabolism (Figure 2B).

In the SI, random excisions occur throughout the cassette array to form nonreplicative circular intermediates containing one or several cassettes; integration events preferentially occur at the attI site5 and are subjected to selection. It is expected that SI core genes would be arranged and stay together; however, we found the core genes are distributed along the SI (Figure 3), apparently without any position effect.

We identified 204 unique genes, 94 belonging to V. mimicus MB451, nine to LMA3984, 45 to O395, nine to 2010EL1786, 14 to MJ1236, seven to IEC224, 20 to M66, and six to N16961 (Figure 3; Table S1). Considering only the V. cholerae SI, there are 21 core genes, most of them present in many copies and rich in the transcription, replication, recombination and repair, translation, ribosomal structure and biogenesis categories.

2ee9b1a5-020e-4bde-bd25-d1de78d31e04_figure3.gif

Figure 3. Superintegron (SI) structure and comparison of seven strains of V. cholerae and a strain of V. mimicus.

The core, dispensable and unique genes are indicated by red, cream and blue arrows, respectively. Vertical blocks between sequences indicate regions with more than 1 kb of shared similarity shaded according to BLASTn. A phylogenetic tree based on gene content of the SI is shown on the left.

Pandey and Gerdes14 identified 13 TA loci within the SI of the N16961 strain. Here we identified six TA genes as part of core SI genes (Table 2), of which the relB genes (VCA0349 and VCA0504) were present in all V. cholerae strains (including V. mimicus) SIs. The parE (VCA0359), relB (VCA0477) and relE (VCA0489) genes were present in all V. cholerae SIs. Moreover, we also identified two higBA loci (VCA0469 and VCA468), which encode mRNA cleaving enzymes and can stabilize plasmids6, as well as SI genes. The previous authors14 also identified higBA-1 TA loci (VCA0392 and VCA0391); in our results, these two TA loci are present in all clinical V. cholerae strains (Table S1). These results suggest that V. cholerae TA loci function as essential stress response elements that help cells survive6, as well as act to stabilize the massive arrays of SI cassettes, as reported previously15.

A previous study suggested that SI structure is influenced by geographic boundaries in response to environmental conditions8. Here, we found that the clinical nature of the V. cholerae and V. mimicus strains evaluated were not grouped together by the analyses performed. Therefore, the ability of V. cholerae to cause disease must be explained by other virulence factors found outside the SI region.

There are 199 clusters involved with indel or mutation events (Table S2). As for the non-synonymous/synonymous substitution (dN/dS) ratio, we found that 30 clusters were suffering positive selection pressure (dN/dS > 1). At the same time, we could also select those variable clusters as the markers for different strains. Based on pangenome profiles and single nucleotide polymorphism (SNP) information, gene content and phylogenetic trees were constructed (Figure 4). The SNP information from SI was useful for separating V. cholerae from V. mimicus, but nevertheless lacked the resolution to distinguish between the different lineages of V. cholerae. However, using gene content information (Figure 4), a good resolution was reached that was coherent with the evolution of the species and the environmental or clinical nature of the strains. These results indicate that the evolution of V. cholerae into different lineages is reflected in the diversity of the SI, which would be also influenced by horizontal gene transfer in these region, as proposed elsewhere8,16,17.

2ee9b1a5-020e-4bde-bd25-d1de78d31e04_figure4.gif

Figure 4.

Top: Phylogenetic trees for the V. cholerae SI based on SNPs constructed by the Maximum Likelihood (A), Neighbor-Joining (B) and UPGMA (C) methods. The numbers indicate the bootstrap values. Bottom: Split network for gene content based on the 408 genes in seven V. cholerae and one V. mimicus genomes. The network was constructed with SplitsTree4 using the GeneContentDistance method12.

Conclusions

In this study, we have revealed the genetic architecture of the V. cholerae SI, which contains eight core genes, many of them present in many copies. The V. cholerae SI has an open pangenome, which means that V. cholerae may have the ability to import new gene cassettes into the SI. The set of the dispensable SI gene cassettes is influenced by the niche and type species. The core genes are distributed along the SI, apparently without a position effect.

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 27 Feb 2013
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Marin MA and Vicente ACP. Architecture of the superintegron in Vibrio cholerae: identification of core and unique genes [version 1; peer review: 2 approved, 1 approved with reservations] F1000Research 2013, 2:63 (https://doi.org/10.12688/f1000research.2-63.v1)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 1
VERSION 1
PUBLISHED 27 Feb 2013
Views
15
Cite
Reviewer Report 14 Apr 2014
Nur Hasan, Maryland Pathogen Research Institute, University of Maryland, College Park, MD, USA 
Approved
VIEWS 15
In this manuscript Marin and Vicente investigated the genomic diversity of V. cholerae Super Integron (SI) with the aim to identify a set of orthologous genes that are conserved and unique among and in between V. cholerae and V. mimicus ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Hasan N. Reviewer Report For: Architecture of the superintegron in Vibrio cholerae: identification of core and unique genes [version 1; peer review: 2 approved, 1 approved with reservations]. F1000Research 2013, 2:63 (https://doi.org/10.5256/f1000research.1158.r4479)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Views
20
Cite
Reviewer Report 14 Mar 2014
Yan Boucher, Department of Biological Sciences, University of Alberta, Edmonton, AB, Canada 
Approved with Reservations
VIEWS 20
The article, although it has a very narrow focus, investigates an interesting question about integron regions in V. cholerae that scientists have so far mostly applied to whole genome sequences: what is their pan-genome and core genome? The scope of ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Boucher Y. Reviewer Report For: Architecture of the superintegron in Vibrio cholerae: identification of core and unique genes [version 1; peer review: 2 approved, 1 approved with reservations]. F1000Research 2013, 2:63 (https://doi.org/10.5256/f1000research.1158.r3764)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Views
19
Cite
Reviewer Report 09 May 2013
Thandavarayan Ramamurthy, National Institute of Cholera and Enteric Diseases, Kolkata, India 
Approved
VIEWS 19
In this manuscript the authors have explored the pattern of the superintegrons (SIs) in V. cholerae using the published DNA sequences of relevant strains. This study has shown the dynamic nature of V. cholerae O1 and genetic relatedness of SIs ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Ramamurthy T. Reviewer Report For: Architecture of the superintegron in Vibrio cholerae: identification of core and unique genes [version 1; peer review: 2 approved, 1 approved with reservations]. F1000Research 2013, 2:63 (https://doi.org/10.5256/f1000research.1158.r937)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 27 Feb 2013
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.