Identification and characterization of simple sequence repeats in the genomes of Shigella species☆
Introduction
Gram-negative, facultative anaerobes of the genus Shigella are the principal cause of bacillary dysentery. Shigella organisms invade and replicate in cells lining the colon and rectum, cause mucosal ulceration, characterized by lower abdominal cramps, tenesmus, with abundant blood and pus in the stool of patients (Sansonetti, 2001). In China, more than a million cases of shigellosis have been reported annually (Mei et al., 1989). The genus is divided into four serogroups with multiple serotypes: A (Shigella dysenteriae, 12 serotypes); B (Shigella flexneri, 6 serotypes); C (Shigella boydii, 18 serotypes) and D (Shigella sonnei, 1 serotype) (Hale, 1991). Rapid detection of Shigella strains has important clinical and epidemiological significance.
Simple sequence repeats (SSRs) or microsatellites are the DNA regions where one or more bases tandemly repeat up to dozens of times. SSRs have been extensively studied in eukaryote genomes and are now well-established targets for pedigree analysis (Jeffreys et al., 1986). A subset of SSRs, namely trinucleotide repeats (TNR), is responsible for many genetic disorders due to abnormal expansion of polymorphic trinucleotide repeats (Sasaki et al., 1996). In prokaryotes, enterobacterial repetitive intergenic consensus (ERIC) has been used in a phylogenetic and taxonomical analysis as well as in determining species composition of mixed bifidobacterial cultures (Ventura et al., 2003), and repetitive extragenic palindromic (REP) has been applied for detecting vaginal colonization of Lactobacillus crispatus (Antonio and Hillier, 2003). Furthermore, SSRs are informative markers for the identification of pathogenic bacteria, and may serve as indicators for the adaptation of pathogens in vivo and ex vivo environments Keim et al., 2000, van Belkum et al., 1997b.
Occurrences of SSRs in genome sequences give a snapshot of in vivo accumulated repeats and reflect basal level of SSR dynamics in a genome. Moreover, variability in the number of repeat units at a given genomic site, i.e. the sequence heterogeneity, among individual strains can be used to assess intra-species diversity.
In this study we screened the entire genome sequence of S. flexneri 2a for the presence and composition of SSRs. We then used locus-specific PCR primers to analyze the allelic polymorphisms of 17 SSR loci from different serotypes of Shigella. This revealed striking differences between genomic regions, and different degrees of variability in four Shigella serogroups. Interestingly, one hypervariable coding-region SSR locus, C5-1, can affect either translation or tertiary structure of the gene product.
Section snippets
Computerized search for SSRs in S. flexneri genome
The whole genome sequence of S. flexneri 2a (GenBank accession number AE005674) was analyzed for potential SSRs with the Tandem Repeat Finder (Benson, 1999, http://tandem.biomath.mssm.edu/trf.html). Some Perl scripts were developed to make the statistical analysis for the relationship of SSRs and ORFs from the GenBank (ftp://ftp.chgb.org.cn/pub/). The minimal repeat number was empirically chosen as three in order to avoid stochastic occurrences.
Shigella strains, growth conditions and DNA purification
The S. flexneri 2a strain Sf301 was originally
Screening SSRs in S. flexneri 2a genome
By using a computer-based screen of the newly determined genome sequence of Sf301 (Jin et al., 2002), we found huge numbers of SSRs scattered throughout the genome (Table 1). The total number of SSRs, with unit length of 1–9 bp, is 227,867 and 12,004 for chromosome and plasmid, respectively, which comprised 17.5% of the chromosome, and 19.8% of the plasmid. There are no SSRs consisting of seven or eight nucleotides in the chromosome, and no plasmid borne SSRs consisting of a unit length that
Genomic distribution of SSRs
Among the SSRs in the S. flexneri genome, the mono- and dinucleotide SSRs are most common. In the chromosome, these repeats decrease in the coding regions as the repeat numbers increase, and in the virulence plasmid, such a pattern does not apply (Table 1). The trinucleotide SSRs are predominant in coding regions of both the chromosome and the plasmid regardless of the number of repeats. The hexanucleotide SSRs are only present in the chromosome and predominantly in the coding regions. In some
Acknowledgements
We thank Moqing Liu, Hong Liu and Fan Yang for their technical assistance, and Peadar O'Gaora and Kathy Smollett for their critical reading of the manuscript. This work is supported by the Chinese Academy of Sciences (grant no. KSCX2-2-07), and the National High Technology Development Program of China (grant no. 2002AA231031).
References (24)
- et al.
Adaptive evolution of highly mutable loci in pathogenic bacteria
Curr. Biol.
(1994) - et al.
DNA fingerprinting of Lactobacillus crispatus strain CTV-05 by repetitive element sequence-based PCR analysis in a pilot study of vaginal colonization
J. Clin. Microbiol.
(2003) Tandem repeats finder: a program to analyze DNA sequences
Nucleic Acids Res.
(1999)- et al.
Microsatellite polymorphism in the promoter sequence of the elongation factor 3 gene of Candida albicans as the basis for a typing system
J. Clin. Microbiol.
(1997) - et al.
Characterization of the hca cluster encoding the dioxygenolytic pathway for initial catabolism of 3-phenylpropionic acid in Escherichia coli K-12
J. Bacteriol.
(1998) - et al.
SWISS-MODEL and the Swiss-PdbViewer: an environment for comparative protein modelling
Electrophoresis
(1997) Genetic basis of virulence in Shigella species
Microbiol. Rev.
(1991)- et al.
Molecular switches—the ON and OFF of bacterial phase variation
Mol. Microbiol.
(1999) - et al.
DNA repeats identify novel virulence genes in Haemophilus influenzae
Proc. Natl. Acad. Sci. U. S. A.
(1996) - et al.
DNA fingerprints and analysis of multiple markers in human pedigrees
Am. J. Hum. Genet.
(1986)
Genome sequence of Shigella flexneri 2a: insights into pathogenicity through comparison with genomes of Escherichia coli K12 and O157
Nucleic Acids Res.
Multiple-locus variable-number tandem repeat analysis reveals genetic relationships within bacillus anthracis
J. Bacteriol.
Cited by (16)
Modulation of gene expression by microsatellites in microbes
2018, New and Future Developments in Microbial Biotechnology and Bioengineering: Microbial Genes Biochemistry and ApplicationsSingle nucleotide polymorphism (SNP)-based differentiation of Shigella isolates by pyrosequencing
2011, Infection, Genetics and EvolutionCitation Excerpt :These methods, while simple and inexpensive, retained relatively low discriminatory indices especially when strain-specific identification was required. More recently, DNA fingerprinting methods, including ribotyping, mutilocus variable-number tandem repeat analysis, and pulsed-field gel electrophoresis, have had greater success in differentiating Shigella isolates to more epidemiologically meaningful levels (Liang et al., 2007; Talukder et al., 2003; Wang et al., 2009; Yang et al., 2003). Sequence-based subtyping methods, however, represent important tools for typing closely related pathogenic strains and recently have been integrated into next-generation sequencing strategies for differentiating otherwise clonal outbreak pathogen swarms from food and clinical settings (Harris et al., 2010; Lienau et al., 2011).
A new set of molecular markers for the genotyping of Babesia bovis isolates
2009, Veterinary ParasitologyEvaluation of the impact of Shigella virulence genes on the basis of clinical features observed in patients with shigellosis
2022, Journal of Infection in Developing CountriesGenome-wide characterization of simple sequence repeats in Palmae genomes
2020, Genes and Genomics
- ☆
All of the nucleotide sequences reported in this study have been submitted to the GenBank data library under accession numbers AY282807–AY282808AY282809AY282810AY282811AY282812AY282813AY282814AY282815AY282816AY282817AY282818AY282819AY282820AY282821AY282822AY282823AY282824AY282825AY282826AY282827AY282828AY282829AY282830AY282831AY282832AY282833AY282834AY282835AY282836AY282837AY282838AY282839AY282840AY282841AY282842AY282843AY282844AY282845AY282846AY282847AY282848AY282849AY282850AY282851AY282852AY282853AY282854AY282855AY282856AY282857AY282858AY282859AY282860AY282861AY282862AY282863AY282864AY282865AY282866AY282867AY282868AY282869AY282870AY282871AY282872AY282873AY282874AY282875AY282876AY282877AY282878AY282879AY282880AY282881AY282882AY282883AY282884AY282885AY282886AY282887AY282888AY282889AY282890AY282891AY282892AY282893AY282894AY282895AY282896AY282897AY282898AY282899AY282900AY282901AY282902AY282903AY282904AY282905AY282906AY282907AY282908AY282909AY282910AY282911AY282912AY282913AY282914AY282915AY282916AY282917AY282918AY282919AY282920AY282921.
- 1
Present address: 1 Bungtown Road, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA.