Elsevier

Gene

Volume 322, 11 December 2003, Pages 85-92
Gene

Identification and characterization of simple sequence repeats in the genomes of Shigella species

https://doi.org/10.1016/j.gene.2003.09.017Get rights and content

Abstract

A variety of simple sequence repeats (SSRs) have been identified in the genome of Shigella flexneri serotype 2a (strain Sf301), an enteric pathogen that causes bacillary dysentery in man. The distribution of SSRs, with unit length ranging from 1 to 9 nucleotides, was biased in different regions of the genome. The tri-, tetra- and hexanucleotide SSRs prevailed in the coding regions while the mono- and dinucleotide SSRs were more common in the noncoding regions. Many intergenic SSRs are less than 30 bp away from the downstream open reading frames (ORFs), suggesting a potential role in transcriptional regulation. To study polymorphism of SSRs, we compared 17 coding-region SSRs from strain Sf301 with the corresponding sequences from 23 other strains of four Shigella species. Five chromosomal loci were found to be polymorphic, of which those from S. flexneri strains were most variable. Particularly interesting is the C5-1 locus in the coding sequence of the hcaD gene encoding a subunit of ferredoxin reductase. Depending on the insertion of variable numbers of the unit sequence (CGCAG), the Shigella hcaD genes can encode truncated products due to premature stop codons or frame shifts, or products with extended core alpha helices that leads to radical alterations in the predicted tertiary structure. Hence, SSRs may serve as genotyping markers for epidemiological investigations, and may offer insights into evolutionary adaptation of the pathogens.

Introduction

Gram-negative, facultative anaerobes of the genus Shigella are the principal cause of bacillary dysentery. Shigella organisms invade and replicate in cells lining the colon and rectum, cause mucosal ulceration, characterized by lower abdominal cramps, tenesmus, with abundant blood and pus in the stool of patients (Sansonetti, 2001). In China, more than a million cases of shigellosis have been reported annually (Mei et al., 1989). The genus is divided into four serogroups with multiple serotypes: A (Shigella dysenteriae, 12 serotypes); B (Shigella flexneri, 6 serotypes); C (Shigella boydii, 18 serotypes) and D (Shigella sonnei, 1 serotype) (Hale, 1991). Rapid detection of Shigella strains has important clinical and epidemiological significance.

Simple sequence repeats (SSRs) or microsatellites are the DNA regions where one or more bases tandemly repeat up to dozens of times. SSRs have been extensively studied in eukaryote genomes and are now well-established targets for pedigree analysis (Jeffreys et al., 1986). A subset of SSRs, namely trinucleotide repeats (TNR), is responsible for many genetic disorders due to abnormal expansion of polymorphic trinucleotide repeats (Sasaki et al., 1996). In prokaryotes, enterobacterial repetitive intergenic consensus (ERIC) has been used in a phylogenetic and taxonomical analysis as well as in determining species composition of mixed bifidobacterial cultures (Ventura et al., 2003), and repetitive extragenic palindromic (REP) has been applied for detecting vaginal colonization of Lactobacillus crispatus (Antonio and Hillier, 2003). Furthermore, SSRs are informative markers for the identification of pathogenic bacteria, and may serve as indicators for the adaptation of pathogens in vivo and ex vivo environments Keim et al., 2000, van Belkum et al., 1997b.

Occurrences of SSRs in genome sequences give a snapshot of in vivo accumulated repeats and reflect basal level of SSR dynamics in a genome. Moreover, variability in the number of repeat units at a given genomic site, i.e. the sequence heterogeneity, among individual strains can be used to assess intra-species diversity.

In this study we screened the entire genome sequence of S. flexneri 2a for the presence and composition of SSRs. We then used locus-specific PCR primers to analyze the allelic polymorphisms of 17 SSR loci from different serotypes of Shigella. This revealed striking differences between genomic regions, and different degrees of variability in four Shigella serogroups. Interestingly, one hypervariable coding-region SSR locus, C5-1, can affect either translation or tertiary structure of the gene product.

Section snippets

Computerized search for SSRs in S. flexneri genome

The whole genome sequence of S. flexneri 2a (GenBank accession number AE005674) was analyzed for potential SSRs with the Tandem Repeat Finder (Benson, 1999, http://tandem.biomath.mssm.edu/trf.html). Some Perl scripts were developed to make the statistical analysis for the relationship of SSRs and ORFs from the GenBank (ftp://ftp.chgb.org.cn/pub/). The minimal repeat number was empirically chosen as three in order to avoid stochastic occurrences.

Shigella strains, growth conditions and DNA purification

The S. flexneri 2a strain Sf301 was originally

Screening SSRs in S. flexneri 2a genome

By using a computer-based screen of the newly determined genome sequence of Sf301 (Jin et al., 2002), we found huge numbers of SSRs scattered throughout the genome (Table 1). The total number of SSRs, with unit length of 1–9 bp, is 227,867 and 12,004 for chromosome and plasmid, respectively, which comprised 17.5% of the chromosome, and 19.8% of the plasmid. There are no SSRs consisting of seven or eight nucleotides in the chromosome, and no plasmid borne SSRs consisting of a unit length that

Genomic distribution of SSRs

Among the SSRs in the S. flexneri genome, the mono- and dinucleotide SSRs are most common. In the chromosome, these repeats decrease in the coding regions as the repeat numbers increase, and in the virulence plasmid, such a pattern does not apply (Table 1). The trinucleotide SSRs are predominant in coding regions of both the chromosome and the plasmid regardless of the number of repeats. The hexanucleotide SSRs are only present in the chromosome and predominantly in the coding regions. In some

Acknowledgements

We thank Moqing Liu, Hong Liu and Fan Yang for their technical assistance, and Peadar O'Gaora and Kathy Smollett for their critical reading of the manuscript. This work is supported by the Chinese Academy of Sciences (grant no. KSCX2-2-07), and the National High Technology Development Program of China (grant no. 2002AA231031).

References (24)

  • E.R. Moxon et al.

    Adaptive evolution of highly mutable loci in pathogenic bacteria

    Curr. Biol.

    (1994)
  • M.A.D. Antonio et al.

    DNA fingerprinting of Lactobacillus crispatus strain CTV-05 by repetitive element sequence-based PCR analysis in a pilot study of vaginal colonization

    J. Clin. Microbiol.

    (2003)
  • G. Benson

    Tandem repeats finder: a program to analyze DNA sequences

    Nucleic Acids Res.

    (1999)
  • S. Bretagne et al.

    Microsatellite polymorphism in the promoter sequence of the elongation factor 3 gene of Candida albicans as the basis for a typing system

    J. Clin. Microbiol.

    (1997)
  • E. Diaz et al.

    Characterization of the hca cluster encoding the dioxygenolytic pathway for initial catabolism of 3-phenylpropionic acid in Escherichia coli K-12

    J. Bacteriol.

    (1998)
  • N. Guex et al.

    SWISS-MODEL and the Swiss-PdbViewer: an environment for comparative protein modelling

    Electrophoresis

    (1997)
  • T.L. Hale

    Genetic basis of virulence in Shigella species

    Microbiol. Rev.

    (1991)
  • R. Henderson et al.

    Molecular switches—the ON and OFF of bacterial phase variation

    Mol. Microbiol.

    (1999)
  • D. Hood et al.

    DNA repeats identify novel virulence genes in Haemophilus influenzae

    Proc. Natl. Acad. Sci. U. S. A.

    (1996)
  • A.J. Jeffreys et al.

    DNA fingerprints and analysis of multiple markers in human pedigrees

    Am. J. Hum. Genet.

    (1986)
  • Q. Jin

    Genome sequence of Shigella flexneri 2a: insights into pathogenicity through comparison with genomes of Escherichia coli K12 and O157

    Nucleic Acids Res.

    (2002)
  • P. Keim et al.

    Multiple-locus variable-number tandem repeat analysis reveals genetic relationships within bacillus anthracis

    J. Bacteriol.

    (2000)
  • Cited by (16)

    • Modulation of gene expression by microsatellites in microbes

      2018, New and Future Developments in Microbial Biotechnology and Bioengineering: Microbial Genes Biochemistry and Applications
    • Single nucleotide polymorphism (SNP)-based differentiation of Shigella isolates by pyrosequencing

      2011, Infection, Genetics and Evolution
      Citation Excerpt :

      These methods, while simple and inexpensive, retained relatively low discriminatory indices especially when strain-specific identification was required. More recently, DNA fingerprinting methods, including ribotyping, mutilocus variable-number tandem repeat analysis, and pulsed-field gel electrophoresis, have had greater success in differentiating Shigella isolates to more epidemiologically meaningful levels (Liang et al., 2007; Talukder et al., 2003; Wang et al., 2009; Yang et al., 2003). Sequence-based subtyping methods, however, represent important tools for typing closely related pathogenic strains and recently have been integrated into next-generation sequencing strategies for differentiating otherwise clonal outbreak pathogen swarms from food and clinical settings (Harris et al., 2010; Lienau et al., 2011).

    View all citing articles on Scopus
    View full text