Reference isolates
Representatives of all S. sonnei subclades, including 120 reference isolates from a global genotyping framework14 were included in the cgMLST tree in this study to contextualise the UK outbreak.
Isolates and data from the United Kingdom (UKHSA)
All S. sonnei isolates with an assigned SNP address in England (97%, n=2895 of 2982 submitted) received by the United Kingdom national reference laboratory (Gastrointestinal Bacteria Reference Unit) between 01/01/2016 and 30/12/2021 were included in this study. Data on patient age was available for all isolates, data on patient sex for 99% (n=2,875/2,895), and explicitly declared information regarding the presence or absence of travel for approximately half (n=1,487/2,895, 51%). In total, 429 isolates were from patients who reported recent travel to Asia, 289 to Africa, 73 to Europe, 121 to North America, 6 to Oceania, 38 to South America, and 73 to an unknown location. There were 458 isolates from patients where ‘no travel’ was explicitly indicated and 1,408 isolates for whom recorded information indicating either an absence or presence of recent travel was missing. In the absence of individual patient risk factors on sexual activity being available, possible MSM associated isolates were classified by proxy as deriving from men aged 16 years or older, without a history of recent international travel, in line with a previous validation of this approach in this setting.26
All isolates were sequenced (Illumina) according to previously described protocols.27 Isolates were assigned hierarchical genomic subtypes, called SNP addresses, according to in house subtyping methods.28 The SNP address is a linkage-clustering system in which links at the 250, 100, 50, 25, 10, 5, and 0 SNP thresholds are numbered and joined by periods in decreasing order. The SNP address of relevance here is 1.1.1.1.377. This lineage became XDR and the outbreak was investigated at the level of the 10-SNP threshold, so the 1.1.1.1.377 SNP linkage cluster is referred to here as t10.377 for consistency with the initial report.25 Among the 2,895 S. sonnei isolates in this study, 483 belonged to SNP linkage cluster t10.377. Additionally, eight plasmid sequences from UK isolates (see accessions in Supplementary Table 1) generated from hybrid assemblies with Oxford Nanopore technology generated as previously,29 were used in plasmid comparisons.
To coordinate exploring international links to the UK outbreak, we employed a recently described global genotyping framework.14 This indicated that the UK-based SNP linkage cluster t10.377 roughly approximated the international 3.6.1.1.2 genotype of S. sonnei (96% concordance statistic, Supplementary Table 1). The human-readable nomenclature for this genotype is CipR.MSM5 due to its known ciprofloxacin resistance and previous description in MSM as ‘Clade 5’.6
International outbreak associated isolates
CipR.MSM5 S. sonnei genotype isolates, with or without blaCTX-M-27, from public health surveillance organisations in other countries were integrated into the analysis (based on data volume and sharing arrangements). A brief overview for each country and aggregate data on patient age and gender, which were consistent with isolates deriving from MSM-associated transmission, is provided here.
Belgium: All S. sonnei confirmed as CipR.MSM5 isolated at Sciensano’s reference laboratory between January 2017 and March 2022 and with sequence data were included. All (n=112) isolates came from patients aged up to 84 years who were mostly male (n=101/112, 90%). Sequence data was obtained using the following protocol: isolates were cultured overnight in BHI broth (BD) at 37°C. DNA was extracted using an MgC Bacterial DNA Kit™ with a 60 μL elution volume (Atrida, Amersfoort, The Netherlands), following the manufacturer’s instructions. Sequencing libraries were constructed using the Illumina Nextera XT DNA sample preparation kit and sequenced on an Illumina MiSeq instrument with a 250 bp paired‐end protocol (MiSeq v3 chemistry), according to the manufacturer’s instructions (Illumina, San Diego, CA, USA).
France: All CipR.MSM5 isolates that contained blaCTX-M-27 (n=101) isolated at the Enteric Bacterial Pathogens Unit at the Institute Pasteur between 01/09/2020 and 15/02/2022 were included in this study. The isolates came from patients from 13 to 68 years and the majority were male (n=98/101, 97%). These were sequenced using the following protocol: Genomic DNA was extracted with the MagNA Pure DNA isolation kit (Roche Molecular Systems, Indianapolis, IN, USA), in accordance with the manufacturer’s instructions. Whole-genome sequencing was performed with at the Mutualized Platform for Microbiology (P2M) at Institut Pasteur, Paris. The libraries were prepared with the Nextera XT kit (Illumina, San Diego, CA, USA) and sequencing was performed with the NextSeq 500 system.
The sequence of a single plasmid (p202008564-6) from isolate 202008564 was obtained. Genomic DNA was prepared as follows: the isolates were cultured overnight at 37°C in alkaline nutrient agar (20 g casein meat peptone E2 from Organotechnie; 5 g sodium chloride from Sigma; 15 g Bacto agar from Difco; distilled water to 1 L; adjusted to pH 8.4; autoclaved at 121°C for 15 min). A few isolated colonies from the overnight culture were used to inoculate 20 mL of Brain-Heart-Infusion (BHI) broth and were cultured until a final OD600 of 0.8 was reached at 37°C with shaking (200 rpm —Thermo Scientific MaxQ 6800). Bacterial cells were harvested by centrifugation and the DNA extraction was performed by two different methods. For isolate 202008118, we followed the protocol described by von Mentzer et al,30 except that MaXtract High Density columns (Qiagen) were used (instead of phase lock tubes) and DNA was resuspended in molecular biology grade water (instead of 10 mM Tris pH 8.0). We used a Genomic-tip 100/G column (Qiagen) according to the manufacturer’s protocol. The library was prepared according to the instructions of the Native barcoding genomic DNA (with EXP-NBD104, EXP-NBD114, and SQK-LSK109) procedure provided by Oxford Nanopore Technology. Sequencing was then performed on a MinION Mk1C apparatus (Oxford Nanopore Technologies). The genomic sequences of the isolates were assembled from long and short reads, with a hybrid approach and Unicycler (v 0.4.8.)31 A polishing step was performed with Pilon (v 1.23), to generate a high-quality plasmid sequence.32 p202008564-6 (GenBank accession no. OP038290).
United States of America: Data for all publicly available CipR.MSM5 isolates which contained a gene in the blaCTX-M family (n=31) and associated age and gender data were kindly provided by PulseNet, Centre for Diseases Control. Aggregate metadata showed that these isolates came from patients aged between 20 and 79 years that were predominately male (n=30/31, 97%).
Australia: From the state of Victoria, CipR.MSM5 blaCTX-M-27 isolates (n=3) from routine health surveillance (Microbiological Diagnostic Unit, Public Health Laboratory, Doherty Institute) were analysed, two of which have been described previously (AUSMDU00040532 and AUSMDU00040564).22 All three were from male patients aged between 16 and 60 years. From the state of New South Wales, all CipR.MSM5 isolates (n=232) collected during routine surveillance October 2017-November 2020 were included. Isolates were from patients between 19 and 83 years (mean = 40 years) and mainly male (n=224/232, 97%). Genomic DNA was extracted from pure cultures using a QIAGEN DNeasy Blood and Tissue Mini Kit (QIAGEN). Sequencing libraries were prepared using Nextera XT DNA Library Prep Kit (Illumina) and sequenced on a NextSeq500 at the Microbial Genomics Reference Laboratory, Westmead Hospital. DNA for long-read sequencing was extracted from isolates growing on Horse Blood agar (Thermofisher) using DNeasy® UltraClean® Microbial Kit (Qiagen) with mechanical lysis reduced to 2 min. DNA quality and quantity were assessed on a Nano-300 Micro-Spectrophotometer (Allsheng) and a QubitTM 2 Fluorometer (Life Technologies) while integrity was assessed by 0.6% (w/v) agarose gel electrophoresis. Libraries were prepared using the SQK-RBK004 Rapid barcoding kit (Oxford Nanopore Technologies) according to manufacturer’s instructions and loaded into a R4.9.1 flow cell and sequenced on a MinION Mk 1B (Oxford Nanopore Technologies). Sequencing was performed for up to 24 h and base-calling was performed post sequencing using Guppy (v 3.4.5+f1fbfb). Hybrid assembly was performed using Unicycler (v 0.4.8).31 Assemblies were further polished on Pilon (v 1.23),32 BAM files generated by MiniMap2 (v 2.17-r941),33 until no further changes could be made. Assemblies were automatically annotated using Prokka (v 1.14.6),34 running on metagenome mode and using genetic code 11.
Bioinformatic Analyses
Clustering dendrogram of UK isolates
A dendrogram was constructed for the UK isolates available on Enterobase (n=2820/2895) and the global context representatives (n=120) using the NINJA MJ tree algorithm on the cgMLST hierarchical clustering results from the algorithm implemented in Enterobase.35,36 The tree was visualised using the Interactive Tree of Life (iTOL, v 5).37
Phylogenetic reconstruction of the international outbreak
To explore international relationships, we constructed a tree of CipR.MSM5 international isolates (n=475) alongside UK isolates belonging to the UK outbreak cluster (n=468, t10.377, see methods).25 Illumina paired end reads for the isolates from the United Kingdom were downloaded from the Sequence Read Archive (SRA) using fastq-dump (SRA toolkit v 2.11.0)38 with default settings except the ‘—split 3’ parameter was used to automatically separate paired end read data into two separate FASTQ files. Nucleotide sequences were trimmed using Trimmomatic (v 0.39)39 with the parameters: Leading: 20, Trailing: 20, SlidingWindow:4:20, MinLen:40, and quality checked with MultiQC (v 1.12)40. Burrows-Wheeler Aligner (BWA v 0.7.17) was used with default parameters to align the S. sonnei sequences to Shigella sonnei 53G as the reference genome (NCBI accession number: GCA_00283715.1_ASM28371v1).41 PICARD (v 2.27.2)42 was used to mark and remove artificial nucleotide duplicates from the genomes of the S. sonnei isolates using default parameters. SAMTOOLS (v 1.11) was used to index the files.43 BCFTOOLS (v 1.9),44 was used to call and filter variants, using default parameters and ‘-c —ploidy 1 -0 -o’ for variant calling and ‘—SnpGap 15 —IndelGap 15’ for filtering.45 A chromosomal pseudogenome was created for all isolates and prophages and plasmids were masked using sed (v 4.2.2)46. Gubbins (v 3.2.1)47 was then used to remove remaining regions of recombination using default parameters except: ‘-c 30, -f 60’. Multiple sequence alignments were then used to 1) impute a phylogenetic tree using IQTREE (v 2.2.0.3)48 with the default parameters except: ‘-ntmax 25 -bb 1000 -m GTR+I+R+ASC, and 2) determine population clusters using Rhier Bayesian Analysis of Genetic Population Structure (RhierBAPS v 1.01).49,50
Genome assembly and identifying genes of interest
Draft genomes were assembled using Unicycler (v 0.5.0),31 using the ‘forward paired’, ‘reverse paired’, ‘forward unpaired’, ‘reverse unpaired’ trimmed and quality controlled FASTQ files. Draft genome assemblies were checked for quality using QUAST (v 5.02)51. Default parameters were used except ‘min_fasta_length 200, --vcf’ were used instead. Draft genomes were then interrogated for AMR genes using NCBI AMRFinder Plus (v 3.10.24)52 with the organism set as Escherichia, and the ‘—plus’ parameter to obtain information on stress genes. SonneiTyping Script (v 20210201)53 embedded in Mykrobe (v 0.11.0)54 was used for genotyping and retrieval of QRDR mutations in gyrA and parC to infer ciprofloxacin resistance.55
Statistical testing
Standard proportions were calculated, and statistical support for phylogenetic analyses evaluated using chi-square testing. Plasmid fitness experiments were evaluated by two sample t-test.
Plasmid analysis
The eight UK IncFII plasmids were re-orientated to start with the repA gene using circulator Fixstart v1.5.5,56 followed by annotation using Prokka v1.14.6,34. Plasmids (GBK files) were compared (>95% nucleotide similarity) and visualised using Clinker v0.0.21.57 For a broader search, the sequence of p893916 was queried against the COBS index,58 of 661,405 curated draft genomes,59 with a kmer similarity cut-off of 80%. Genomes that were high quality and had a species ID of 0.8 or higher (n=1,505) were included in the distribution figure. Sequence types that are represented 10 or more times are named, while rare sequence types are collapsed into the category of “other” within the respective species. A BRIG (v0.95),60 plot was constructed using default settings with p893816 (GenBank accession MW396858) as the comparator plasmid.
Antimicrobial Susceptibility Testing
To confirm resistances inferred from sequence data, 14 UK isolates with representative genotypic AMR profiles underwent antimicrobial susceptibility testing using Liofilchem® gradient MIC Test Strips (MTS) according to manufacturer’s instructions to determine the minimum inhibitory concentrations (MICs). The following antimicrobial and MIC ranges (all in µg/mL) were used: ciprofloxacin (0.002 – 32), ertapenem (0.002 – 32), mecillinam (0.016 – 256), azithromycin (0.016 – 256), ceftriaxone (0.002 – 32), trimethoprim sulfamethoxazole (0.002 – 32), gentamicin (0.016 – 256). Results were interpreted as defined by the European Committee on Antimicrobial Susceptibility Testing (EUCAST).61
Plasmid fitness costs
The p893816-like plasmid from S. sonnei clinical isolate 1538717 (Nanopore SRA, SRR18254055) was conjugally transferred into E. coli MG1655 carrying a chromosomally encoded and constitutively expressed GFP protein and was used in the relative fitness assay. E. coli MG1655 carrying pKSR100 or pAPR100 were from isolates obtained from the UKHSA, as previously described.19 E. coli MG1655 and derivatives that each carry a different plasmid were inoculated into M9 minimal media supplemented with thiamine (10 µg/ml) and grown overnight with shaking at 220 rpm at 37˚C. These pre-cultures were adjusted to OD600 0.05 in a total volume of 1 ml and 150 µl aliquots inoculated into individual wells of a 96-well plate (Greiner Bio One, UK). The 96-well plate was then incubated with shaking at 37˚C in a Synergy H1 multi-mode plate reader (BioTek Instruments) taking optical density measurements at 600 nm every 15 min. Data were analysed using the R package Growthcurver and Area Under the Curve (AUC) output was used to calculate the relative fitness of each strain carrying the plasmid compared with the plasmid-free strain.62 Each experiment consisted of three technical replicates and was repeated three times.