NOT PEER-REVIEWED
"PeerJ Preprints" is a venue for early communication or feedback before peer review. Data may be preliminary.

A peer-reviewed article of this Preprint also exists.

View peer-reviewed version

A newer version of this Preprint is available: View the latest version

Supplemental Information

Figure S1: Schematic overview of errors affecting metabarcoding data and clustering / denoising strategies to reduce them

Overview of the metabarcoding process, with key biases potentially affecting sequence accuracy (shown in red). In the bulk sample (A) several species with different biomass (indicated by circle size) and distinct haplotypes (indicated by colour) are present. After tissue homogenization and DNA extraction the COI marker is amplified using PCR (B), which can not only skew sequence abundance but also fail to amplify taxa due to primer bias (Elbrecht & Leese, 2015) or insufficient sequencing depth in the case of underrepresented / rare taxa (Elbrecht, Peinert & Leese, 2017). In the process of HTS (C) many new false sequence variants are generated due to sequencing errors (Schirmer et al., 2015), chimera formation (Edgar et al., 2011) and mixing of multiplexed samples (Esling, Lejzerowicz & Pawlowski, 2015; Schnell, Bohmann & Gilbert, 2015). The impact of these errors is usually reduced by strict quality filtering and clustering of similar sequences into operational taxonomic units (OTUs). Normally, only the most abundant sequence in an OTU is considered and used to identify the respective species, which in turn means that information on genetic diversity is lost (Callahan, McMurdie & Holmes, 2017) (D). Recently alternative denoising strategies have been developed to remove sequences affected by errors from a dataset and retain the actual haplotype sequences present in a sample (Eren et al., 2015; Edgar & Flyvbjerg, 2015; Callahan et al., 2016; Amir et al., 2017). Figure based on Figure S1 in Callahan et al. 2016.

DOI: 10.7287/peerj.preprints.3269v2/supp-1

Figure S2: Overview of the haplotyping strategy used here and their implementation in the JAMP R package

Detailed bioinformatic processing of metabarcoding to extract haplotype sequences using the JAMP R package. A) Metabarcoding raw data is processed and quality filtered. These steps are integrated in JAMP, but most other standard metabarcoding pipelines could be used as well. B) The processed and quality filtered samples from step A would be usually clustered into operational taxonomic units, but are here additionally filtered (retaining reads of only the expected amplicon length and discarding reads of low abundance) and then denoised. C) In denoising with usearch unoise3 the strictness of denoising is controlled by the alpha value (low alpha = less noise, however more true haplotypes get discarded). D) The denoised reads (=haplotypes) are clustered into OTUs grouped by similarity and the abundance of each haplotype for each sample is exported in a table. E) The haplotype table is additionally filtered using different thresholds, to reduce the presence of low abundant OTUs and haplotypes and increase data reliability. F) The final filtered haplotype table can be used for phylogeographic and population genetic analysis.

DOI: 10.7287/peerj.preprints.3269v2/supp-2

Figure S3: Effect of different quality filtering (max ee) on reads of the single species mock sample

Effect of different expected error filtering thresholds on haplotype recovery (no denoising applied). All filtered reads are mapped against the expected haplotypes (black circles). Not all reads are shared between both replicates (indicated by A or B instead of a circle). The 15 expected haplotypes are shown in black, while unexpected ones are highlighted in gray or blue. Error bars show the standard deviation of relative read abundance between both replicates, for the respective haplotype.

DOI: 10.7287/peerj.preprints.3269v2/supp-3

Figure S4: Effect of different alpha values in read denoising of the single-species mock sample

Effect of different haplotype recovery of in the single species mock sample, when using different alpha values with Unoise3 (as integrated in the JAMP package). Not all reads are shared between both replicates (indicated by A or B instead of a circle). The 15 expected haplotypes are shown in black, while unexpected ones are highlighted in gray or blue. Error bars show the standard deviation of relative read abundance between both replicates, for the respective haplotype.

DOI: 10.7287/peerj.preprints.3269v2/supp-4

Figure S5: Bar plots of haplotype distribution within each OTU

Bar plots showing the haplotype composition of all 199 OTUs obtained with the BF2+BR2 primer combination. The OTU number is indicated above each bar, with the four taxa shown in Figure 2 being highlighted. Haplotypes are shown in different colours, with white bars indicating the proportion of sites where the respective OTU was not detected. Most OTUs were only present at a few sample sites.

DOI: 10.7287/peerj.preprints.3269v2/supp-5

Figure S6: Detailed plots of four taxa from the denoised multi-species monitoring samples, showing haplotype maps & networks, similarity between replicates and sequence alignments

Figure S6: Detailed haplotype maps, networks and sequence alignment for all 4 primer combinations and replicates of selected taxa. a) Haplotype maps for both replicates for each of the four primer combinations. For A. aquaticus only the 10 most common haplotypes are shown in different colours (remaining ones in white). For each primer combination, the haplotypes in the map and network have the same corresponding colours. b) Haplotype networks for each primer pair. Each cross line represents one base pair difference between the respective haplotypes. Haplotypes present in just one replicate are indicated by A or B next to the network node. Dashed lines around a circle indicate novel haplotypes that were not available in the BOLD reference database. c) Quantification of similarity between both replicates, by plotting abundance of individual haplotypes of each sampling point against each other. The red line indicates the best fit (with significance and adjusted R2 value given in each plot). d) Sequence alignment of all haplotypes, with mismatching nucleotides between sequences highlighted (green = T, red = A, yellow = G and blue = C). See the following pages for example plots of: Page 2: Taeniopteryx nebulosa Page 3: Hydropsyche pellucidula Page 4: Oulimnius tuberculatus Page 5: Asellus aquaticus

DOI: 10.7287/peerj.preprints.3269v2/supp-6

Table S1: Finland haplotype table (for all 4 different primer combinations)

DOI: 10.7287/peerj.preprints.3269v2/supp-7

Table S1: Finland haplotype table (for all four different primer combinations)

DOI: 10.7287/peerj.preprints.3269v2/supp-8

Manuscript work file for providing feedback

Please use track changes / comment functions. Thank you for your feedback.

DOI: 10.7287/peerj.preprints.3269v2/supp-9

Additional Information

Competing Interests

The authors declare that they have no competing interests.

Author Contributions

Vasco Elbrecht conceived and designed the experiments, performed the experiments, analyzed the data, contributed reagents/materials/analysis tools, prepared figures and/or tables, authored or reviewed drafts of the paper, approved the final draft.

Ecaterina Edith Vamos conceived and designed the experiments, performed the experiments, analyzed the data, authored or reviewed drafts of the paper, approved the final draft.

Dirk Steinke conceived and designed the experiments, authored or reviewed drafts of the paper, approved the final draft.

Florian Leese conceived and designed the experiments, authored or reviewed drafts of the paper, approved the final draft.

Data Deposition

The following information was supplied regarding data availability:

Unprocessed raw sequence data are available from previous studies on the NCBI SRA archive. Single species mock sample: SRR5295658 and SRR5295659 (Vamos, Elbrecht & Leese, 2017), monitoring samples: SRR4112287 (Elbrecht et al., 2017). The JAMP R package is available on GitHub (github.com/VascoElbrecht/JAMP) with the used R scripts (Script S1) and full haplotype tables (Table S1) available as supporting information.

Funding

This study is part of the European Cooperation in Science and Technology (COST) Action DNAqua-Net (CA15219). D.S. and V.E. was supported by the Canada First Research Excellence Fund for the Food from Thought initiative. E.E.V. was supported by a grant of the Bodnarescu Foundation. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.


Add your feedback

Before adding feedback, consider if it can be asked as a question instead, and if so then use the Question tab. Pointing out typos is fine, but authors are encouraged to accept only substantially helpful feedback.

Some Markdown syntax is allowed: _italic_ **bold** ^superscript^ ~subscript~ %%blockquote%% [link text](link URL)
 
By posting this you agree to PeerJ's commenting policies
1 Citation   Views   Downloads