iScience
Volume 25, Issue 6, 17 June 2022, 104461
Journal home page for iScience

Article
Genetic polyploid phasing from low-depth progeny samples

https://doi.org/10.1016/j.isci.2022.104461Get rights and content
Under a Creative Commons license
open access

Highlights

  • Allows phasing of autopolyploid species through genetic information of progenies

  • High number of low-depth progeny samples yields significant markers for phasing

  • Informative variant types (simplex-nulliplex) phasable with high confidence

  • Continuity not limited by read connectivity, but rather by the recombination rate

Summary

An important challenge in genome assembly is haplotype phasing, that is, to reconstruct the different haplotype sequences of an individual genome. Phasing becomes considerably more difficult with increasing ploidy, which makes polyploid phasing a notoriously hard computational problem. We present a novel genetic phasing method for plant breeding with the aim to phase two deep-sequenced parental samples with the help of a large number of progeny samples sequenced at low depth. The key ideas underlying our approach are to (i) integrate the individually weak Mendelian progeny signals with a Bayesian log-likelihood model, (ii) cluster alleles according to their likelihood of co-occurrence, and (iii) assign them to haplotypes via an interval scheduling approach. We show on two deep-sequenced parental and 193 low-depth progeny potato samples that our approach computes high-quality sparse phasings and that it scales to whole genomes.

Subject areas

Bioinformatics
Genomics
Sequence analysis

Data and code availability

  • There are three separate sequencing data sets: Short read data for two parental samples, short read data for the progeny samples and Hifi sequencing data for the parental samples. All three data sets have been uploaded on NCBI database as SRA. Accession numbers are listed in the key resources table.

  • All algorithms are implemented as part of the WhatsHap phasing suite (Patterson et al., 2015). The status of all original code by the time the experiments were run has been deposited at Zenodo and is publicly available as of the date of publication. Instructions how to run this code have been included. Existing software, which is either directly used by the phasing algorithm or contributed significantly to the data procession are Hifiasm, Snakemake and PuLP. A reference to these tools is listed in the key resources table.

  • Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request. Except for read mapping and initial variant calling, all experiments have been run through a snakemake pipeline, which has been deposited at Zenodo. In order to enable access to the input VCF files for the novel phasing algorithm, we separately uploaded this processed intermediary files under a separate DOI on Zenodo, which is listed in the key resources table.

Cited by (0)

9

These authors contributed equally

10

Lead contact