Genetic polyploid phasing from low-depth progeny samples

doi:10.1016/j.isci.2022.104461

iScience

Volume 25, Issue 6, 17 June 2022, 104461

https://doi.org/10.1016/j.isci.2022.104461 Get rights and content

Under a Creative Commons license

open access

Highlights

•
Allows phasing of autopolyploid species through genetic information of progenies
•
High number of low-depth progeny samples yields significant markers for phasing
•
Informative variant types (simplex-nulliplex) phasable with high confidence
•
Continuity not limited by read connectivity, but rather by the recombination rate

Summary

An important challenge in genome assembly is haplotype phasing, that is, to reconstruct the different haplotype sequences of an individual genome. Phasing becomes considerably more difficult with increasing ploidy, which makes polyploid phasing a notoriously hard computational problem. We present a novel genetic phasing method for plant breeding with the aim to phase two deep-sequenced parental samples with the help of a large number of progeny samples sequenced at low depth. The key ideas underlying our approach are to (i) integrate the individually weak Mendelian progeny signals with a Bayesian log-likelihood model, (ii) cluster alleles according to their likelihood of co-occurrence, and (iii) assign them to haplotypes via an interval scheduling approach. We show on two deep-sequenced parental and 193 low-depth progeny potato samples that our approach computes high-quality sparse phasings and that it scales to whole genomes.

Graphical abstract

Subject areas

Bioinformatics

Genomics

Sequence analysis

Data and code availability

•
There are three separate sequencing data sets: Short read data for two parental samples, short read data for the progeny samples and Hifi sequencing data for the parental samples. All three data sets have been uploaded on NCBI database as SRA. Accession numbers are listed in the key resources table.
•
All algorithms are implemented as part of the WhatsHap phasing suite (Patterson et al., 2015). The status of all original code by the time the experiments were run has been deposited at Zenodo and is publicly available as of the date of publication. Instructions how to run this code have been included. Existing software, which is either directly used by the phasing algorithm or contributed significantly to the data procession are Hifiasm, Snakemake and PuLP. A reference to these tools is listed in the key resources table.
•
Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request. Except for read mapping and initial variant calling, all experiments have been run through a snakemake pipeline, which has been deposited at Zenodo. In order to enable access to the input VCF files for the novel phasing algorithm, we separately uploaded this processed intermediary files under a separate DOI on Zenodo, which is listed in the key resources table.

Cited by (0)

⁹: These authors contributed equally

¹⁰: Lead contact