Abstract
The advancements in sequencing technologies and assembly methods enable the regular production of high-quality genome assemblies characterizing complex regions. However, challenges remain in efficiently interpreting variations at various scales, from smaller tandem repeats to megabase re-arrangements, across many human genomes. We present a pangenome research toolkit enabling analyses of complex pangenome variations at multiple scales. A graph decomposition method is developed for interpreting such variations. Surveying a set of 395 challenging and medically important genes in pangenome provides quantitative insights into repetitiveness and diversity that could impact the accuracy of variant calls. We apply the graph decomposition methods to the Y-chromosome gene, DAZ1/DAZ2/DAZ3/DAZ4, of which structural variants have been linked to male infertility, and X-chromosome genes OPN1LW and OPN1MW linked to eye disorders, highlighting the power of PGR-TK and pangenomics to resolve complex variation in regions of the genome that were previously too complex to analyze across many haplotypes.
Competing Interest Statement
CSC is an employee and shareholder of Sema4, OpCo, Inc. FJS obtains research support from Illumina, PacBio and Oxford Nanopore.
Footnotes
↵+ Authors are listed in alphabetical order except the corresponding author
Fix typos and revision in introduction and discussion. Minor changes to the figures.
https://giab-data.s3.amazonaws.com/PGR-TK-Files/pgr-tk-HGRP-y1-evaluation-set-v0.tar
https://giab-data.s3.amazonaws.com/PGR-TK-Files/CMRG_output_dir_v0.3.3.tar