Skip to main content

Supporting data for "Assessment of human diploid genome assembly with 10x Linked-Reads data"

Dataset type: Genomic
Data released on November 01, 2019

Zhang L; Zhou X; Weng Z; Sidow A (2019): Supporting data for "Assessment of human diploid genome assembly with 10x Linked-Reads data" GigaScience Database. https://doi.org/10.5524/100668

DOI10.5524/100668

Producing cost-effective haplotype-resolved personal genomes remains challenging. 10x Linked-Read sequencing, with its high base quality and long-range information, has been demonstrated to facilitate de novo assembly of human genomes and variant detection. In this study, we investigate in depth how the parameter space of 10x library preparation and sequencing affects assembly quality, on the basis of both simulated and real libraries.
We prepared and sequenced eight 10x libraries with a diverse set of parameters from standard cell lines NA12878 and NA24385 and performed whole genome assembly on the data. We also developed the simulator LRTK-SIM to follow the workflow of 10x data generation and produce realistic simulated Linked-Read data sets. We found that assembly quality could be improved by increasing the total sequencing coverage (C) and keeping physical coverage of DNA fragments (CF) or read coverage per fragment (CR) within broad ranges. The optimal physical coverage was between 332X and 823X and assembly quality worsened if it increased to greater than 1,000X for a given C. Long DNA fragments could significantly extend phase blocks, but decreased contig contiguity. The optimal length-weighted fragment length (Wμ_FL) was around 50 – 150kb. When broadly optimal parameters were used for library preparation and sequencing, ca. 80% of the genome was assembled in a diploid state.
The Linked-Read libraries we generated and the parameter space we identified provide theoretical considerations and practical guidelines for personal genome assemblies based on 10x Linked-Read sequencing.

Additional details

Read the peer-reviewed publication(s):

  • Zhang, L., Zhou, X., Weng, Z., & Sidow, A. (2019). Assessment of human diploid genome assembly with 10x Linked-Reads data. GigaScience, 8(11). https://doi.org/10.1093/gigascience/giz141 (PubMed:31769805)

Additional information:

http://mendel.stanford.edu/supplementarydata/zhang_SN2_2019

Github links:

https://github.com/zhanglu295/Evaluate_diploid_assembly

https://github.com/zhanglu295/LRTK-SIM

Accessions (data included in GigaDB):

BioProject: PRJNA527321

Click on a table column to sort the results.

Table Settings
Sample ID Common Name Scientific Name Sample Attributes Taxonomic ID Genbank Name
NA12878 Human Homo sapiens Alternative accession-BioProject:PRJNA527321
Alt accBioSample:SAMN11149732
Alternative accession-SRA Sample:SRS4498416
...
9606 human
NA24385 Human Homo sapiens Description:Human standard cell line NA24385
Alternative accession-BioProject:PRJNA527321
Alt accBioSample:SAMN11149733
...
9606 human

Click on a table column to sort the results.

Table Settings

File Name Description Sample ID Data Type File Format Size Release Date File Attributes Download
Archival copy of the GitHub repository https://github.com/zhanglu295/Evaluate_diploid_assembly downloaded 25-Oct-2019. This is the description of how to evaluate and compare diploid assemblies in different parameters.This repository is licensed under the MIT license. Please refer to the GitHub repo for most recent updates. GitHub archive archive 12.93 kB 2019-10-29 license: MIT
MD5 checksum: 6b70d973a6e85dace22aa2ef60a459cc
Archival copy of the GitHub repository https://github.com/zhanglu295/LRTK-SIM downloaded 25-Oct-2019. A program to simulate linked reads sequencing from 10X Chromium System. This repository is licensed under the MIT license. Please refer to the GitHub repo for most recent updates. GitHub archive archive 83.05 MB 2019-10-29 license: MIT
MD5 checksum: 814aceb0d7c662f3bbfd7d41ee3f49e4
Diploid assembly generated by Supernova2. R10 library. megabubbles output Sequence assembly FASTA 1.82 GB 2019-10-29 MD5 checksum: 27091c8cb82cc8e50d0e96b8450fe2a2
Diploid assembly generated by Supernova2. R10 library. pseudohap output Sequence assembly FASTA 1.73 GB 2019-10-29 MD5 checksum: 221588d5dde4fff6f8018ad0d6f2222b
Diploid assembly generated by Supernova2. R11 library. megabubbles output Sequence assembly FASTA 1.79 GB 2019-10-29 MD5 checksum: 5a415f9ca6d60ff61103efe8398ee525
Diploid assembly generated by Supernova2. R11 library. pseudohap output Sequence assembly FASTA 1.70 GB 2019-10-29 MD5 checksum: 69974e5950ba1d33c43ae4ed982b5f16
Diploid assembly generated by Supernova2. R6 library. megabubbles output Sequence assembly FASTA 1.62 GB 2019-10-29 MD5 checksum: 7245a4aa8b8fdd72712c5bb18811c72c
Diploid assembly generated by Supernova2. R6 library. pseudohap output Sequence assembly FASTA 1.69 GB 2019-10-29 MD5 checksum: 8394eb2addc3cb591e41d503dcf814ed
Diploid assembly generated by Supernova2. R7 library. megabubbles output Sequence assembly FASTA 1.78 GB 2019-10-29 MD5 checksum: 5f0970260b5442f8f15dcf542a5bb86e
Diploid assembly generated by Supernova2. R7 library. pseudohap output Sequence assembly FASTA 1.70 GB 2019-10-29 MD5 checksum: 13c820b18af4640d44bcb4f5b8fcaa6e
Funding body Awardee Award ID Comments
National Institute of Standards and Technology
Date Action
November 1, 2019 Dataset publish
November 14, 2019 Manuscript Link added : 10.1093/gigascience/giz141
October 14, 2022 Manuscript Link updated : 10.1093/gigascience/giz141