Supporting data for "Assessment of human diploid genome assembly with 10x Linked-Reads data"
Dataset type: Genomic
Data released on November 01, 2019
Zhang L; Zhou X; Weng Z; Sidow A (2019): Supporting data for "Assessment of human diploid genome assembly with 10x Linked-Reads data" GigaScience Database. https://doi.org/10.5524/100668
Producing cost-effective haplotype-resolved personal genomes remains challenging. 10x Linked-Read sequencing, with its high base quality and long-range information, has been demonstrated to facilitate de novo assembly of human genomes and variant detection. In this study, we investigate in depth how the parameter space of 10x library preparation and sequencing affects assembly quality, on the basis of both simulated and real libraries.
We prepared and sequenced eight 10x libraries with a diverse set of parameters from standard cell lines NA12878 and NA24385 and performed whole genome assembly on the data. We also developed the simulator LRTK-SIM to follow the workflow of 10x data generation and produce realistic simulated Linked-Read data sets. We found that assembly quality could be improved by increasing the total sequencing coverage (C) and keeping physical coverage of DNA fragments (CF) or read coverage per fragment (CR) within broad ranges. The optimal physical coverage was between 332X and 823X and assembly quality worsened if it increased to greater than 1,000X for a given C. Long DNA fragments could significantly extend phase blocks, but decreased contig contiguity. The optimal length-weighted fragment length (Wμ_FL) was around 50 – 150kb. When broadly optimal parameters were used for library preparation and sequencing, ca. 80% of the genome was assembled in a diploid state.
The Linked-Read libraries we generated and the parameter space we identified provide theoretical considerations and practical guidelines for personal genome assemblies based on 10x Linked-Read sequencing.
Additional details
Read the peer-reviewed publication(s):
- Zhang, L., Zhou, X., Weng, Z., & Sidow, A. (2019). Assessment of human diploid genome assembly with 10x Linked-Reads data. GigaScience, 8(11). https://doi.org/10.1093/gigascience/giz141 (PubMed:31769805)
Additional information:
http://mendel.stanford.edu/supplementarydata/zhang_SN2_2019
Github links:
https://github.com/zhanglu295/Evaluate_diploid_assembly
https://github.com/zhanglu295/LRTK-SIM
Accessions (data included in GigaDB):
BioProject: PRJNA527321
Click on a table column to sort the results.
Table SettingsSample ID | Common Name | Scientific Name | Sample Attributes | Taxonomic ID | Genbank Name |
---|---|---|---|---|---|
NA12878 | Human | Homo sapiens | Alternative accession-BioProject:PRJNA527321 Alt accBioSample:SAMN11149732 Alternative accession-SRA Sample:SRS4498416 ... |
9606 | human |
NA24385 | Human | Homo sapiens | Description:Human standard cell line NA24385 Alternative accession-BioProject:PRJNA527321 Alt accBioSample:SAMN11149733 ... |
9606 | human |
Click on a table column to sort the results.
Table SettingsFile Name | Description | Sample ID | Data Type | File Format | Size | Release Date | File Attributes | Download |
---|---|---|---|---|---|---|---|---|
Archival copy of the GitHub repository https://github.com/zhanglu295/Evaluate_diploid_assembly downloaded 25-Oct-2019. This is the description of how to evaluate and compare diploid assemblies in different parameters.This repository is licensed under the MIT license. Please refer to the GitHub repo for most recent updates. | GitHub archive | archive | 12.93 kB | 2019-10-29 | license: MIT MD5 checksum: 6b70d973a6e85dace22aa2ef60a459cc |
|||
Archival copy of the GitHub repository https://github.com/zhanglu295/LRTK-SIM downloaded 25-Oct-2019. A program to simulate linked reads sequencing from 10X Chromium System. This repository is licensed under the MIT license. Please refer to the GitHub repo for most recent updates. | GitHub archive | archive | 83.05 MB | 2019-10-29 | license: MIT MD5 checksum: 814aceb0d7c662f3bbfd7d41ee3f49e4 |
|||
Diploid assembly generated by Supernova2. R10 library. megabubbles output | Sequence assembly | FASTA | 1.82 GB | 2019-10-29 | MD5 checksum: 27091c8cb82cc8e50d0e96b8450fe2a2 |
|||
Diploid assembly generated by Supernova2. R10 library. pseudohap output | Sequence assembly | FASTA | 1.73 GB | 2019-10-29 | MD5 checksum: 221588d5dde4fff6f8018ad0d6f2222b |
|||
Diploid assembly generated by Supernova2. R11 library. megabubbles output | Sequence assembly | FASTA | 1.79 GB | 2019-10-29 | MD5 checksum: 5a415f9ca6d60ff61103efe8398ee525 |
|||
Diploid assembly generated by Supernova2. R11 library. pseudohap output | Sequence assembly | FASTA | 1.70 GB | 2019-10-29 | MD5 checksum: 69974e5950ba1d33c43ae4ed982b5f16 |
|||
Diploid assembly generated by Supernova2. R6 library. megabubbles output | Sequence assembly | FASTA | 1.62 GB | 2019-10-29 | MD5 checksum: 7245a4aa8b8fdd72712c5bb18811c72c |
|||
Diploid assembly generated by Supernova2. R6 library. pseudohap output | Sequence assembly | FASTA | 1.69 GB | 2019-10-29 | MD5 checksum: 8394eb2addc3cb591e41d503dcf814ed |
|||
Diploid assembly generated by Supernova2. R7 library. megabubbles output | Sequence assembly | FASTA | 1.78 GB | 2019-10-29 | MD5 checksum: 5f0970260b5442f8f15dcf542a5bb86e |
|||
Diploid assembly generated by Supernova2. R7 library. pseudohap output | Sequence assembly | FASTA | 1.70 GB | 2019-10-29 | MD5 checksum: 13c820b18af4640d44bcb4f5b8fcaa6e |
Funding body | Awardee | Award ID | Comments |
---|---|---|---|
National Institute of Standards and Technology |
Date | Action |
---|---|
November 1, 2019 | Dataset publish |
November 14, 2019 | Manuscript Link added : 10.1093/gigascience/giz141 |
October 14, 2022 | Manuscript Link updated : 10.1093/gigascience/giz141 |