Supporting data for "Canfam_GSD: De novo chromosome-length genome assembly of the German Shepherd Dog (Canis lupus familiaris) using a combination of long reads, optical mapping and Hi-C"
Dataset type: Genomic, Epigenomic
Data released on February 18, 2020
Field MA; Rosen BD; Dudchenko O; Chan EKF; Minoche AE; Edwards RJ; Barton K; Lyons RJ; Enosi Tuipulotu D; Hayes VM; Omer A; Colaric Z; Keilwagen J; Skvortsova K; Bogdanovic O; Smith MA; Aiden EL; Smith TPL; Zammit RA (2020): Supporting data for "Canfam_GSD: De novo chromosome-length genome assembly of the German Shepherd Dog (Canis lupus familiaris) using a combination of long reads, optical mapping and Hi-C" GigaScience Database. https://doi.org/10.5524/100712
The German Shepherd Dog (GSD) is one of the most common breeds on earth and has been bred for its utility and intelligence. It is often first choice for police and military work, as well as protection, disability assistance and search-and-rescue. Yet, GSD’s are well known to be afflicted with a range of genetic diseases that can interfere with their training. Such diseases are of particular concern when they occur later in life, and fully trained animals are not able to continue their duties. Here, we provide the draft genome sequence of a healthy German Shepherd female as a reference for future disease and evolutionary studies We generated this improved canid reference genome (CanFam_GSD) utilising a combination of Pacific Bioscience, Oxford Nanopore,, 10X Genomics, Bionano, and Hi-C technologies. The GSD assembly is approximately 80 times as contiguous as the current canid reference genome (20.9 Mb vs 0.267 Mb contig N50), containing far fewer gaps (306 vs 23,876) and fewer scaffolds (429 vs 3,310) than the current canid reference genome CanFam v3.1. Two chromosomes (4 and 35) are assembled into single scaffolds with no gaps. Benchmarking Universal Single-Copy Orthologs analyses of the genome assembly results show 93.0% of the conserved single-copy genes are complete in the GSD assembly compared to 92.2% for CanFam v3.1. Homology-based gene annotation increases this value to about 99%. Detailed examination of the evolutionary important pancreatic amylase region reveals there are most likely seven copies of the gene indicative of a duplication of four ancestral copies and the disruption of one copy. GSD genome assembly and annotation were produced with major improvement in completeness, continuity and quality over the existing canid reference. This resource will enable further research related to canine diseases, the evolutionary relationships of canids, and other aspects of canid biology.
Additional details
Read the peer-reviewed publication(s):
- Field, M. A., Rosen, B. D., Dudchenko, O., Chan, E. K. F., Minoche, A. E., Edwards, R. J., Barton, K., Lyons, R. J., Tuipulotu, D. E., Hayes, V. M., D. Omer, A., Colaric, Z., Keilwagen, J., Skvortsova, K., Bogdanovic, O., Smith, M. A., Aiden, E. L., Smith, T. P. L., Zammit, R. A., & Ballard, J. W. O. (2020). Canfam_GSD: De novo chromosome-length genome assembly of the German Shepherd Dog (Canis lupus familiaris) using a combination of long reads, optical mapping, and Hi-C. GigaScience, 9(4). https://doi.org/10.1093/gigascience/giaa027 (PubMed:32236524)
Related datasets:
doi:10.5524/100712 IsCitedBy doi:10.5524/102356
Additional information:
https://www.dnazoo.org/assemblies/Canis_lupus_familiaris_German_Shepherd
Accessions (data included in GigaDB):
BioProject: PRJNA560310
Bioproject: PRJNA512907
GEO: GSE136348
Click on a table column to sort the results.
Table SettingsSample ID | Common Name | Scientific Name | Sample Attributes | Taxonomic ID | Genbank Name |
---|---|---|---|---|---|
SAMN12574852 | dogs | Canis lupus familiaris | Geographic location (longitude):151.2099 Geographic location (latitude):-33.865143 Life stage:adult ... |
9615 | dog |
GSM4047136 | dogs | Canis lupus familiaris | Geographic location (longitude):151.2099 Geographic location (latitude):-33.865143 Description:DNA methylome of blood extracted from ... ... |
9615 | dog |
SAMN13072002 | dogs | Canis lupus familiaris | Geographic location (latitude):40.7128 Geographic location (longitude):-74.006 Description:Chromatin for in situ Hi-C preparation... ... |
9615 | dog |
Click on a table column to sort the results.
Table SettingsFile Name | Description | Sample ID | Data Type | File Format | Size | Release Date | File Attributes | Download |
---|---|---|---|---|---|---|---|---|
Readme | TEXT | 4.27 kB | 2020-02-17 | |||||
Genome assembly fasta | Sequence assembly | FASTA | 2.44 GB | 2020-02-14 | MD5 checksum: c25db10f2e18bd1448acb94d904faa1c |
|||
Coding gene annotations | Annotation | GFF | 79.48 MB | 2020-02-14 | MD5 checksum: 6830516348137c6f53b4a7250ccc8dc4 |
|||
Coding gene nucleotides fasta | Coding sequence | FASTA | 127.07 MB | 2020-02-14 | MD5 checksum: b67dc61f20530d59d39fe605469b9284 |
|||
VCF file generated mummer show-snps | Sequence variants | VCF | 204.29 MB | 2020-02-14 | MD5 checksum: a5b55542bbaecf115b76d68e8062167a |
|||
BUSCO summary for assembly | Other | TEXT | 964 B | 2020-02-14 | MD5 checksum: c1eabc846651976ebb4982a9d3c89f3c |
|||
BUSCO summary for current dog reference genome CanFam3.1 | Other | TEXT | 757 B | 2020-02-14 | MD5 checksum: 228822e3ad32852b48e3fff1b21b2e42 |
|||
Tabular data with additional sample details | Tabular data | TEXT | 346 B | 2020-02-14 | MD5 checksum: 347ab475d21633a0557dff05979c4c11 |
|||
Complete mitochondrial sequence | Sequence assembly | FASTA | 16.85 kB | 2020-02-14 | MD5 checksum: a951cef9315f4a6476815bb58324ab12 |
|||
Final combined genome assembly fasta | Sequence assembly | FASTA | 2.41 GB | 2020-02-14 | MD5 checksum: ab5ca3da02c3867315c8e70b6a8b5765 |
Funding body | Awardee | Award ID | Comments |
---|---|---|---|
NHMRC | Matt A Field | APP1139756 | CJ Martin Fellowship |
NSF | Erez Lieberman Aiden | PHY1427654 | Physics Frontiers Center Award |
Welch Foundation | Erez Lieberman Aiden | Q-1866 | |
USDA Agriculture and Food Research Initiative Grant | Erez Lieberman Aiden | 2017-05741 | |
NIH | Erez Lieberman Aiden | U01HL130010 | 4D Nucleome Grant |
NIH | Erez Lieberman Aiden | UM1HG009375 | Encyclopedia of DNA Elements Mapping Center Award |
Australian Research Council | Ramiciotti Sequencing Centre | LE150100031 | LIEF |
Date | Action |
---|---|
February 18, 2020 | Dataset publish |
March 17, 2020 | Manuscript Link added : 10.1093/gigascience/giaa027 |
October 7, 2022 | Manuscript Link updated : 10.1093/gigascience/giaa027 |
February 21, 2023 | Relationship added : DOI 102356 |