Skip to main content

Supporting data for "Canfam_GSD: De novo chromosome-length genome assembly of the German Shepherd Dog (Canis lupus familiaris) using a combination of long reads, optical mapping and Hi-C"

Dataset type: Genomic, Epigenomic
Data released on February 18, 2020

Field MA; Rosen BD; Dudchenko O; Chan EKF; Minoche AE; Edwards RJ; Barton K; Lyons RJ; Enosi Tuipulotu D; Hayes VM; Omer A; Colaric Z; Keilwagen J; Skvortsova K; Bogdanovic O; Smith MA; Aiden EL; Smith TPL; Zammit RA (2020): Supporting data for "Canfam_GSD: De novo chromosome-length genome assembly of the German Shepherd Dog (Canis lupus familiaris) using a combination of long reads, optical mapping and Hi-C" GigaScience Database. https://doi.org/10.5524/100712

DOI10.5524/100712

The German Shepherd Dog (GSD) is one of the most common breeds on earth and has been bred for its utility and intelligence. It is often first choice for police and military work, as well as protection, disability assistance and search-and-rescue. Yet, GSD’s are well known to be afflicted with a range of genetic diseases that can interfere with their training. Such diseases are of particular concern when they occur later in life, and fully trained animals are not able to continue their duties. Here, we provide the draft genome sequence of a healthy German Shepherd female as a reference for future disease and evolutionary studies We generated this improved canid reference genome (CanFam_GSD) utilising a combination of Pacific Bioscience, Oxford Nanopore,, 10X Genomics, Bionano, and Hi-C technologies. The GSD assembly is approximately 80 times as contiguous as the current canid reference genome (20.9 Mb vs 0.267 Mb contig N50), containing far fewer gaps (306 vs 23,876) and fewer scaffolds (429 vs 3,310) than the current canid reference genome CanFam v3.1. Two chromosomes (4 and 35) are assembled into single scaffolds with no gaps. Benchmarking Universal Single-Copy Orthologs analyses of the genome assembly results show 93.0% of the conserved single-copy genes are complete in the GSD assembly compared to 92.2% for CanFam v3.1. Homology-based gene annotation increases this value to about 99%. Detailed examination of the evolutionary important pancreatic amylase region reveals there are most likely seven copies of the gene indicative of a duplication of four ancestral copies and the disruption of one copy. GSD genome assembly and annotation were produced with major improvement in completeness, continuity and quality over the existing canid reference. This resource will enable further research related to canine diseases, the evolutionary relationships of canids, and other aspects of canid biology.

Additional details

Read the peer-reviewed publication(s):

  • Field, M. A., Rosen, B. D., Dudchenko, O., Chan, E. K. F., Minoche, A. E., Edwards, R. J., Barton, K., Lyons, R. J., Tuipulotu, D. E., Hayes, V. M., D. Omer, A., Colaric, Z., Keilwagen, J., Skvortsova, K., Bogdanovic, O., Smith, M. A., Aiden, E. L., Smith, T. P. L., Zammit, R. A., & Ballard, J. W. O. (2020). Canfam_GSD: De novo chromosome-length genome assembly of the German Shepherd Dog (Canis lupus familiaris) using a combination of long reads, optical mapping, and Hi-C. GigaScience, 9(4). https://doi.org/10.1093/gigascience/giaa027 (PubMed:32236524)
Related datasets:

doi:10.5524/100712 IsCitedBy doi:10.5524/102356

Additional information:

https://www.dnazoo.org/assemblies/Canis_lupus_familiaris_German_Shepherd

Accessions (data included in GigaDB):

BioProject: PRJNA560310
Bioproject: PRJNA512907
GEO: GSE136348

Click on a table column to sort the results.

Table Settings
Sample ID Common Name Scientific Name Sample Attributes Taxonomic ID Genbank Name
SAMN12574852 dogs Canis lupus familiaris Geographic location (longitude):151.2099
Geographic location (latitude):-33.865143
Life stage:adult
...
9615 dog
GSM4047136 dogs Canis lupus familiaris Geographic location (longitude):151.2099
Geographic location (latitude):-33.865143
Description:DNA methylome of blood extracted from ...
...
9615 dog
SAMN13072002 dogs Canis lupus familiaris Geographic location (latitude):40.7128
Geographic location (longitude):-74.006
Description:Chromatin for in situ Hi-C preparation...
...
9615 dog

Click on a table column to sort the results.

Table Settings

File Name Description Sample ID Data Type File Format Size Release Date File Attributes Download
Readme TEXT 4.27 kB 2020-02-17
Genome assembly fasta Sequence assembly FASTA 2.44 GB 2020-02-14 MD5 checksum: c25db10f2e18bd1448acb94d904faa1c
Coding gene annotations Annotation GFF 79.48 MB 2020-02-14 MD5 checksum: 6830516348137c6f53b4a7250ccc8dc4
Coding gene nucleotides fasta Coding sequence FASTA 127.07 MB 2020-02-14 MD5 checksum: b67dc61f20530d59d39fe605469b9284
VCF file generated mummer show-snps Sequence variants VCF 204.29 MB 2020-02-14 MD5 checksum: a5b55542bbaecf115b76d68e8062167a
BUSCO summary for assembly Other TEXT 964 B 2020-02-14 MD5 checksum: c1eabc846651976ebb4982a9d3c89f3c
BUSCO summary for current dog reference genome CanFam3.1 Other TEXT 757 B 2020-02-14 MD5 checksum: 228822e3ad32852b48e3fff1b21b2e42
Tabular data with additional sample details Tabular data TEXT 346 B 2020-02-14 MD5 checksum: 347ab475d21633a0557dff05979c4c11
Complete mitochondrial sequence Sequence assembly FASTA 16.85 kB 2020-02-14 MD5 checksum: a951cef9315f4a6476815bb58324ab12
Final combined genome assembly fasta Sequence assembly FASTA 2.41 GB 2020-02-14 MD5 checksum: ab5ca3da02c3867315c8e70b6a8b5765
Funding body Awardee Award ID Comments
NHMRC Matt A Field APP1139756 CJ Martin Fellowship
NSF Erez Lieberman Aiden PHY1427654 Physics Frontiers Center Award
Welch Foundation Erez Lieberman Aiden Q-1866
USDA Agriculture and Food Research Initiative Grant Erez Lieberman Aiden 2017-05741
NIH Erez Lieberman Aiden U01HL130010 4D Nucleome Grant
NIH Erez Lieberman Aiden UM1HG009375 Encyclopedia of DNA Elements Mapping Center Award
Australian Research Council Ramiciotti Sequencing Centre LE150100031 LIEF
Date Action
February 18, 2020 Dataset publish
March 17, 2020 Manuscript Link added : 10.1093/gigascience/giaa027
October 7, 2022 Manuscript Link updated : 10.1093/gigascience/giaa027
February 21, 2023 Relationship added : DOI 102356