Picky comprehensively detects high-resolution structural variants in nanopore long reads

Gong, Liang; Wong, Chee-Hong; Cheng, Wei-Chung; Tjong, Harianto; Menghi, Francesca; Ngan, Chew Yee; Liu, Edison T.; Wei, Chia-Lin

doi:10.1038/s41592-018-0002-6

Article
Published: 30 April 2018

Picky comprehensively detects high-resolution structural variants in nanopore long reads

Liang Gong¹^na1,
Chee-Hong Wong¹^na1,
Wei-Chung Cheng²,
Harianto Tjong¹,
Francesca Menghi¹,
Chew Yee Ngan¹,
Edison T. Liu¹ &
…
Chia-Lin Wei^1,2

Nature Methods volume 15, pages 455–460 (2018)Cite this article

6930 Accesses
59 Citations
65 Altmetric
Metrics details

Subjects

Abstract

Acquired genomic structural variants (SVs) are major hallmarks of cancer genomes, but they are challenging to reconstruct from short-read sequencing data. Here we exploited the long reads of the nanopore platform using our customized pipeline, Picky (https://github.com/TheJacksonLaboratory/Picky), to reveal SVs of diverse architecture in a breast cancer model. We identified the full spectrum of SVs with superior specificity and sensitivity relative to short-read analyses, and uncovered repetitive DNA as the major source of variation. Examination of genome-wide breakpoints at nucleotide resolution uncovered micro-insertions as the common structural features associated with SVs. Breakpoint density across the genome is associated with the propensity for interchromosomal connectivity and was found to be enriched in promoters and transcribed regions of the genome. Furthermore, we observed an over-representation of reciprocal translocations from chromosomal double-crossovers through phased SVs. We demonstrate that Picky analysis is an effective tool for comprehensive detection of SVs in cancer genomes from long-read data.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: A customized pipeline for long-read SV analysis.**

**Fig. 2: The sensitivity of the Picky pipeline in SV detection.**

**Fig. 3: Long reads uncover repeat-rich SVs and the presence of micro-insertions within SV junctions.**

**Fig. 4: Analysis of the genomic distribution of breakpoints and their affected genes.**

Inferring gene regulatory networks from single-cell multiome data using atlas-scale external data

Article Open access 12 April 2024

A single-cell atlas enables mapping of homeostatic cellular shifts in the adult human breast

Article Open access 28 March 2024

Tissue-specific enhancer–gene maps from multimodal single-cell data identify causal disease alleles

Article 09 April 2024

References

Sudmant, P. H. et al. An integrated map of structural variation in 2,504 human genomes. Nature 526, 75–81 (2015).
Article PubMed PubMed Central CAS Google Scholar
Feuk, L., Carson, A. R. & Scherer, S. W. Structural variation in the human genome. Nat. Rev. Genet. 7, 85–97 (2006).
Article PubMed CAS Google Scholar
1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Article CAS Google Scholar
Bochukova, E. G. et al. Large, rare chromosomal deletions associated with severe early-onset obesity. Nature 463, 666–670 (2010).
Article PubMed CAS Google Scholar
Diskin, S. J. et al. Copy number variation at 1q21.1 associated with neuroblastoma. Nature 459, 987–991 (2009).
Article PubMed PubMed Central CAS Google Scholar
Edwards, P. A. Fusion genes and chromosome translocations in the common epithelial cancers. J. Pathol. 220, 244–254 (2010).
PubMed CAS Google Scholar
Menghi, F. et al. The tandem duplicator phenotype as a distinct genomic configuration in cancer. Proc. Natl. Acad. Sci. USA 113, E2373–E2382 (2016).
Article PubMed PubMed Central CAS Google Scholar
Weischenfeldt, J., Symmons, O., Spitz, F. & Korbel, J. O. Phenotypic impact of genomic structural variation: insights from and for human disease. Nat. Rev. Genet. 14, 125–138 (2013).
Article PubMed CAS Google Scholar
Stankiewicz, P. & Lupski, J. R. Structural variation in the human genome and its role in disease. Annu. Rev. Med. 61, 437–455 (2010).
Article PubMed CAS Google Scholar
Chaisson, M. J. et al. Resolving the complexity of the human genome using single-molecule sequencing. Nature 517, 608–611 (2015).
Article PubMed CAS Google Scholar
Goodwin, S., McPherson, J. D. & McCombie, W. R. Coming of age: ten years of next-generation sequencing technologies. Nat. Rev. Genet. 17, 333–351 (2016).
Article PubMed CAS Google Scholar
Alkan, C., Coe, B. P. & Eichler, E. E. Genome structural variation discovery and genotyping. Nat. Rev. Genet. 12, 363–376 (2011).
Article PubMed PubMed Central CAS Google Scholar
Mills, R. E. et al. Mapping copy number variation by population-scale genome sequencing. Nature 470, 59–65 (2011).
Article PubMed PubMed Central CAS Google Scholar
Sović, I. et al. Fast and sensitive mapping of nanopore sequencing reads with GraphMap. Nat. Commun. 7, 11307 (2016).
Article PubMed PubMed Central CAS Google Scholar
Spies, N. et al. Genome-wide reconstruction of complex structural variants using read clouds. Nat. Methods 14, 915–920 (2017).
Article PubMed PubMed Central CAS Google Scholar
Cretu Stancu, M. et al. Mapping and phasing of structural variation in patient genomes using nanopore sequencing. Nat. Commun. 8, 1326 (2017).
Article PubMed PubMed Central CAS Google Scholar
Sedlazeck, F.J. et al. Accurate detection of complex structural variations using single molecule sequencing. bioRxiv Preprint at https://www.biorxiv.org/content/early/2017/07/28/169557 (2017).
Jain, M. et al. Improved data analysis for the MinION nanopore sequencer. Nat. Methods 12, 351–356 (2015).
Article PubMed PubMed Central CAS Google Scholar
Deamer, D., Akeson, M. & Branton, D. Three decades of nanopore sequencing. Nat. Biotechnol. 34, 518–524 (2016).
Article PubMed CAS Google Scholar
Jain, M., Olsen, H. E., Paten, B. & Akeson, M. The Oxford Nanopore MinION: delivery of nanopore sequencing to the genomics community. Genome Biol. 17, 239 (2016).
Article PubMed PubMed Central CAS Google Scholar
Jain, M. et al. Nanopore sequencing and assembly of a human genome with ultra-long reads. Nat. Biotechnol. 36, 338–345 (2018).
Gazdar, A. F. et al. Characterization of paired tumor and non-tumor cell lines established from patients with breast cancer. Int. J. Cancer 78, 766–774 (1998).
Article PubMed CAS Google Scholar
Li, H. Minimap2: fast pairwise alignment for long nucleotide sequences. arXiv Preprint at https://arxiv.org/abs/1708.01492 (2017).
Kiełbasa, S. M., Wan, R., Sato, K., Horton, P. & Frith, M. C. Adaptive seeds tame genomic sequence comparison. Genome Res. 21, 487–493 (2011).
Article PubMed PubMed Central CAS Google Scholar
Frith, M. C., Hamada, M. & Horton, P. Parameters for accurate genome alignment. BMC Bioinformatics 11, 80 (2010).
Article PubMed PubMed Central CAS Google Scholar
Layer, R. M., Chiang, C., Quinlan, A. R. & Hall, I. M. LUMPY: a probabilistic framework for structural variant discovery. Genome Biol. 15, R84 (2014).
Article PubMed PubMed Central Google Scholar
Stephens, P. J. et al. Complex landscapes of somatic rearrangement in human breast cancer genomes. Nature 462, 1005–1010 (2009).
Article PubMed PubMed Central CAS Google Scholar
Pendleton, M. et al. Assembly and diploid architecture of an individual human genome via single-molecule technologies. Nat. Methods 12, 780–786 (2015).
Article PubMed PubMed Central CAS Google Scholar
Bignell, G. R. et al. Architectures of somatic genomic rearrangement in human cancer amplicons at sequence-level resolution. Genome Res. 17, 1296–1303 (2007).
Article PubMed PubMed Central CAS Google Scholar
Campbell, P. J. et al. Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing. Nat. Genet. 40, 722–729 (2008).
Article PubMed PubMed Central CAS Google Scholar
Cahill, D., Connor, B. & Carney, J. P. Mechanisms of eukaryotic DNA double strand break repair. Front. Biosci. 11, 1958–1976 (2006).
Article PubMed CAS Google Scholar
Howarth, K. D. et al. Array painting reveals a high frequency of balanced translocations in breast cancer cell lines that break in cancer-relevant genes. Oncogene 27, 3345–3359 (2008).
Article PubMed CAS Google Scholar
Kalhor, R., Tjong, H., Jayathilaka, N., Alber, F. & Chen, L. Genome architectures revealed by tethered chromosome conformation capture and population-based modeling. Nat. Biotechnol. 30, 90–98 (2011).
Article PubMed PubMed Central CAS Google Scholar
Branco, M. R. & Pombo, A. Intermingling of chromosome territories in interphase suggests role in translocations and transcription-dependent associations. PLoS Biol. 4, e138 (2006).
Article PubMed PubMed Central CAS Google Scholar
Tjong, H. et al. Population-based 3D genome structure analysis reveals driving forces in spatial genome organization. Proc. Natl. Acad. Sci. USA 113, E1663–E1672 (2016).
Article PubMed PubMed Central CAS Google Scholar
Chung, I. F. et al. DriverDBv2: a database for human cancer driver gene research. Nucleic Acids Res. 44, D975–D979 (2016).
Article PubMed CAS Google Scholar

Download references

Acknowledgements

The authors thank P. Shreckengast for collecting the HCC1187 cells; C. Robinett and A. Lau for their comments on the manuscript; and B. Hanson and M. Bolisetty for their help in setting up the initial nanopore runs. Research reported in this publication was partially supported by the National Cancer Institute of the National Institutes of Health under Award Number P30CA034196. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Author information

These authors contributed equally: Liang Gong, Chee-Hong Wong.

Authors and Affiliations

The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
Liang Gong, Chee-Hong Wong, Harianto Tjong, Francesca Menghi, Chew Yee Ngan, Edison T. Liu & Chia-Lin Wei
China Medical University, Taichung, Taiwan
Wei-Chung Cheng & Chia-Lin Wei

Authors

Liang Gong
View author publications
You can also search for this author in PubMed Google Scholar
Chee-Hong Wong
View author publications
You can also search for this author in PubMed Google Scholar
Wei-Chung Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Harianto Tjong
View author publications
You can also search for this author in PubMed Google Scholar
Francesca Menghi
View author publications
You can also search for this author in PubMed Google Scholar
Chew Yee Ngan
View author publications
You can also search for this author in PubMed Google Scholar
Edison T. Liu
View author publications
You can also search for this author in PubMed Google Scholar
Chia-Lin Wei
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

L.G., C.-H.W., and C.-L.W. designed the experiment, analyzed the data, and wrote the manuscript. L.G. performed the experiments. C.-H.W. developed the Picky pipeline. W.-C.C. analyzed the TCGA data. H.T. performed the ICP analysis. F.M., C.Y.N., E.T.L., and C.-L.W. contributed to manuscript preparation.

Corresponding author

Correspondence to Chia-Lin Wei.

Ethics declarations

Competing interests

L.G., C.-H.W., and C.-L.W. have received a few batches of reagent from Oxford Nanopore. C.-L.W. has received travel and accommodation support from Oxford Nanopore as an invited speaker at the Oxford Nanopore user meeting.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Supplementary Figure 1 Correlation between read length and percentage of reads with breakpoints.

Each blue dot represents a single 2D nanopore run. N = 13

Supplementary Figure 2 Analysis for phased SVs using multi-breakpoint long reads.

(a) The total counts and (b) the log-likelihood of the adjacent SVs phased by the multi-breakpoint long reads. Red count indicates observation > 2X expected. Blue count indicates observation < 0.5X expected. N = 2,374.

Supplementary Figure 3 Examples of validated breakpoints and their detailed junction sequences.

Nanopore read-to-genome alignments, junction sequences and affected genes were shown in each SV class. The micro-homologous sequences shared between junctions were highlighted in red boxes. (a) TDJ. (b) INS. (c) DEL. (d) INV. (e) TLC. The translocation t(1;8) identified is consistent with translocation identified previously by spectral karyotyping (SKY)³² with base resolution. (f) Amplified PCR fragments across breakpoints for each SVs shown in (a)-(e) were analyzed by Bioanalyzer (Agilent Technologies). L: molecular size markers. Independent repeats = 2.

Supplementary Figure 4 The sensitivity and specificity of the Picky-called SVs.

(a) Summary of the validated SVs by PCR strategy. (b) Numbers of SVs called by LUMPY from different depth of short-read data. *: deletion and DEL in INDEL. **: thresholds used in SV calling by LUMPY (see Online Methods). ***: not called by standard LUMPY pipeline. (c) The numbers of high confidence SVs previously described in HCC1187 detected by nanopore sequencing.

Supplementary Figure 5 The prevalence of SV heterozygosity in the HCC1187 genome.

(a) PCR products corresponding to different haplotypes in two validated SVs. Independent repeats = 2. (b) Reads supporting both SV and the normal genotypes from the same locus were visualized in IGV browser. (c) Heterozygosity analysis from 50 randomly selected loci from each of the seven SV types.

Supplementary Figure 6 A comprehensive comparison of SV detection in long-read and short-read analyses.

LR, long-read. SR, short-read. (a) Numbers of SVs found in each data and their overlaps. (b) Distributions of the SV span size.

Supplementary Figure 7 Comparison of Picky, Sniffles, and NanoSV.

Overview of the different components and features among Picky, Sniffles and NanoSV. Yes represents the SV type can be reported by the pipeline while N/A represents that cannot be reported.

Supplementary Figure 8 The SV span distributions and the SVs enriched in repeat regions.

(a) The span distribution of DEL, INS and INDEL. (b) Relative percentages of repeats across different span sizes in simple DEL. (c) Relative percentages of repeats across different span sizes in simple INS.

Supplementary Figure 9 Selected cases of micro-insertions from nanopore results confirmed by PacBio sequencing.

(a) A 36 bp insertion associated with a 329 bp deletion on chromosome 20. (b) A 75 bp insertion associated with a 3,262 bp deletion on chromosome X.

Supplementary Figure 10 Distribution of the SV breakpoints along the genomic features of transcription.

(a) Enrichment of breakpoint from each SV class. (b) Distributions of the breakpoints from different types of TDCs.

Supplementary Figure 11 Control of the multidimensional scaling (MDS) analysis.

(a) Histogram of gene expression from SVs-genes (log₂ transferred). (b) Histogram of gene expression from the control genes. Similar expression profiles and the equivalent numbers of SVs-genes are shown (log₂ transferred). (c) The MDS plot expressions of the SVs-genes by sample-wise permutation. Red, non-TNBC tissues, n = 851. Green, TNBC tissues, n = 113. (d) The MDS plot of the expressions from the control genes. All data are from the breast carcinoma (BRCA) dataset within the cancer genome atlas (TCGA). Red, non-TNBC tissues, n = 851. Green, TNBC tissues, n = 113.

Supplementary Figure 12 The logic and criteria used to define seven SV types by Picky.

High-scoring Segment Pair (HSP) i between the read segment and the reference segment is denoted by Qi and Si respectively. Linked alignment extensions with 3 segments will have 3 HSPs indicated by Q1:S1, Q2:S2 and Q3:S3. Each segment span is denoted by (start,end] as per UCSC 0-start, half-open coordinate system. sDiff between reference segment Si and Si+1 is given by Si+1(start)-Si(end). qDiff between read segment Qi and Qi+1 is given by Qi+1(start)-Qi(end).

Supplementary Figure 13 Homopolymer analysis of nanopore reads.

(a) The ratio of the observed versus expected instances of all 1,024 5-mers. Highlighted are the 4 under-called homopolymers. (b) The annotated current trace for the segment harboring basecalled deletion. The trace indicates the clear existence of the two homopolymers (marked (A)₂₀ and (T)₁₈) rather than the deletion flanked by (A)₅ and (T)₅.

Supplementary Figure 14

Overview of the process of assigning breakpoints to their corresponding genomic features on the basis of the gene model.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–14

Reporting Summary

Combined Supplementary Information

Supplementary Note 1

Supplementary Table 1

Summary of the 15 nanopore runs in this study

Supplementary Table 2

Summary of the mapping and SV-calling results

Supplementary Table 3

List of seven SV types detected in nanopore data

Supplementary Table 4

SVs selected for validation analysis

Supplementary Table 5

Details of nanopore sequencing kits, devices, and software

Supplementary Table 6

List of all primers used in this study

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gong, L., Wong, CH., Cheng, WC. et al. Picky comprehensively detects high-resolution structural variants in nanopore long reads. Nat Methods 15, 455–460 (2018). https://doi.org/10.1038/s41592-018-0002-6

Download citation

Received: 01 September 2017
Accepted: 14 March 2018
Published: 30 April 2018
Issue Date: June 2018
DOI: https://doi.org/10.1038/s41592-018-0002-6

This article is cited by

Regulation and function of transposable elements in cancer genomes
- Michael Lee
- Syed Farhan Ahmad
- Jian Xu
Cellular and Molecular Life Sciences (2024)
SVcnn: an accurate deep learning-based method for detecting structural variation based on long-read data
- Yan Zheng
- Xuequn Shang
BMC Bioinformatics (2023)
A survey of algorithms for the detection of genomic structural variants from long-read sequencing data
- Mian Umair Ahsan
- Qian Liu
- Kai Wang
Nature Methods (2023)
Comparison of structural variants detected by PacBio-CLR and ONT sequencing in pear
- Yueyuan Liu
- Mingyue Zhang
- Jun Wu
BMC Genomics (2022)
Chromosome-level genome assemblies of four wild peach species provide insights into genome evolution and genetic basis of stress resistance
- Ke Cao
- Zhen Peng
- Lirong Wang
BMC Biology (2022)