Articles

A sequential Monte Carlo framework for haplotype inference in CNV/SNP genotype data

Iliadis, Alexandros; Anastassiou, Dimitris; Wang, Xiaodong

Copy number variations (CNVs) are abundant in the human genome. They have been associated with complex traits in genome-wide association studies (GWAS) and expected to continue playing an important role in identifying the
etiology of disease phenotypes. As a result of current high throughput whole-genome single-nucleotide polymorphism (SNP) arrays, we currently have datasets that simultaneously have integer copy numbers in CNV regions as well as SNP genotypes. At the same time, haplotypes that have been shown to offer advantages over genotypes in identifying disease traits even though available for SNP genotypes are largely not available for CNV/SNP data due to insufficient computational tools. We introduce a new framework for inferring haplotypes in CNV/SNP data using a sequential Monte Carlo sampling scheme ‘Tree-Based Deterministic Sampling CNV’ (TDSCNV). We compare our method with polyHap(v2.0), the only currently available software able to perform inference in CNV/SNP genotypes, on datasets of varying number of markers. We have found that both algorithms show similar accuracy but TDSCNV is an order of magnitude faster while scaling linearly with the number of markers and number of individuals and thus could
be the method of choice for haplotype inference in such datasets. Our method is implemented in the TDSCNV package which is available for download at www.ee.columbia.edu/~anastas/tdscnv.

Files

Also Published In

Title
EURASIP Journal on Bioinformatics and Systems Biology
DOI
https://doi.org/10.1186/1687-4153-2014-7

More About This Work

Academic Units
Electrical Engineering
Published Here
September 23, 2014