Discovery and genotyping of novel sequence insertions in many sequenced individuals

dc.citation.epagei169en_US
dc.citation.issueNumber14en_US
dc.citation.issueNumber14en_US
dc.citation.spagei161en_US
dc.citation.volumeNumber33en_US
dc.contributor.authorKavak, P.en_US
dc.contributor.authorLin, Yen-Yien_US
dc.contributor.authorNumanagić, I.en_US
dc.contributor.authorAsghari, H.en_US
dc.contributor.authorGüngör, T.en_US
dc.contributor.authorAlkan C.en_US
dc.contributor.authorHach, F.en_US
dc.date.accessioned2018-04-12T11:46:34Z
dc.date.available2018-04-12T11:46:34Z
dc.date.issued2017en_US
dc.departmentDepartment of Computer Engineeringen_US
dc.description.abstractMotivation: Despite recent advances in algorithms design to characterize structural variation using high-throughput short read sequencing (HTS) data, characterization of novel sequence insertions longer than the average read length remains a challenging task. This is mainly due to both computational difficulties and the complexities imposed by genomic repeats in generating reliable assemblies to accurately detect both the sequence content and the exact location of such insertions. Additionally, de novo genome assembly algorithms typically require a very high depth of coverage, which may be a limiting factor for most genome studies. Therefore, characterization of novel sequence insertions is not a routine part of most sequencing projects. There are only a handful of algorithms that are specifically developed for novel sequence insertion discovery that can bypass the need for the whole genome de novo assembly. Still, most such algorithms rely on high depth of coverage, and to our knowledge there is only one method (PopIns) that can use multi-sample data to "collectively" obtain a very high coverage dataset to accurately find insertions common in a given population. Result: Here, we present Pamir, a new algorithm to efficiently and accurately discover and genotype novel sequence insertions using either single or multiple genome sequencing datasets. Pamir is able to detect breakpoint locations of the insertions and calculate their zygosity (i.e. heterozygous versus homozygous) by analyzing multiple sequence signatures, matching one-end-anchored sequences to small-scale de novo assemblies of unmapped reads, and conducting strand-aware local assembly. We test the efficacy of Pamir on both simulated and real data, and demonstrate its potential use in accurate and routine identification of novel sequence insertions in genome projects.en_US
dc.description.provenanceMade available in DSpace on 2018-04-12T11:46:34Z (GMT). No. of bitstreams: 1 bilkent-research-paper.pdf: 179475 bytes, checksum: ea0bedeb05ac9ccfb983c327e155f0c2 (MD5) Previous issue date: 2017en
dc.identifier.doi10.1093/bioinformatics/btx254en_US
dc.identifier.eissn1460-2059en_US
dc.identifier.issn1367-4803en_US
dc.identifier.urihttp://hdl.handle.net/11693/37643en_US
dc.language.isoEnglishen_US
dc.publisherOxford University Pressen_US
dc.relation.isversionofhttp://dx.doi.org/10.1093/bioinformatics/btx254en_US
dc.source.titleBioinformaticsen_US
dc.titleDiscovery and genotyping of novel sequence insertions in many sequenced individualsen_US
dc.typeArticleen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Discovery_and_genotyping_of_novel_sequence_insertions_in_many_sequenced_individuals.pdf
Size:
707.19 KB
Format:
Adobe Portable Document Format
Description:
Full printable version