On genomic repeats and reproducibility

dc.citation.epage2247en_US
dc.citation.issueNumber15en_US
dc.citation.spage2243en_US
dc.citation.volumeNumber32en_US
dc.contributor.authorFirtina, C.en_US
dc.contributor.authorAlkan C.en_US
dc.date.accessioned2018-04-12T10:44:41Z
dc.date.available2018-04-12T10:44:41Z
dc.date.issued2016en_US
dc.departmentDepartment of Computer Engineeringen_US
dc.description.abstractResults: Here, we present a comprehensive analysis on the reproducibility of computational characterization of genomic variants using high throughput sequencing data. We reanalyzed the same datasets twice, using the same tools with the same parameters, where we only altered the order of reads in the input (i.e. FASTQ file). Reshuffling caused the reads from repetitive regions being mapped to different locations in the second alignment, and we observed similar results when we only applied a scatter/gather approach for read mapping - without prior shuffling. Our results show that, some of the most common variation discovery algorithms do not handle the ambiguous read mappings accurately when random locations are selected. In addition, we also observed that even when the exact same alignment is used, the GATK HaplotypeCaller generates slightly different call sets, which we pinpoint to the variant filtration step. We conclude that, algorithms at each step of genomic variation discovery and characterization need to treat ambiguous mappings in a deterministic fashion to ensure full replication of results. Availability and Implementation: Code, scripts and the generated VCF files are available at DOI:10.5281/zenodo.32611.en_US
dc.description.provenanceMade available in DSpace on 2018-04-12T10:44:41Z (GMT). No. of bitstreams: 1 bilkent-research-paper.pdf: 179475 bytes, checksum: ea0bedeb05ac9ccfb983c327e155f0c2 (MD5) Previous issue date: 2016en
dc.identifier.doi10.1093/bioinformatics/btw139en_US
dc.identifier.issn1367-4803
dc.identifier.urihttp://hdl.handle.net/11693/36571
dc.language.isoEnglishen_US
dc.publisherOxford University Pressen_US
dc.relation.isversionofhttp://dx.doi.org/10.1093/bioinformatics/btw139en_US
dc.source.titleBioinformaticsen_US
dc.subjectDNAen_US
dc.subjectDNA sequenceen_US
dc.subjectGenomeen_US
dc.subjectGenomicsen_US
dc.subjectHigh throughput sequencingen_US
dc.subjectReproducibilityen_US
dc.subjectHigh-throughput nucleotide sequencingen_US
dc.subjectReproducibility of resultsen_US
dc.subjectSequence analysisen_US
dc.titleOn genomic repeats and reproducibilityen_US
dc.typeArticleen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
On genomic repeats and reproducibility.pdf
Size:
101.48 KB
Format:
Adobe Portable Document Format
Description:
Full printable version