On genomic repeats and reproducibility

Date

2016

Authors

Firtina, C.
Alkan C.

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

Source Title

Bioinformatics

Print ISSN

1367-4803

Electronic ISSN

Publisher

Oxford University Press

Volume

32

Issue

15

Pages

2243 - 2247

Language

English

Journal Title

Journal ISSN

Volume Title

Series

Abstract

Results: Here, we present a comprehensive analysis on the reproducibility of computational characterization of genomic variants using high throughput sequencing data. We reanalyzed the same datasets twice, using the same tools with the same parameters, where we only altered the order of reads in the input (i.e. FASTQ file). Reshuffling caused the reads from repetitive regions being mapped to different locations in the second alignment, and we observed similar results when we only applied a scatter/gather approach for read mapping - without prior shuffling. Our results show that, some of the most common variation discovery algorithms do not handle the ambiguous read mappings accurately when random locations are selected. In addition, we also observed that even when the exact same alignment is used, the GATK HaplotypeCaller generates slightly different call sets, which we pinpoint to the variant filtration step. We conclude that, algorithms at each step of genomic variation discovery and characterization need to treat ambiguous mappings in a deterministic fashion to ensure full replication of results. Availability and Implementation: Code, scripts and the generated VCF files are available at DOI:10.5281/zenodo.32611.

Course

Other identifiers

Book Title

Degree Discipline

Degree Level

Degree Name

Citation