Causal mutation discovery using next generation sequencing data: development and application of a pipeline to reduce false positive calls and to map regions of shared homozygosity and IBD
Date
Editor(s)
Advisor
Supervisor
Co-Advisor
Co-Supervisor
Instructor
Source Title
Print ISSN
Electronic ISSN
Publisher
Volume
Issue
Pages
Language
Type
Journal Title
Journal ISSN
Volume Title
Attention Stats
Usage Stats
views
downloads
Series
Abstract
Next generation sequencing technologies have brought enormous successes for disease gene discovery but also challenges for data analysis, particularly in genomic regions with low or low quality sequence coverage. Errors in variant calling may lead to missing true variants or to calling many false positives. The false discovery rate can be reduced by optimizing variant calling thresholds such as quality of base pair identification, mapping, and alignment. However, such optimization strategies are often associated with the loss of true variants. We present and apply a pipeline for variant identification and verification using aligned sequences of related individuals. It is comprised of three modules: (1) an identification pipeline for de novo variants where data of parents and siblings are aligned in order to rule out false positive calls in children, false negative calls in parents, and indel artifacts; (2) a homozygosity mapping and IBD analysis module; and (3) a variant read depth module that reveals variants that may have been missed due to sequence coverage and quality issues. We applied module (1) to a large trio-based gene discovery project and reduced the number of variant calling errors by 74%, thereby significantly streamlining the experimental validation protocol for potential de novo variants. We also applied the pipeline to the discovery of the gene responsible for mega corpus callosum and microcephaly with developmental delay, and epilepsy in a brother and sister whose unaffected parents were first cousins. Our error correction pipeline significantly improved homozygosity mapping and IBD analysis and facilitated the rapid identification of the causal allele in this family.