Breakpoint refinement of genomic structural variation using split read analysis
Item Usage Stats
Genomic variations that vary from single nucleotide polymorphisms (SNPs), small INDELs to structural variations (SVs) are discovered to have significant phenotypic effects on individuals. Among these genomic variations, SVs are changes that affect more than 50 nucleotides of DNA. SVs are linked to the sources of many genetic diseases such as autism, schizophrenia and chronic myelogenous leukemia. Accurate and precise characterization of these structural variants not only enables us to diagnose genetic diseases that are previously correlated with them but also it provides more reliable information to pursue higher levels of research in the genomic research pipelines. There are many SV detection tools that aim to find the approximate locations of SVs in genome, a further step in the pipeline is to refine those breakpoints of variants by a closer and more focused examination. By this means, genotyping step of structural variations would be faster using k-mer based alignment-free methods and more accurate since locations of SVs will be known with 1-5 base pair resolution compared to 300 - 500 base pair long confidence intervals. Moreover, further steps in the genomic pipelines based on the results of SV detection algorithms would have more definite data to build up on. In this thesis, we propose BROSV (Breakpoint Refinement of Structural Variation), a breakpoint refinement algorithm to obtain better resolution on SV breakpoints with split read analysis and local assembly methods using Illumina short reads and BWA alignment tool. Implementation is available at https://github.com/BilkentCompGen/brosv.