Show simple item record

dc.contributor.advisorAlkan, Canen_US
dc.contributor.authorDemir, Gülfemen_US
dc.date.accessioned2017-03-31T10:00:17Z
dc.date.available2017-03-31T10:00:17Z
dc.date.copyright2017-03
dc.date.issued2017-03
dc.date.submitted2017-03-22
dc.identifier.urihttp://hdl.handle.net/11693/32937
dc.descriptionCataloged from PDF version of article.en_US
dc.descriptionThesis (M.S.): Bilkent University, Department of Computer Engineering, İhsan Doğramacı Bilkent University, 2017.en_US
dc.descriptionIncludes bibliographical references (leaves 43-48).en_US
dc.description.abstractTandem repeats are pieces of DNA where a pattern has multiple consecutive copies adjacent to itself. If the repeat unit (pattern) consists of 2 to 6 nucleotides, it can be referred to as a short tandem repeat or a microsatellite. There are many genetic diseases (such as huntington disease and Fragile-X syndrome) linked with STR expansions and because tandem repeats make up 3% of the sequenced human genome, STR detection research is significant. STR variations have always been a challenge for genome assembly and sequence alignment due to their repetitive nature, sequencing errors, short read lengths, and high incidence of polymerase slippage at STR regions. Despite the information they carry being very valuable, STR variations have not gained enough attention to be a permanent step in genome sequence analysis pipelines. After the 1000 Genomes Project, which aimed to establish the most detailed genetic variation catalogue for humans, the consortium released only two STR prediction sets which are identified by two STR caller tools, lobSTR and RepeatSeq. Many other large research efforts have failed to shed light on STR variations. The main aim of this study is to use sequence assembly methods for regions where we know that there is an STR, based on reference genome, and release a complete pipeline from sample's reads to STR genotype. The assembly problem we are dealing with in the scope of this thesis can be considered as local assembly, which is the assembly procedure of reads that maps to a small part of the genome. We will be focusing on two general assembly approaches that make use of graph data structures: de Bruijn graph (DBG) based methods that rely on a variant of kmer graph, overlap-layout-consensus (OLC) methods that are based on an overlap graph. Even though sequence assembly is a well studied problem, there is not any work that uses assembly algorithms to characterize STRs. We demonstrate that using sequence assembly on STR regions increases the true positive rate compared to state-of-art tools. We evaluated the performance of three different local assembly methods on three different experimental settings: focusing on (i) genotype based performance, (ii) coverage impact, and (iii) evaluating pre-processing and including anking regions. All these experiments supported our initial expectations on using assembly. Besides, we show that OLC based assembly methods bring much higher sensitivity to STR variant calling when compared to DBG based approach. This concludes that assembly with OLC is a better way for genotyping STRs according to our experiments.en_US
dc.description.statementofresponsibilityby Gülfem Demir.en_US
dc.format.extentxiv, 58 leaves : charts (some color) ; 29 cm.en_US
dc.language.isoEnglishen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjectShort tandem repeaten_US
dc.subjectSequence assemblyen_US
dc.subjectNext generation sequencingen_US
dc.titleCharacterization of short tandem repeats using local assemblyen_US
dc.title.alternativeLokal DNA birlestirme methodu ile mikrosatellitlerin bulunmasıen_US
dc.typeThesisen_US
dc.departmentDepartment of Computer Engineeringen_US
dc.publisherBilkent Universityen_US
dc.description.degreeM.S.en_US
dc.identifier.itemidB155347
dc.embargo.release2018-04-20


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record