Algorithms for the discovery of large genomic inversions using pooled clone sequencing

buir.advisorAlkan, Can
dc.contributor.authorRasekh, Marzieh Eslami
dc.date.accessioned2016-07-01T11:11:28Z
dc.date.available2016-07-01T11:11:28Z
dc.date.issued2015
dc.departmentDepartment of Computer Engineeringen_US
dc.descriptionCataloged from PDF version of article.en_US
dc.description.abstractAn inversion is a chromosomal rearrangement in which an internal segment of a chromosome has been broken twice, flipped 180 degrees, and rejoined. Most known examples of large inversions were found indirectly from studies on human disease where inversions have no detectable effect in parents, but increase the risk of a disease-associated rearrangement in the offspring. The development of a map of inversion polymorphisms will provide valuable information regarding their distribution and frequency in the human genome and will help unravel how inversions and the segmental duplications architecture associated with inverted haplotypes contribute to genomic susceptibility to disease rearrangements. The 1000 Genomes Project spearheaded the development of several methods to identify inversions, however, they are limited to relatively short inversions, and there are currently no available algorithms to discover large inversions using high throughput sequencing technologies (HTS). This is mainly because the breakpoints of such events typically lie within segmental duplications and common repeats, reducing the mappability of short reads. We propose using pooled clone sequencing (PCS), a method originally developed to improve haplotype phasing, to characterize large genomic inversions. PCS merges the advantages of clone based sequencing approaches with the speed and cost efficiency of HTS technologies. Using this sequencing data, we developed a novel algorithm, dipSeq for discovering large inversions (>500 Kbp) following the observation that clones that span the inversion breakpoint will be split into two sections, split clones, when mapped to the reference genome. We evaluate the performance of dipSeq on 3 sets of simulated data, demonstrating its correctness and robustness to structural duplications and other types of structural variations. We further applied dipSeq to the genome of a HapMap individual (NA12878). dipSeq was able to accurately discover all previously known and experimentally validated large inversions. We also identified a new inversion and confirmed using fluorescent in situ hybridization. Although dipSeq displays a relatively high false positive rate using real data, it performed better with simulated data, suggesting that the performance with the NA12878 genome may be improved with higher depth of coverage.en_US
dc.description.degreeM.S.en_US
dc.description.statementofresponsibilityRasekh, Marzieh Eslamien_US
dc.format.extentxvii, 109 leavesen_US
dc.identifier.itemidB151123
dc.identifier.urihttp://hdl.handle.net/11693/30059
dc.language.isoEnglishen_US
dc.publisherBilkent Universityen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjectstructural variationen_US
dc.subjectpooled clone sequencingen_US
dc.subjectinversion detectionen_US
dc.subject.lccB151123en_US
dc.titleAlgorithms for the discovery of large genomic inversions using pooled clone sequencingen_US
dc.typeThesisen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
0006986.pdf
Size:
10.22 MB
Format:
Adobe Portable Document Format
Description:
Full printable version