Algorithms for the discovery of large genomic inversions using pooled clone sequencing

Rasekh, Marzieh Eslami

Algorithms for the discovery of large genomic inversions using pooled clone sequencing

buir.advisor	Alkan, Can
dc.contributor.author	Rasekh, Marzieh Eslami
dc.date.accessioned	2016-07-01T11:11:28Z
dc.date.available	2016-07-01T11:11:28Z
dc.date.issued	2015
dc.description	Cataloged from PDF version of article.	en_US
dc.description.abstract	An inversion is a chromosomal rearrangement in which an internal segment of a chromosome has been broken twice, flipped 180 degrees, and rejoined. Most known examples of large inversions were found indirectly from studies on human disease where inversions have no detectable effect in parents, but increase the risk of a disease-associated rearrangement in the offspring. The development of a map of inversion polymorphisms will provide valuable information regarding their distribution and frequency in the human genome and will help unravel how inversions and the segmental duplications architecture associated with inverted haplotypes contribute to genomic susceptibility to disease rearrangements. The 1000 Genomes Project spearheaded the development of several methods to identify inversions, however, they are limited to relatively short inversions, and there are currently no available algorithms to discover large inversions using high throughput sequencing technologies (HTS). This is mainly because the breakpoints of such events typically lie within segmental duplications and common repeats, reducing the mappability of short reads. We propose using pooled clone sequencing (PCS), a method originally developed to improve haplotype phasing, to characterize large genomic inversions. PCS merges the advantages of clone based sequencing approaches with the speed and cost efficiency of HTS technologies. Using this sequencing data, we developed a novel algorithm, dipSeq for discovering large inversions (>500 Kbp) following the observation that clones that span the inversion breakpoint will be split into two sections, split clones, when mapped to the reference genome. We evaluate the performance of dipSeq on 3 sets of simulated data, demonstrating its correctness and robustness to structural duplications and other types of structural variations. We further applied dipSeq to the genome of a HapMap individual (NA12878). dipSeq was able to accurately discover all previously known and experimentally validated large inversions. We also identified a new inversion and confirmed using fluorescent in situ hybridization. Although dipSeq displays a relatively high false positive rate using real data, it performed better with simulated data, suggesting that the performance with the NA12878 genome may be improved with higher depth of coverage.	en_US
dc.description.statementofresponsibility	Rasekh, Marzieh Eslami	en_US
dc.format.extent	xvii, 109 leaves	en_US
dc.identifier.itemid	B151123
dc.identifier.uri	http://hdl.handle.net/11693/30059
dc.language.iso	English	en_US
dc.rights	info:eu-repo/semantics/openAccess	en_US
dc.subject	structural variation	en_US
dc.subject	pooled clone sequencing	en_US
dc.subject	inversion detection	en_US
dc.subject.lcc	B151123	en_US
dc.title	Algorithms for the discovery of large genomic inversions using pooled clone sequencing	en_US
dc.type	Thesis	en_US
thesis.degree.discipline	Computer Engineering
thesis.degree.grantor	Bilkent University
thesis.degree.level	Master's
thesis.degree.name	MS (Master of Science)

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 0006986.pdf
Size:: 10.22 MB
Format:: Adobe Portable Document Format
Description:: Full printable version

Download

Collections

Graduate School of Engineering and Science