Read mapping methods optimized for multiple GPGPU

buir.advisorAlkan, Can
dc.contributor.authorNouri, Azita
dc.date.accessioned2016-09-05T12:30:39Z
dc.date.available2016-09-05T12:30:39Z
dc.date.copyright2016-07
dc.date.issued2016-07
dc.date.submitted2016-09-02
dc.descriptionCataloged from PDF version of article.en_US
dc.descriptionThesis (M.S.): Bilkent University, Department of Computer Engineering, İhsan Doğramacı Bilkent University, 2016.en_US
dc.descriptionIncludes bibliographical references (leaves 52-57).en_US
dc.description.abstractDNA sequence alignment problem can be broadly defined as the character-level comparison of DNA sequences obtained from one or more samples against a database of reference (i.e., consensus) genome sequence of the same or a similar species. High throughput sequencing (HTS) technologies were introduced in 2006, and the latest iterations of HTS technologies are able to read the genome of a human individual in just three days for a cost of ~ $1,000. With HTS technologies we may encounter massive amount of reads available in different size and they also present a computational problem since the analysis of the HTS data requires the comparison of >1 billion short (100 characters, or base pairs) \reads" against a very long (3 billion base pairs) reference genome. Since DNA molecules are composed of two opposing strands (i.e. two complementary strings), the number of required comparisons are doubled. It is therefore present a diffcult and important challenge of mapping in terms of execution time and scalability with this volume of different-size short reads. Instead of calculating billions of local alignment of short vs long sequences using a quadratic-time algorithm, heuristics are applied to speed up the process. First, partial sequence matches, called \seeds", are quickly found using either Burrows Wheeler Transform (BWT) followed with Ferragina-Manzini Index (FM), or a simple hash table. Next, the candidate locations are verified using a dynamic programming alignment algorithm that calculates Levenshtein edit distance (mismatches, insertions, deletions different from reference), which runs in quadratic time. Although these heuristics are substantially faster than local alignment, because of the repetitive nature of the human genome, they often require hundreds of verification runs per read, imposing a heavy computational burden. However, all of these billions of alignments are independent from each other, thus the read mapping problem presents itself as embarrassingly parallel. In this thesis we propose novel algorithms that are optimized for multiple graphic processing units (GPGPUs) to accelerate the read mapping procedure beyond the capabilities of algorithmic improvements that only use CPUs. We distribute the read mapping workload into the massively parallel architecture of GPGPUs to performing millions of alignments simultaneously, using single or many GPGPUs, together with multi-core CPUs. Our aim is to reduce the need for large scale clusters or cloud platforms to a single server with advanced parallel processing units.en_US
dc.description.provenanceSubmitted by Betül Özen (ozen@bilkent.edu.tr) on 2016-09-05T12:30:39Z No. of bitstreams: 1 Thesis.pdf: 2688789 bytes, checksum: 251ac46211136207410ab0c8c62f849a (MD5)en
dc.description.provenanceMade available in DSpace on 2016-09-05T12:30:39Z (GMT). No. of bitstreams: 1 Thesis.pdf: 2688789 bytes, checksum: 251ac46211136207410ab0c8c62f849a (MD5) Previous issue date: 2016-09en
dc.description.statementofresponsibilityby Azita Nouri.en_US
dc.embargo.release2018-06-01
dc.format.extentxiii, 57 leaves : illustrations (some color), charts.en_US
dc.identifier.itemidB154007
dc.identifier.urihttp://hdl.handle.net/11693/32197
dc.language.isoEnglishen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjectCUDAen_US
dc.subjectNeedleman-Wunsch.en_US
dc.titleRead mapping methods optimized for multiple GPGPUen_US
dc.title.alternativeÇoklu GPGPU sistemleri için eniyilenmiş dizi hizalama yöntemlerien_US
dc.typeThesisen_US
thesis.degree.disciplineComputer Engineering
thesis.degree.grantorBilkent University
thesis.degree.levelMaster's
thesis.degree.nameMS (Master of Science)

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Thesis.pdf
Size:
2.56 MB
Format:
Adobe Portable Document Format
Description:
Full printable version

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: