Distributed stream-processing framework for graph-based sequence alignment

buir.advisorAlkan, Can
dc.contributor.authorGökkaya, Alim Şükrücan
dc.date.accessioned2020-02-21T11:51:42Z
dc.date.available2020-02-21T11:51:42Z
dc.date.copyright2020-01
dc.date.issued2020-01
dc.date.submitted2020-02-20
dc.descriptionCataloged from PDF version of article.en_US
dc.descriptionThesis (M.S.): Bilkent University, Department of Computer Engineering, İhsan Doğramacı Bilkent University, 2020.en_US
dc.descriptionIncludes bibliographical references (leaves 37-43).en_US
dc.description.abstractOptimized the sequence alignment pipelines are needed to minimize the time required to complete processing the short-read genomic data. Today there are many sequence alignment tools exist, yet few of them are capable of directly ingesting the streaming base-call data. The sequencing has to be entirely completed before the mainstream aligners can begin mapping the reads to the reference. The sequencing process can take days to complete. The output is then needs to be demultiplexed into individual reads and aligned to the reference, which can take several more hours. Overall time of a genomic analysis can be shortened significantly by progressively computing the alignments at the time when the reads are still being generated. It is important to have genomic analysis done as quickly as possible, especially in life critical situations. Here we introduce a distributed stream processing framework for aligning short-reads into a graph representation of the genome. The massively parallel nature of the genomic sequencing data requires a massively parallel computation architecture. Thus we have designed our pipeline called R2G2Flow to align many reads to a de Bruijn graph in parallel. Our aligning method is specialized for the sequencing technologies that are based on base-call cycles, such as produced by Illumina. The results are made available soon after the final bases from the sequencing devices has been emitted. R2G2Flow is available at https://github.com/BilkentCompGen/r2g2en_US
dc.description.provenanceSubmitted by Betül Özen (ozen@bilkent.edu.tr) on 2020-02-21T11:51:42Z No. of bitstreams: 1 10328114.pdf: 413293 bytes, checksum: 1931e14a10fb3985777d7ffa1ace7efc (MD5)en
dc.description.provenanceMade available in DSpace on 2020-02-21T11:51:42Z (GMT). No. of bitstreams: 1 10328114.pdf: 413293 bytes, checksum: 1931e14a10fb3985777d7ffa1ace7efc (MD5) Previous issue date: 2020-02en
dc.description.statementofresponsibilityby Alim Şükrücan Gökkayaen_US
dc.format.extentx, 45 leaves : charts (some color) ; 30 cm.en_US
dc.identifier.itemidB149891
dc.identifier.urihttp://hdl.handle.net/11693/53472
dc.language.isoEnglishen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjectRead mappingen_US
dc.subjectde Bruijn graphsen_US
dc.subjectStream processingen_US
dc.titleDistributed stream-processing framework for graph-based sequence alignmenten_US
dc.title.alternativeÇizge tabanlı okuma hızalandırması için dağıtık akıntı işleme sistemien_US
dc.typeThesisen_US
thesis.degree.disciplineComputer Engineering
thesis.degree.grantorBilkent University
thesis.degree.levelMaster's
thesis.degree.nameMS (Master of Science)

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
10328114.pdf
Size:
403.61 KB
Format:
Adobe Portable Document Format
Description:
Full printable version
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: