Whole genome alignment via Alternating Lyndon Factorization Tree traversal
buir.advisor | Alkan, Can | |
dc.contributor.author | Aydın, Mahmud Sami | |
dc.date.accessioned | 2023-07-26T08:01:33Z | |
dc.date.available | 2023-07-26T08:01:33Z | |
dc.date.copyright | 2023-07 | |
dc.date.issued | 2023-07 | |
dc.date.submitted | 2023-07 | |
dc.description | Cataloged from PDF version of article. | |
dc.description | Thesis (Master's): Bilkent University, Department of Computer Engineering, İhsan Doğramacı Bilkent University, 2023. | |
dc.description | Includes bibliographical references (leaves 67-74). | |
dc.description.abstract | The Whole Genome Alignment Problem (WGA) is an important challenge in the field of genomics, especially in the context of pangenome construction. Here we propose a novel indexing structure called the Alternating Lyndon Factor-ization Tree (ALFTree), which incorporates both spatial and lexicographical information within its nodes. The ALFTree is a powerful tool for WGA, as it can efficiently store and retrieve information about large DNA sequences. We present an algorithm, namely Idoneous, specifically designed to construct the ALFTree from a given DNA sequence. The algorithm works by generating intervals of specific sizes, identifying matches within these intervals, and perform-ing a sanity check through alignment procedures. The algorithm is efficient and scalable, making it a valuable tool for WGA. Some of the key features of the ALFTree are 1) compact and efficient data structure for storing large DNA sequences; 2) efficient retrieval of information about specific regions of a DNA sequence; 3) ability to handle both spatial and lexicographical information; and 4) scalability to large DNA sequences. Our experimental results on different genomes highlight the effects of param-eter selections on coverage and identity. Idoneous demonstrates competitive per-formance in terms of coverage and provides flexibility in adjusting sensitivity and specificity for different alignment scenarios. The ALFTree has the potential to significantly improve the performance of WGA algorithms. We believe that the ALFTree is a valuable contribution to the field of genomics, and we hope that it will be used by researchers to accelerate the pace of discovery. | |
dc.description.provenance | Made available in DSpace on 2023-07-26T08:01:33Z (GMT). No. of bitstreams: 1 B162254.pdf: 1456115 bytes, checksum: 2920c1a928d3a66cb7c19bd059cbc23c (MD5) Previous issue date: 2023-07 | en |
dc.description.statementofresponsibility | by Mahmud Sami Aydın | |
dc.embargo.release | 2024-01-18 | |
dc.format.extent | xvi, 75 leaves : charts ; 30 cm. | |
dc.identifier.itemid | B162254 | |
dc.identifier.uri | https://hdl.handle.net/11693/112437 | |
dc.language.iso | English | |
dc.rights | info:eu-repo/semantics/openAccess | |
dc.subject | Whole genome alignment | |
dc.subject | Indexing | |
dc.subject | Lyndon factorization | |
dc.title | Whole genome alignment via Alternating Lyndon Factorization Tree traversal | |
dc.title.alternative | Almaşık Lyndon Faktörizasyon Ağacında gezinerek tüm genom hizalama | |
dc.type | Thesis | |
thesis.degree.discipline | Computer Engineering | |
thesis.degree.grantor | Bilkent University | |
thesis.degree.level | Master's | |
thesis.degree.name | MS (Master of Science) |