Whole genome alignment via Alternating Lyndon Factorization Tree traversal

Limited Access
This item is unavailable until:
2024-01-18

Date

2023-07

Editor(s)

Advisor

Alkan, Can

Supervisor

Co-Advisor

Co-Supervisor

Instructor

Source Title

Print ISSN

Electronic ISSN

Publisher

Bilkent University

Volume

Issue

Pages

Language

English

Journal Title

Journal ISSN

Volume Title

Series

Abstract

The Whole Genome Alignment Problem (WGA) is an important challenge in the field of genomics, especially in the context of pangenome construction. Here we propose a novel indexing structure called the Alternating Lyndon Factor-ization Tree (ALFTree), which incorporates both spatial and lexicographical information within its nodes. The ALFTree is a powerful tool for WGA, as it can efficiently store and retrieve information about large DNA sequences. We present an algorithm, namely Idoneous, specifically designed to construct the ALFTree from a given DNA sequence. The algorithm works by generating intervals of specific sizes, identifying matches within these intervals, and perform-ing a sanity check through alignment procedures. The algorithm is efficient and scalable, making it a valuable tool for WGA. Some of the key features of the ALFTree are 1) compact and efficient data structure for storing large DNA sequences; 2) efficient retrieval of information about specific regions of a DNA sequence; 3) ability to handle both spatial and lexicographical information; and 4) scalability to large DNA sequences. Our experimental results on different genomes highlight the effects of param-eter selections on coverage and identity. Idoneous demonstrates competitive per-formance in terms of coverage and provides flexibility in adjusting sensitivity and specificity for different alignment scenarios. The ALFTree has the potential to significantly improve the performance of WGA algorithms. We believe that the ALFTree is a valuable contribution to the field of genomics, and we hope that it will be used by researchers to accelerate the pace of discovery.

Course

Other identifiers

Book Title

Citation

item.page.isversionof