Efficient variation graph construction using locally consistent parsing

buir.advisorAlkan, Can
dc.contributor.authorAshyralyyev, Akmuhammet
dc.date.accessioned2025-09-02T12:44:02Z
dc.date.available2025-09-02T12:44:02Z
dc.date.copyright2025-08
dc.date.issued2025-08
dc.date.submitted2025-08-29
dc.descriptionCataloged from PDF version of article.
dc.descriptionIncludes bibliographical references (leaves 31-40).
dc.description.abstractEfficient and consistent string processing is critical in the exponentially growing genomic data era. Locally Consistent Parsing (LCP) addresses this need by partitioning an input genome string into short, exactly matching substrings (e.g., “cores”), ensuring consistency across partitions. Labeling the cores of an input string consistently not only provides a compact representation of the input but also enables the reapplication of LCP to refine the cores over multiple iterations, providing a progressively longer and more informative set of substrings for downstream analyses. We present the first iterative implementation of LCP with Lcptools and demonstrate its effectiveness in identifying cores with minimal collisions. Experimental results show that the number of cores at the ith iteration is O(n/ci) for c ∼ 2.34, while the average length and the average distance between consecutive cores are O(ci). Compared to the popular sketching techniques, LCP produces significantly fewer cores, enabling a more compact representation and faster analyses. To demonstrate the advantages of LCP in genomic string processing in terms of computation and memory efficiency, we also introduce LCPan, an efficient variation graph constructor. We show that LCPan generates variation graphs >10× faster than vg, while using >13× less memory.
dc.description.statementofresponsibilityby Akmuhammet Ashyralyyev
dc.embargo.release2026-02-28
dc.format.extentix, 40 leaves : charts ; 30 cm.
dc.identifier.itemidB163194
dc.identifier.urihttps://hdl.handle.net/11693/117474
dc.language.isoEnglish
dc.rightsinfo:eu-repo/semantics/openAccess
dc.subjectLocally consistent parsing
dc.subjectGenome representation
dc.subjectVariation graph
dc.titleEfficient variation graph construction using locally consistent parsing
dc.title.alternativeYerel tutarlı ayrıştırma kullanan verimli varyasyon çizgesi oluşturucu
dc.typeThesis
thesis.degree.disciplineComputer Engineering
thesis.degree.grantorBilkent University
thesis.degree.levelMaster's
thesis.degree.nameMS (Master of Science)

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
B163194.pdf
Size:
360.96 KB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.1 KB
Format:
Item-specific license agreed upon to submission
Description: