Using Bloom filters to quickly and efficiently characterize genomic repeats and segmental duplications

Limited Access
This item is unavailable until:
2026-02-28

Date

2025-08

Editor(s)

Advisor

Alkan, Can

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats
16
views
0
downloads

Series

Abstract

Advances in sequencing technologies are expected to further reduce the occurrence of sequencing-related misassemblies. Nevertheless, errors caused by repetitive sequences and duplications remain a persistent challenge and are likely to continue impacting genome assemblies. This highlights the need for fast and efficient algorithms specifically designed to address repeat-induced errors. In this study, we present KonuSeg, a versatile k-mer counting tool that leverages Bloom filters and assigns copy numbers to genomic regions in a segmentbased manner across the genome. KonuSeg employs a non-mapping-based approach that is computationally efficient and readily integrable into assembly graph frameworks, providing improved scalability and memory performance. We demonstrate its effectiveness through comprehensive analyses on data from multiple species under various configurations and evaluate its performance in combination with a widely used scaffolding algorithm to showcase its potential for enhancing assembly quality.

Source Title

Publisher

Course

Other identifiers

Book Title

Degree Discipline

Computer Engineering

Degree Level

Master's

Degree Name

MS (Master of Science)

Citation

Published Version (Please cite this version)

Language

English

Type