Characterization of structural variation through assembly-to-assembly comparison

buir.advisorAlkan, Can
dc.contributor.authorÇoktalaş, Muhammet Rafi
dc.date.accessioned2024-09-18T06:43:07Z
dc.date.available2024-09-18T06:43:07Z
dc.date.copyright2024-09
dc.date.issued2024-09
dc.date.submitted2024-09-13
dc.descriptionCataloged from PDF version of article.en_US
dc.descriptionIncludes bibliographical references (leaves 48-56).en_US
dc.description.abstractStructural variations (SVs) are genomic variations affecting more than 50 nucleotides of DNA. SVs play a crucial role in evolution and have critical phenotypic effects on organisms, such as genetic diseases in humans like autism, schizophrenia, epilepsy, and cancer. Thus, SV characterization is of great significance. In the past, read-based methodologies were utilized due to the infeasibility of constructing genome assemblies. However, with technological advancements, assembling genomes has become significantly more feasible, and complete assemblies of human and other primate genomes have been constructed. Despite the high-quality assemblies, SV discovery in human genomes remains challenging due to the genome's repetitive nature and complex rearrangements caused by a combination of SVs. Most existing SV discovery tools operating on genome assemblies require whole genome alignments, leading to high preprocessing times and memory usage. Therefore, new algorithms are still needed to efficiently discover SVs. Here, we propose Strive, a linear time algorithm that operates on genome assembly sketches instead of whole genome alignments to characterize insertions, deletions, and inversions. We evaluated the performance Strive with two experiments: simulated data from the human reference genome (GRCh38.p14 / hg38) and real data using a full genome assembly from the Telomere to Telomere Consortium (CHM13). Strive is able to accurately detect insertions, deletions, and inversions in 11 to 12 seconds in addition to preprocessing times ranging from 50 to 55 seconds. Strive achieved over 95% precision and recall values in the simulations without duplications. In the simulations that included segmental duplications and SNPs and in the experiment with CHM13 assembly, although still maintaining over 95% recall in inversion discovery, the precision and recall for insertions and deletions were lower, suggesting a need for increased robustness to duplications.
dc.description.statementofresponsibilityby Muhammet Rafi Çoktalaş
dc.format.extentxiv, 56 leaves : color illustrations, charts ; 30 cm.
dc.identifier.itemidB162654
dc.identifier.urihttps://hdl.handle.net/11693/115819
dc.language.isoEnglish
dc.rightsinfo:eu-repo/semantics/openAccess
dc.subjectStructural variation
dc.subjectGenome assembly
dc.subjectSketching
dc.titleCharacterization of structural variation through assembly-to-assembly comparison
dc.title.alternativeTüm genom karşılaştırması ile yapısal varyasyonların karakterizasyonu
dc.typeThesis
thesis.degree.disciplineComputer Engineering
thesis.degree.grantorBilkent University
thesis.degree.levelMaster's
thesis.degree.nameMS (Master of Science)

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
B162654.pdf
Size:
3.97 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.1 KB
Format:
Item-specific license agreed upon to submission
Description: