Paralog-specific gene copy number discovery within segmental duplications
Item Usage Stats
With the advancing technology in genome sequencing and analysis, it has become evident that the structural variations are the main source of alteration in human genome. Despite their signi cance in understanding disease susceptibility, there is no algorithm yet to nd all types and sizes of structural variations at once. Structural variation discovery remained problematic since they often overlap with the segmental duplications, nearly identical segments of DNA that appear more than once in the genome. Researchers often excluded these regions that made up 5% of the genome because of the complexity it brings to their studies. Only few of them are working in these regions, however, they require a special sequence alignment le where reads are mapped to multiple locations. Here, we present ParaCoND to discover paralog speci c gene copy number within segmental duplications using a sequence alignment le with unique mapping. We utilize the singly unique nucleotides (SUN) that distinguish paralogs from each other in the sequence alignment of the duplicated regions. Our method is based on read depth and is limited to detect only duplications and deletions. We computed the absolute copy numbers of genes using only read depth of SUN. Furthermore, we also computed the paralog speci c absolute copy numbers for genes residing in the same segmental duplication.