Doubletdetector: a method to detect doublet cells in open chromatin regions
Files
Date
Authors
Editor(s)
Advisor
Supervisor
Co-Advisor
Co-Supervisor
Instructor
BUIR Usage Stats
views
downloads
Series
Abstract
Assay for Transposase-Accessible Chromatin using sequencing (ATAC-seq) is a simple and effective technique in genomic studies that shows the chromatin accessibility of the genome. The open regions of the genome play an important role in DNA replication and transcription. It has many practical applications such as nucleosome mapping, identifying regulatory elements, cancer research and immune system aging. With the development of the technology used, this technique is now applied at single cell level in the form of single nucleus ATAC-seq (snATAC-seq). Single cell level resolution helps further the possible implications of ATAC-seq by helping in detection of rare cell types that play roles in the regulatory networks. Like other single cell technologies, snATAC-seq suffers from the existence of doublet cells that occur when multiple cells are simultaneously captured and sequenced which confounds downstream analyses. A unique property of snATAC-seq data is that at a given loci in the genome there can be at most two overlapping reads, one from the maternal and other from the paternal chromosome. When a loci has more than 2 reads this can be due to doublets or alignment/sequencing errors. We propose a count-based method, DoubletDetector, that makes use of this property to detect doublets. It identifies doublets by counting the number of loci within the cell that has more than 2 ATAC-seq reads. It also finds the types of the cells that formed the doublets, to further help understand their nature. DoubletDetector achieved high recall near 90% for detecting simulated doublets in human PBMC and islet snATAC-seq samples. Artificial doublets were then traced back to their cells of origin with near 78% recall using a marker peak-based algorithm. DoubletDetector is the first method to effectively identify both homotypic and heterotypic doublets from snATAC-seq.