Doubletdetector: a method to detect doublet cells in open chromatin regions
Author(s)
Advisor
Çiçek, A. ErcümentDate
2020-09Publisher
Bilkent University
Language
English
Type
ThesisItem Usage Stats
241
views
views
94
downloads
downloads
Abstract
Assay for Transposase-Accessible Chromatin using sequencing (ATAC-seq) is
a simple and effective technique in genomic studies that shows the chromatin
accessibility of the genome. The open regions of the genome play an important
role in DNA replication and transcription. It has many practical applications
such as nucleosome mapping, identifying regulatory elements, cancer research
and immune system aging. With the development of the technology used, this
technique is now applied at single cell level in the form of single nucleus ATAC-seq
(snATAC-seq). Single cell level resolution helps further the possible implications
of ATAC-seq by helping in detection of rare cell types that play roles in the regulatory networks. Like other single cell technologies, snATAC-seq suffers from
the existence of doublet cells that occur when multiple cells are simultaneously
captured and sequenced which confounds downstream analyses. A unique property of snATAC-seq data is that at a given loci in the genome there can be at
most two overlapping reads, one from the maternal and other from the paternal
chromosome. When a loci has more than 2 reads this can be due to doublets
or alignment/sequencing errors. We propose a count-based method, DoubletDetector, that makes use of this property to detect doublets. It identifies doublets
by counting the number of loci within the cell that has more than 2 ATAC-seq
reads. It also finds the types of the cells that formed the doublets, to further
help understand their nature. DoubletDetector achieved high recall near 90% for
detecting simulated doublets in human PBMC and islet snATAC-seq samples.
Artificial doublets were then traced back to their cells of origin with near 78% recall using a marker peak-based algorithm. DoubletDetector is the first method to effectively identify both homotypic and heterotypic doublets from snATAC-seq.