Doubletdetector: a method to detect doublet cells in open chromatin regions

Available
The embargo period has ended, and this item is now available.

Date

2020-09

Editor(s)

Advisor

Çiçek, A. Ercüment

Supervisor

Co-Advisor

Co-Supervisor

Instructor

Source Title

Print ISSN

Electronic ISSN

Publisher

Bilkent University

Volume

Issue

Pages

Language

English

Journal Title

Journal ISSN

Volume Title

Series

Abstract

Assay for Transposase-Accessible Chromatin using sequencing (ATAC-seq) is a simple and effective technique in genomic studies that shows the chromatin accessibility of the genome. The open regions of the genome play an important role in DNA replication and transcription. It has many practical applications such as nucleosome mapping, identifying regulatory elements, cancer research and immune system aging. With the development of the technology used, this technique is now applied at single cell level in the form of single nucleus ATAC-seq (snATAC-seq). Single cell level resolution helps further the possible implications of ATAC-seq by helping in detection of rare cell types that play roles in the regulatory networks. Like other single cell technologies, snATAC-seq suffers from the existence of doublet cells that occur when multiple cells are simultaneously captured and sequenced which confounds downstream analyses. A unique property of snATAC-seq data is that at a given loci in the genome there can be at most two overlapping reads, one from the maternal and other from the paternal chromosome. When a loci has more than 2 reads this can be due to doublets or alignment/sequencing errors. We propose a count-based method, DoubletDetector, that makes use of this property to detect doublets. It identifies doublets by counting the number of loci within the cell that has more than 2 ATAC-seq reads. It also finds the types of the cells that formed the doublets, to further help understand their nature. DoubletDetector achieved high recall near 90% for detecting simulated doublets in human PBMC and islet snATAC-seq samples. Artificial doublets were then traced back to their cells of origin with near 78% recall using a marker peak-based algorithm. DoubletDetector is the first method to effectively identify both homotypic and heterotypic doublets from snATAC-seq.

Course

Other identifiers

Book Title

Citation

item.page.isversionof