Bias correction in finding copy number variation with using read depth-based methods in exome sequencing data
Item Usage Stats
MetadataShow full item record
Medical research has striven for identifying the causes of disorders with the ultimate goal of establishing therapeutic treatments and finding cures since its early years. This aim is now becoming a reality thanks to recent developments in whole genome (WGS) and whole exome sequencing (WES). Despite the decrease in the cost of sequencing, WGS is still a very costly approach because of the need to evaluate large number of populations for more concise results. Therefore, sequencing only the protein coding regions (WES) is a more cost effective alternative. With the help of WES approach, most of the functionally important variants can be detected. Additionally, single nucleotide polymorphisms (SNPs) that are located within coding regions are the most common causes for Mendelian diseases (i.e. diseases caused by a single mutation). Moreover, WES approaches require less analysis effort compared to whole genome sequencing approaches since only 1% of whole genome is sequenced. Besides the advantages, there are also some shortcomings that need to be addressed such as biases in GC−content and probe efficiency. Although there are some previous studies on correcting GC−content related issues, there are no studies on correcting probe efficiency effect. In this thesis, we provide a formal study on the effects of both GC−content and probe efficiency on the distribution of read depth in exome sequencing data. The correction of probe efficiency will make it possible to develop new CNV discovery methods using exome sequencing data.
KeywordsCopy number variations