Browsing by Author "Zhao, S."
Now showing 1 - 2 of 2
Results Per Page
Sort Options
Item Open Access Detailed modeling of positive selection improves detection of cancer driver genes(Nature Publishing Group, 2019-07) Zhao, S.; Liu, J.; Nanga, P.; Liu, Y.; Çiçek, A. Ercüment; Knoblauch, N.; He, C.; Stephens, M.; He, X.Identifying driver genes from somatic mutations is a central problem in cancer biology. Existing methods, however, either lack explicit statistical models, or use models based on simplistic assumptions. Here, we present driverMAPS (Model-based Analysis of Positive Selection), a model-based approach to driver gene identification. This method explicitly models positive selection at the single-base level, as well as highly heterogeneous background mutational processes. In particular, the selection model captures elevated mutation rates in functionally important sites using multiple external annotations, and spatial clustering of mutations. Simulations under realistic evolutionary models demonstrate the increased power of driverMAPS over current approaches. Applying driverMAPS to TCGA data of 20 tumor types, we identified 159 new potential driver genes, including the mRNA methyltransferase METTL3-METTL14. We experimentally validated METTL3 as a tumor suppressor gene in bladder cancer, providing support to the important role mRNA modification plays in tumorigenesis.Item Open Access A statistical framework for mapping risk genes from de novo mutations in whole-genome-sequencing studies(Cell Press, 2018) Liu, Y.; Liang, Y.; Çiçek, A. Ercüment; Li, Z.; Li, J.; Muhle, R. A.; Krenzer, M.; Mei, Y.; Wang Y.; Knoblauch, N.; Morrison, J.; Zhao, S.; Jiang, Y.; Geller, E.; Ionita-Laza, I.; Wu, J.; Xia, K.; Noonan, J. P.; Sun, Z. S.; He, X.Analysis of de novo mutations (DNMs) from sequencing data of nuclear families has identified risk genes for many complex diseases, including multiple neurodevelopmental and psychiatric disorders. Most of these efforts have focused on mutations in protein-coding sequences. Evidence from genome-wide association studies (GWASs) strongly suggests that variants important to human diseases often lie in non-coding regions. Extending DNM-based approaches to non-coding sequences is challenging, however, because the functional significance of non-coding mutations is difficult to predict. We propose a statistical framework for analyzing DNMs from whole-genome sequencing (WGS) data. This method, TADA-Annotations (TADA-A), is a major advance of the TADA method we developed earlier for DNM analysis in coding regions. TADA-A is able to incorporate many functional annotations such as conservation and enhancer marks, to learn from data which annotations are informative of pathogenic mutations, and to combine both coding and non-coding mutations at the gene level to detect risk genes. It also supports meta-analysis of multiple DNM studies, while adjusting for study-specific technical effects. We applied TADA-A to WGS data of ∼300 autism-affected family trios across five studies and discovered several autism risk genes. The software is freely available for all research uses.