Browsing by Subject "Sequence alignment"
Now showing 1 - 4 of 4
- Results Per Page
- Sort Options
Item Open Access Demographically-based evaluation of genomic regions under selection in domestic dogs(Public Library of Science, 2016) Freedman, A. H.; Schweizer, R. M.; Vecchyo, D. Ortega-Del; Han, E.; Davis, B. W.; Gronau, I.; Silva, P. M.; Galaverni, M.; Fan, Z.; Marx, P.; Lorente-Galdos, B.; Ramirez, O.; Hormozdiari, F.; Alkan C.; Vilà, C.; Squire K.; Geffen, E.; Kusak, J.; Boyko, A. R.; Parker, H. G.; Lee C.; Tadigotla, V.; Siepel, A.; Bustamante, C. D.; Harkins, T. T.; Nelson, S. F.; Marques Bonet, T.; Ostrander, E. A.; Wayne, R. K.; Novembre, J.Controlling for background demographic effects is important for accurately identifying loci that have recently undergone positive selection. To date, the effects of demography have not yet been explicitly considered when identifying loci under selection during dog domestication. To investigate positive selection on the dog lineage early in the domestication, we examined patterns of polymorphism in six canid genomes that were previously used to infer a demographic model of dog domestication. Using an inferred demographic model, we computed false discovery rates (FDR) and identified 349 outlier regions consistent with positive selection at a low FDR. The signals in the top 100 regions were frequently centered on candidate genes related to brain function and behavior, including LHFPL3, CADM2, GRIK3, SH3GL2, MBP, PDE7B, NTAN1, and GLRA1. These regions contained significant enrichments in behavioral ontology categories. The 3rdtop hit, CCRN4L, plays a major role in lipid metabolism, that is supported by additional metabolism related candidates revealed in our scan, including SCP2D1 and PDXC1. Comparing our method to an empirical outlier approach that does not directly account for demography, we found only modest overlaps between the two methods, with 60% of empirical outliers having no overlap with our demography-based outlier detection approach. Demography-aware approaches have lower-rates of false discovery. Our top candidates for selection, in addition to expanding the set of neurobehavioral candidate genes, include genes related to lipid metabolism, suggesting a dietary target of selection that was important during the period when proto-dogs hunted and fed alongside hunter-gatherers. © 2016, Public Library of Science. All Rights Reserved.Item Open Access Diverse sequence search and alignment(2013) Eser, ElifSequence similarity tools, such as BLAST, seek sequences from a database most similar to a query. They return results signi cantly similar to the query sequence that are typically also highly similar to each other. Most sequence analysis tasks in bioinformatics require an exploratory approach where the initial results guide the user to new searches. However, diversity has not been considered as an integral component of sequence search tools yet. Repetitions in the result can be avoided by introducing non-redundancy during database construction; however, it is not feasible to dynamically set a level of non-redundancy tailored to a query sequence. We introduce the problem of diverse search and browsing in sequence databases that produces non-redundant results optimized for any given query. We de ne diversity measures for sequences, and propose methods to obtain diverse results extracted from current sequence similarity search tools. We propose a new measure to evaluate the diversity of a set of sequences that is returned as a result of a similarity query. We evaluate the e ectiveness of the proposed methods in post-processing PSI-BLAST results. We also assess the functional diversity of the returned results based on available Gene Ontology annotations. Our experiments show that the proposed methods are able to achieve more diverse yet similar result sets compared to static non-redundancy approaches. In both sequence based and functional diversity evaluation, the proposed diversi cation methods outperform original BLAST results signi cantly. We built an online diverse sequence search tool Div-BLAST that supports queries using BLAST web services. It re-ranks the results diversely according to given parameters.Item Open Access Generalization of predicates with string arguments(2002) Canıtezer, GökerString/sequence generalization is used in many different areas such as machine learning, example-based machine translation and DNA sequence alignment. In this thesis, a method is proposed to find the generalizations of the predicates with string arguments from the given examples. Trying to learn from examples is a very hard problem in machine learning, since finding the global optimal point to stop generalization is a difficult and time consuming process. All the work done until now is about employing a heuristic to find the best solution. This work is one of them. In this study, some restrictions applied by the SLGG (Specific Least General Generalization) algorithm, which is developed to be used in an example-based machine translation system, are relaxed to find the all possible alignments of two strings. Moreover, a Euclidian distance like scoring mechanism is used to find the most specific generalizations. Some of the generated templates are eliminated by four different selection/filtering approaches to get a good solution set. Finally, the result set is presented as a decision list, which provides the handling of exceptional cases.Item Open Access mrsFAST-Ultra: a compact, SNP-aware mapper for high performance sequencing applications(Oxford University Press, 2014) Hach, F.; Sarrafi, I.; Hormozdiari, F.; Alkan C.; Eichler, E. E.; Sahinalp, S. C.High throughput sequencing (HTS) platforms generate unprecedented amounts of data that introduce challenges for processing and downstream analysis. While tools that report the 'best' mapping location of each read provide a fast way to process HTS data, they are not suitable for many types of downstream analysis such as structural variation detection, where it is important to report multiple mapping loci for each read. For this purpose we introduce mrsFAST-Ultra, a fast, cache oblivious, SNP-aware aligner that can handle the multi-mapping of HTS reads very efficiently. mrsFAST-Ultra improves mrsFAST, our first cache oblivious read aligner capable of handling multi-mapping reads, through new and compact index structures that reduce not only the overall memory usage but also the number of CPU operations per alignment. In fact the size of the index generated by mrsFAST-Ultra is 10 times smaller than that of mrsFAST. As importantly, mrsFAST-Ultra introduces new features such as being able to (i) obtain the best mapping loci for each read, and (ii) return all reads that have at most n mapping loci (within an error threshold), together with these loci, for any user specified n. Furthermore, mrsFAST-Ultra is SNP-aware, i.e. it can map reads to reference genome while discounting the mismatches that occur at common SNP locations provided by db-SNP; this significantly increases the number of reads that can be mapped to the reference genome. Notice that all of the above features are implemented within the index structure and are not simple post-processing steps and thus are performed highly efficiently. Finally, mrsFAST-Ultra utilizes multiple available cores and processors and can be tuned for various memory settings. Our results show that mrsFAST-Ultra is roughly five times faster than its predecessor mrsFAST. In comparison to newly enhanced popular tools such as Bowtie2, it is more sensitive (it can report 10 times or more mappings per read) and much faster (six times or more) in the multi-mapping mode. Furthermore, mrsFAST-Ultra has an index size of 2GB for the entire human reference genome, which is roughly half of that of Bowtie2. mrsFAST-Ultra is open source and it can be accessed at http://mrsfast.sourceforge.net. © 2014 The Author(s).