Browsing by Author "Ferhatosmanoglu, H."

Now showing 1 - 7 of 7

Open Access
Diverse relevance feedback for time series with autoencoder based summarizations
(IEEE Computer Society, 2018) Eravci, B.; Ferhatosmanoglu, H.
We present a relevance feedback based browsing methodology using different representations for time series data. The outperforming representation type, e.g., among dual-tree complex wavelet transformation, Fourier, symbolic aggregate approximation (SAX), is learned based on user annotations of the presented query results with representation feedback. We present the use of autoencoder type neural networks to summarize time series or its representations into sparse vectors, which serves as another representation learned from the data. Experiments on 85 real data sets confirm that diversity in the result set increases precision, representation feedback incorporates item diversity and helps to identify the appropriate representation. The results also illustrate that the autoencoders can enhance the base representations, and achieve comparably accurate results with reduced data sizes.
Open Access
Fair task allocation in crowdsourced delivery
(Institute of Electrical and Electronics Engineers, 2018) Basik, F.; Gedik, B.; Ferhatosmanoglu, H.; Wu, K.
Faster and more cost-efficient, crowdsourced delivery is needed to meet the growing customer demands of many industries. In this work, we introduce a new crowdsourced delivery platform that takes fairness towards workers into consideration, while maximizing the task completion ratio. Since redundant assignments are not possible in delivery tasks, we first introduce a 2-phase assignment model that increases the reliability of a worker to complete a given task. To realize the effectiveness of our model in practice, we present both offline and online versions of our proposed algorithm called F-Aware. Given a task-to-worker bipartite graph, F-Aware assigns each task to a worker that maximizes fairness, while allocating tasks to use worker capacities as much as possible. We present an evaluation of our algorithms with respect to running time, task completion ratio, as well as fairness and assignment ratio. Experiments show that F-Aware runs around $10^7\times$ faster than the TAR-optimal solution and assigns 96.9% of the tasks that can be assigned by it. Moreover, it is shown that, F-Aware is able to provide a much fair distribution of tasks to workers than the best competitor algorithm. IEEE
Open Access
Lineking: coffee shop wait-time monitoring using smartphones
(Institute of Electrical and Electronics Engineers, 2015) Bulut, M. F.; Demirbas, M.; Ferhatosmanoglu, H.
This article describes LineKing, a crowdsensing system for monitoring and forecasting coffee shop line wait times. LineKing consists of a smartphone component that provides automatic and accurate wait-time detection, and a cloud backend that uses the collected data to provide accurate wait-time estimation. LineKing is used on a daily basis by hundreds of users to monitor the wait-times of a coffee shop in the University at Buffalo, SUNY. The novel wait-time estimation algorithms of LineKing deployed at the cloud backend provide median absolute errors of less than 3 minutes.
Open Access
Scaling forecasting algorithms using clustered modeling
(Association for Computing Machinery, 2015) Gür, İ.; Güvercin, M.; Ferhatosmanoglu, H.
Research on forecasting has traditionally focused on building more accurate statistical models for a given time series. The models are mostly applied to limited data due to efficiency and scalability problems. However, many enterprise applications require scalable forecasting on large number of data series. For example, telecommunication companies need to forecast each of their customers’ traffic load to understand their usage behavior and to tailor targeted campaigns. Forecasting models are typically applied on aggregate data to estimate the total traffic volume for revenue estimation and resource planning. However, they cannot be easily applied to each user individually as building accurate models for large number of users would be time consuming. The problem is exacerbated when the forecasting process is continuous and the models need to be updated periodically. This paper addresses the problem of building and updating forecasting models continuously for multiple data series. We propose dynamic clustered modeling for forecasting by utilizing representative models as an analogy to cluster centers. We apply the models to each individual series through iterative nonlinear optimization. We develop two approaches: The Integrated Clustered Modeling integrates clustering and modeling simultaneously, and the Sequential Clustered Modeling applies them sequentially. Our findings indicate that modeling an individual’s behavior using its segment can be more scalable and accurate than the individual model itself. The grouped models avoid overfits and capture common motifs even on noisy data. Experimental results from a telco CRM application show the method is efficient and scalable, and also more accurate than having separate individual models.
Open Access
Smolign: a spatial motifs-based protein multiple structural alignment method
(Institute of Electrical and Electronics Engineers, 2012) Sun, H.; Sacan, A.; Ferhatosmanoglu, H.; Wang Y.
Availability of an effective tool for protein multiple structural alignment (MSTA) is essential for discovery and analysis of biologically significant structural motifs that can help solve functional annotation and drug design problems. Existing MSTA methods collect residue correspondences mostly through pairwise comparison of consecutive fragments, which can lead to suboptimal alignments, especially when the similarity among the proteins is low. We introduce a novel strategy based on: building a contactwindow based motif library from the protein structural data, discovery and extension of common alignment seeds from this library, and optimal superimposition of multiple structures according to these alignment seeds by an enhanced partial order curve comparison method. The ability of our strategy to detect multiple correspondences simultaneously, to catch alignments globally, and to support flexible alignments, endorse a sensitive and robust automated algorithm that can expose similarities among protein structures even under low similarity conditions. Our method yields better alignment results compared to other popular MSTA methods, on several protein structure data sets that span various structural folds and represent different protein similarity levels. A web-based alignment tool, a downloadable executable, and detailed alignment results for the data sets used here are available at http://sacan.biomed. drexel.edu/Smolign and http://bio.cse.ohio-state.edu/Smolign.
Open Access
Temporal workload-aware replicated partitioning for social networks
(Institute of Electrical and Electronics Engineers, 2014-11) Turk, A.; Selvitopi, R. O.; Ferhatosmanoglu, H.; Aykanat, Cevdet
Most frequent and expensive queries in social networks involve multi-user operations such as requesting the latest tweets or news-feeds of friends. The performance of such queries are heavily dependent on the data partitioning and replication methodologies adopted by the underlying systems. Existing solutions for data distribution in these systems involve hash- or graph-based approaches that ignore the multi-way relations among data. In this work, we propose a novel data partitioning and selective replication method that utilizes the temporal information in prior workloads to predict future query patterns. Our method utilizes the social network structure and the temporality of the interactions among its users to construct a hypergraph that correctly models multi-user operations. It then performs simultaneous partitioning and replication of this hypergraph to reduce the query span while respecting load balance and I/O load constraints under replication. To test our model, we enhance the Cassandra NoSQL system to support selective replication and we implement a social network application (a Twitter clone) utilizing our enhanced Cassandra. We conduct experiments on a cloud computing environment (Amazon EC2) to test the developed systems. Comparison of the proposed method with hash- and enhanced graph-based schemes indicate that it significantly improves latency and throughput.
Open Access
λ-diverse nearest neighbors browsing for multidimensional data
(Institute of Electrical and Electronics Engineers, 2013-03) Kucuktunc, O.; Ferhatosmanoglu, H.
Traditional search methods try to obtain the most relevant information and rank it according to the degree of similarity to the queries. Diversity in query results is also preferred by a variety of applications since results very similar to each other cannot capture all aspects of the queried topic. In this paper, we focus on the -diverse k-nearest neighbor search problem on spatial and multidimensional data. Unlike the approach of diversifying query results in a postprocessing step, we naturally obtain diverse results with the proposed geometric and index-based methods. We first make an analogy with the concept of Natural Neighbors (NatN) and propose a natural neighbor-based method for 2D and 3D data and an incremental browsing algorithm based on Gabriel graphs for higher dimensional spaces. We then introduce a diverse browsing method based on the distance browsing feature of spatial index structures, such as R-trees. The algorithm maintains a Priority Queue with mindivdist of the objects depending on both relevancy and angular diversity and efficiently prunes nondiverse items and nodes. We experiment with a number of spatial and high-dimensional data sets, including Factual’s (http://www.factual.com/) US points-of-interest data set of 13M entries. On the experimental setup, the diverse browsing method is shown to be more efficient (regarding disk accesses) than k-NN search on R-trees, and more effective (regarding Maximal Marginal Relevance (MMR)) than the diverse nearest neighbor search techniques found in the literature.