Browsing by Subject "Greedy algorithms"

Now showing 1 - 8 of 8

Open Access
Aggregate profile clustering for streaming analytics
(Oxford University Press, 2015) Abbasoğlu, M. A.; Gedk, B.; Ferhatosmanoğu H.
Many analytic applications require analyzing user interaction data. In particular, such data can be aggregated over a window to build user activity profiles. Clustering such aggregate profiles is useful for grouping together users with similar behaviors, so that common models could be built for them. In this paper, we present an approach for clustering profiles that are incrementally maintained over a stream of updates. Owing to the potentially large number of users and high rate of interactions, maintaining profile clusters can have high processing and memory resource requirements. To tackle this problem, we apply distributed stream processing. However, in the presence of distributed state, it is a major challenge to partition the profiles over nodes such that memory and computation balance is maintained, while keeping the clustering accuracy high. Furthermore, in order to adapt to potentially changing user interaction patterns, the partitioning of profiles to nodes should be continuously revised, yet one should minimize the migration of profiles so as not to disturb the online processing of updates. We develop a re-partitioning technique that achieves all these goals. To achieve this, we keep micro-cluster summaries at each node and periodically collect these summaries at a central node to perform re-partitioning. We use a greedy algorithm with novel affinity heuristics to revise the partitioning and update the routing tables without introducing a lengthy pause. We showcase the effectiveness of our approach using an application that clusters customers of a telecommunications company based on their aggregate calling profiles.
Open Access
Aggregate profile clustering for telco analytics
(2013) Abbasoğlu, M.A.; Gedik, B.; Ferhatosmanoğlu H.
Many telco analytics require maintaining call profiles based on recent customer call patterns. Such call profiles are typically organized as aggregations computed at different time scales over the recent customer interactions. Customer call profiles are key inputs for analytics targeted at improving operations, marketing, and sales of telco providers. Many of these analytics require clustering customer call profiles, so that customers with similar calling patterns can be modeled as a group. Example applications include optimizing tariffs, customer segmentation, and usage forecasting. In this demo, we present our system for scalable aggregate profile clustering in a streaming setting. We focus on managing anonymized segments of customers for tariff optimization. Due to the large number of customers, maintaining profile clusters have high processing and memory resource requirements. In order to tackle this problem, we apply distributed stream processing. However, in the presence of distributed state, it is a major challenge to partition the profiles over machines (nodes) such that memory and computation balance is maintained, while keeping the clustering accuracy high. Furthermore, to adapt to potentially changing customer calling patterns, the partitioning of profiles to machines should be continuously revised, yet one should minimize the migration of profiles so as not to disturb the online processing of updates. We provide a re-partitioning technique that achieves all these goals. We keep micro-cluster summaries at each node, collect these summaries at a centralize node, and use a greedy algorithm with novel affinity heuristics to revise the partitioning. We present a demo that showcases our Storm and Hbase based implementation of the proposed solution in the context of a customer segmentation application. © 2013 VLDB Endowment.
Open Access
Algorithms for sparsity constrained principal component analysis
(2023-07) Aktaş, Fatih Selim
The classical Principal Component Analysis problem consists of finding a linear transform that reduces the dimensionality of the original dataset while keeping most of the variation. Extra sparsity constraint sets most of the coefficients to zero which makes interpretation of the linear transform easier. We present two approaches to the sparsity constrained Principal Component Analysis. Firstly, we develop computationally cheap heuristics that can be deployed in very high-dimensional problems. Our heuristics are justified with linear algebra approximations and theoretical guarantees. Furthermore, we strengthen our algorithms by deploying the necessary conditions for the optimization model. Secondly, we use a non-convex log-sum penalty in the semidefinite space. We show a connection to the cardinality function and develop an algorithm, PCA Sparsified, to solve the problem locally via solving a sequence of convex optimization problems. We analyze the theoretical properties of this algorithm and comment on the numerical implementation. Moreover, we derive a pre-processing method that can be used with previous approaches. Finally, our findings from the numerical experiments we conducted show that our greedy algorithms scale to high dimensional problems easily while being highly competitive in many problems with state-of-art algorithms and even beating them uniformly in some cases. Additionally, we illustrate the effectiveness of PCA Sparsified on small dimensional problems in terms of variance explained. Although it is computationally very demanding, it consistently outperforms local and greedy approaches.
Open Access
Discriminative fine-grained mixing for adaptive compression of data streams
(Institute of Electrical and Electronics Engineers, 2014) Gedik, B.
This paper introduces an adaptive compression algorithm for transfer of data streams across operators in stream processing systems. The algorithm is adaptive in the sense that it can adjust the amount of compression applied based on the bandwidth, CPU, and workload availability. It is discriminative in the sense that it can judiciously apply partial compression by selecting a subset of attributes that can provide good reduction in the used bandwidth at a low cost. The algorithm relies on the significant differences that exist among stream attributes with respect to their relative sizes, compression ratios, compression costs, and their amenability to application of custom compressors. As part of this study, we present a modeling of uniform and discriminative mixing, and provide various greedy algorithms and associated metrics to locate an effective setting when model parameters are available at run-time. Furthermore, we provide online and adaptive algorithms for real-world systems in which system parameters that can be measured at run-time are limited. We present a detailed experimental study that illustrates the superiority of discriminative mixing over uniform mixing. © 2013 IEEE.
Open Access
Expectation maximization based matching pursuit
(IEEE, 2012) Gurbuz, A.C.; Pilanci, M.; Arıkan, Orhan
A novel expectation maximization based matching pursuit (EMMP) algorithm is presented. The method uses the measurements as the incomplete data and obtain the complete data which corresponds to the sparse solution using an iterative EM based framework. In standard greedy methods such as matching pursuit or orthogonal matching pursuit a selected atom can not be changed during the course of the algorithm even if the signal doesn't have a support on that atom. The proposed EMMP algorithm is also flexible in that sense. The results show that the proposed method has lower reconstruction errors compared to other greedy algorithms using the same conditions. © 2012 IEEE.
Open Access
Learning-based compressive MRI
(Institute of Electrical and Electronics Engineers, 2018) Gözcü, B.; Mahabadi, R. K.; Li, Y. H.; Ilıcak, E.; Çukur, Tolga; Scarlett, J.; Cevher, V.
In the area of magnetic resonance imaging (MRI), an extensive range of non-linear reconstruction algorithms has been proposed which can be used with general Fourier subsampling patterns. However, the design of these subsampling patterns has typically been considered in isolation from the reconstruction rule and the anatomy under consideration. In this paper, we propose a learning-based framework for optimizing MRI subsampling patterns for a specific reconstruction rule and anatomy, considering both the noiseless and noisy settings. Our learning algorithm has access to a representative set of training signals, and searches for a sampling pattern that performs well on average for the signals in this set. We present a novel parameter-free greedy mask selection method and show it to be effective for a variety of reconstruction rules and performance metrics. Moreover, we also support our numerical findings by providing a rigorous justification of our framework via statistical learning theory.
Open Access
A tabu search algorithm for sparse placement of wavelength converters in optical networks
(Springer, 2004) Sengezer, N.; Karasan, E.
In this paper, we study the problem of placing limited number of wavelength converting nodes in a multi-fiber network with static traffic demands and propose a tabu search based heuristic algorithm. The objective of the algorithm is to achieve the performance of full wavelength conversion in terms of minimizing the total number of fibers used in the network by placing minimum number of wavelength converting nodes. We also present a greedy algorithm and compare its performance with the tabu search algorithm. Finally, we present numerical results that demonstrate the high correlation between placing a wavelength converting node and the amount of transit traffic passing through that node. © Springer-Verlag 2004.
Open Access
TSCP: a tabu search algorithm for wavelength converting node placement in WDM optical networks
(IEEE, 2005) Şengezer, Namık; Karasan, Ezhan
Sparse wavelength conversion can increase the performance of all-optical wavelength division multiplexing (WDM) networks signi cantly by relaxing the wavelength continuity constraint. In this paper, we study the wavelength converter placement problem in multi- ber networks with static traf c demands. We present a tabu search based heuristic algorithm. The objective of the algorithm is to satisfy all the traf c demands with the minimum total cost of bers achieved in the full conversion case, by placing minimum number of wavelength converting nodes. We also implement a greedy algorithm and compare the performances of these converter placement algorithms with the optimum solutions on a sample network. The Tabu search based algorithm achieves the optimum solution in 72% of the test cases and it increases the average number of wavelength converting nodes by less than 10% with respect to the optimum solution. The effect of the utilized routing scheme on the generated solutions and the correlation between the converter node locations and the amount of traf c passing through the nodes are also investigated.