Browsing by Subject "Distributed databases"

Now showing 1 - 2 of 2

Open Access
Efficient community identification and maintenance at multiple resolutions on distributed datastores
(Elsevier BV, 2015) Aksu, H.; Canim, M.; Chang, Yuan-Chi; Korpeoglu, I.; Ulusoy, Özgür
The topic of network community identification at multiple resolutions is of great interest in practice to learn high cohesive subnetworks about different subjects in a network. For instance, one might examine the interconnections among web pages, blogs and social content to identify pockets of influencers on subjects like 'Big Data', 'smart phone' or 'global warming'. With dynamic changes to its graph representation and content, the incremental maintenance of a community poses significant challenges in computation. Moreover, the intensity of community engagement can be distinguished at multiple levels, resulting in a multi-resolution community representation that has to be maintained over time. In this paper, we first formalize this problem using the k-core metric projected at multiple k-values, so that multiple community resolutions are represented with multiple k-core graphs. Recognizing that large graphs and their even larger attributed content cannot be stored and managed by a single server, we then propose distributed algorithms to construct and maintain a multi-k-core graph, implemented on the scalable Big Data platform Apache HBase. Our experimental evaluation results demonstrate orders of magnitude speedup by maintaining multi-k-core incrementally over complete reconstruction. Our algorithms thus enable practitioners to create and maintain communities at multiple resolutions on multiple subjects in rich network content simultaneously.
Open Access
Hypergraph based declustering for multi-disk databases
(2000) Koyutürk, Mehmet
In very large distributed database systems, the data is declustered in order to exploit parallelism while processing a query. Declustering refers to allocating the data into multiple disks in such a way that the tuples belonging to a relation are distributed evenly across disks. There are many declustering strategies proposed in the literature, however these strategies are domain specific or have deficiencies. We propose a model that exactly fits the problem and show that iterative improvement schemes can capture detailed per-relation basis declustering objective. We provide a two phase iterative improvement based algorithm and appropriate gain functions for these algorithms. The experimental results show that the proposed algorithm provides a significant performance improvement compared to the state-of-the-art graph-partitioning based declustering strategy.