Efficient community identification and maintenance at multiple resolutions on distributed datastores

Aksu, H.; Canim, M.; Chang, Yuan-Chi; Korpeoglu, I.; Ulusoy, Özgür

Efficient community identification and maintenance at multiple resolutions on distributed datastores

buir.contributor.author	Ulusoy, Özgür
dc.citation.epage	147	en_US
dc.citation.spage	133	en_US
dc.citation.volumeNumber	100	en_US
dc.contributor.author	Aksu, H.	en_US
dc.contributor.author	Canim, M.	en_US
dc.contributor.author	Chang, Yuan-Chi	en_US
dc.contributor.author	Korpeoglu, I.	en_US
dc.contributor.author	Ulusoy, Özgür	en_US
dc.date.accessioned	2016-02-08T12:20:07Z
dc.date.available	2016-02-08T12:20:07Z
dc.date.issued	2015	en_US
dc.department	Department of Computer Engineering	en_US
dc.description.abstract	The topic of network community identification at multiple resolutions is of great interest in practice to learn high cohesive subnetworks about different subjects in a network. For instance, one might examine the interconnections among web pages, blogs and social content to identify pockets of influencers on subjects like 'Big Data', 'smart phone' or 'global warming'. With dynamic changes to its graph representation and content, the incremental maintenance of a community poses significant challenges in computation. Moreover, the intensity of community engagement can be distinguished at multiple levels, resulting in a multi-resolution community representation that has to be maintained over time. In this paper, we first formalize this problem using the k-core metric projected at multiple k-values, so that multiple community resolutions are represented with multiple k-core graphs. Recognizing that large graphs and their even larger attributed content cannot be stored and managed by a single server, we then propose distributed algorithms to construct and maintain a multi-k-core graph, implemented on the scalable Big Data platform Apache HBase. Our experimental evaluation results demonstrate orders of magnitude speedup by maintaining multi-k-core incrementally over complete reconstruction. Our algorithms thus enable practitioners to create and maintain communities at multiple resolutions on multiple subjects in rich network content simultaneously.	en_US
dc.identifier.doi	10.1016/j.datak.2015.06.001	en_US
dc.identifier.issn	0169-023X
dc.identifier.uri	http://hdl.handle.net/11693/28419
dc.language.iso	English	en_US
dc.publisher	Elsevier BV	en_US
dc.relation.isversionof	http://dx.doi.org/10.1016/j.datak.2015.06.001	en_US
dc.source.title	Data & Knowledge Engineering	en_US
dc.subject	Big data analytics	en_US
dc.subject	Distributed databases	en_US
dc.subject	k-Core	en_US
dc.subject	Algorithms	en_US
dc.subject	Global warming	en_US
dc.subject	Mining	en_US
dc.subject	Smartphones	en_US
dc.subject	Social networking	en_US
dc.subject	Websites	en_US
dc.subject	Community identification	en_US
dc.subject	Data analytics	en_US
dc.subject	HBase	en_US
dc.subject	Mining methods and algorithms	en_US
dc.subject	Big data	en_US
dc.title	Efficient community identification and maintenance at multiple resolutions on distributed datastores	en_US
dc.type	Article	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Efficient community identification and maintenance at multiple resolutions on distributed datastores.pdf
Size:: 2.65 MB
Format:: Adobe Portable Document Format
Description:: Full printable version

Download

Collections

Department of Computer Engineering