Efficient community identification and maintenance at multiple resolutions on distributed datastores

buir.contributor.authorUlusoy, Özgür
dc.citation.epage147en_US
dc.citation.spage133en_US
dc.citation.volumeNumber100en_US
dc.contributor.authorAksu, H.en_US
dc.contributor.authorCanim, M.en_US
dc.contributor.authorChang, Yuan-Chien_US
dc.contributor.authorKorpeoglu, I.en_US
dc.contributor.authorUlusoy, Özgüren_US
dc.date.accessioned2016-02-08T12:20:07Z
dc.date.available2016-02-08T12:20:07Z
dc.date.issued2015en_US
dc.departmentDepartment of Computer Engineeringen_US
dc.description.abstractThe topic of network community identification at multiple resolutions is of great interest in practice to learn high cohesive subnetworks about different subjects in a network. For instance, one might examine the interconnections among web pages, blogs and social content to identify pockets of influencers on subjects like 'Big Data', 'smart phone' or 'global warming'. With dynamic changes to its graph representation and content, the incremental maintenance of a community poses significant challenges in computation. Moreover, the intensity of community engagement can be distinguished at multiple levels, resulting in a multi-resolution community representation that has to be maintained over time. In this paper, we first formalize this problem using the k-core metric projected at multiple k-values, so that multiple community resolutions are represented with multiple k-core graphs. Recognizing that large graphs and their even larger attributed content cannot be stored and managed by a single server, we then propose distributed algorithms to construct and maintain a multi-k-core graph, implemented on the scalable Big Data platform Apache HBase. Our experimental evaluation results demonstrate orders of magnitude speedup by maintaining multi-k-core incrementally over complete reconstruction. Our algorithms thus enable practitioners to create and maintain communities at multiple resolutions on multiple subjects in rich network content simultaneously.en_US
dc.identifier.doi10.1016/j.datak.2015.06.001en_US
dc.identifier.issn0169-023X
dc.identifier.urihttp://hdl.handle.net/11693/28419
dc.language.isoEnglishen_US
dc.publisherElsevier BVen_US
dc.relation.isversionofhttp://dx.doi.org/10.1016/j.datak.2015.06.001en_US
dc.source.titleData & Knowledge Engineeringen_US
dc.subjectBig data analyticsen_US
dc.subjectDistributed databasesen_US
dc.subjectk-Coreen_US
dc.subjectAlgorithmsen_US
dc.subjectGlobal warmingen_US
dc.subjectMiningen_US
dc.subjectSmartphonesen_US
dc.subjectSocial networkingen_US
dc.subjectWebsitesen_US
dc.subjectCommunity identificationen_US
dc.subjectData analyticsen_US
dc.subjectHBaseen_US
dc.subjectMining methods and algorithmsen_US
dc.subjectBig dataen_US
dc.titleEfficient community identification and maintenance at multiple resolutions on distributed datastoresen_US
dc.typeArticleen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Efficient community identification and maintenance at multiple resolutions on distributed datastores.pdf
Size:
2.65 MB
Format:
Adobe Portable Document Format
Description:
Full printable version