Scalable layout of large graphs on disk
Author
Yaşar, Abdurrahman
Advisor
Güdükbay, Uğur
Date
2015Publisher
Bilkent University
Language
English
Type
ThesisItem Usage Stats
75
views
views
33
downloads
downloads
Abstract
We are witnessing an enormous growth in social networks as well as in the volume
of data generated by them. As a consequence, processing this massive amount
of data has become a major problem. An important portion of this data is in
the form of graphs. In recent years, several graph processing and management
systems emerged to handle large-scale graphs. The primary goal of these systems
is to run graph algorithms in an efficient and scalable manner. Unlike relational
data, graphs are semi-structured in nature. Thus, storing and accessing graph
data using secondary storage requires new solutions that can provide locality of
access for graph processing workloads. In this work, we propose a novel scalable
disk layout technique for graphs, which aims at reducing the I/O cost of diskbased
graph processing algorithms. To achieve this goal, we designed a scalable
Map/Reduce-style method called ICBP, which can divide the graph into a series
of disk blocks that contain sub-graphs with high locality. Furthermore, ICBP can
order the resulting blocks on the disk to further reduce non-local accesses. We
experimentally evaluated ICBP to showcase its scalability, layout quality, as well
as the effectiveness of automatic parameter tuning for ICBP. We also deployed the
graph layouts generated by ICBP to the Neo4j [1] graph database management
system. Our experimental results show that the default layout results in 1.5 to
2.5 times higher running times compared to ICBP.