PetaShare: A reliable, efficient and transparent distributed storage management system

Date
2011
Authors
Kosar, T.
Akturk I.
Balman, M.
Wang X.
Advisor
Instructor
Source Title
Scientific Programming
Print ISSN
10589244
Electronic ISSN
Publisher
Volume
19
Issue
1
Pages
27 - 43
Language
English
Type
Article
Journal Title
Journal ISSN
Volume Title
Abstract

Modern collaborative science has placed increasing burden on data management infrastructure to handle the increasingly large data archives generated. Beside functionality, reliability and availability are also key factors in delivering a data management system that can efficiently and effectively meet the challenges posed and compounded by the unbounded increase in the size of data generated by scientific applications. We have developed a reliable and efficient distributed data storage system, PetaShare, which spans multiple institutions across the state of Louisiana. At the back-end, PetaShare provides a unified name space and efficient data movement across geographically distributed storage sites. At the front-end, it provides light-weight clients the enable easy, transparent and scalable access. In PetaShare, we have designed and implemented an asynchronously replicated multi-master metadata system for enhanced reliability and availability, and an advanced buffering system for improved data transfer performance. In this paper, we present the details of our design and implementation, show performance results, and describe our experience in developing a reliable and efficient distributed data management system for data-intensive science. © 2011 - IOS Press and the authors. All rights reserved.

Course
Other identifiers
Book Title
Keywords
advanced buffer, asynchronous replication, data-intensive science, Distributed data storage, metadata management, performance, PetaShare, reliability, advanced buffer, Asynchronous replication, data-intensive science, Distributed data, metadata management, performance, PetaShare, Buffer storage, Data storage equipment, Data transfer, Design, Distributed computer systems, Distributed database systems, Metadata, Reliability, Information management
Citation
Published Version (Please cite this version)