The RESAR OpenStack Swift Project Team
I would like to describe the latest project that I am working on for my PhD thesis program. This project involves a number of participants, so I will first provide short biographies.
First off, we have my thesis adviser (at Santa Clara University) Dr. Ahmed Amer. Dr. Amer received his Doctorate in Computer Science from the University of California, Santa Cruz. He is now an Associate Professor at both UC Santa Cruz and Santa Clara University.
Ignacio Corderi is one of Dr. Amer’s PhD students at UC Santa Cruz. Recently Ignacio was the lead author of a rather interesting paper: “RESAR Storage: a System for Two-Failure Tolerant, Self-Adjusting Million Disk Storage Clusters.” Additional authors of this paper are: Dr. Darrell D. E. Long: University of California, Santa Cruz, Dr. Thomas M. Kroeger: Sandia National Laboratories, and Dr. Thomas Schwarz: Universidad Católica del Uruguay.
Dr. Schwarz is a professor in the Computer Engineering Department at Santa Clara University. He is currently on loan to the Universidad Católica del Uruguay. He helped sponsor Ignacio as an exchange graduate student from UC Uruguay to UC Santa Cruz.
RESAR Swift Project Abstract
So enough of the biographies and back to Ignacio’s paper. The following is the abstract:
The demand for large-scale storage is greater than ever. The wide availability of broadband networking has made cloud based storage a vibrant and growing market. Additionally, as we explore exascale high performance computing (HPC) systems with exabytes of data, power considerations become a significant factor. Most existing systems rely on replication to protect user data, maintaining as many as six copies. This high overhead leads to an unnecessary costs in equipment, maintenance and energy. While storage appliances using erasure coding schemes are available, their long rebuild times and lack of continuity of service during rebuild make them unsuitable as building blocks for large scale storage systems.
We present RESAR (Robust, Efficient, Scalable, Autonomous Reliable) storage, a reliable distributed storage volume provider that scales to millions of drives. We implemented our system and tested it on a large-scale emulation platform called Megatux. Our results show that RESAR is capable of scaling to millions of drives, and it’s rebuild performance benefits from this scale by distributing the recovery across many disks. In our emulations, the work of rebuilding a one terabyte hard drive was distributed across 400 disks and completed in less than four minutes with no interruption of service. With an annual durability of 99.999999% and a storage overhead cost of 20%, RESAR has great promise for both exascale HPC and cloud storage.
Robust, Efficient, Scalable, Autonomous, Reliable Storage
Ignacio has decided to name his research: RESAR, defined as: Robust, Efficient, Scalable, Autonomous, and Reliable storage. As mentioned in the abstract, RESAR is a natural fit for Cloud Storage. And here is where I come into the picture. My job is to integrate RESAR into an existing cloud storage infrastructure. Specifically, I will map RESAR into OpenStack Swift.
OpenStack is an open source joint collaboration between Rackspaceand NASA. NASA created the Cloud Computing portion of the project and they called it Nova. Rackspace developed the Cloud Storage product and called it Swift.
The Swift protocol consists of a Hierarchy of Items: {Account, Container, Object}. Each Item is stored on multiple Storage Pairs {Server, Device}. The mapping information is stored in Rings, which are actually Consistent Hashes.
For Storage Pairs, Devices are actually Logical Unit Numbers (LUNs). That is, OpenStack Swift does not concern itself with LUN Parity. It is up to the Swift Administrator to manage LUN Parity on each device host. This is extremely inefficient and does not scale when attempting to manage millions of devices. Thus RESAR will be used to extend Swift capabilities and thus allow Swift to directly mange not just Storage Items, but Storage Devices.
Since Swift is implemented in Python, Swift RESAR will also be implemented in Python.
My Piece of the Project
So my piece of this project will consist of:
1) IMPLEMENT RESAR USING MYSQL.
MySQL will provide disk persistence. RESAR data structures will thus be MySQL Tables. Creation time will be measured for 1 million devices. The strengths of MySQL are that it is an established package and easy to install. The weakness of MySQL is its performance. It is after all a traditional relational database.
2) IMPLEMENT RESAR IN A MEMORY CACHE.
The cache will be populated from MySQL. This function will be measured for 1 million devices. Data manipulations in the memory cache will obviously be much faster than MySQL.
3) PORT MY STREAM STAR SCHEMA TO PYTHON.
I originally did my research in C. Python will make it more portable. The Stream Star Schema has much better performance than MySQL.
4) INTEGRATE RESAR WITH THE STREAM STAR SCHEMA.
Populate the RESAR memory cache from the Stream Star Schema. This function will be measured for 1 million devices.
5) IMPORT RESAR LUN’S INTO SWIFT.
Stay tuned, as I will post frequent project status updates.