Now that I’ve presented an overview of OLTP Database & OLAP as well as addressed the problems found during my research, I’ll now list list of projects and papers that currently comprise my PhD Thesis.
1. A new type of OLAP hypercube that links cells to the underlying OLTP
database.
This work was written in C. I used the Network Data Stream as an implementation example. The Network Data Stream is defined to be: {content, time stamp, destination
ip, destination location, destination port, mail bcc, mail cc, mail file name, mail recipient, mail sender, mail subject, protocol, size, source ip, source location, source port}.
This work was reported in the paper “Extending OLAP beyond Aggregates and
Summaries”.
2. A new type of OLTP database (Stream Star Schema) that keeps up with high streaming rates – gigbytes per second.
This work was written in C. I used the Network Data Stream as an implementation example. In this case, the Stream Star Schema improved database insertion time by a
factor of 177.
This work was reported in the paper “Efficient Lossless Real-Time Stream Processing”.
3. Stream Star Schema applied to the File Meta Data problem.
This work was written in C. File Meta Data consists of logging changes to the meta data in a file system. Such changes include: file updates, file reads, and file permissions. In this case, the Stream Star Schema improved database insertion time by a factor of 40.
This work was reported in the paper “Enabling {OLAP} for Security Applications and Forensic Processing”.
4. Stream Star Schema applied to the Swift RESAR problem.
Swift is a powerful and popular Cloud Storage implementation. The Swift protocol consists of the hierarchy of Cloud Items: {Account, Container, Object}. Each Cloud Item is stored on multiple Storage Pairs: {Server, Device}. Swift does not maintain device construction and parity. This critical infrastructure is left to the Administrator to manage outside of Swift. In this project, we expand Swift to include device management: Swift RESAR. This extension greatly empowers Swift Administrators in managing large numbers of cloud devices. In this case, the Stream Star Schema improved database insertion time by a factor of over 1,000.
I am finishing up the paper for this work “Building a High Performance Key Store for Petacale Device Management in RESAR”.
In the Thesis Queue
In addition, the following projects are in the thesis queue:
1. Create a Swift cloud with lots of devices.
To prove the Swift RESAR concept, the team has decided to construct a Swift Cloud with tens of thousands of devices. This will be implemented via cloud virtual machines. In its simplest form, Swift consists of a single Proxy VM and multiple Storage VMs. A Swift Proxy VM exists to service Swift API requests. A Swift Storage VM exists to service storage devices. So the challenge is to automate (as much as possible) creation of a Swift Cloud with thousands of VMs and devices. To whit, I have already written Python scripts that will install Swift Proxy and Storage VMs. It would be a good idea to automate VM creation. One possible approach would be to create a single Swift Storage VM and then replicate that VM. We shall see.
2. Apply the Stream Star Schema to a Shingle Disk application.
The Shingle Disk is a current project at the Storage Systems Research Center (SSRC) at the University of California Santa Cruz. The following description is an excerpt from their web site:
Any further dramatic improvements to the capacity of hard disk drives is bound to come with some major changes to the currently employed techniques, as they are reaching their limitations imposed by the laws of physics. Of the new technologies being explored, only Shingled writing promises a high areal density increase with minimal changes to the manufacturing process.
Shingled writing overlaps the currently written track with the previous track, leaving only a relatively small strip of the previous write track untouched. Hence a random write in a Shingled Write Disk (SWD) will destroy the data on adjacent tracks. This forces the SWD to be a largely sequential write device with unrestricted random reads.
Although it is possible to treat a SWD like a virtual tape with random reads, exploring its potential to replace Hard Disk Drives in their traditional roles is more desirable. We are exploring the design issues in a shingled write disk system and are looking at solutions for problems ranging from data layout management to system software changes.
It turns out that some Shingle Disk applications require storing large amounts of streaming data. Thus applying the Stream Star Schema would be of benefit. We shall see.
PhD Thesis Complete Yet Research Continues
So my PhD Thesis presentation is now complete. I have presented an overview of OLTP database and OLAP. I have also addressed problems with the OLTP database and OLAP that I have found in my research. And I have given a list of projects and papers that currently comprise my PhD Thesis. I will keep you up to date on any new research.