OLTP Database & OLAP: PhD Thesis Overview Pt 2

In my last blog post, I presented an overview of OLTP and OLAP. I would now like to address two problems with OLTP database and OLAP that I have found in my research.

2 Problems with OLTP Database & OLAP

Problem 1: The OLTP database is not linked to the OLAP hypercube. The OLAP hypercube is unable to provide query details (by design). In general, this is ok for queries that are focused on trend analysis. But what about anomaly detection, outliers, forensics, or security? In these cases, a count of 1 is significant and the desire is to obtain quick details. This is called hypercube drill down. My solution to this problem is to link the OLAP hypercube cells to the OLTP database. The result is that each hypercube cell points to a linked list of data values. This does add additional processing to hypercube construction, but this additional processing is performed in constant time and thus does not significantly effect hypercube construction performance.

Problem 2: OLTP database construction is too slow. OLTP and OLAP are historically performed on Data Warehouses. That is, data at rest. This typically consists of End of Month processing. But the growth of the internet has resulted in constant, high speed input rates for OLTP. Examples would be Ethernet Traffic, RFID, Vehicular Traffic Analysis, and Web Site Traffic. The result is often times a consistent input rate of gigabits per second. OLTP databases cannot not keep up with such traffic. So I created a new kind of database (Stream Star Schema) that is designed to keep up with high speed traffic and provide OLTP details. Traditional OLTP databases are implemented as Star Schemas, that are optimized for disk space compression. The downside is that database insertion time is poor. My new kind of database (the Stream Star Schema) is optimized for insertion performance. The downside is that disk space usage is high. But this is ok since disks are so cheap now days.

Star Schema & Strings

In a Star Schema, all strings are stored in sorted tables to minimize the total number of strings. The cost is string table lookup which is log2(n).

In a Stream Star Schema, all strings are stored in a single global table that is NOT sorted.
Thus the global string table can contain duplicates. The cost of this technique is thus decreased disk space efficiency. But the benefit is that string table insertion is a constant.
Just add the string to the end of the table.

The end result is that Stream Star Schema insertion performance is constant as a function of database size. Correspondingly, Star Schema insertion performance is log2(n) and is thus a function of database size. Star Schema insertion performance thus decreases as the database size increases.

So I have now presented an overview of OLTP database and OLAP. I have also addressed problems with OLTP and OLAP that I have found in my research. My next blog post will present a list of projects and papers that currently comprise my PhD Thesis.