A PhD Thesis is not a single paper or project but rather a body of work. It is a progression of projects, papers, and research which is hopefully under the umbrella of a single unifying theme or subject.
With that said, the subject of my thesis is “Streaming OLAP”, where OLAP is OnLine Analytical Processing. OLAP is a contrast to OLTP, which is OnLine Transaction Processing. So how do OLTP, OLAP, and Streaming OLAP relate to each other?
OLTP can be thought of as traditional processing performed on relational databases. These databases typically consisted of megabytes of data. This was back in the 1970s. Such database queries were highly detailed and required fast response times.
The problem with OLTP is that modern databases commonly consist of terrabytes of data.
And there are databases that contain petabytes of data. And there is talk of databases that use exabytes of data. So OLTP performance falls apart under these conditions.
PhD Thesis & OLAP
Enter OLAP. OLAP is designed to deal with large amounts of data: terrabytes, petabytes, and exabytes. It does this by changing the query paradigm. Instead of a highly detailed query focus, OLAP uses a less detailed grouping focus for its queries. The best way to explain this is via an example. So here goes.
Assume that we have a database table that is used to track items that are sold across multiple stores. We will call this database table the “sales” table with the following attributes:
*) store – the name of the store.
*) item – the name of the item sold.
*) price – the sales price of the item.
*) count – the number of items sold of this type and price at this store.
So a basic OLTP query for this table would be:
“SELECT store, item, price, count from sales group by store”.
The corresponding OLAP query is:
“SELECT COUNT(store, item) from sales group by store”.
So the OLTP query wants complete details on each item sold and wants the results
grouped by the store name. The OLAP query wants the count of the number of items sold for each store. on each item sold and wants the results grouped by the store name.
So the big difference between OLTP and OLAP is that OLTP is concerned with details, OLAP is concerned with counts. And since OLAP is not concerned with details, it can construct a higher level view of the database called a hypercube. The OLAP hyper cube thus contains counts or aggregates for an underlying OLTP database.
So this blog post has presented an overview of my PhD Thesis: OLTP and OLAP. In my next blog post, I will present problems with OLTP and OLAP and my solutions.