The Swift MySQL RESAR project had less than stellar results on its iteration. Last time I presented a new approach to Swift RESAR: the Stream Star Schema. In this post we will present results for this new approach.
Implementing Stream Star Schema to MySQL RESAR
Once the Stream Star Schema was defined, the Swift RESAR database was also implemented as a Stream Star Schema. The following is the Stream Star Schema for RESAR:
MetaData Fact Table
- CreateTime – string
- DiskletSize – int32
Device Fact Table
- ID – int32
- CreateTime – string
- HostName – string, indexed
- DeviceName – string, indexed
- DeviceStart – int64
- DeviceSize – int64
- InUse – int8
- NumDisklets – int32
Disklet Fact Table
- ID – int32
- DeviceID – int32, indexed
- DeviceIndex – int64
- Type – string
ReliabilityGroup Fact Table
- ID – int32
- CreateTime – string
- NumDisklets – int32
- InUse – int8
ReliabilityGroupsDisklets Fact Table
- ID – int32
- ReliabilityGroupID – int32, indexed
- DiskletID – int32, indexed
- DiskletType – string
- DeviceID – int32, indexed
Final Piece of Puzzle
But there was a final piece to the puzzle. The stream star schema needed a query language, like SQL. Once again, Python came to the rescue. It turns out that manipulating Python dictionaries is a lot like executing SQL queries. Thus, in this case, Python code was remarkably similar to SQL. For example, consider the following code that deletes a device from the RESAR MySQL database, given a device ID (DEVICE):
#
# Determine if the Device ID is valid.
#
string = “SELECT InUse FROM Device WHERE (ID == ‘%s’)” % (DEVICE)
cur.execute(string)
if cur.rowcount != 1:
raise Exception(“Not a Device.”)
#
# Determine if the Device is in use.
#
row = cur.fetchone()
InUse = row[0]
if InUse != 0:
raise Exception(“Device is in use.”)
#
# Delete the disklets
#
string = “DELETE FROM Disklet WHERE DeviceID == %s” % (DEVICE)
cur.execute(string)
#
# Delete the device
#
string = “DELETE FROM Device WHERE ID == %s” % (DEVICE)
cur.execute(string)
Now consider the corresponding code that deletes a device from the RESAR stream star schema database, given a stream star schema (SSS) and device ID (DEVICE):
#
# Determine if the Device ID is valid.
#
if DEVICE < 0 or DEVICE >= len(SSS[‘fact_tables’][‘device’][‘data’]):
raise Exception(“Unkown Device.”)
if “_deleted_” in SSS[‘fact_tables’][‘device’][‘data’][DEVICE]:
raise Exception(“Unkown Device.”)
#
# Determine if the Device is in use.
#
if SSS[‘fact_tables’][‘device’][‘data’][DEVICE][‘inuse’] == 1:
raise Exception(“Device is in use.”)
#
# Delete the disklets.
#
for index in SSS[‘fact_tables’][‘disklet’][‘dimensions’][‘deviceid’][‘index’][DEVICE]:
del SSS[‘fact_tables’][‘disklet’][‘dimensions’][‘deviceid’][‘index’][DEVICE][index]
if len(SSS[‘fact_tables’][‘disklet’][‘dimensions’][‘deviceid’][‘index’][DEVICE]) == 0:
del SSS[‘fact_tables’][‘disklet’][‘dimensions’][‘deviceid’][‘index’][DEVICE]
SSS[‘fact_tables’][‘disklet’][‘data’][disklet_index][‘_deleted_’] = 1
#
# Delete the device.
#
SSS[‘fact_tables’][‘device’][‘data’][DEVICE][‘_deleted_’] = 1
Results with All Pieces Assembled
With all of the pieces in place, we could now construct our RESAR stream star schema database. The following shows database construction time for a given number of disk devices.
stream count = 100
total write time = 0:00:00.011036
write time per stream = 0.00011036
MySQL/SSS ratio = 1147
stream count = 1,000
total write time = 0:00:00.111014
write time per stream = 0.000111014
MySQL/SSS ratio = 1147
stream count = 10,000
total write time = 0:00:01.149159
write time per stream = 0.0001149159
MySQL/SSS ratio = 1082
stream count = 100,000
total write time = 0:00:11.503280
write time per stream = 0.0001150328
MySQL/SSS ratio = 1092
stream count = 1,000,000
total write time = 0:01:52.200627
write time per stream = 0.000112200627
MySQL/SSS ratio = 1108
So for a cloud cluster of 1 million devices, database construction time was (on
average) 0.00011 seconds for a single device and reliability group. It only required less than 2 minutes to create the entire database of 1 million devices and 1 million reliability groups. These results were greatly significant when compared to the Star Schema. In
fact, the Stream Star Schema creation time was on average 1,108 times faster than Star Schema creation. That is a order of magnitude of 3. The following shows database query times for the tables.
Query Device Table by (ID)
average time = .000006425888
factor = 60
Query Device Table by (HostName, DeviceName)
average time = .00000221678
factor = 207
Query Disklet Table by (ID)
average time = .000000501782666667
factor = 562
Query Disklet Table by (DeviceID)
average time = .000001366214
factor = 233
Query ReliabilityGroup Table by (ID)
average time = .000000525165
factor = 519
Query ReliabilityGroupsDisklets Table by (ID)
average time = .000000524541
factor = 519
Query ReliabilityGroupsDisklets Table by (ReliabilityGroupID)
average time = .000001360337
factor = 264
The minimum query time was .000000501782666667 seconds and the maximum was .000006425888 seconds. This is a factor 60 – 562 times faster than the star schema database, which is significant.
So we have presented the performance results for Swift RESAR and the Stream
Star Schema. The next and final post for this discussion will summarize the project.