In my last two posts (Python’s Strengths & Weaknesses) I have been describing the operation of Openstack Swift Storage. Swift storage basically consists of four components: Ring, Database, Zones, and File system. I’m proposing some performance improvements to this design. But first we need to understand the Swift database schema. An Openstack Account Database consists of two tables: Account Stat and Container. And a Container Database consists of two tables: Container Stat and Object.
Openstack Account Stat Table
The following is a detailed view of the Openstack Database Account Stat Table:
account
created_at
put_timestamp
delete_timestamp
container_count
object_count
bytes_used
hash
id
status
status_changed_at
metadata
Openstack Container Table
And the following is a detailed view of the Container Table:
ROWID
name
put_timestamp
delete_timestamp
object_count
bytes_used
deleted
Notice that both the Account Stat Table and the Container Table have deleted attributes. These attributes are required since rows in these tables are never deleted, they are just marked as deleted. The reason that rows are not deleted is that this would require some time of synchronization (locking), in case another thread was accessing the same database. And we all know that locking in the Cloud is a very bad thing, it would destroy scaling. So these tables are append or update, deletes are not allowed.
This is all well and good for performance, but happens when these tables grow? The Account Stat Table will never grow, it will always have one and only one row. But the Container Table will grow with time, as containers are created for the account. So what happens to SQLite performance when the Container Table gets large? First, since an SQLite database is a file, file performance will degrade as the file grows. Second, database query performance will also degrade as the file grows.
Container Database
The following is a detailed view of the Container Stat Table:
account
container
created_at
put_timestamp
delete_timestamp
object_count
reported_put_timestamp
reported_delete_timestamp
reported_object_count
reported_bytes_used
hash
id
status
status_changed_at
metadata
Object Table
And the following is a detailed view of the Object Table:
ROWID
name
created_at
size
content_type
etag
deleted
Notice that both the Container Stat Table and the Object Table have deleted attributes, just like the Account Database, and for the same reasons. So both the Container and Objects Tables will have performance problems as containers and objects are created over time.
Next time I’ll propose a solution that consists of two parts: MySQL and Database Chunking.