Openstack Swift Data Storage Explained

I would like to take this opportunity to describe how Openstack Swift stores data.  Swift storage basically consists of four components: Ring, Database, Zones, and File system.

Openstack Swift Accounts, Containers & Objects

Openstack Swift data consists of Accounts, Containers, and Objects. A Swift item then, can be an account, a container, or an object.  A Swift Cluster can have multiple accounts, an account can have multiple containers, and a container can have multiple objects.  Containers cannot be nested.  That is, a container can only contain objects,  it can not contain other containers.  So Swift storage is considered to be a flat file system.

By default, Swift uses three replicas.  This means that when an item get created (via a PUT),  two additional copies are also created.  Thus if two copies of the item are not available, the item can still be read.

Two Types of Nodes

Basically, a Swift cluster consists of two types of nodes: Proxy and Storage.  Proxy Nodes provide an external interface into the Swift repository.  Proxy nodes perform authentication and provide a REST API to the data.  Storage nodes are used to store the actual data items: account, container, and object.  Storage nodes consist of a number of Storage Devices.  Each Storage Device is mounted as a separate file system using XFS.  A Swift Cluster could (and should) contain a number of Proxy and Storage nodes.

Openstack Swift groups Storage nodes into Zones.  By default, a Swift Cluster has four zones.  Thus if a Swift Cluster has 40 Storage nodes, then each Zone consists of 10 Storage nodes.  A zone is commonly a single Storage Rack, where a Storage Rack consists of multiple Storage nodes.  Item replicas are guaranteed to be distributed accross zones.  Thus the three replicas for a given Item are guaranteed to reside on different zones.  This greatly improves data reliability: if a Storage Rack is down, then only a single replica is unavailable.

So how do Proxy nodes keep track of Storage nodes?

If a Proxy node receives a GET Object operation, then the object is stored on three different Storage nodes.  How does the Proxy Node figure out which Storage Nodes are holding the Object data?  The answer is: by using Consistent Hashing.  In Openstack Swift Data Storage, this caching is called a “ring.”  There is a separate ring for accounts, containers, and objects.

Next time I’ll cover more about Rings.