K8s – StatefulSets

In a stateless application the instances are interchangeable, as they have the same backing state (or source of data).

Whereas in a distributed data system like kafka , cassandra, Solar, each computing node has its own data or state. Similarly, in clustered database like Redis or MySql each node has a server and its own replicated data store. When a server goes down, the replacing server has to connect to the same data replica that lost its server.

Because each instance has its unique state, we can not replace it with a new one or interchange it with any existing one. So, we only have to restore it using its backing state. Due to their unique backing states, we call them as stateful applications.

K8s allows us to deploy and manage such stateful applications using StatefulSets.


StatefulSets – How does it manage its unique instances ?


The diagram below with a MySql cluster with three instances show the various aspects of how K8s manages a StatefulSets.

As we have seen above, managing these stateful applications is all about :

  1. Knowing that each of its instances are unique and
  2. We have to restore an instance along with its backing state, if it goes down.

With this in mind let us see how the StatefulSets handle it.



Unique Instances

StatefulSets assigns an ordinal index to each set.

Since each instance is unique, the StatefulSets assigns an ordinal index(1…N) to each of them as an unique identifier. Moreover, both the POD(mysql) and the backing state(my-pvc), use this as part of their names. Thus, if the POD goes down, we can easily identify its corresponding state using the index number.

As shown above, the statefulsets creates the sets in order(0,1…N-1) when we deploy. We can make it parallel though if we need so.

Coming to the deletion, it deletes them in the reverse order when we un-deploy.

  • Each POD gets a host-name of the form- $(statefulsets-name)-$(index)
  • Each PVC gets its unique name of the form $(pvc-template-name)-$(index)


Stable Storage

Each instance uses a separate persistent volume.

Because the states are unique, each instance uses a separate persistent volume(PV) to maintain its state. Since the PVC-PV is a one-to-one mapping, we use a volumeClaimTemplates as shown, to dynamically create claims(PVC) for each instance.

The PVCs are shown using ‘my-pvc-0’, ‘my-pvc-1’ and ‘my-pvc-2’ in the diagram.

  - metadata:
      name: my-pvc
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: "ssd-vol-class"
          storage: 1Gi

As regards the PVs, we have used a StorageClass that will create the PVs dynamically for each claim as shown.

Failover Management

Its a restoration using stable host name; not a complete replacement.

When a POD, mysql-2 as shown, goes down, the StatefulSets assigns the same name to the replacing instance.

The ordinal index of the replaced POD help us find the matching PVC, here it is my-pvc-2.

Moreover, since we re-use the host name of the failing POD, the stable host name makes the replacement transparent for it’s client. Thus, the instance gets restored with its hostname as well as backing state .

Note : In the our example, the default ordered creation of the instances is fine, as we do not want every instance to replicate from the master. Using ordered creation, each new instance will replicate its database from it previous instance.

In case, such dependencies are not there, we can opt for parallel instantiation as well using,

podManagementPolicy: Parallel


Services to Access the Satefulsets



For getting separate service url for each instance.

Since the instances are unique, there are many cases where we might need to access a particular instance. For instance, we would like to access the master instance for our write transactions.

We can use the headless service for getting separate urls to meet such requirements.

  • It creates a service domain as follows : (assuming, Namespace = p1-dev, Headless Service Name=mysqlservice)
    • $(service-name).$(namespace).svc.cluster.local
    • mysqlservice.p1-dev.svc.cluster.local
  • Each POD gets a sub-domain of the form : $(pod-name).$(service-domain)
    • mysql-0.mysqlservice.p1-dev.svc.cluster.local
    • mysql-1.mysqlservice.p1-dev.svc.cluster.local and
    • mysql-2.mysqlservice.p1-dev.svc.cluster.local gets created as shown in the diagram
  • These service sub-domain names allows us to target individual PODs.

Use it for getting single load balanced service to access a cluster

For example, we would require load-balanced service for our read-only access. For such cases, we can deploy cluster services for getting single load-balanced urls.