K8s – Volume and PersistentVolume

What is a K8s Volume ?

When we create or restart a container it starts with a fresh workspace. Hence, we can not save any data into its workspace for a re-use.

K8s Volume enables us to plug-in an external storage into the container workspace. Because of the plugged-in storage, the containers can save and re-use data across restarts and failures. K8s supports a wide range of storage types in this regard.

The Volume provides the simplest way for a Pod to plug-in an external storage.

Note : Apart from external storage plug-in, the regular volume supports some other special storage solutions like ConfigMaps, Secret, emptyDir() where the files are not supposed to be stored on an external storage beyond the life of the containers.

PersistentVolume

A storage handle which help us to provision the storage, manage and monitor the usage.

The Volume being part of a pod, it follows the life cycle of a pod and provides only a runtime reference to the storage. But, to manage the storage beyond the usage time of the pods, K8s needs a standalone handle which is addressed by the PersistentVolume.

The PersistentVolume is a cluster level object and holds a reference to external storage as shown above. Since its a standalone entity, it enables the admin to provision our required capacity of storage well in advance. Its also enables us to monitor the usage of the storage.

The PersistentVolume allows us to reserve the storage for a specific namespace, using a volume claim.

Moreover, through its reclaim policy we can retain its data and not allow others to tamper it even beyond the project usage.

As we compare the Volume inside a Pod & PersistentVolume API objects, we can notice both can point to similar storage. But, the later being a standalone object, provides the extra controls as mentioned just above.

apiVersion: v1
kind: Pod
metadata:
  name: my-app
spec:
  containers:
    - name: demo
      image: sample/my-app
      volumeMounts:
      - name: vol-myapp
        mountPath: "/data/order"
  volumes:
    - name: vol-myapp
      hostPath:
           path:  /data/vol-01
           type: DirectoryOrCreate
apiVersion: v1
kind: PersistentVolume
metadata:
 name: pv-01 
 labels:
   type: local
spec:
 storageClassName: manual     
 capacity:
   storage: 5Gi
 accessModes:
   - ReadWriteOnce 
 persistentVolumeReclaimPolicy: Retain
 hostPath:
   path: /data/vol-01
   type: DirectoryOrCreate

Note : A volumeMounts makes the plugged-in storage available inside the container workspace as a local directory.

Deploy a PersistentVolume

The scripts below shows how can we deploy and verify a PersistentVolume.

$ kubectl apply -f pv.yaml
persistentvolume/pv-01 created

$ kubectl get pv -o wide
NAME    CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM   STORAGECLASS   REASON   AGE   VOLUMEMODE
pv-01   5Gi        RWO            Retain           Available           manual                  34s   Filesystem


//Note : The status shows available as the claim part is empty.

Since PV is a non-namespaced object, it is available across all namespaces. A PersitentVolumeClaim object from any namespace can bind to a PV, if it meets the requirement, and reserve it for that namespace as we have seen in diagram above.

 

PersistentVolumeClaim

Volume claim requirement which enable us to reserve a PV for our usage.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: task-pvc-claim
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
  storageClassName: manual

This highlighted part in the claim object holds the minimum requirement of our project. When we deploy a claim, it looks for its closet match among the available PV and binds into that as shown below.

Once a PVC claims a PV, the PV becomes unavailable for others. It will be available for other claims, only when the current claim releases it and the PV is recycled. It avoids any project conflicts in using a volume.

$ kubectl apply -f pvc.yaml
persistentvolumeclaim/task-pvc-claim created

$ kubectl get pv,pvc -o wide

NAME                     CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                    STORAGECLASS   REASON   AGE   VOLUMEMODE
persistentvolume/pv-01   5Gi        RWO            Retain           Bound    default/task-pvc-claim   manual                  15m   Filesystem

NAME                                   STATUS   VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS   AGE   VOLUMEMODE
persistentvolumeclaim/task-pvc-claim   Bound    pv-01    5Gi        RWO            manual         39s   Filesystem


// We can see the PV and PVC are now bound to each other
//(ref. STATUS, CLAIM on PV & STATUS,VOLUME on PVC).

//The above PVC belongs to default namespace.
//The PVs are shared resources and never belong to a namespace.

Note : As we see here, the capacity of the claim shows to be 5GB. It just shows that, the claim only contains the minimum requirement. And, the PV decides the available capacity.

 

PVC makes choice of storage flexible

# A regular volume pointing a storage
volumes:
  - name: myapp-volume
    awsElasticBlockStore:     
      volumeID: <volume-id>
      fsType: ext4

//A Volume needs the technical details 
//of the targeted storage 
# A volume using pvc
volumes:
  - name: myapp-volume
    persistentVolumeClaim:
         claimName: pvc-myapp  

//While using PV through PVC,
//we just need the name of the PVC

First, as we can see above, in case of PVC we just specify the claim name, whereas a regular volume has to hard code the storage details. Hence, in case the storage type changes, we will have to change all pods using it.

Second, just being a requirement, a PVC can use any storage types provisioned for it by an Admin. It can be a NFS, glusterfs, google PersistentDisk etc .

 

StorageClass

It enables us to provision PVs dynamically.

Usually an Admin provisions the PVs to meets the storage requirements.

But, in large systems, in distributed database systems or clustered database systems, we might need lots of separate volumes. Moreover, the number of volumes we need may be dynamic in nature. In such cases, dynamic provisioning of PVs becomes very useful.

A storageClass can create the PVs dynamically as per the need of our PVC. We can define each storageClass to provision volumes from a particular storage type. And, the PVC need to add the required storageClass through annotations as shown below.

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: sc-ssd
/*annotations:
    storageclass.kubernetes.io/is-default-class: "true"*/
provisioner: kubernetes.io/gce-pd
parameters:
  type: pd-ssd  

//Dynamically provisions SSD storage 
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pvc-project-X
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: sc-ssd
  resources:
    requests:
      storage: 10Gi

//The PVC points to the required storageClassName
//as part of its requirements

The example shows a StorageClass which dynamically provisions a SSD storage from a Google Persistent Storage. The PVC refers to the class using storageClassName.

Note : In the above example, we can make the StorageClass our default storageClass by un-commenting the annotation. Any PCV not specifying a storageClassName, uses the default storageClass.

 

Different storage classes supported by Kubernetes

K8s provides a wide range of volume plug-ins. Here is a subset of those, categorized under different storage types.

  • Node-Local storage : emptyDir(),hostPath, local
  • Network File System Based Storage: NFS, cephfs, cinder
  • Distributed File System Based Storage : glusterfs, quobyte
  • Cloud Storage : gcePersistentDisk, awsElasticBlockStore, azureDisk, azureFile
  • Cloud Native Storage Solutions : portworxVolume, storageos, flocker,scaleIO
  • Special Volume Types : ConfigMap, Secret