Kubernetes Volumes

Motivation for volumes

Containers in a pod share the same network stack but each has its own file system. It can be useful to share data between containers, for example having an initcontainer prepare some files that the main container depends on. The file system of containers also are limited to the lifetime of the container. This can present undesirable side effects. For example if the data tier container we are using in our examples crashes or fails a liveness probe it will be restarted and all of the data it had been storing will be lost forever.

In this post, I will cover different ways Kubernetes handles non-ephemeral data, allowing us to separate data from containers. We will see Kubernetes Volumes, and Kubernetes PersistentVolumes. By the end of this post, our goal is to deploy the data-tier for our sample application using PersistentVolumes so that the data can outlive the data-tier pod. Again, this post builds on the code from the previous posts specifically the deployments post. Let’s first discuss more about the options for storing persistent data and then apply them to our data tier.

Kubernetes includes two different data storage types. Both are used by mounting a directory in a container and can be shared by containers in the same pod. Pods can also use more than one Volume and PersistentVolume. Their differences are mainly in how their lifetime is managed. One type exists for the lifetime of a particular pod, and the other is independent from the lifetime of pods.

Volumes

Volumes are tied to a pod and their lifecycle. Volumes are used to share data between containers in a pod and to tolerate container restarts. Although you can configure volumes to use durable storage types that survive pod deletion, you should consider using volumes for non-durable storage that is deleted when the pod is deleted.

The default type of volume, is called empty_dir, and it creates an initially empty directory on the node running the pod to back the storage used by the volume. Any data written to the directory remains if a container in the pod is restarted. Once the pod is deleted the data in the volume is permanently deleted.

It’s worth noting that since the data is stored on a specific node, if a pod is rescheduled to a different node, the data will be lost. If the data is too valuable to lose when a pod is deleted or rescheduled, you should consider using PersistentVolumes.

PersistentVolumes

PersistentVolumes are independent from the lifetime of pods and is separately managed by Kubernetes. They work a little bit differently than volumes.

Pods may claim a persistent volume, and use it throughout their lifetime.

PersistentVolumes will continue to exist outside of their pods. Persistent volumes can even be mounted by multiple pods on different nodes -if the underlying storage supports multiple readers or writers.

Persistent volumes can be provisioned statically in advance by a cluster admin or dynamically for more flexible self-serve use cases.

PersistentVolume Claims or PVC

Pods must make a request for storage before they can use a persistent volume. The request is made using a persistent volume claim or pvc. A PVC declares how much storage the pod needs, the type of persistent volume, and the access mode. The access mode describes how the persistent volume is mounted whether it is read only or read write and if it can be mounted by one node or many. There are three supported access modes to choose from: read-write once, read-only many, or read-write many. If there isn’t a persistent volume available to satisfy the claim and dynamic provisioning isn’t enabled, the claim will stay in a pending state until such a persistent volume is available.

The persistent volume claim is connected to a Pod by using a regular volume with the type set to persistent volume claim.

Storage Volume types

Both volumes and PersistentVolumes may be backed by a wide variety of volume types. As we learned before, it is usually preferable to use persistent volumes for more durable types and volumes for more ephemeral storage needs. Durable volume types include the persistent disks of many cloud vendors such as Google Cloud engine persistent disks, Azure Disks, and Amazon elastic block store. There’s also support for more generic volume types such as network file system or NFS, and iSCSI.

Demo

That is quite a lot to take in but everything should solidify with an example. Our objective is to use a PersistentVolume for the sample applications data-tier since we want the data to outlive its pod. In our example the cluster has an Amazon elastic block store volume statically provisioned and ready for us to use.

To see dynamic provisioning in action I will cover this in another post: “Deploy a Stateful Application in a Kubernetes Cluster”.

What is the issue we are trying to address?

Before we get into volumes I want to cement the issue we are trying to solve. We can illustrate the issue of pod containers losing their data when they restart by forcing a restart of the data tier pod. First of all, let’s look at the counter that will be running after we create the 3-tier application from deployments post.

ubuntu@ip-10-0-128-5:~# kubectl create -f 5.1-namespace.yaml
namespace/deployments created

ubuntu@ip-10-0-128-5:~/src# kubectl create -f 5.2-data_tier.yaml -f 5.3-app_tier.yaml -f 5.4-support_tier.yaml -n deployments
service/data-tier created
deployment.apps/data-tier created
service/app-tier created
deployment.apps/app-tier created
deployment.apps/support-tier created

ubuntu@ip-10-0-128-5:~/src# kubectl get -n deployments deployments.
NAME           READY   UP-TO-DATE   AVAILABLE   AGE
app-tier       1/1     1            1           14s
data-tier      1/1     1            1           14s
support-tier   1/1     1            1           14s
ubuntu@ip-10-0-128-5:~/src#

Check the counter logs using kubectl -n deployments logs support-tier-58d5d545b6-clltf poller --tail 1

ubuntu@ip-10-0-128-5:~/src# kubectl -n deployments logs support-tier-58d5d545b6-clltf poller --tail 1
Current counter: 1350
ubuntu@ip-10-0-128-5:~/src# kubectl -n deployments logs support-tier-58d5d545b6-clltf poller --tail 1
Current counter: 1370
ubuntu@ip-10-0-128-5:~/src# kubectl -n deployments logs support-tier-58d5d545b6-clltf poller --tail 1
Current counter: 1386
ubuntu@ip-10-0-128-5:~/src#

Sure enough the counter is getting incremented.

Kill the container to emulate pod restart

Now if I force the pod to be restarted we can observe the impact on the counter. One way to do that is to kill the redis process which will cause the data-tier container to exit and the data-tier pod will automatically restart it. We can use the exec command allows us to run a command inside of a container, the same way docker exec does. Let’s open a bash shell inside the container: kubectl exec -n deployments data-tier-599bc4fcf8-p5d86 -it /bin/bash

ubuntu@ip-10-0-128-5:~/src# kubectl exec -n deployments data-tier-599bc4fcf8-p5d86 -it /bin/bash
root@data-tier-599bc4fcf8-p5d86:/data#

The change of command prompt tells us we are in the container now. We can now use the kill command to stop the main process of the container. But what is the ID of the process? The ID of the main process, which is redis in this case, will always be one since it is the first process that runs in the container.

root@data-tier-599bc4fcf8-p5d86:/data# kill 1
root@data-tier-599bc4fcf8-p5d86:/data# command terminated with exit code 137

ubuntu@ip-10-0-128-5:~/src# kubectl get -n deployments deployments.
NAME           READY   UP-TO-DATE   AVAILABLE   AGE
app-tier       1/1     1            1           10m
data-tier      1/1     1            1           10m
support-tier   1/1     1            1           10m
ubuntu@ip-10-0-128-5:~/src#

ubuntu@ip-10-0-128-5:~/src# kubectl -n deployments get pods
NAME                            READY   STATUS    RESTARTS   AGE
app-tier-748cdbdcc5-fjpcv       1/1     Running   1          16m
data-tier-599bc4fcf8-p5d86      1/1     Running   1          16m
support-tier-58d5d545b6-clltf   2/2     Running   0          16m
ubuntu@ip-10-0-128-5:~/src#

ubuntu@ip-10-0-128-5:~/src# kubectl -n deployments logs support-tier-58d5d545b6-clltf poller --tail 1
Current counter: 114
ubuntu@ip-10-0-128-5:~/src# kubectl -n deployments logs support-tier-58d5d545b6-clltf poller --tail 1
Current counter: 133
ubuntu@ip-10-0-128-5:~/src# kubectl -n deployments logs support-tier-58d5d545b6-clltf poller --tail 1
Current counter: 147
ubuntu@ip-10-0-128-5:~/src# kubectl -n deployments logs support-tier-58d5d545b6-clltf poller
...
Current counter: 3561
Current counter: 3571
Current counter: 3587
Current counter: 3604
Current counter:
Current counter:
Current counter: 6
Current counter: 11
Current counter: 23
Current counter: 35
Current counter: 53
Current counter: 61
Current counter: 75
Current counter: 87
Current counter: 92
Current counter: 98

The output tells us that yes there has been a restart of the pod. Also the counter value through the poller logs reveals that it was reset when the pod restarted. This is what we want to avoid.

Create a new namespace called volumes

Let’s start by creating a new volumes namespace

ubuntu@ip-10-0-128-5:~/src# kubectl create -f 9.1-namespace.yaml
namespace/volumes created
ubuntu@ip-10-0-128-5:~/src#

Data-tier manifest

Now, on to the data tier. There are three additions to the manifest:

  1. a persistent volume,
  2. a persistent volume claim, and
  3. a volume to connect the claim to the pod.
apiVersion: v1
kind: Service
metadata:
  name: data-tier
  labels:
    app: microservices
spec:
  ports:
  - port: 6379
    protocol: TCP # default
    name: redis # optional when only 1 port
  selector:
    tier: data
  type: ClusterIP # default
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: data-tier-volume
spec:
  capacity:
    storage: 1Gi # 1 gibibyte
  accessModes:
    - ReadWriteOnce
  awsElasticBlockStore:
    volumeID: INSERT_VOLUME_ID # replace with actual ID
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: data-tier-volume-claim
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 128Mi # 128 mebibytes
---
apiVersion: apps/v1 # apps API group
kind: Deployment
metadata:
  name: data-tier
  labels:
    app: microservices
    tier: data
spec:
  replicas: 1
  selector:
    matchLabels:
      tier: data
  template:
    metadata:
      labels:
        app: microservices
        tier: data
    spec: # Pod spec
      containers:
      - name: redis
        image: redis:latest
        imagePullPolicy: IfNotPresent
        ports:
          - containerPort: 6379
            name: redis
        livenessProbe:
          tcpSocket:
            port: redis # named port
          initialDelaySeconds: 15
        readinessProbe:
          exec:
            command:
            - redis-cli
            - ping
          initialDelaySeconds: 5
        volumeMounts:
          - mountPath: /data
            name: data-tier-volume
      volumes:
      - name: data-tier-volume
        persistentVolumeClaim:
          claimName: data-tier-volume-claim

PersistentVolume (PV)

First, is the PersistentVolume. It is the raw storage where data is ultimately written to by the pod’s container. It has a declared storage capacity and other attributes. Here, we’ve allocated 1 gibibyte. The access mode of ReadWriteOnce means this volume may be mounted for reading and writing by a single node at a time. Note that it is a limit on node attachment and not pod attachment. PersistentVolumes may list multiple access modes and the claim specifies the mode it requires. The persistent volume can only be claimed in a single accessmode at any time. Lastly we have an awsElasticBlockStore mapping which is specific to the type of storage backing the PV. You would use a different mapping if you were not using an EBS volumefor storage. The only required key for aws elastic block store is the volume ID which uniquely identifies the EBS volume. It will be different in your environment than mine so I’ve added an insert volume id placeholder that we will replace before we create the PV.

Persistent Volume Claim (PVC)

Next we have the persistent volume claim. The PVC spec outlines what it is looking for in a PV. For a PV to be bound to a PVC, it must satisfy all of the constraints in the claim. We are looking for a PV that provides the read-write once access mode and has at least 128 mebibytes of storage. The claim request is less than or equal to the persistent volumes capacity and the access mode overlaps with the available access modes in the PV. This means the PVC request is satisfied by our PV and will be bound to it.

Volumes in Deployment’s pod template

Lastly, the deployments template now includes a volume which links the PVC to the deployment’s pod. This is accomplished by using the persistentvolume claim mapping and setting the claim name to the name of the pvc which is data tier volume claim. You will always use persistent volume claim when working with PVs. If you wanted to use an ephemeral storage volume you would replace it with an emptyDir mapping or other types that don’t connect to a PV.

VolumeMounts

Volume can be used in the pod’s containers and init containers but they must be mounted to be available in the containers. The volume mounts list includes all the volume mounts for a given container. The mountPaths for different containers can be different even if the volume is the same. In our case we only have one and we are mounting the volume at /data which is where redis is configured to store its data. This will cause all of the data to be written to the PV.

VolumeID placeholder for EBS

Now we are left with replacing the volume ID placeholder with the actual ID of the Amazon EBS volume the lab environment created for us. You could get it from the EC2 console in your browser but we’ll use the AWS CLI for this example. The volume can be obtained from the aws ec2 describe command.

aws ec2 describe-volumes --region=us-west-2 --filters="Name=tag:Type,Values=PV" --query="Volumes[0].VolumeId" --output=text

ubuntu@ip-10-0-128-5:~/src# aws ec2 describe-volumes --region=us-west-2 --filters="Name=tag:Type,Values=PV" --query="Volumes[0].VolumeId" --output=text
vol-09bc5324eb947dcdb
ubuntu@ip-10-0-128-5:~/src# vol_id=$(aws ec2 describe-volumes --region=us-west-2 --filters="Name=tag:Type,Values=PV" --query="Volumes[0].VolumeId" --output=text)
ubuntu@ip-10-0-128-5:~/src# sed -i "s/INSERT_VOLUME_ID/$vol_id/" 9.2-pv_data_tier.yaml
ubuntu@ip-10-0-128-5:~/src#

The filter selects only the PV volume which is labeled with a Type = PV tag and the query outputs only the volume ID property of the volume. I’ll store the id in a variable named vol_id. Then we can use stream editor or sed to substitute the the occurrence of INSERT_VOLUME_ID with the volume ID stored in vol_id.

Create the data-tier

And with that we are ready to create the data-tier using a persistent volume. We’ll also create the app and support tiers which don’t have anything new compared to previous versions.

ubuntu@ip-10-0-128-5:~/src# kubectl create -n volumes -f 9.2-pv_data_tier.yaml -f 9.3-app_tier.yaml -f 9.4-support_tier.yaml
service/data-tier created
persistentvolume/data-tier-volume created
persistentvolumeclaim/data-tier-volume-claim created
deployment.apps/data-tier created
service/app-tier created
deployment.apps/app-tier created
deployment.apps/support-tier created
ubuntu@ip-10-0-128-5:~/src#

Describe the PVC

Let’s get the persistent volume claim which has the short name of pvc in kubectl to confirm the claim’s request is satisfied by the PV

ubuntu@ip-10-0-128-5:~/src# kubectl describe -n volumes pvc
Name:          data-tier-volume-claim
Namespace:     volumes
StorageClass:  gp2
Status:        Bound
Volume:        pvc-7eda5dd0-38e6-46a5-a8e5-f4e2a098f4d3
Labels:        <none>
Annotations:   pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
               volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/aws-ebs
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      1Gi
Access Modes:  RWO
VolumeMode:    Filesystem
Mounted By:    data-tier-8689f7ffc-nk8h4
Events:
  Type    Reason                 Age   From                         Message
  ----    ------                 ----  ----                         -------
  Normal  ProvisioningSucceeded  40s   persistentvolume-controller  Successfully provisioned volume pvc-7eda5dd0-38e6-46a5-a8e5-f4e2a098f4d3 using kubernetes.io/aws-ebs
ubuntu@ip-10-0-128-5:~/src#

The Status of Bound confirms that the PVC is bound to the PV.

Describe the data-tier pod to check the event logs

Now if we describe the data-tier pod

ubuntu@ip-10-0-128-5:~/src# kubectl describe -n volumes pod data-tier-8689f7ffc-nk8h4
Name:           data-tier-8689f7ffc-nk8h4
Namespace:      volumes
Priority:       0
Node:           ip-10-0-27-39.us-west-2.compute.internal/10.0.27.39
Start Time:     Tue, 05 May 2020 23:37:44 +0000
Labels:         app=microservices
                pod-template-hash=8689f7ffc
                tier=data
Annotations:    <none>
Status:         Running
IP:             192.168.95.67
Controlled By:  ReplicaSet/data-tier-8689f7ffc
Containers:
  redis:
    Container ID:   docker://4f284571569b06d4ef337030f401856b30b74b83ad3b06449ec38514f3e6223a
    Image:          redis:latest
    Image ID:       docker-pullable://redis@sha256:f7ee67d8d9050357a6ea362e2a7e8b65a6823d9b612bc430d057416788ef6df9
    Port:           6379/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Tue, 05 May 2020 23:37:59 +0000
    Ready:          True
    Restart Count:  0
    Liveness:       tcp-socket :redis delay=15s timeout=1s period=10s #success=1 #failure=3
    Readiness:      exec [redis-cli ping] delay=5s timeout=1s period=10s #success=1 #failure=3
    Environment:    <none>
    Mounts:
      /data from data-tier-volume (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-jpwt8 (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  data-tier-volume:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-tier-volume-claim
    ReadOnly:   false
  default-token-jpwt8:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-jpwt8
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason                  Age                    From                                               Message
  ----     ------                  ----                   ----                                               -------
  Warning  FailedScheduling        2m18s (x3 over 2m20s)  default-scheduler                                  pod has unbound immediate PersistentVolumeClaims (repeated 2 times)
  Normal   Scheduled               2m15s                  default-scheduler                                  Successfully assigned volumes/data-tier-8689f7ffc-nk8h4 to ip-10-0-27-39.us-west-2.compute.internal
  Normal   SuccessfulAttachVolume  2m13s                  attachdetach-controller                            AttachVolume.Attach succeeded for volume "pvc-7eda5dd0-38e6-46a5-a8e5-f4e2a098f4d3"
  Normal   Pulling                 2m5s                   kubelet, ip-10-0-27-39.us-west-2.compute.internal  Pulling image "redis:latest"
  Normal   Pulled                  2m                     kubelet, ip-10-0-27-39.us-west-2.compute.internal  Successfully pulled image "redis:latest"
  Normal   Created                 2m                     kubelet, ip-10-0-27-39.us-west-2.compute.internal  Created container redis
  Normal   Started                 2m                     kubelet, ip-10-0-27-39.us-west-2.compute.internal  Started container redis
ubuntu@ip-10-0-128-5:~/src#

We can see the pod initially failed to schedule because the claim needs to wait awhile before it is bound to the PV. Once it is bound the pod is scheduled and we can see the SuccessfulAttachVolume event.

Delete the data-tier pod

Not only can our new design tolerate a data tier pod container restart, but the data will persist even if we delete the entire data tier deployment which will delete the data tier pod and prevent any new pods from being created. (Recall we are not killing the redis process here, which in turn will kill the container, and cause the deployment pod to restart the container).

If everything goes to plan we should be able to recover the redis data if we then replace the deployment. That is because the deployment template is configured to use the same PVC and the PVC is still bound to the PV storing the original redis data. Let’s verify all of this.

Before we delete the data-tier deployment lets get the last log line from the poller to see where our counter is at

ubuntu@ip-10-0-128-5:~/src# kubectl logs -n volumes support-tier-687789db8-45d5b poller --tail 1
Current counter: 750

If we delete the deployment and then replace it we should see a number higher than this if the data is persisted. Let’s do that. Delete the data tier deployment, and confirm that there no data tier pods running.

ubuntu@ip-10-0-128-5:~/src# kubectl delete -n volumes deployments. data-tier
deployment.extensions "data-tier" deleted
ubuntu@ip-10-0-128-5:~/src# kubectl get -n volumes pods
NAME                           READY   STATUS    RESTARTS   AGE
app-tier-6bf4d544c-v7m4l       1/1     Running   0          8m1s
support-tier-687789db8-45d5b   2/2     Running   0          8m1s
ubuntu@ip-10-0-128-5:~/src#

Re-create the data-tier pod

Now recreate the data tier deployment

ubuntu@ip-10-0-128-5:~/src# kubectl create -f 9.2-pv_data_tier.yaml -n volumes
deployment.apps/data-tier created
Error from server (AlreadyExists): error when creating "9.2-pv_data_tier.yaml": services "data-tier" already exists
Error from server (AlreadyExists): error when creating "9.2-pv_data_tier.yaml": persistentvolumes "data-tier-volume" already exists
Error from server (AlreadyExists): error when creating "9.2-pv_data_tier.yaml": persistentvolumeclaims "data-tier-volume-claim" already exists
ubuntu@ip-10-0-128-5:~/src#

Create tells us everything except the deployment already exists and only the deployment was created. Now it takes a couple minutes for all of the readiness checks to start passing again and for some old connections to time out. This is mainly a side effect of the example application not being particularly good at handling this situation and not because of delays intrinsic to Kuberentes. The fact that Kubernetes can self heal the application is a testament to kubernetes abilities.

After a minute or two we can get the poller’s last log

ubuntu@ip-10-0-128-5:~/src# kubectl logs -n volumes support-tier-687789db8-45d5b poller --tail 1
Current counter: 1360

ubuntu@ip-10-0-128-5:~/src# kubectl logs -n volumes support-tier-687789db8-45d5b poller
Current counter: 1159
Current counter: 1176
Current counter: 1183
Current counter:
Current counter:
Current counter: 1208
Current counter: 1224
Current counter: 1239
Current counter: 1251
Current counter: 1259
Current counter: 1268
Current counter: 1280
Current counter: 1291
Current counter: 1296
Current counter: 1304
Current counter: 1315
Current counter: 1328
Current counter: 1341
Current counter: 1355
Current counter: 1360
Current counter: 1369

And voila, the counter has kept on ticking upward from where we left off before deleting the deployment. Our persistent volume has lived up to its name.

Conclusion

This concludes our lesson on volumes. We’ve covered volumes, PersistentVolumes, and PersistentVolumeClaims. In our example We’ve shown how to use a persistent volume to avoid data loss by keeping the data independent from the lifecycle of the pod or the pod’s volume. We also saw how kubectl exec allows us to run commands in existing containers when we demonstrated how container restarts cause data loss when volumes aren’t used. We now how a solid foundation for volumes and persistent volumes.