Kubernetes Volumes
Motivation for volumes
Containers in a pod share the same network stack but each has its own file system. It can be useful to share data between containers, for example having an initcontainer prepare some files that the main container depends on. The file system of containers also are limited to the lifetime of the container. This can present undesirable side effects. For example if the data tier container we are using in our examples crashes or fails a liveness probe it will be restarted and all of the data it had been storing will be lost forever.
In this post, I will cover different ways Kubernetes handles non-ephemeral data, allowing us to separate data from containers. We will see Kubernetes Volumes, and Kubernetes PersistentVolumes. By the end of this post, our goal is to deploy the data-tier for our sample application using PersistentVolumes
so that the data can outlive the data-tier pod. Again, this post builds on the code from the previous posts specifically the deployments post. Let’s first discuss more about the options for storing persistent data and then apply them to our data tier.
Kubernetes includes two different data storage types. Both are used by mounting a directory in a container and can be shared by containers in the same pod. Pods can also use more than one Volume and PersistentVolume. Their differences are mainly in how their lifetime is managed. One type exists for the lifetime of a particular pod, and the other is independent from the lifetime of pods.
Volumes
Volumes are tied to a pod and their lifecycle. Volumes are used to share data between containers in a pod and to tolerate container restarts. Although you can configure volumes to use durable storage types that survive pod deletion, you should consider using volumes for non-durable storage that is deleted when the pod is deleted.
The default type of volume, is called empty_dir
, and it creates an initially empty directory on the node running the pod to back the storage used by the volume. Any data written to the directory remains if a container in the pod is restarted. Once the pod is deleted the data in the volume is permanently deleted.
It’s worth noting that since the data is stored on a specific node, if a pod is rescheduled to a different node, the data will be lost. If the data is too valuable to lose when a pod is deleted or rescheduled, you should consider using PersistentVolumes
.
PersistentVolumes
PersistentVolumes are independent from the lifetime of pods and is separately managed by Kubernetes. They work a little bit differently than volumes.
Pods may claim a persistent volume, and use it throughout their lifetime.
PersistentVolumes will continue to exist outside of their pods. Persistent volumes can even be mounted by multiple pods on different nodes -if the underlying storage supports multiple readers or writers.
Persistent volumes can be provisioned statically in advance by a cluster admin or dynamically for more flexible self-serve use cases.
PersistentVolume Claims or PVC
Pods must make a request for storage before they can use a persistent volume. The request is made using a persistent volume claim or pvc. A PVC declares how much storage the pod needs, the type of persistent volume, and the access mode. The access mode describes how the persistent volume is mounted whether it is read only or read write and if it can be mounted by one node or many. There are three supported access modes to choose from: read-write once, read-only many, or read-write many. If there isn’t a persistent volume available to satisfy the claim and dynamic provisioning isn’t enabled, the claim will stay in a pending state until such a persistent volume is available.
The persistent volume claim is connected to a Pod by using a regular volume with the type set to persistent volume claim.
Storage Volume types
Both volumes and PersistentVolumes may be backed by a wide variety of volume types. As we learned before, it is usually preferable to use persistent volumes for more durable types and volumes for more ephemeral storage needs. Durable volume types include the persistent disks of many cloud vendors such as Google Cloud engine persistent disks, Azure Disks, and Amazon elastic block store. There’s also support for more generic volume types such as network file system or NFS, and iSCSI.
Demo
That is quite a lot to take in but everything should solidify with an example. Our objective is to use a PersistentVolume for the sample applications data-tier since we want the data to outlive its pod. In our example the cluster has an Amazon elastic block store volume statically provisioned and ready for us to use.
To see dynamic provisioning in action I will cover this in another post: “Deploy a Stateful Application in a Kubernetes Cluster”.
What is the issue we are trying to address?
Before we get into volumes I want to cement the issue we are trying to solve. We can illustrate the issue of pod containers losing their data when they restart by forcing a restart of the data tier pod. First of all, let’s look at the counter that will be running after we create the 3-tier application from deployments post.
ubuntu@ip-10-0-128-5:~# kubectl create -f 5.1-namespace.yaml
namespace/deployments created
ubuntu@ip-10-0-128-5:~/src# kubectl create -f 5.2-data_tier.yaml -f 5.3-app_tier.yaml -f 5.4-support_tier.yaml -n deployments
service/data-tier created
deployment.apps/data-tier created
service/app-tier created
deployment.apps/app-tier created
deployment.apps/support-tier created
ubuntu@ip-10-0-128-5:~/src# kubectl get -n deployments deployments.
NAME READY UP-TO-DATE AVAILABLE AGE
app-tier 1/1 1 1 14s
data-tier 1/1 1 1 14s
support-tier 1/1 1 1 14s
ubuntu@ip-10-0-128-5:~/src#
Check the counter logs using kubectl -n deployments logs support-tier-58d5d545b6-clltf poller --tail 1
ubuntu@ip-10-0-128-5:~/src# kubectl -n deployments logs support-tier-58d5d545b6-clltf poller --tail 1
Current counter: 1350
ubuntu@ip-10-0-128-5:~/src# kubectl -n deployments logs support-tier-58d5d545b6-clltf poller --tail 1
Current counter: 1370
ubuntu@ip-10-0-128-5:~/src# kubectl -n deployments logs support-tier-58d5d545b6-clltf poller --tail 1
Current counter: 1386
ubuntu@ip-10-0-128-5:~/src#
Sure enough the counter is getting incremented.
Kill the container to emulate pod restart
Now if I force the pod to be restarted we can observe the impact on the counter. One way to do that is to kill the redis process which will cause the data-tier container to exit and the data-tier pod will automatically restart it. We can use the exec command allows us to run a command inside of a container, the same way docker exec does. Let’s open a bash shell inside the container:
kubectl exec -n deployments data-tier-599bc4fcf8-p5d86 -it /bin/bash
ubuntu@ip-10-0-128-5:~/src# kubectl exec -n deployments data-tier-599bc4fcf8-p5d86 -it /bin/bash
root@data-tier-599bc4fcf8-p5d86:/data#
The change of command prompt tells us we are in the container now. We can now use the kill command to stop the main process of the container. But what is the ID of the process? The ID of the main process, which is redis in this case, will always be one since it is the first process that runs in the container.
root@data-tier-599bc4fcf8-p5d86:/data# kill 1
root@data-tier-599bc4fcf8-p5d86:/data# command terminated with exit code 137
ubuntu@ip-10-0-128-5:~/src# kubectl get -n deployments deployments.
NAME READY UP-TO-DATE AVAILABLE AGE
app-tier 1/1 1 1 10m
data-tier 1/1 1 1 10m
support-tier 1/1 1 1 10m
ubuntu@ip-10-0-128-5:~/src#
ubuntu@ip-10-0-128-5:~/src# kubectl -n deployments get pods
NAME READY STATUS RESTARTS AGE
app-tier-748cdbdcc5-fjpcv 1/1 Running 1 16m
data-tier-599bc4fcf8-p5d86 1/1 Running 1 16m
support-tier-58d5d545b6-clltf 2/2 Running 0 16m
ubuntu@ip-10-0-128-5:~/src#
ubuntu@ip-10-0-128-5:~/src# kubectl -n deployments logs support-tier-58d5d545b6-clltf poller --tail 1
Current counter: 114
ubuntu@ip-10-0-128-5:~/src# kubectl -n deployments logs support-tier-58d5d545b6-clltf poller --tail 1
Current counter: 133
ubuntu@ip-10-0-128-5:~/src# kubectl -n deployments logs support-tier-58d5d545b6-clltf poller --tail 1
Current counter: 147
ubuntu@ip-10-0-128-5:~/src# kubectl -n deployments logs support-tier-58d5d545b6-clltf poller
...
Current counter: 3561
Current counter: 3571
Current counter: 3587
Current counter: 3604
Current counter:
Current counter:
Current counter: 6
Current counter: 11
Current counter: 23
Current counter: 35
Current counter: 53
Current counter: 61
Current counter: 75
Current counter: 87
Current counter: 92
Current counter: 98
The output tells us that yes there has been a restart of the pod. Also the counter value through the poller logs reveals that it was reset when the pod restarted. This is what we want to avoid.
Create a new namespace called volumes
Let’s start by creating a new volumes namespace
ubuntu@ip-10-0-128-5:~/src# kubectl create -f 9.1-namespace.yaml
namespace/volumes created
ubuntu@ip-10-0-128-5:~/src#
Data-tier manifest
Now, on to the data tier. There are three additions to the manifest:
- a persistent volume,
- a persistent volume claim, and
- a volume to connect the claim to the pod.
apiVersion: v1
kind: Service
metadata:
name: data-tier
labels:
app: microservices
spec:
ports:
- port: 6379
protocol: TCP # default
name: redis # optional when only 1 port
selector:
tier: data
type: ClusterIP # default
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: data-tier-volume
spec:
capacity:
storage: 1Gi # 1 gibibyte
accessModes:
- ReadWriteOnce
awsElasticBlockStore:
volumeID: INSERT_VOLUME_ID # replace with actual ID
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: data-tier-volume-claim
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 128Mi # 128 mebibytes
---
apiVersion: apps/v1 # apps API group
kind: Deployment
metadata:
name: data-tier
labels:
app: microservices
tier: data
spec:
replicas: 1
selector:
matchLabels:
tier: data
template:
metadata:
labels:
app: microservices
tier: data
spec: # Pod spec
containers:
- name: redis
image: redis:latest
imagePullPolicy: IfNotPresent
ports:
- containerPort: 6379
name: redis
livenessProbe:
tcpSocket:
port: redis # named port
initialDelaySeconds: 15
readinessProbe:
exec:
command:
- redis-cli
- ping
initialDelaySeconds: 5
volumeMounts:
- mountPath: /data
name: data-tier-volume
volumes:
- name: data-tier-volume
persistentVolumeClaim:
claimName: data-tier-volume-claim
PersistentVolume (PV)
First, is the PersistentVolume. It is the raw storage where data is ultimately written to by the pod’s container. It has a declared storage capacity and other attributes. Here, we’ve allocated 1 gibibyte. The access mode of ReadWriteOnce means this volume may be mounted for reading and writing by a single node at a time. Note that it is a limit on node attachment and not pod attachment. PersistentVolumes may list multiple access modes and the claim specifies the mode it requires. The persistent volume can only be claimed in a single accessmode at any time. Lastly we have an awsElasticBlockStore mapping which is specific to the type of storage backing the PV. You would use a different mapping if you were not using an EBS volumefor storage. The only required key for aws elastic block store is the volume ID which uniquely identifies the EBS volume. It will be different in your environment than mine so I’ve added an insert volume id placeholder that we will replace before we create the PV.
Persistent Volume Claim (PVC)
Next we have the persistent volume claim. The PVC spec outlines what it is looking for in a PV. For a PV to be bound to a PVC, it must satisfy all of the constraints in the claim. We are looking for a PV that provides the read-write once access mode and has at least 128 mebibytes of storage. The claim request is less than or equal to the persistent volumes capacity and the access mode overlaps with the available access modes in the PV. This means the PVC request is satisfied by our PV and will be bound to it.
Volumes in Deployment’s pod template
Lastly, the deployments template now includes a volume which links the PVC to the deployment’s pod. This is accomplished by using the persistentvolume claim mapping and setting the claim name to the name of the pvc which is data tier volume claim. You will always use persistent volume claim when working with PVs. If you wanted to use an ephemeral storage volume you would replace it with an emptyDir mapping or other types that don’t connect to a PV.
VolumeMounts
Volume can be used in the pod’s containers and init containers but they must be mounted to be available in the containers. The volume mounts list includes all the volume mounts for a given container. The mountPaths for different containers can be different even if the volume is the same. In our case we only have one and we are mounting the volume at /data which is where redis is configured to store its data. This will cause all of the data to be written to the PV.
VolumeID placeholder for EBS
Now we are left with replacing the volume ID placeholder with the actual ID of the Amazon EBS volume the lab environment created for us. You could get it from the EC2 console in your browser but we’ll use the AWS CLI for this example. The volume can be obtained from the aws ec2 describe command.
aws ec2 describe-volumes --region=us-west-2 --filters="Name=tag:Type,Values=PV" --query="Volumes[0].VolumeId" --output=text
ubuntu@ip-10-0-128-5:~/src# aws ec2 describe-volumes --region=us-west-2 --filters="Name=tag:Type,Values=PV" --query="Volumes[0].VolumeId" --output=text
vol-09bc5324eb947dcdb
ubuntu@ip-10-0-128-5:~/src# vol_id=$(aws ec2 describe-volumes --region=us-west-2 --filters="Name=tag:Type,Values=PV" --query="Volumes[0].VolumeId" --output=text)
ubuntu@ip-10-0-128-5:~/src# sed -i "s/INSERT_VOLUME_ID/$vol_id/" 9.2-pv_data_tier.yaml
ubuntu@ip-10-0-128-5:~/src#
The filter selects only the PV volume which is labeled with a Type = PV tag and the query outputs only the volume ID property of the volume. I’ll store the id in a variable named vol_id
. Then we can use stream editor or sed to substitute the the occurrence of INSERT_VOLUME_ID
with the volume ID stored in vol_id
.
Create the data-tier
And with that we are ready to create the data-tier using a persistent volume. We’ll also create the app and support tiers which don’t have anything new compared to previous versions.
ubuntu@ip-10-0-128-5:~/src# kubectl create -n volumes -f 9.2-pv_data_tier.yaml -f 9.3-app_tier.yaml -f 9.4-support_tier.yaml
service/data-tier created
persistentvolume/data-tier-volume created
persistentvolumeclaim/data-tier-volume-claim created
deployment.apps/data-tier created
service/app-tier created
deployment.apps/app-tier created
deployment.apps/support-tier created
ubuntu@ip-10-0-128-5:~/src#
Describe the PVC
Let’s get the persistent volume claim which has the short name of pvc in kubectl to confirm the claim’s request is satisfied by the PV
ubuntu@ip-10-0-128-5:~/src# kubectl describe -n volumes pvc
Name: data-tier-volume-claim
Namespace: volumes
StorageClass: gp2
Status: Bound
Volume: pvc-7eda5dd0-38e6-46a5-a8e5-f4e2a098f4d3
Labels: <none>
Annotations: pv.kubernetes.io/bind-completed: yes
pv.kubernetes.io/bound-by-controller: yes
volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/aws-ebs
Finalizers: [kubernetes.io/pvc-protection]
Capacity: 1Gi
Access Modes: RWO
VolumeMode: Filesystem
Mounted By: data-tier-8689f7ffc-nk8h4
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ProvisioningSucceeded 40s persistentvolume-controller Successfully provisioned volume pvc-7eda5dd0-38e6-46a5-a8e5-f4e2a098f4d3 using kubernetes.io/aws-ebs
ubuntu@ip-10-0-128-5:~/src#
The Status of Bound
confirms that the PVC is bound to the PV.
Describe the data-tier pod to check the event logs
Now if we describe the data-tier pod
ubuntu@ip-10-0-128-5:~/src# kubectl describe -n volumes pod data-tier-8689f7ffc-nk8h4
Name: data-tier-8689f7ffc-nk8h4
Namespace: volumes
Priority: 0
Node: ip-10-0-27-39.us-west-2.compute.internal/10.0.27.39
Start Time: Tue, 05 May 2020 23:37:44 +0000
Labels: app=microservices
pod-template-hash=8689f7ffc
tier=data
Annotations: <none>
Status: Running
IP: 192.168.95.67
Controlled By: ReplicaSet/data-tier-8689f7ffc
Containers:
redis:
Container ID: docker://4f284571569b06d4ef337030f401856b30b74b83ad3b06449ec38514f3e6223a
Image: redis:latest
Image ID: docker-pullable://redis@sha256:f7ee67d8d9050357a6ea362e2a7e8b65a6823d9b612bc430d057416788ef6df9
Port: 6379/TCP
Host Port: 0/TCP
State: Running
Started: Tue, 05 May 2020 23:37:59 +0000
Ready: True
Restart Count: 0
Liveness: tcp-socket :redis delay=15s timeout=1s period=10s #success=1 #failure=3
Readiness: exec [redis-cli ping] delay=5s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/data from data-tier-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-jpwt8 (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
data-tier-volume:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: data-tier-volume-claim
ReadOnly: false
default-token-jpwt8:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-jpwt8
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 2m18s (x3 over 2m20s) default-scheduler pod has unbound immediate PersistentVolumeClaims (repeated 2 times)
Normal Scheduled 2m15s default-scheduler Successfully assigned volumes/data-tier-8689f7ffc-nk8h4 to ip-10-0-27-39.us-west-2.compute.internal
Normal SuccessfulAttachVolume 2m13s attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-7eda5dd0-38e6-46a5-a8e5-f4e2a098f4d3"
Normal Pulling 2m5s kubelet, ip-10-0-27-39.us-west-2.compute.internal Pulling image "redis:latest"
Normal Pulled 2m kubelet, ip-10-0-27-39.us-west-2.compute.internal Successfully pulled image "redis:latest"
Normal Created 2m kubelet, ip-10-0-27-39.us-west-2.compute.internal Created container redis
Normal Started 2m kubelet, ip-10-0-27-39.us-west-2.compute.internal Started container redis
ubuntu@ip-10-0-128-5:~/src#
We can see the pod initially failed to schedule because the claim needs to wait awhile before it is bound to the PV. Once it is bound the pod is scheduled and we can see the SuccessfulAttachVolume
event.
Delete the data-tier pod
Not only can our new design tolerate a data tier pod container restart, but the data will persist even if we delete the entire data tier deployment which will delete the data tier pod and prevent any new pods from being created. (Recall we are not killing the redis process here, which in turn will kill the container, and cause the deployment pod to restart the container).
If everything goes to plan we should be able to recover the redis data if we then replace the deployment. That is because the deployment template is configured to use the same PVC and the PVC is still bound to the PV storing the original redis data. Let’s verify all of this.
Before we delete the data-tier deployment lets get the last log line from the poller to see where our counter is at
ubuntu@ip-10-0-128-5:~/src# kubectl logs -n volumes support-tier-687789db8-45d5b poller --tail 1
Current counter: 750
If we delete the deployment and then replace it we should see a number higher than this if the data is persisted. Let’s do that. Delete the data tier deployment, and confirm that there no data tier pods running.
ubuntu@ip-10-0-128-5:~/src# kubectl delete -n volumes deployments. data-tier
deployment.extensions "data-tier" deleted
ubuntu@ip-10-0-128-5:~/src# kubectl get -n volumes pods
NAME READY STATUS RESTARTS AGE
app-tier-6bf4d544c-v7m4l 1/1 Running 0 8m1s
support-tier-687789db8-45d5b 2/2 Running 0 8m1s
ubuntu@ip-10-0-128-5:~/src#
Re-create the data-tier pod
Now recreate the data tier deployment
ubuntu@ip-10-0-128-5:~/src# kubectl create -f 9.2-pv_data_tier.yaml -n volumes
deployment.apps/data-tier created
Error from server (AlreadyExists): error when creating "9.2-pv_data_tier.yaml": services "data-tier" already exists
Error from server (AlreadyExists): error when creating "9.2-pv_data_tier.yaml": persistentvolumes "data-tier-volume" already exists
Error from server (AlreadyExists): error when creating "9.2-pv_data_tier.yaml": persistentvolumeclaims "data-tier-volume-claim" already exists
ubuntu@ip-10-0-128-5:~/src#
Create tells us everything except the deployment already exists and only the deployment was created. Now it takes a couple minutes for all of the readiness checks to start passing again and for some old connections to time out. This is mainly a side effect of the example application not being particularly good at handling this situation and not because of delays intrinsic to Kuberentes. The fact that Kubernetes can self heal the application is a testament to kubernetes abilities.
After a minute or two we can get the poller’s last log
ubuntu@ip-10-0-128-5:~/src# kubectl logs -n volumes support-tier-687789db8-45d5b poller --tail 1
Current counter: 1360
ubuntu@ip-10-0-128-5:~/src# kubectl logs -n volumes support-tier-687789db8-45d5b poller
Current counter: 1159
Current counter: 1176
Current counter: 1183
Current counter:
Current counter:
Current counter: 1208
Current counter: 1224
Current counter: 1239
Current counter: 1251
Current counter: 1259
Current counter: 1268
Current counter: 1280
Current counter: 1291
Current counter: 1296
Current counter: 1304
Current counter: 1315
Current counter: 1328
Current counter: 1341
Current counter: 1355
Current counter: 1360
Current counter: 1369
And voila, the counter has kept on ticking upward from where we left off before deleting the deployment. Our persistent volume has lived up to its name.
Conclusion
This concludes our lesson on volumes. We’ve covered volumes, PersistentVolumes, and PersistentVolumeClaims. In our example We’ve shown how to use a persistent volume to avoid data loss by keeping the data independent from the lifecycle of the pod or the pod’s volume. We also saw how kubectl exec allows us to run commands in existing containers when we demonstrated how container restarts cause data loss when volumes aren’t used. We now how a solid foundation for volumes and persistent volumes.