Kubernetes Rolling Updates and Rollbacks

Rolling Updates and Rollbacks

The last topic with respect to deployments is how updates work. Kubernetes uses roll outs to update deployments.

A Kubernetes rollout is the process of updating or replacing replicas with replicas matching a new deployment template.

Changes may be configuration, such as changing environment variables or labels, or also code changes which result in updating the image key of the deployment template.

In a nutshell, any change to the deployment’s template will trigger a rollout.

Rollout strategies

RollingUpdate (default)

Deployments have different rollout strategies. Kubernetes uses rolling updates by default.

Replicas are updated in groups instead of all at once until the rollout completes.

This allows service to continue uninterrupted while the update is being rolled out. However, you need to consider that during the rollout there will be pods using both the old and the new configuration and the application should gracefully handle that.

Recreate strategy

As an alternative, deployments can also be configured to use the recreate strategy, which kills all the old template pods before creating all the new ones. That of course incurs downtime for the application. In this post I will focus on rolling updates.

Scaling is an orthogonal concept to rollouts

We actually rolled out an update in the previous post when we added the cpu request to the app-tier deployment’s pod template. Scaling events do not create rollouts. Recall that the number of replicas is not part of the deployment’s template, so it does not trigger a rollout.

Demo of RolloutUpdate

kubectl includes commands to conveniently

  • check,
  • pause,
  • resume, and
  • rollback rollouts.

Let’s see how all of this work. Create the namespace and deployments to get started.

ubuntu@ip-10-0-128-5:~/src# kubectl create -f 5.1-namespace.yaml
namespace/deployments created
ubuntu@ip-10-0-128-5:~/src# kubectl create -n deployments -f 5.2-data_tier.yaml -f 6.1-app_tier_cpu_request.yaml -f 5.4-support_tier.yaml
service/data-tier created
deployment.apps/data-tier created
service/app-tier created
deployment.apps/app-tier created
deployment.apps/support-tier created
ubuntu@ip-10-0-128-5:~/src# kubectl get deployments. -n deployments
NAME           READY   UP-TO-DATE   AVAILABLE   AGE
app-tier       0/5     5            0           9s
data-tier      1/1     1            1           9s
support-tier   0/1     1            0           9s
ubuntu@ip-10-0-128-5:~/src# kubectl get deployments. -n deployments
NAME           READY   UP-TO-DATE   AVAILABLE   AGE
app-tier       5/5     5            5           16s
data-tier      1/1     1            1           16s
support-tier   1/1     1            1           16s
ubuntu@ip-10-0-128-5:~/src# kubectl get -n deployments pods
NAME                            READY   STATUS    RESTARTS   AGE
app-tier-74c7df4f88-2584t       1/1     Running   0          72s
app-tier-74c7df4f88-4ksht       1/1     Running   0          72s
app-tier-74c7df4f88-mkdtw       1/1     Running   0          72s
app-tier-74c7df4f88-trqmb       1/1     Running   0          72s
app-tier-74c7df4f88-wn7jl       1/1     Running   0          72s
data-tier-599bc4fcf8-5bf6p      1/1     Running   0          72s
support-tier-58d5d545b6-t8jkc   2/2     Running   0          72s
ubuntu@ip-10-0-128-5:~/src#

Autoscaling and rollouts are compatible but for us to easily observe rollouts as they progress we’ll need many replicas in action. Next, let’s edit the app tier deployment with kubectl edit -n deployments deployment app-tier, this will open a vi editor

#
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "1"
  creationTimestamp: "2020-05-03T21:08:26Z"
  generation: 1
  labels:
    app: microservices
    tier: app
  name: app-tier
  namespace: deployments
  resourceVersion: "4657"
  selfLink: /apis/extensions/v1beta1/namespaces/deployments/deployments/app-tier
  uid: a628e3b6-70dc-42d9-abb0-d92eca54e6c1
spec:
  progressDeadlineSeconds: 600
  replicas: 10
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      tier: app
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: microservices
        tier: app
    spec:
      containers:
      - env:
        - name: REDIS_URL
          value: redis://$(DATA_TIER_SERVICE_HOST):$(DATA_TIER_SERVICE_PORT_REDIS)
        image: lrakai/microservices:server-v1
        imagePullPolicy: IfNotPresent
        name: server
        ports:
        - containerPort: 8080
          protocol: TCP
        #resources:
          #requests:
            #cpu: 20m
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
status:
  availableReplicas: 5
  conditions:
  - lastTransitionTime: "2020-05-03T21:08:38Z"
    lastUpdateTime: "2020-05-03T21:08:38Z"
    message: Deployment has minimum availability.
    reason: MinimumReplicasAvailable
    status: "True"
    type: Available
  - lastTransitionTime: "2020-05-03T21:08:26Z"
    lastUpdateTime: "2020-05-03T21:08:39Z"
    message: ReplicaSet "app-tier-74c7df4f88" has successfully progressed.
    reason: NewReplicaSetAvailable
    status: "True"
    type: Progressing
  observedGeneration: 1
  readyReplicas: 5
  replicas: 5
  updatedReplicas: 5

It’ll be easier to see the rollout in action with a large number of replicas. Edit the replicas to be 10 and then search for resources and delete the resources which are commented out above. This will avoid any potential problems with scheduling the replicas if all 10 of the cpu requests can’t be satisfied. Then watch the deployment until all the replicas are ready. watch -n 1 kubectl get -n deployments deployments app-tier

Every 1.0s: kubectl get -n deployments deployments app-tier
NAME       READY   UP-TO-DATE   AVAILABLE   AGE
app-tier   10/10   10           10          92m

Just to confirm get the pods:

ubuntu@ip-10-0-128-5:~/src# kubectl get -n deployments pods
NAME                            READY   STATUS    RESTARTS   AGE
app-tier-748cdbdcc5-59mms       1/1     Running   0          100s
app-tier-748cdbdcc5-6rm6t       1/1     Running   0          93s
app-tier-748cdbdcc5-8h57v       1/1     Running   0          100s
app-tier-748cdbdcc5-dmsrs       1/1     Running   0          100s
app-tier-748cdbdcc5-f6gwg       1/1     Running   0          100s
app-tier-748cdbdcc5-hvxdq       1/1     Running   0          96s
app-tier-748cdbdcc5-qgpkk       1/1     Running   0          93s
app-tier-748cdbdcc5-tjlk4       1/1     Running   0          95s
app-tier-748cdbdcc5-wjpm8       1/1     Running   0          100s
app-tier-748cdbdcc5-xr4ms       1/1     Running   0          93s
data-tier-599bc4fcf8-5bf6p      1/1     Running   0          93m
support-tier-58d5d545b6-t8jkc   2/2     Running   0          93m
ubuntu@ip-10-0-128-5:~/src#

Edit the deployment

Now it’s time to trigger a rollout. Open the app-tier deployment with kubectl edit. kubectl edit -n deployments deployment app-tier, this is the same output as shown above. From this we can see that the server added some default values for the deployment strategy, specifically the type is rolling update and the corresponding maxSurge and maxUnavailable fields control the rate at which updates are rolled out.

  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate

Configure maxSurge and maxUnavailable

  • Maxsurge: specifies how many replicas over the desired total are allowed during a rollout. A higher surge allows new pods to be created without waiting for old ones to be deleted.
  • Maxunavailable: controls how many old pods can be deleted without waiting for new pods to be ready. We’ll keep the defaults of 25%.

You may want to configure them if you want trade off the impact on availability or resource utilization with the speed of the rollout. For example, you can have all the new pods start immediately but in the worst case you could have all the new pods and all the old pods consuming resources at the same time effectively doubling the resource utilization for a short period.

Trigger a rollout

With those fields out of the way, we can trigger a rollout.

Remember that any change to the deployment’s template triggers a rollout.

Run kubectl edit -n deployments deployments. app-tier. In the pod template spec change the name from server to api as shown in the below snippet then save and quit.

spec:
      containers:
      - env:
        - name: REDIS_URL
          value: redis://$(DATA_TIER_SERVICE_HOST):$(DATA_TIER_SERVICE_PORT_REDIS)
        image: lrakai/microservices:server-v1
        imagePullPolicy: IfNotPresent
        #name: server
        name: api
        ports:
        - containerPort: 8080
          protocol: TCP
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30

Check rollout status

Check the status kubectl rollout -n deployments status deployment app-tier by running this immediately, it will all happen in a flash, so try doing it in two windows with a tmux session. Run tmux, then run ctrl+b % to split the screen vertically. Use ctrl+b -> left arrow and right arrow to move between screens.

ubuntu@ip-10-0-128-5:~/src# kubectl edit -n deployments deployments. app-tier
deployment.extensions/app-tier edited
ubuntu@ip-10-0-128-5:~/src# kubectl rollout -n deployments status deployment app-tier
Waiting for deployment "app-tier" rollout to finish: 5 out of 10 new replicas have been updated...
Waiting for deployment "app-tier" rollout to finish: 5 out of 10 new replicas have been updated...
Waiting for deployment "app-tier" rollout to finish: 5 out of 10 new replicas have been updated...
Waiting for deployment "app-tier" rollout to finish: 5 out of 10 new replicas have been updated...
Waiting for deployment "app-tier" rollout to finish: 5 out of 10 new replicas have been updated...
Waiting for deployment "app-tier" rollout to finish: 6 out of 10 new replicas have been updated...
Waiting for deployment "app-tier" rollout to finish: 6 out of 10 new replicas have been updated...
Waiting for deployment "app-tier" rollout to finish: 6 out of 10 new replicas have been updated...
Waiting for deployment "app-tier" rollout to finish: 6 out of 10 new replicas have been updated...
Waiting for deployment "app-tier" rollout to finish: 6 out of 10 new replicas have been updated...
Waiting for deployment "app-tier" rollout to finish: 7 out of 10 new replicas have been updated...
Waiting for deployment "app-tier" rollout to finish: 7 out of 10 new replicas have been updated...
Waiting for deployment "app-tier" rollout to finish: 8 out of 10 new replicas have been updated...
Waiting for deployment "app-tier" rollout to finish: 8 out of 10 new replicas have been updated...
Waiting for deployment "app-tier" rollout to finish: 8 out of 10 new replicas have been updated...
Waiting for deployment "app-tier" rollout to finish: 9 out of 10 new replicas have been updated...
Waiting for deployment "app-tier" rollout to finish: 9 out of 10 new replicas have been updated...
Waiting for deployment "app-tier" rollout to finish: 9 out of 10 new replicas have been updated...
Waiting for deployment "app-tier" rollout to finish: 9 out of 10 new replicas have been updated...
Waiting for deployment "app-tier" rollout to finish: 3 old replicas are pending termination...
Waiting for deployment "app-tier" rollout to finish: 3 old replicas are pending termination...
Waiting for deployment "app-tier" rollout to finish: 3 old replicas are pending termination...
Waiting for deployment "app-tier" rollout to finish: 2 old replicas are pending termination...
Waiting for deployment "app-tier" rollout to finish: 2 old replicas are pending termination...
Waiting for deployment "app-tier" rollout to finish: 2 old replicas are pending termination...
Waiting for deployment "app-tier" rollout to finish: 1 old replicas are pending termination...
Waiting for deployment "app-tier" rollout to finish: 1 old replicas are pending termination...
Waiting for deployment "app-tier" rollout to finish: 1 old replicas are pending termination...
Waiting for deployment "app-tier" rollout to finish: 8 of 10 updated replicas are available...
Waiting for deployment "app-tier" rollout to finish: 9 of 10 updated replicas are available...
deployment "app-tier" successfully rolled out
ubuntu@ip-10-0-128-5:~/src#

An ilustration of pause and resume

For the purpose of illustration, we will edit the deployment again using kubectl edit -n deployments deployments app-tier. This time, change the name to a different name then save and quit, this will trigger the rollout. Quickly run the pause command. kubectl rollout -n deployments pause deployment app-tier

Now the rollout is paused, but pausing is won’t pause Replicas that were created before pausing. They will continue to progress to ready. However, there will be no new replicas created after the rollout is paused. We can try a few things at this point. One thing you can do is inspect the new pods before deciding to continue or rollback. We’ll simply get the deployment. As shown below on the right side window, we are running kubectl rollout -n deployments status deployment app-tier, which shows the status of the deployment which says 9 out of 10 new replicas have been updated... The same fact is reflected on the left side where we ran kubectl get deployments. -n deployments app-tier.

And let’s say everything is fine and we decided to resume it. We can then run the kubectl rollout -n deployments resume deployment app-tier. The rollout picks up right where it left off and goes about its business.

Rollbacks

So now consider you found a bug in this new revision and need to rollback. kubectl rollout undo to the rescue. This will rollback to the previous revision. You may also rollback to a specific version. Use kubectl rollout history to get a list of all versions, then pass the specific revision to kubectl rollout undo.

ubuntu@ip-10-0-128-5:~/src# kubectl rollout history -n deployments deployment app-tier
deployment.extensions/app-tier
REVISION  CHANGE-CAUSE
1         <none>
2         <none>
3         <none>

ubuntu@ip-10-0-128-5:~/src#
ubuntu@ip-10-0-128-5:~/src# kubectl rollout -n deployments undo deployment app-tier
deployment.extensions/app-tier rolled back
ubuntu@ip-10-0-128-5:~/src#

Doing another rollback, with the status running in parallel:

Do a describe to check what is the name of the container:

When you rollback again, you will be taken back to new version and not the previous-previous version. Notice below the name of the container. This was version 3. (version 1 was server, version 2 was api, version 3 was api-pause). As shown here, you will be taken to version 2.

That’s all for this demonstration of rolling updates and rollbacks, but before we move on let’s scale back the app tier to one replica to give back some CPU resources

ubuntu@ip-10-0-128-5:~/src# kubectl scale -n deployments deployment app-tier --replicas=1
deployment.extensions/app-tier scaled
ubuntu@ip-10-0-128-5:~/src# kubectl get -n deployments deployments.
NAME           READY   UP-TO-DATE   AVAILABLE   AGE
app-tier       1/1     1            1           86m
data-tier      1/1     1            1           86m
support-tier   1/1     1            1           86m
ubuntu@ip-10-0-128-5:~/src#

Conclusion

Deployments and rollouts are very powerful constructs. Their features cover a large swath of use cases. Lets summarize what we learned in this post:

  1. We learned that rollouts are triggered by updates to a deployments template.
  2. Kubernetes uses a rolling update strategy by default.
  3. We also learned how to pause, resume, and undo rollouts of deployments.

There’s still so much more we can do with deployments. Rollouts depend on container status. K8s assumes that created containers are immediately ready and the rollout should continue. This does not work in all cases. We may need to wait for the web server to accept connections. Here’s another scenario, consider an application using a relational database. The containers may start, but it will fail until a database and tables are created. These scenarios must be considered to build reliable applications. This is where probes and init containers come into the picture. We’ll integrate probes and init containers in the next posts.