Kubernetes Autoscaling Demonstration


We’ve seen deployments work their magic in this post. We also saw how to scale the deployment replicas by running the kubectl scale command. But it would be nice to not have to manually scale the deployment. That’s where autoscaling comes in.

Kubernetes supports CPU based autoscaling and autoscaling based on a custom metric you define. We’ll focus on using CPU in this post.

Autoscaling works by specifying:

  • a desired target CPU percentage, and
  • a minimum and maximum number of allowed replicas.

The CPU percentage is expressed as a percentage of the cpu resource request of the pod.

Recall that pods can set resource requests for CPU to ensure that they are scheduled on a node with at least that much CPU available.

If no CPU resource request is set, then autoscaling won’t take any action.

Kubernetes will increase or decrease the number of replicas according to the average CPU usage of all the replicas. The autoscaler will increase the number of replicas when the actual CPU usage of the current pods exceeds the target and vice versa for decreasing the number of pods.

Autoscaler will never create more replicas than the maximum you set nor will it decrease the number of replicas below your configured minimum. You can configure some of the parameters of the autoscaler but the defaults will work fine for us. With the defaults the autoscaler will compare the actual cpu usage to the target cpu and either increase the replicas if the actual cpu is sufficiently higher than the target or decrease the replicas if the actual cpu is sufficiently below the target. Otherwise it will keep the status quo.

Metrics Server

Autoscaling depends on metrics being collected in the cluster so that the average pod CPU can be computed. Kubernetes integrates with several solutions for collecting metrics. We will use metrics server which is a solution maintained by Kubernetes. There are several manifest files on the kubernetes metrics-server Github repository that declare all the required resources.

We will need to get metrics server up and running before we can use autoscaling.

Once metrics server is running, Autoscalers can retrieve them using the Kubernetes metrics API.

Autoscaling demostration using a 3-tier application

To recall, shown below is the lab architecture.

Create namespace, deployments and verify application is running

The first thing we need to do is to create a namespace called deployments and then create the 3-tiers of the application. The following commands will get you there:

  1. kubectl create -f 5.1-namespace.yaml: Create a namespace
  2. kubectl create -n deployments -f 5.2-data_tier.yaml -f 5.3-app_tier.yaml -f 5.4-support_tier.yaml : Create the 3 tier application
  3. kubectl get pods -n deployments : View the pods
  4. kubectl get deployments. -n deployments : View the deployments
  5. kubectl logs -n deployments support-tier-58d5d545b6-58h4q counter --tail 10 : Verify app is running
  6. kubectl logs -n deployments support-tier-58d5d545b6-58h4q poller --tail 10 : Verify app is running
ubuntu@ip-10-0-128-5:~/src# kubectl create -f 5.1-namespace.yaml
namespace/deployments created
ubuntu@ip-10-0-128-5:~/src# kubectl create -n deployments -f 5.2-data_tier.yaml -f 5.3-app_tier.yaml -f 5.4-support_tier.yaml
service/data-tier created
deployment.apps/data-tier created
service/app-tier created
deployment.apps/app-tier created
deployment.apps/support-tier created
ubuntu@ip-10-0-128-5:~/src# kubectl get pods -n deployments
NAME                            READY   STATUS    RESTARTS   AGE
app-tier-748cdbdcc5-jj4l6       1/1     Running   0          17s
data-tier-599bc4fcf8-mf7bt      1/1     Running   0          17s
support-tier-58d5d545b6-58h4q   2/2     Running   0          17s
ubuntu@ip-10-0-128-5:~/src# kubectl get deployments. -n deployments
app-tier       1/1     1            1           25s
data-tier      1/1     1            1           25s
support-tier   1/1     1            1           25s

ubuntu@ip-10-0-128-5:~/src# kubectl logs -n deployments support-tier-58d5d545b6-58h4q counter --tail 10
Incrementing counter by 5 ...
Incrementing counter by 5 ...
Incrementing counter by 6 ...
Incrementing counter by 6 ...
Incrementing counter by 8 ...
Incrementing counter by 10 ...
Incrementing counter by 4 ...
Incrementing counter by 6 ...
Incrementing counter by 2 ...
Incrementing counter by 7 ...

ubuntu@ip-10-0-128-5:~/src# kubectl logs -n deployments support-tier-58d5d545b6-58h4q poller --tail 10
Current counter: 2242
Current counter: 2255
Current counter: 2263
Current counter: 2272
Current counter: 2291
Current counter: 2304
Current counter: 2313
Current counter: 2327
Current counter: 2339
Current counter: 2354

Create the metrics server

The bastion host also includes the metrics server manifests in the metrics-server sub directory. It is outside the scope of this post to discuss all of the resources that comprise metrics server. All we need to do is create them and we can count on metrics being collected in the cluster. To do that we can use our trusty kubectl create command and specify the directory as the file target.

ubuntu@ip-10-0-128-5:~/src# kubectl create -f metrics-server/
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created
clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created
rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created
serviceaccount/metrics-server created
deployment.extensions/metrics-server created
service/metrics-server created
clusterrole.rbac.authorization.k8s.io/system:metrics-server created
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created

kubectl then creates all of the manifests it find in the directory. You can see quite a few resources are created. One of them is a deployment (deployment.extensions/metrics-server created).

Metrics server runs as a pod in the cluster and that pod is managed by a deployment.

It takes a minute for the first metrics to start trickling in. You can confirm that metrics server is doing its thing by watching the watch kubectl top pods -n deployments command. This lists the cpu and memory usage of each pod in the namespace. You can use the top command to benchmark a pods resource utilization and debug resource utilization issues.

Our pods are all using a small fraction of one cpu.

Every 2.0s: kubectl top pods -n deployments                                       
NAME                            CPU(cores)   MEMORY(bytes)
app-tier-748cdbdcc5-fqsw7       2m           47Mi
data-tier-599bc4fcf8-xzh92      2m           2Mi
support-tier-58d5d545b6-wdx49   3m           2Mi

The m stands for milli. 1000 milliCPUs equals one CPU.

Now that we have metrics, the other thing that the autoscaler depends on is having a cpu request in the deployment’s pod spec.

Declare a CPU request in the deployment’s pod spec

Let’s see how that looks in the app-tier deployment. I’ve highlighted the change from the previous post.

Each pod will now request 20 millicpu. Kubernetes will only scale the pods on nodes with at least 0.02 CPUs remaining. I also set the replicas to 5 to keep the 5 replicas running. The manifest is shown here:

apiVersion: v1
kind: Service
  name: app-tier
    app: microservices
  - port: 8080
    tier: app
apiVersion: apps/v1
kind: Deployment
  name: app-tier
    app: microservices
    tier: app
  replicas: 5
      tier: app
        app: microservices
        tier: app
      - name: server
        image: lrakai/microservices:server-v1
          - containerPort: 8080
            cpu: 20m # 20 milliCPU / 0.02 CPU
          - name: REDIS_URL
            # Environment variable service discovery
            # Naming pattern:
            #   IP address: <all_caps_service_name>_SERVICE_HOST
            #   Port: <all_caps_service_name>_SERVICE_PORT
            #   Named Port: <all_caps_service_name>_SERVICE_PORT_<all_caps_port_name>
            # In multi-container example value was
            # value: redis://localhost:6379

Now if we try to create the resources kubectl will tell us they already exist.

ubuntu@ip-10-0-128-5:~/src# kubectl create -n deployments -f 6.1-app_tier_cpu_request.yaml
Error from server (AlreadyExists): error when creating "6.1-app_tier_cpu_request.yaml": services "app-tier" already exists
Error from server (AlreadyExists): error when creating "6.1-app_tier_cpu_request.yaml": deployments.apps "app-tier" already exists

If you recollect, we had these 3 tiers created

ubuntu@ip-10-0-128-5:~/src# kubectl get -n deployments deployments
app-tier       1/1     1            1           9m27s
data-tier      1/1     1            1           9m27s
support-tier   1/1     1            1           9m27s

First encounter with kubectl apply

Create will check if a resource of a given type and name already exists and it will fail if it does. We could delete the deployment and then create it. It would be nice to avoid the downtime that is involved with that though. Instead kubernetes provides a command that can apply changes to existing resources.

ubuntu@ip-10-0-128-5:~/src# kubectl apply -f 6.1-app_tier_cpu_request.yaml -n deployments
Warning: kubectl apply should be used on resource created by either kubectl create --save-config or kubectl apply
service/app-tier configured
Warning: kubectl apply should be used on resource created by either kubectl create --save-config or kubectl apply
deployment.apps/app-tier configured

Apply will update our deployment to include the cpu request. It will warn us about mixing create and apply, but we can ignore that for now.

We set the request low enough that the five replicas can remain scheduled in the cluster as we can see from the get deployments output

ubuntu@ip-10-0-128-5:~/src$ kubectl get -n deployments deployments
app-tier       5/5     5            5           17m
data-tier      1/1     1            1           17m
support-tier   1/1     1            1           17m

There will be 5 pods running. Kubernetes ensured that 5 actual pods are ready matching the 5 pods we desired.

ubuntu@ip-10-0-128-5:~/src# kubectl get -n deployments pods
NAME                            READY   STATUS    RESTARTS   AGE
app-tier-74c7df4f88-bzh8n       1/1     Running   0          2m22s
app-tier-74c7df4f88-gl76r       1/1     Running   0          2m22s
app-tier-74c7df4f88-gr2xz       1/1     Running   0          2m17s
app-tier-74c7df4f88-j9swj       1/1     Running   0          2m22s
app-tier-74c7df4f88-s8rb4       1/1     Running   0          2m18s
data-tier-599bc4fcf8-xzh92      1/1     Running   0          17m
support-tier-58d5d545b6-wdx49   2/2     Running   0          17m

You can check the watch output to get resource consumption

Every 2.0s: kubectl top pods -n deployments

NAME                            CPU(cores)   MEMORY(bytes)
app-tier-74c7df4f88-bzh8n       1m           49Mi
app-tier-74c7df4f88-gl76r       1m           48Mi
app-tier-74c7df4f88-gr2xz       1m           48Mi
app-tier-74c7df4f88-j9swj       3m           48Mi
app-tier-74c7df4f88-s8rb4       1m           48Mi
data-tier-599bc4fcf8-xzh92      2m           2Mi
support-tier-58d5d545b6-wdx49   10m          2Mi

This completes the prereqs for using autoscaling.

Everything about the HorizontalPodAutoscaler

The autoscaler which has the full name of: HorizontalPodAutoscaler -because it scales horizontally or out, is just another resource in Kubernetes, so we can use a manifest to declare it. The HorizontalPodAutoscaler kind is part of the autoscaling/v1 api. It’s spec includes a min and max to set lower and upper bounds on running replicas.

The targetCPUUtilizationPercentage field sets the target average CPU percentage across the replicas. With the target set to 70 percent, kubernetes will decrease the number of replicas if the average CPU utilization if 63% or below and increase the replicas if it is 77% or higher, using the default target tolerance of 10% of the target. The tolerance ensures that kubernetes isn’t constantly scaling up and down around the target. Lastly the spec also includes a scaleTargetReference that identifies what it is scaling. We are targeting the app-tier deployment.

I’ve added the equivalent kubectl autoscale command to achieve that achieves the same result, but we’ll stick with manifests for everything.

kubectl autoscale deployment app-tier --max=5 --min=1 --cpu-percent=70

Create the autoscaler

Let’s now create the autoscaler using this manifest:

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
  name: app-tier
    app: microservices
    tier: app
  maxReplicas: 5
  minReplicas: 1
    apiVersion: apps/v1
    kind: Deployment
    name: app-tier
  targetCPUUtilizationPercentage: 70

# Equivalent to
# kubectl autoscale deployment app-tier --max=5 --min=1 --cpu-percent=70

Run kubectl create -f 6.2-autoscale.yaml -n deployments

ubuntu@ip-10-0-128-5:~/src# kubectl create -f 6.2-autoscale.yaml -n deployments
horizontalpodautoscaler.autoscaling/app-tier created

Now we can watch the deployment until the autoscaler kicks in. watch -n 1 kubectl get -n deployments deployments app-tier. We should expect to see the 5 app-tier pods come down to 1

Every 1.0s: kubectl get -n deployments deployments app-tier  
app-tier   5/5     5            5           19m

Took a few minutes, but eventually brought it down to 1.

Every 1.0s: kubectl get -n deployments deployments app-tier     
app-tier   1/1     1            1           34m

Sure enough, the counts updated. K8s does not disappoint. Shown below is the number of pods before and after the HorizontalPodAutoscaler kicked in.

A word about api-resources

We can also describe the horizontal pod autoscaler to see what events took place. Now it would be painful to type out horizontal pod autoscaler many times. Fortunately kubectl accepts shorthand notations for resource types. Shown here are the whole gamut of resources that you have at your fingertips.

ubuntu@ip-10-0-128-5:~/src$ kubectl api-resources
NAME                              SHORTNAMES   APIGROUP                       NAMESPACED   KIND
bindings                                                                      true         Binding
componentstatuses                 cs                                          false        ComponentStatus
configmaps                        cm                                          true         ConfigMap
endpoints                         ep                                          true         Endpoints
events                            ev                                          true         Event
limitranges                       limits                                      true         LimitRange
namespaces                        ns                                          false        Namespace
nodes                             no                                          false        Node
persistentvolumeclaims            pvc                                         true         PersistentVolumeClaim
persistentvolumes                 pv                                          false        PersistentVolume
pods                              po                                          true         Pod
podtemplates                                                                  true         PodTemplate
replicationcontrollers            rc                                          true         ReplicationController
resourcequotas                    quota                                       true         ResourceQuota
secrets                                                                       true         Secret
serviceaccounts                   sa                                          true         ServiceAccount
services                          svc                                         true         Service
mutatingwebhookconfigurations                  admissionregistration.k8s.io   false        MutatingWebhookConfiguration
validatingwebhookconfigurations                admissionregistration.k8s.io   false        ValidatingWebhookConfiguration
customresourcedefinitions         crd,crds     apiextensions.k8s.io           false        CustomResourceDefinition
apiservices                                    apiregistration.k8s.io         false        APIService
controllerrevisions                            apps                           true         ControllerRevision
daemonsets                        ds           apps                           true         DaemonSet
deployments                       deploy       apps                           true         Deployment
replicasets                       rs           apps                           true         ReplicaSet
statefulsets                      sts          apps                           true         StatefulSet
tokenreviews                                   authentication.k8s.io          false        TokenReview
localsubjectaccessreviews                      authorization.k8s.io           true         LocalSubjectAccessReview
selfsubjectaccessreviews                       authorization.k8s.io           false        SelfSubjectAccessReview
selfsubjectrulesreviews                        authorization.k8s.io           false        SelfSubjectRulesReview
subjectaccessreviews                           authorization.k8s.io           false        SubjectAccessReview
horizontalpodautoscalers          hpa          autoscaling                    true         HorizontalPodAutoscaler
cronjobs                          cj           batch                          true         CronJob
jobs                                           batch                          true         Job
certificatesigningrequests        csr          certificates.k8s.io            false        CertificateSigningRequest
leases                                         coordination.k8s.io            true         Lease
events                            ev           events.k8s.io                  true         Event
daemonsets                        ds           extensions                     true         DaemonSet
deployments                       deploy       extensions                     true         Deployment
ingresses                         ing          extensions                     true         Ingress
networkpolicies                   netpol       extensions                     true         NetworkPolicy
podsecuritypolicies               psp          extensions                     false        PodSecurityPolicy
replicasets                       rs           extensions                     true         ReplicaSet
nodes                                          metrics.k8s.io                 false        NodeMetrics
pods                                           metrics.k8s.io                 true         PodMetrics
ingresses                         ing          networking.k8s.io              true         Ingress
networkpolicies                   netpol       networking.k8s.io              true         NetworkPolicy
runtimeclasses                                 node.k8s.io                    false        RuntimeClass
poddisruptionbudgets              pdb          policy                         true         PodDisruptionBudget
podsecuritypolicies               psp          policy                         false        PodSecurityPolicy
clusterrolebindings                            rbac.authorization.k8s.io      false        ClusterRoleBinding
clusterroles                                   rbac.authorization.k8s.io      false        ClusterRole
rolebindings                                   rbac.authorization.k8s.io      true         RoleBinding
roles                                          rbac.authorization.k8s.io      true         Role
priorityclasses                   pc           scheduling.k8s.io              false        PriorityClass
csidrivers                                     storage.k8s.io                 false        CSIDriver
csinodes                                       storage.k8s.io                 false        CSINode
storageclasses                    sc           storage.k8s.io                 false        StorageClass
volumeattachments                              storage.k8s.io                 false        VolumeAttachment

Describe the autoscaler to check the actions it took

The lone autoscaling resource is horizontalpodautoscalers and we can use hpa for short.

ubuntu@ip-10-0-128-5:~/src# kubectl describe -n deployments hpa
Name:                                                  app-tier
Namespace:                                             deployments
Labels:                                                app=microservices
Annotations:                                           <none>
CreationTimestamp:                                     Sun, 03 May 2020 15:16:50 +0000
Reference:                                             Deployment/app-tier
Metrics:                                               ( current / target )
  resource cpu on pods  (as a percentage of request):  5% (1m) / 70%
Min replicas:                                          1
Max replicas:                                          5
Deployment pods:                                       1 current / 1 desired
  Type            Status  Reason              Message
  ----            ------  ------              -------
  AbleToScale     True    ReadyForNewScale    recommended size matches current size
  ScalingActive   True    ValidMetricFound    the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
  ScalingLimited  False   DesiredWithinRange  the desired count is within the acceptable range
  Type    Reason             Age   From                       Message
  ----    ------             ----  ----                       -------
  Normal  SuccessfulRescale  34s   horizontal-pod-autoscaler  New size: 1; reason: All metrics below target

We can see the successful rescale events and that the current metrics are all below the target.

View the current stats of the autoscaler

We can also get the hpa for a quick summary of the current state:

ubuntu@ip-10-0-128-5:~/src# kubectl get -n deployments hpa
app-tier   Deployment/app-tier   10%/70%   1         5         1          8m44s

The first number in the target expresses the current average cpu utilization as a percentage of the cpu request. We can see that we are well below the target but we are at the min replicas so it won’t scale any further down.

First encounter with kubectl edit

Let’s say we wanted to modify the minimum to two replicas. We could modify the manifest, save it and use the apply command or we can use the kubectl edit command which combines those three actions into one.

Run: kubectl edit -n deployments hpa this will open up a server-side vi editor with the manifest file. Change the min replicas from 1 to 2, then run the watch command to see how it increases the number of deployments to 2. The autoscaler bumps the minimum number of replicas. It should happen within 15 seconds which is the default period for the hpa to check if it should scale.

ubuntu@ip-10-0-128-5:~/src# kubectl edit -n deployments hpa
horizontalpodautoscaler.autoscaling/app-tier edited
ubuntu@ip-10-0-128-5:~/src# watch -n 1 kubectl get -n deployments deployments app-tier
Every 1.0s: kubectl get -n deployments deployments app-tier
app-tier   2/2     2            2           31m

Now get the hpa summary again to confirm the same:

ubuntu@ip-10-0-128-5:~/src# kubectl get -n deployments hpa
app-tier   Deployment/app-tier   12%/70%   2         5         2          15m

Using kubectl edit may be tempting, but ideally you want to keep track of the changes you are making in the manifest by editing the copy in your repo, and then running an apply.


To recap our adventure with autoscaling:

  1. Kubernetes depends on metrics being collected in the cluster before you can use autoscaling. We accomplished that by adding metrics server to the cluster.
  2. You must also declare a cpu request in your deployments pod template so that autoscaling can compute each pods percentage cpu utilization.
  3. With those prerequisites taken care of you can use the horizontal pod autoscaler or hpa. You configure it with a target cpu percentage and min and max replicas. Once it is created kubernetes does all the heavy lifting to dynamically scale your deployment based on the current state of the load.
  4. While we were doing this we also picked up the kubectl apply to update resources rather than deleting and creating.
  5. We also learnt how to use the kubectl edit command which is used update the manifest and then apply that change in one command.