We’ve seen deployments work their magic in this post. We also saw how to scale the deployment replicas by running the
kubectl scale command. But it would be nice to not have to manually scale the deployment. That’s where autoscaling comes in.
Kubernetes supports CPU based autoscaling and autoscaling based on a custom metric you define. We’ll focus on using CPU in this post.
Autoscaling works by specifying:
- a desired target CPU percentage, and
- a minimum and maximum number of allowed replicas.
The CPU percentage is expressed as a percentage of the cpu resource request of the pod.
Recall that pods can set resource requests for CPU to ensure that they are scheduled on a node with at least that much CPU available.
If no CPU resource request is set, then autoscaling won’t take any action.
Kubernetes will increase or decrease the number of replicas according to the average CPU usage of all the replicas. The autoscaler will increase the number of replicas when the actual CPU usage of the current pods exceeds the target and vice versa for decreasing the number of pods.
Autoscaler will never create more replicas than the maximum you set nor will it decrease the number of replicas below your configured minimum. You can configure some of the parameters of the autoscaler but the defaults will work fine for us. With the defaults the autoscaler will compare the actual cpu usage to the target cpu and either increase the replicas if the actual cpu is sufficiently higher than the target or decrease the replicas if the actual cpu is sufficiently below the target. Otherwise it will keep the status quo.
Autoscaling depends on metrics being collected in the cluster so that the average pod CPU can be computed. Kubernetes integrates with several solutions for collecting metrics. We will use metrics server which is a solution maintained by Kubernetes. There are several manifest files on the kubernetes metrics-server Github repository that declare all the required resources.
We will need to get metrics server up and running before we can use autoscaling.
metrics server is running,
Autoscalers can retrieve them using the
Kubernetes metrics API.
Autoscaling demostration using a 3-tier application
To recall, shown below is the lab architecture.
Create namespace, deployments and verify application is running
The first thing we need to do is to create a namespace called
deployments and then create the 3-tiers of the application. The following commands will get you there:
kubectl create -f 5.1-namespace.yaml: Create a namespace
kubectl create -n deployments -f 5.2-data_tier.yaml -f 5.3-app_tier.yaml -f 5.4-support_tier.yaml: Create the 3 tier application
kubectl get pods -n deployments: View the pods
kubectl get deployments. -n deployments: View the deployments
kubectl logs -n deployments support-tier-58d5d545b6-58h4q counter --tail 10: Verify app is running
kubectl logs -n deployments support-tier-58d5d545b6-58h4q poller --tail 10: Verify app is running
ubuntu@ip-10-0-128-5:~/src# kubectl create -f 5.1-namespace.yaml namespace/deployments created ubuntu@ip-10-0-128-5:~/src# kubectl create -n deployments -f 5.2-data_tier.yaml -f 5.3-app_tier.yaml -f 5.4-support_tier.yaml service/data-tier created deployment.apps/data-tier created service/app-tier created deployment.apps/app-tier created deployment.apps/support-tier created ubuntu@ip-10-0-128-5:~/src# kubectl get pods -n deployments NAME READY STATUS RESTARTS AGE app-tier-748cdbdcc5-jj4l6 1/1 Running 0 17s data-tier-599bc4fcf8-mf7bt 1/1 Running 0 17s support-tier-58d5d545b6-58h4q 2/2 Running 0 17s ubuntu@ip-10-0-128-5:~/src# kubectl get deployments. -n deployments NAME READY UP-TO-DATE AVAILABLE AGE app-tier 1/1 1 1 25s data-tier 1/1 1 1 25s support-tier 1/1 1 1 25s ubuntu@ip-10-0-128-5:~/src# kubectl logs -n deployments support-tier-58d5d545b6-58h4q counter --tail 10 Incrementing counter by 5 ... Incrementing counter by 5 ... Incrementing counter by 6 ... Incrementing counter by 6 ... Incrementing counter by 8 ... Incrementing counter by 10 ... Incrementing counter by 4 ... Incrementing counter by 6 ... Incrementing counter by 2 ... Incrementing counter by 7 ... ubuntu@ip-10-0-128-5:~/src# ubuntu@ip-10-0-128-5:~/src# kubectl logs -n deployments support-tier-58d5d545b6-58h4q poller --tail 10 Current counter: 2242 Current counter: 2255 Current counter: 2263 Current counter: 2272 Current counter: 2291 Current counter: 2304 Current counter: 2313 Current counter: 2327 Current counter: 2339 Current counter: 2354 ubuntu@ip-10-0-128-5:~/src#
Create the metrics server
The bastion host also includes the metrics server manifests in the metrics-server sub directory. It is outside the scope of this post to discuss all of the resources that comprise metrics server. All we need to do is create them and we can count on metrics being collected in the cluster. To do that we can use our trusty
kubectl create command and specify the directory as the file target.
ubuntu@ip-10-0-128-5:~/src# kubectl create -f metrics-server/ clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created serviceaccount/metrics-server created deployment.extensions/metrics-server created service/metrics-server created clusterrole.rbac.authorization.k8s.io/system:metrics-server created clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created ubuntu@ip-10-0-128-5:~/src#
kubectl then creates all of the manifests it find in the directory. You can see quite a few resources are created. One of them is a deployment (
Metrics server runs as a pod in the cluster and that pod is managed by a deployment.
It takes a minute for the first metrics to start trickling in. You can confirm that metrics server is doing its thing by watching the
watch kubectl top pods -n deployments command. This lists the cpu and memory usage of each pod in the namespace. You can use the top command to benchmark a pods resource utilization and debug resource utilization issues.
Our pods are all using a small fraction of one cpu.
Every 2.0s: kubectl top pods -n deployments NAME CPU(cores) MEMORY(bytes) app-tier-748cdbdcc5-fqsw7 2m 47Mi data-tier-599bc4fcf8-xzh92 2m 2Mi support-tier-58d5d545b6-wdx49 3m 2Mi
The m stands for milli. 1000 milliCPUs equals one CPU.
Now that we have metrics, the other thing that the autoscaler depends on is having a cpu request in the deployment’s pod spec.
Declare a CPU request in the deployment’s pod spec
Let’s see how that looks in the
app-tier deployment. I’ve highlighted the change from the previous post.
Each pod will now request
20 millicpu. Kubernetes will only scale the pods on nodes with at least 0.02 CPUs remaining. I also set the replicas to 5 to keep the 5 replicas running. The manifest is shown here:
apiVersion: v1 kind: Service metadata: name: app-tier labels: app: microservices spec: ports: - port: 8080 selector: tier: app --- apiVersion: apps/v1 kind: Deployment metadata: name: app-tier labels: app: microservices tier: app spec: replicas: 5 selector: matchLabels: tier: app template: metadata: labels: app: microservices tier: app spec: containers: - name: server image: lrakai/microservices:server-v1 ports: - containerPort: 8080 resources: requests: cpu: 20m # 20 milliCPU / 0.02 CPU env: - name: REDIS_URL # Environment variable service discovery # Naming pattern: # IP address: <all_caps_service_name>_SERVICE_HOST # Port: <all_caps_service_name>_SERVICE_PORT # Named Port: <all_caps_service_name>_SERVICE_PORT_<all_caps_port_name> value: redis://$(DATA_TIER_SERVICE_HOST):$(DATA_TIER_SERVICE_PORT_REDIS) # In multi-container example value was # value: redis://localhost:6379
Now if we try to create the resources kubectl will tell us they already exist.
ubuntu@ip-10-0-128-5:~/src# kubectl create -n deployments -f 6.1-app_tier_cpu_request.yaml Error from server (AlreadyExists): error when creating "6.1-app_tier_cpu_request.yaml": services "app-tier" already exists Error from server (AlreadyExists): error when creating "6.1-app_tier_cpu_request.yaml": deployments.apps "app-tier" already exists ubuntu@ip-10-0-128-5:~/src#
If you recollect, we had these 3 tiers created
ubuntu@ip-10-0-128-5:~/src# kubectl get -n deployments deployments NAME READY UP-TO-DATE AVAILABLE AGE app-tier 1/1 1 1 9m27s data-tier 1/1 1 1 9m27s support-tier 1/1 1 1 9m27s ubuntu@ip-10-0-128-5:~/src#
First encounter with
Create will check if a resource of a given type and name already exists and it will fail if it does. We could delete the deployment and then create it. It would be nice to avoid the downtime that is involved with that though. Instead kubernetes provides a command that can apply changes to existing resources.
ubuntu@ip-10-0-128-5:~/src# kubectl apply -f 6.1-app_tier_cpu_request.yaml -n deployments Warning: kubectl apply should be used on resource created by either kubectl create --save-config or kubectl apply service/app-tier configured Warning: kubectl apply should be used on resource created by either kubectl create --save-config or kubectl apply deployment.apps/app-tier configured ubuntu@ip-10-0-128-5:~/src#
Apply will update our deployment to include the cpu request. It will warn us about mixing create and apply, but we can ignore that for now.
We set the request low enough that the five replicas can remain scheduled in the cluster as we can see from the get deployments output
ubuntu@ip-10-0-128-5:~/src$ kubectl get -n deployments deployments NAME READY UP-TO-DATE AVAILABLE AGE app-tier 5/5 5 5 17m data-tier 1/1 1 1 17m support-tier 1/1 1 1 17m ubuntu@ip-10-0-128-5:~/src$
There will be 5 pods running. Kubernetes ensured that 5 actual pods are ready matching the 5 pods we desired.
ubuntu@ip-10-0-128-5:~/src# kubectl get -n deployments pods NAME READY STATUS RESTARTS AGE app-tier-74c7df4f88-bzh8n 1/1 Running 0 2m22s app-tier-74c7df4f88-gl76r 1/1 Running 0 2m22s app-tier-74c7df4f88-gr2xz 1/1 Running 0 2m17s app-tier-74c7df4f88-j9swj 1/1 Running 0 2m22s app-tier-74c7df4f88-s8rb4 1/1 Running 0 2m18s data-tier-599bc4fcf8-xzh92 1/1 Running 0 17m support-tier-58d5d545b6-wdx49 2/2 Running 0 17m ubuntu@ip-10-0-128-5:~/src#
You can check the watch output to get resource consumption
Every 2.0s: kubectl top pods -n deployments NAME CPU(cores) MEMORY(bytes) app-tier-74c7df4f88-bzh8n 1m 49Mi app-tier-74c7df4f88-gl76r 1m 48Mi app-tier-74c7df4f88-gr2xz 1m 48Mi app-tier-74c7df4f88-j9swj 3m 48Mi app-tier-74c7df4f88-s8rb4 1m 48Mi data-tier-599bc4fcf8-xzh92 2m 2Mi support-tier-58d5d545b6-wdx49 10m 2Mi
This completes the prereqs for using autoscaling.
Everything about the
The autoscaler which has the full name of:
HorizontalPodAutoscaler -because it scales horizontally or out, is just another resource in Kubernetes, so we can use a manifest to declare it. The
HorizontalPodAutoscaler kind is part of the
autoscaling/v1 api. It’s spec includes a
max to set lower and upper bounds on running replicas.
targetCPUUtilizationPercentage field sets the target average CPU percentage across the replicas. With the target set to 70 percent, kubernetes will decrease the number of replicas if the average CPU utilization if 63% or below and increase the replicas if it is 77% or higher, using the default target tolerance of 10% of the target. The tolerance ensures that kubernetes isn’t constantly scaling up and down around the target. Lastly the spec also includes a
scaleTargetReference that identifies what it is scaling. We are targeting the
I’ve added the equivalent kubectl autoscale command to achieve that achieves the same result, but we’ll stick with manifests for everything.
kubectl autoscale deployment app-tier --max=5 --min=1 --cpu-percent=70
Create the autoscaler
Let’s now create the autoscaler using this manifest:
apiVersion: autoscaling/v1 kind: HorizontalPodAutoscaler metadata: name: app-tier labels: app: microservices tier: app spec: maxReplicas: 5 minReplicas: 1 scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: app-tier targetCPUUtilizationPercentage: 70 # Equivalent to # kubectl autoscale deployment app-tier --max=5 --min=1 --cpu-percent=70
kubectl create -f 6.2-autoscale.yaml -n deployments
ubuntu@ip-10-0-128-5:~/src# kubectl create -f 6.2-autoscale.yaml -n deployments horizontalpodautoscaler.autoscaling/app-tier created ubuntu@ip-10-0-128-5:~/src#
Now we can watch the deployment until the autoscaler kicks in.
watch -n 1 kubectl get -n deployments deployments app-tier. We should expect to see the 5 app-tier pods come down to 1
Every 1.0s: kubectl get -n deployments deployments app-tier NAME READY UP-TO-DATE AVAILABLE AGE app-tier 5/5 5 5 19m
Took a few minutes, but eventually brought it down to 1.
Every 1.0s: kubectl get -n deployments deployments app-tier NAME READY UP-TO-DATE AVAILABLE AGE app-tier 1/1 1 1 34m
Sure enough, the counts updated. K8s does not disappoint. Shown below is the number of pods before and after the HorizontalPodAutoscaler kicked in.
A word about
We can also describe the horizontal pod autoscaler to see what events took place. Now it would be painful to type out horizontal pod autoscaler many times. Fortunately kubectl accepts shorthand notations for resource types. Shown here are the whole gamut of resources that you have at your fingertips.
ubuntu@ip-10-0-128-5:~/src$ kubectl api-resources NAME SHORTNAMES APIGROUP NAMESPACED KIND bindings true Binding componentstatuses cs false ComponentStatus configmaps cm true ConfigMap endpoints ep true Endpoints events ev true Event limitranges limits true LimitRange namespaces ns false Namespace nodes no false Node persistentvolumeclaims pvc true PersistentVolumeClaim persistentvolumes pv false PersistentVolume pods po true Pod podtemplates true PodTemplate replicationcontrollers rc true ReplicationController resourcequotas quota true ResourceQuota secrets true Secret serviceaccounts sa true ServiceAccount services svc true Service mutatingwebhookconfigurations admissionregistration.k8s.io false MutatingWebhookConfiguration validatingwebhookconfigurations admissionregistration.k8s.io false ValidatingWebhookConfiguration customresourcedefinitions crd,crds apiextensions.k8s.io false CustomResourceDefinition apiservices apiregistration.k8s.io false APIService controllerrevisions apps true ControllerRevision daemonsets ds apps true DaemonSet deployments deploy apps true Deployment replicasets rs apps true ReplicaSet statefulsets sts apps true StatefulSet tokenreviews authentication.k8s.io false TokenReview localsubjectaccessreviews authorization.k8s.io true LocalSubjectAccessReview selfsubjectaccessreviews authorization.k8s.io false SelfSubjectAccessReview selfsubjectrulesreviews authorization.k8s.io false SelfSubjectRulesReview subjectaccessreviews authorization.k8s.io false SubjectAccessReview horizontalpodautoscalers hpa autoscaling true HorizontalPodAutoscaler cronjobs cj batch true CronJob jobs batch true Job certificatesigningrequests csr certificates.k8s.io false CertificateSigningRequest leases coordination.k8s.io true Lease events ev events.k8s.io true Event daemonsets ds extensions true DaemonSet deployments deploy extensions true Deployment ingresses ing extensions true Ingress networkpolicies netpol extensions true NetworkPolicy podsecuritypolicies psp extensions false PodSecurityPolicy replicasets rs extensions true ReplicaSet nodes metrics.k8s.io false NodeMetrics pods metrics.k8s.io true PodMetrics ingresses ing networking.k8s.io true Ingress networkpolicies netpol networking.k8s.io true NetworkPolicy runtimeclasses node.k8s.io false RuntimeClass poddisruptionbudgets pdb policy true PodDisruptionBudget podsecuritypolicies psp policy false PodSecurityPolicy clusterrolebindings rbac.authorization.k8s.io false ClusterRoleBinding clusterroles rbac.authorization.k8s.io false ClusterRole rolebindings rbac.authorization.k8s.io true RoleBinding roles rbac.authorization.k8s.io true Role priorityclasses pc scheduling.k8s.io false PriorityClass csidrivers storage.k8s.io false CSIDriver csinodes storage.k8s.io false CSINode storageclasses sc storage.k8s.io false StorageClass volumeattachments storage.k8s.io false VolumeAttachment ubuntu@ip-10-0-128-5:~/src$
Describe the autoscaler to check the actions it took
The lone autoscaling resource is
horizontalpodautoscalers and we can use
hpa for short.
ubuntu@ip-10-0-128-5:~/src# kubectl describe -n deployments hpa Name: app-tier Namespace: deployments Labels: app=microservices tier=app Annotations: <none> CreationTimestamp: Sun, 03 May 2020 15:16:50 +0000 Reference: Deployment/app-tier Metrics: ( current / target ) resource cpu on pods (as a percentage of request): 5% (1m) / 70% Min replicas: 1 Max replicas: 5 Deployment pods: 1 current / 1 desired Conditions: Type Status Reason Message ---- ------ ------ ------- AbleToScale True ReadyForNewScale recommended size matches current size ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request) ScalingLimited False DesiredWithinRange the desired count is within the acceptable range Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal SuccessfulRescale 34s horizontal-pod-autoscaler New size: 1; reason: All metrics below target ubuntu@ip-10-0-128-5:~/src#
We can see the successful rescale events and that the current metrics are all below the target.
View the current stats of the autoscaler
We can also get the hpa for a quick summary of the current state:
ubuntu@ip-10-0-128-5:~/src# kubectl get -n deployments hpa NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE app-tier Deployment/app-tier 10%/70% 1 5 1 8m44s ubuntu@ip-10-0-128-5:~/src#
The first number in the target expresses the current average cpu utilization as a percentage of the cpu request. We can see that we are well below the target but we are at the min replicas so it won’t scale any further down.
First encounter with
Let’s say we wanted to modify the minimum to two replicas. We could modify the manifest, save it and use the apply command or we can use the kubectl edit command which combines those three actions into one.
kubectl edit -n deployments hpa this will open up a server-side
vi editor with the manifest file. Change the min replicas from 1 to 2, then run the watch command to see how it increases the number of deployments to 2. The autoscaler bumps the minimum number of replicas. It should happen within 15 seconds which is the default period for the
hpa to check if it should scale.
ubuntu@ip-10-0-128-5:~/src# kubectl edit -n deployments hpa horizontalpodautoscaler.autoscaling/app-tier edited ubuntu@ip-10-0-128-5:~/src# watch -n 1 kubectl get -n deployments deployments app-tier Every 1.0s: kubectl get -n deployments deployments app-tier NAME READY UP-TO-DATE AVAILABLE AGE app-tier 2/2 2 2 31m
Now get the hpa summary again to confirm the same:
ubuntu@ip-10-0-128-5:~/src# kubectl get -n deployments hpa NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE app-tier Deployment/app-tier 12%/70% 2 5 2 15m ubuntu@ip-10-0-128-5:~/src#
kubectl editmay be tempting, but ideally you want to keep track of the changes you are making in the manifest by editing the copy in your repo, and then running an apply.
To recap our adventure with autoscaling:
- Kubernetes depends on metrics being collected in the cluster before you can use autoscaling. We accomplished that by adding metrics server to the cluster.
- You must also declare a cpu request in your deployments pod template so that autoscaling can compute each pods percentage cpu utilization.
- With those prerequisites taken care of you can use the horizontal pod autoscaler or hpa. You configure it with a target cpu percentage and min and max replicas. Once it is created kubernetes does all the heavy lifting to dynamically scale your deployment based on the current state of the load.
- While we were doing this we also picked up the
kubectl applyto update resources rather than deleting and creating.
- We also learnt how to use the
kubectl editcommand which is used update the manifest and then apply that change in one command.