Why do we need Kubernetes probes?
Is your pod ready as soon as the container starts? This is the key question we will explore in this post through the use of Kubernetes probes.
In an earlier post I covered deployment rollouts. Kubernetes assumed that a pod was ready as soon as the container started. That isn’t always true.
- For example, if the container needs time to warm up then Kubernetes should wait before sending any traffic to the new pod.
- It’s also possible that a pod is fully operational but after some time it becomes non-responsive, for example if it enters a deadlock state. Kubernetes shouldn’t send any more requests to the pod and would be better off to restart a new pod.
Kubernetes provides probes to remedy both of these situations.
Probes are sometimes referred to as health checks.
The first type of probe is a readiness probe. They are used to probe when a pod is ready to serve traffic. As I mentioned before often a pod is not ready after its containers have just started. They may need time to warm caches or load configurations.
Readiness probes can monitor the containers until they are ready to serve traffic.
But readiness probes are also useful long after startup. For example, if the pod depends on an external service and that service goes down, it’s not worth sending traffic to the pod since it can’t complete it until the external service is back online.
Readiness probes control the ready condition of a pod.
If a Readiness probe succeeds then the ready condition is
*true*, otherwise it is
Services use the
ready condition to determine if pods should be sent traffic. In this way probes integrate with services to ensure that traffic doesn’t flow to pods that aren’t ready for it.
Probes integrate with Services to ensure that traffic doesn’t flow to pods that aren’t ready for it.
This is a familiar concept if you have used cloud load balancer. Backend instances that fail health checks are not served traffic just as services won’t serve traffic to pods that aren’t ready. Services are our load balancers in Kubernetes.
The second type of probe is called a liveness probe. They are used to detect when a pod has entered a broken state and can no longer serve traffic. In this case, Kubernetes will restart the pod for you. That is the key difference between the two types of probes.
- Readiness probes determine when a Service can send traffic to a pod because it is temporarily not ready
- Liveness probes decide when a pod should be restarted because it won’t come back to life.
You declare both probes in the same way, you just have to decide which course of action is appropriate if a probe fails: stop serving traffic OR restart.
Probes can be declared on containers in a pod. All of a pods containers probes must pass for the pod to pass. You can define any of the following as the action a probe performs to check the container:
- a command that runs inside the container
- An HTTP GET request
- Or opening a TCP socket
A command probe succeeds if the exit code of the command is 0, otherwise it fails.
An HTTP GET request probe succeeds if the response status code is between 200 and 399 inclusive.
A tcp socket probe succeeds if a connection can be established.
By default the probes check the pods every 10 seconds.
Demo: Adding liveness and readiness probes to data and app-tier containers
Our objective in this demo is to test our containers using probes. Specifically we will add
Liveness Probes to our application. We will use the application manifests from the deployments post as the base of our work in this lesson.
Before we start creating probes let’s first crystallize the concepts by relating these probes to our application.
data-tier contains one Redis container. This container is alive if it accepts TCP connections. The Redis container is ready if it responds to Redis commands such as get or ping. There is a small but important difference between the two. A server maybe alive but not necessarily ready to handle incoming requests.
In the app-tier, the API server is alive if it accepts HTTP requests but the API server is only ready if it is online and has a connection to Redis to request and increment the counter.
The sample application has a path for each of these probes. The counter and poller containers are live and ready if they can make HTTP requests back to the API server. Let’s apply this knowledge to the deployment templates. We will go in the same order we just discussed but skip the support tier because the server demonstrates the same functionality.
We’ll start by creating a probes namespace to isolate the resources in this post.
apiVersion: v1 kind: Namespace metadata: name: probes labels: app: counter
ubuntu@ip-10-0-128-5:~/src# kubectl create -f 7.1-namespace.yaml namespace/probes created ubuntu@ip-10-0-128-5:~/src#
Create the data-tier
Now take a look at this comparison that shows the addition of a name for the port and the probes are the only changes to the data tier deployment.
The liveness probe uses the TCP socket type of probe in this example. By using a named port we can simply write the name rather than the port number. That protects us in the future if the port number ever changes and someone forgets to update the probe port number. Also set the initial delay seconds to give the Redis server an adequate time to start.
apiVersion: v1 kind: Service metadata: name: data-tier labels: app: microservices spec: ports: - port: 6379 protocol: TCP # default name: redis # optional when only 1 port selector: tier: data type: ClusterIP # default --- apiVersion: apps/v1 # apps API group kind: Deployment metadata: name: data-tier labels: app: microservices tier: data spec: replicas: 1 selector: matchLabels: tier: data template: metadata: labels: app: microservices tier: data spec: # Pod spec containers: - name: redis image: redis:latest imagePullPolicy: IfNotPresent ports: - containerPort: 6379 name: redis livenessProbe: tcpSocket: port: redis # named port initialDelaySeconds: 15 readinessProbe: exec: command: - redis-cli - ping initialDelaySeconds: 5
Notice how we specify the liveness and readiness probes in the container spec.
We can also configure failure thresholds, delays and timeouts for all probes. The default value work well now for this example. You can reference the Kubernetes documentation for the complete information. Next the readiness probe uses the exec type of probe to specify a command. This runs the command inside the container similar to docker exec if you’ve used that before. The redis-cli ping command tests if the server is up and ready to actually process Redis specific commands.
Commands are specified as lists of strings. Also set the
initialDelaySeconds. Given the consequence of failing a liveness probe is to restart a pod, it’s generally advisable to have the liveness probe at a higher delay than the readiness probe.
I’ll also point out that by default 3 sequential probes need to fail before a probe is marked as failed.
So there is some buffer built in. Kubernetes won’t immediately restart the pod the first time a probe fails, unless you configure it that way.
The particular delay values depend on your application and how long it reasonably requires to start up. 5 seconds should be enough to start checking readiness. By default we only need to pass a single probe before any traffic is sent to the pod, having the readiness initial delay too high will prevent pods that are able to handle traffic from receiving any.
Let’s create the new and improved data tier using the manifest shown above
ubuntu@ip-10-0-128-5:~/src# kubectl get deployments -n probes -w NAME READY UP-TO-DATE AVAILABLE AGE data-tier 0/1 1 0 12s data-tier 1/1 1 1 20s ^Cubuntu@ip-10-0-128-5:~/src#
-w watch option is especially handy for this case. Note the ready column. This will show one of one replicas when the readiness check passes. With the watch option new changes are appended to the bottom of the output, so we can see from the bottom line that the pod transitions to the ready after the number of seconds shown in the AGE column in the bottom line of output.
Watch the deployment for a while to make sure things stay running. If no new lines appear, there are no changes and everything has stayed up and running. If something did go awry, I’d recommend using a combination of the describe and logs commands to debug the issue. Unfortunately failed probe events don’t show in the events output but you can use the pod restart count as an indicator of failed liveness probes. But logs are the most direct way to get at them.
We will add some debug logging to the app-tier server so that you can see all the incoming probe requests next.
Create the app-tier
On to the app tier. Notice the
DEBUG environment variable has been added which will cause all the server’s requests to be logged. Note that this environment variable is specific to the sample application and not a general purpose setting.
apiVersion: v1 kind: Service metadata: name: app-tier labels: app: microservices spec: ports: - port: 8080 selector: tier: app --- apiVersion: apps/v1 kind: Deployment metadata: name: app-tier labels: app: microservices tier: app spec: replicas: 1 selector: matchLabels: tier: app template: metadata: labels: app: microservices tier: app spec: containers: - name: server image: lrakai/microservices:server-v1 ports: - containerPort: 8080 name: server env: - name: REDIS_URL # Environment variable service discovery # Naming pattern: # IP address: <all_caps_service_name>_SERVICE_HOST # Port: <all_caps_service_name>_SERVICE_PORT # Named Port: <all_caps_service_name>_SERVICE_PORT_<all_caps_port_name> value: redis://$(DATA_TIER_SERVICE_HOST):$(DATA_TIER_SERVICE_PORT_REDIS) # In multi-container example value was # value: redis://localhost:6379 - name: DEBUG value: express:* livenessProbe: httpGet: path: /probe/liveness port: server initialDelaySeconds: 5 readinessProbe: httpGet: path: /probe/readiness port: server initialDelaySeconds: 3
Further down both probes are declared and this time they are
httpGet probes. They send requests to endpoints built into the server specifically for checking its health. The
livenessProbe endpoint does not actually communicate with Redis. It’s actually a dummy that returns a 200 OK response for all requests. The readiness probe endpoint checks that the data tier is available. Also set the initial delay seconds so the process has adequate time to start.
Let’s create the app tier deployment using the above manifest.
ubuntu@ip-10-0-128-5:~/src# kubectl create -f 7.3-app_tier.yaml -n probes service/app-tier created deployment.apps/app-tier created ubuntu@ip-10-0-128-5:~/src#
Watch the deployment like before to verify containers are alive and ready. It may take some time to start the containers and wait for the initial delay seconds on the readiness probe. But after a short delay the replica is ready.
ubuntu@ip-10-0-128-5:~/src# kubectl get deployments. -n probes app-tier -w NAME READY UP-TO-DATE AVAILABLE AGE app-tier 0/1 1 0 3s app-tier 1/1 1 1 13s ^Cubuntu@ip-10-0-128-5:~/src#
Now let’s stream some logs to see what’s happening behind the scenes. First get the pods to find a pod in the deployment
ubuntu@ip-10-0-128-5:~/src# kubectl get -n probes pods NAME READY STATUS RESTARTS AGE app-tier-8445876447-mxf92 1/1 Running 0 14m data-tier-64cd74d68b-lb6ps 1/1 Running 0 30m ubuntu@ip-10-0-128-5:~/src#
ubuntu@ip-10-0-128-5:~/src# kubectl logs -n probes app-tier-8445876447-mxf92 | cut -d' ' -f5,8-11 if with ok firstname.lastname@example.org email@example.com 14:29:44 set "x-powered-by" to true 14:29:44 set "etag" to 'weak' 14:29:44 set "etag fn" to 14:29:44 set "env" to 'development' 14:29:44 set "query parser" to 14:29:44 set "query parser fn" 14:29:44 set "subdomain offset" to 14:29:44 set "trust proxy" to 14:29:44 set "trust proxy fn" 14:29:44 booting in development mode 14:29:44 set "view" to [Function: 14:29:44 set "views" to '/usr/src/app/views' 14:29:44 set "jsonp callback name" 14:29:44 use '/' query 14:29:44 new '/' 14:29:44 use '/' expressInit 14:29:44 new '/' 14:29:44 use '/' urlencodedParser 14:29:44 new '/' 14:29:44 set "redis" to RedisClient 14:29:44 new '/' 14:29:44 new '/' 14:29:44 get '/' 14:29:44 new '/' 14:29:44 new '/' 14:29:44 new '/' 14:29:44 post '/' 14:29:44 new '/' 14:29:44 new '/probe/liveness' 14:29:44 new '/probe/liveness' 14:29:44 get '/probe/liveness' 14:29:44 new '/' 14:29:44 new '/probe/readiness' 14:29:44 new '/probe/readiness' 14:29:44 get '/probe/readiness' 14:29:44 new '/' 8080! 14:29:49 dispatching GET /probe/liveness 14:29:49 query : /probe/liveness 14:29:49 expressInit : /probe/liveness 14:29:49 urlencodedParser : /probe/liveness 14:29:55 dispatching GET /probe/readiness 14:29:55 query : /probe/readiness 14:29:55 expressInit : /probe/readiness 14:29:55 urlencodedParser : /probe/readiness 14:29:59 dispatching GET /probe/liveness 14:29:59 query : /probe/liveness 14:29:59 expressInit : /probe/liveness 14:29:59 urlencodedParser : /probe/liveness 14:30:05 dispatching GET /probe/readiness 14:30:05 query : /probe/readiness 14:30:05 expressInit : /probe/readiness 14:30:05 urlencodedParser : /probe/readiness 14:30:09 dispatching GET /probe/liveness 14:30:09 query : /probe/liveness 14:30:09 expressInit : /probe/liveness 14:30:09 urlencodedParser : /probe/liveness 14:30:15 dispatching GET /probe/readiness 14:30:15 query : /probe/readiness 14:30:15 expressInit : /probe/readiness 14:30:15 urlencodedParser : /probe/readiness 14:30:19 dispatching GET /probe/liveness 14:30:19 query : /probe/liveness 14:30:19 expressInit : /probe/liveness 14:30:19 urlencodedParser : /probe/liveness 14:30:25 dispatching GET /probe/readiness 14:30:25 query : /probe/readiness 14:30:25 expressInit : /probe/readiness
We can see that Kubernetes is firing both probes in 10 second intervals. With the help of these probes Kubernetes can take pods out of service when they aren’t ready and restart them when they enter a broken state.
- Containers in pods can declare readiness probes to allow Kubernetes to monitor when they are ready to serve traffic and when they should temporarily be taken out of service.
- Containers in pods can declare liveness probes to allow Kubernetes to detect when they have entered a broken state and the pod should be restarted.
- Both types of probes have the same format in manifest files and can make use of either command, http get, or tcp socket probe types.
Remember that probes kick in after containers are started. If you need to test or prepare things before the containers start, there is a way to do that as well. That is the role of init containers which I will explore in the next post.