Kubernetes Liveness and Readiness Probes

Why do we need Kubernetes probes?

Is your pod ready as soon as the container starts? This is the key question we will explore in this post through the use of Kubernetes probes.

In an earlier post I covered deployment rollouts. Kubernetes assumed that a pod was ready as soon as the container started. That isn’t always true.

  • For example, if the container needs time to warm up then Kubernetes should wait before sending any traffic to the new pod.
  • It’s also possible that a pod is fully operational but after some time it becomes non-responsive, for example if it enters a deadlock state. Kubernetes shouldn’t send any more requests to the pod and would be better off to restart a new pod.

Kubernetes provides probes to remedy both of these situations.

Probes are sometimes referred to as health checks.

Readiness Probe

The first type of probe is a readiness probe. They are used to probe when a pod is ready to serve traffic. As I mentioned before often a pod is not ready after its containers have just started. They may need time to warm caches or load configurations.

Readiness probes can monitor the containers until they are ready to serve traffic.

But readiness probes are also useful long after startup. For example, if the pod depends on an external service and that service goes down, it’s not worth sending traffic to the pod since it can’t complete it until the external service is back online.

Readiness probes control the ready condition of a pod.

If a Readiness probe succeeds then the ready condition is *true*, otherwise it is *false*.

Services use the ready condition to determine if pods should be sent traffic. In this way probes integrate with services to ensure that traffic doesn’t flow to pods that aren’t ready for it.

Probes integrate with Services to ensure that traffic doesn’t flow to pods that aren’t ready for it.

This is a familiar concept if you have used cloud load balancer. Backend instances that fail health checks are not served traffic just as services won’t serve traffic to pods that aren’t ready. Services are our load balancers in Kubernetes.

Liveness Probe

The second type of probe is called a liveness probe. They are used to detect when a pod has entered a broken state and can no longer serve traffic. In this case, Kubernetes will restart the pod for you. That is the key difference between the two types of probes.

  1. Readiness probes determine when a Service can send traffic to a pod because it is temporarily not ready
  2. Liveness probes decide when a pod should be restarted because it won’t come back to life.

You declare both probes in the same way, you just have to decide which course of action is appropriate if a probe fails: stop serving traffic OR restart.

Declaring Probes

Probes can be declared on containers in a pod. All of a pods containers probes must pass for the pod to pass. You can define any of the following as the action a probe performs to check the container:

  • a command that runs inside the container
  • An HTTP GET request
  • Or opening a TCP socket

A command probe succeeds if the exit code of the command is 0, otherwise it fails.

An HTTP GET request probe succeeds if the response status code is between 200 and 399 inclusive.

A tcp socket probe succeeds if a connection can be established.

By default the probes check the pods every 10 seconds.

Demo: Adding liveness and readiness probes to data and app-tier containers

Our objective in this demo is to test our containers using probes. Specifically we will add Readiness and Liveness Probes to our application. We will use the application manifests from the deployments post as the base of our work in this lesson.

Before we start creating probes let’s first crystallize the concepts by relating these probes to our application.

The data-tier contains one Redis container. This container is alive if it accepts TCP connections. The Redis container is ready if it responds to Redis commands such as get or ping. There is a small but important difference between the two. A server maybe alive but not necessarily ready to handle incoming requests.

In the app-tier, the API server is alive if it accepts HTTP requests but the API server is only ready if it is online and has a connection to Redis to request and increment the counter.

The sample application has a path for each of these probes. The counter and poller containers are live and ready if they can make HTTP requests back to the API server. Let’s apply this knowledge to the deployment templates. We will go in the same order we just discussed but skip the support tier because the server demonstrates the same functionality.

We’ll start by creating a probes namespace to isolate the resources in this post.

apiVersion: v1
kind: Namespace
metadata:
  name: probes
  labels:
    app: counter
ubuntu@ip-10-0-128-5:~/src# kubectl create -f 7.1-namespace.yaml
namespace/probes created
ubuntu@ip-10-0-128-5:~/src#

Create the data-tier

Now take a look at this comparison that shows the addition of a name for the port and the probes are the only changes to the data tier deployment.

The liveness probe uses the TCP socket type of probe in this example. By using a named port we can simply write the name rather than the port number. That protects us in the future if the port number ever changes and someone forgets to update the probe port number. Also set the initial delay seconds to give the Redis server an adequate time to start.

apiVersion: v1
kind: Service
metadata:
  name: data-tier
  labels:
    app: microservices
spec:
  ports:
  - port: 6379
    protocol: TCP # default
    name: redis # optional when only 1 port
  selector:
    tier: data
  type: ClusterIP # default
---
apiVersion: apps/v1 # apps API group
kind: Deployment
metadata:
  name: data-tier
  labels:
    app: microservices
    tier: data
spec:
  replicas: 1
  selector:
    matchLabels:
      tier: data
  template:
    metadata:
      labels:
        app: microservices
        tier: data
    spec: # Pod spec
      containers:
      - name: redis
        image: redis:latest
        imagePullPolicy: IfNotPresent
        ports:
          - containerPort: 6379
            name: redis
        livenessProbe:
          tcpSocket:
            port: redis # named port
          initialDelaySeconds: 15
        readinessProbe:
          exec:
            command:
            - redis-cli
            - ping
          initialDelaySeconds: 5

Notice how we specify the liveness and readiness probes in the container spec.

We can also configure failure thresholds, delays and timeouts for all probes. The default value work well now for this example. You can reference the Kubernetes documentation for the complete information. Next the readiness probe uses the exec type of probe to specify a command. This runs the command inside the container similar to docker exec if you’ve used that before. The redis-cli ping command tests if the server is up and ready to actually process Redis specific commands.

Commands are specified as lists of strings. Also set the initialDelaySeconds. Given the consequence of failing a liveness probe is to restart a pod, it’s generally advisable to have the liveness probe at a higher delay than the readiness probe.

I’ll also point out that by default 3 sequential probes need to fail before a probe is marked as failed.

So there is some buffer built in. Kubernetes won’t immediately restart the pod the first time a probe fails, unless you configure it that way.

The particular delay values depend on your application and how long it reasonably requires to start up. 5 seconds should be enough to start checking readiness. By default we only need to pass a single probe before any traffic is sent to the pod, having the readiness initial delay too high will prevent pods that are able to handle traffic from receiving any.

Let’s create the new and improved data tier using the manifest shown above

ubuntu@ip-10-0-128-5:~/src# kubectl get deployments -n probes -w
NAME        READY   UP-TO-DATE   AVAILABLE   AGE
data-tier   0/1     1            0           12s
data-tier   1/1     1            1           20s
^Cubuntu@ip-10-0-128-5:~/src#

The -w watch option is especially handy for this case. Note the ready column. This will show one of one replicas when the readiness check passes. With the watch option new changes are appended to the bottom of the output, so we can see from the bottom line that the pod transitions to the ready after the number of seconds shown in the AGE column in the bottom line of output.

Watch the deployment for a while to make sure things stay running. If no new lines appear, there are no changes and everything has stayed up and running. If something did go awry, I’d recommend using a combination of the describe and logs commands to debug the issue. Unfortunately failed probe events don’t show in the events output but you can use the pod restart count as an indicator of failed liveness probes. But logs are the most direct way to get at them.

We will add some debug logging to the app-tier server so that you can see all the incoming probe requests next.

Create the app-tier

On to the app tier. Notice the DEBUG environment variable has been added which will cause all the server’s requests to be logged. Note that this environment variable is specific to the sample application and not a general purpose setting.

apiVersion: v1
kind: Service
metadata:
  name: app-tier
  labels:
    app: microservices
spec:
  ports:
  - port: 8080
  selector:
    tier: app
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: app-tier
  labels:
    app: microservices
    tier: app
spec:
  replicas: 1
  selector:
    matchLabels:
      tier: app
  template:
    metadata:
      labels:
        app: microservices
        tier: app
    spec:
      containers:
      - name: server
        image: lrakai/microservices:server-v1
        ports:
          - containerPort: 8080
            name: server
        env:
          - name: REDIS_URL
            # Environment variable service discovery
            # Naming pattern:
            #   IP address: <all_caps_service_name>_SERVICE_HOST
            #   Port: <all_caps_service_name>_SERVICE_PORT
            #   Named Port: <all_caps_service_name>_SERVICE_PORT_<all_caps_port_name>
            value: redis://$(DATA_TIER_SERVICE_HOST):$(DATA_TIER_SERVICE_PORT_REDIS)
            # In multi-container example value was
            # value: redis://localhost:6379
            - name: DEBUG
            value: express:*
        livenessProbe:
          httpGet:
            path: /probe/liveness
            port: server
          initialDelaySeconds: 5
        readinessProbe:
          httpGet:
            path: /probe/readiness
            port: server
          initialDelaySeconds: 3

Further down both probes are declared and this time they are httpGet probes. They send requests to endpoints built into the server specifically for checking its health. The livenessProbe endpoint does not actually communicate with Redis. It’s actually a dummy that returns a 200 OK response for all requests. The readiness probe endpoint checks that the data tier is available. Also set the initial delay seconds so the process has adequate time to start.

Let’s create the app tier deployment using the above manifest.

ubuntu@ip-10-0-128-5:~/src# kubectl create -f 7.3-app_tier.yaml -n probes
service/app-tier created
deployment.apps/app-tier created
ubuntu@ip-10-0-128-5:~/src#

Watch the deployment like before to verify containers are alive and ready. It may take some time to start the containers and wait for the initial delay seconds on the readiness probe. But after a short delay the replica is ready.

ubuntu@ip-10-0-128-5:~/src# kubectl get deployments. -n probes app-tier -w
NAME       READY   UP-TO-DATE   AVAILABLE   AGE
app-tier   0/1     1            0           3s
app-tier   1/1     1            1           13s

^Cubuntu@ip-10-0-128-5:~/src#

Steam logs

Now let’s stream some logs to see what’s happening behind the scenes. First get the pods to find a pod in the deployment

ubuntu@ip-10-0-128-5:~/src# kubectl get -n probes pods
NAME                         READY   STATUS    RESTARTS   AGE
app-tier-8445876447-mxf92    1/1     Running   0          14m
data-tier-64cd74d68b-lb6ps   1/1     Running   0          30m
ubuntu@ip-10-0-128-5:~/src#
ubuntu@ip-10-0-128-5:~/src# kubectl logs -n probes app-tier-8445876447-mxf92 | cut -d' ' -f5,8-11
if with ok


server@1.0.0
server@1.0.0




14:29:44 set "x-powered-by" to true
14:29:44 set "etag" to 'weak'
14:29:44 set "etag fn" to
14:29:44 set "env" to 'development'
14:29:44 set "query parser" to
14:29:44 set "query parser fn"
14:29:44 set "subdomain offset" to
14:29:44 set "trust proxy" to
14:29:44 set "trust proxy fn"
14:29:44 booting in development mode
14:29:44 set "view" to [Function:
14:29:44 set "views" to '/usr/src/app/views'
14:29:44 set "jsonp callback name"
14:29:44 use '/' query
14:29:44 new '/'
14:29:44 use '/' expressInit
14:29:44 new '/'
14:29:44 use '/' urlencodedParser
14:29:44 new '/'
14:29:44 set "redis" to RedisClient
14:29:44 new '/'
14:29:44 new '/'
14:29:44 get '/'
14:29:44 new '/'
14:29:44 new '/'
14:29:44 new '/'
14:29:44 post '/'
14:29:44 new '/'
14:29:44 new '/probe/liveness'
14:29:44 new '/probe/liveness'
14:29:44 get '/probe/liveness'
14:29:44 new '/'
14:29:44 new '/probe/readiness'
14:29:44 new '/probe/readiness'
14:29:44 get '/probe/readiness'
14:29:44 new '/'
8080!
14:29:49 dispatching GET /probe/liveness
14:29:49 query  : /probe/liveness
14:29:49 expressInit  : /probe/liveness
14:29:49 urlencodedParser  : /probe/liveness
14:29:55 dispatching GET /probe/readiness
14:29:55 query  : /probe/readiness
14:29:55 expressInit  : /probe/readiness
14:29:55 urlencodedParser  : /probe/readiness
14:29:59 dispatching GET /probe/liveness
14:29:59 query  : /probe/liveness
14:29:59 expressInit  : /probe/liveness
14:29:59 urlencodedParser  : /probe/liveness
14:30:05 dispatching GET /probe/readiness
14:30:05 query  : /probe/readiness
14:30:05 expressInit  : /probe/readiness
14:30:05 urlencodedParser  : /probe/readiness
14:30:09 dispatching GET /probe/liveness
14:30:09 query  : /probe/liveness
14:30:09 expressInit  : /probe/liveness
14:30:09 urlencodedParser  : /probe/liveness
14:30:15 dispatching GET /probe/readiness
14:30:15 query  : /probe/readiness
14:30:15 expressInit  : /probe/readiness
14:30:15 urlencodedParser  : /probe/readiness
14:30:19 dispatching GET /probe/liveness
14:30:19 query  : /probe/liveness
14:30:19 expressInit  : /probe/liveness
14:30:19 urlencodedParser  : /probe/liveness
14:30:25 dispatching GET /probe/readiness
14:30:25 query  : /probe/readiness
14:30:25 expressInit  : /probe/readiness

We can see that Kubernetes is firing both probes in 10 second intervals. With the help of these probes Kubernetes can take pods out of service when they aren’t ready and restart them when they enter a broken state.

Conclusion

  • Containers in pods can declare readiness probes to allow Kubernetes to monitor when they are ready to serve traffic and when they should temporarily be taken out of service.
  • Containers in pods can declare liveness probes to allow Kubernetes to detect when they have entered a broken state and the pod should be restarted.
  • Both types of probes have the same format in manifest files and can make use of either command, http get, or tcp socket probe types.

Remember that probes kick in after containers are started. If you need to test or prepare things before the containers start, there is a way to do that as well. That is the role of init containers which I will explore in the next post.