Simulate failures of Keycloak in Kubernetes

How to automate the simulation of failures Keycloak Pods in a Kubernetes environment to test the recovery of Keycloak after a failure.

Why failure testing

There is an excellent writeup about why we need chaos testing tools in general in the introduction to the chaos testing tool krkn.

Running the failure test from the CLI

Preparations

Extract the keycloak-benchmark-${version}.[zip|tar.gz] file
Preparing Keycloak for testing
Make sure you can access the Kubernetes cluster from where you are planning to run the failure tests and run commands such as kubectl get pods -n keycloak-keycloak

Simulating load

Use the Running benchmarks from the CLI guide to simulate load against a specific Kubernetes environment.

Running the failure tests

Once there is enough load going against the Keycloak application hosted on an existing Kubernetes/OpenShift cluster, execute below command to:

./kc-chaos.sh <RESULT_DIR_PATH>

Set the environment variables below to configure on how and where this script gets executed.

INITIAL_DELAY_SECS: Time in seconds the script waits before it triggers the first failure.
CHAOS_DELAY_SECS: Time in seconds the script waits between simulating failures.
PROJECT: Namespace of the Keycloak pods.

Collecting the results

The chaos script also collects information about the Keycloak failures, Keycloak pod utilization, Keycloak pod restarts, Keycloak logs before killing the keycloak pod and at the end of the run and store them under the results/logs directory.