Keycloak under load

This topic describes how Keycloak behaves under a high load. It also offers configuration best practices to prepare Keycloak for a high load.

Database connections

Creating new database connections is expensive as it takes time. Creating them when a request arrives will delay the response, so it is good to have them created before the request arrives. It can also contribute to a stampede effect where creating a lot of connections in a short time makes things worse as it slows down the system and blocks threads. Closing a connection also invalidates all service side statements caching for that connection.

For the best performance, the values for the initial, minimal and maximum DB connection pool size should all be equal. This avoids creating new DB connections when a new request comes in which is costly.

Keeping the DB connection open for as long as possible allows for server side statement caching that is bound to a connection. In the case of PostgreSQL, in order to use a server-side prepared statement, a query needs to be executed (by default) at least five times.

See the PostgreSQL docs on prepared statements for more information.

See the Keycloak deployment configuration options KC_DB_*

Threads

Keycloak requests are handled on the Quarkus executor pool, as well as all liveness and readiness probes. There is an open issue to investigate on how to make the probes non-blocking, thereby not being queue in the Quarkus executor pool, see keycloak#22109.
The Quarkus executor thread pool is configured in quarkus.thread-pool.max-threads and has a maximum size of at least 200 threads. Depending on the available CPU cores it can grow even larger. Threads are created as needed, and will end when no longer needed, so the system will scale up and down as needed.

When the load and the number of threads increases, the bottleneck will usually be the database connections. Once a request can’t acquire a database connection, it will fail with a message in the log like Unable to acquire JDBC Connection or similar as described in the known error messages. The caller will receive a response with a 5xx HTTP status code indicating a server side error.

With the number of threads in the executor pool being an order of magnitude larger than the number of database connections and with requests failing when no database connection is available within the quarkus.datasource.jdbc.acquisition-timeout (5 seconds default), this is somewhat of a load-shedding behavior where it returns an error response instead of queueing requests for an indefinite amount of time.

The combined number of executor threads in all Keycloak nodes in the cluster shouldn’t exceed the number of threads available in JGroups thread pool to avoid the error described in 'org.jgroups.util.ThreadPool: thread pool is full'.

Due to the bug ISPN-14780 there was currently no known way to configure the size different from the default JGroups 100 max threads before Keycloak 23.0.0 and 22.0.2, so assuming a Keycloak cluster with 4 Pods, each Pod shouldn’t have more than 25 worker threads. Use the Quarkus configuration options quarkus.thread-pool.max-threads to configure the maximum number of worker threads.

Configure quarkus.thread-pool.queue-size to specify a maximum queue length to allow for effective load shedding once this queue size is exceeded: Keycloak will return HTTP Status code 503 (server unavailable) in this situation (available in Keycloak 23, see keycloak#21962 for details). Assuming a Keycloak pod processes around 200 requests per second, a queue of 1000 would lead to maximum waiting times of around 5 seconds.
All health and liveness probes are handled in the Quarkus executor worker pool. When requests queue up in Keycloak, also Readiness and Liveness probes are delayed, which might trigger failure detection in Kubernetes and will lead to Pod restarts in overload or load-shedding situations. This is tracked in keycloak#22109. For the time being, consider a longer timeout for the probes to survive spikes in the delay, or disabling the liveness probe to avoid Pod restarts.
In order for Java to create threads, when running on Linux it needs to have file handles available. Therefore, the number of open files (as retrieved as ulimit -n on Linux) need to provide head-space for Keycloak to increase the number of threads needed. Each thread will also consume memory, and the container memory limits need to be set to a value that allows for this or the pod will be killed by Kubernetes.