Client Failover

This guide describes how to achieve automatic client-failover when a Keycloak deployment when a given site fails.

Route 53 for Client Failover

To provide client failover, we can leverage AWS Route 53's DNS failover capabilities to automatically re-route traffic when the primary site is down. A health check on AWS checks every 30 seconds if a site is responding, and the DNS as seen by the client will update accordingly.

This script generates a subdomain name, with three further host entries for our root domain keycloak-benchmark.com.

primary.<generated-subdomain>.keycloak-benchmark.com

Subdomain for Keycloak site 1

backup.<generated-subdomain>.keycloak-benchmark.com

Subdomain for Keycloak site 2

client.<generated-subdomain>.keycloak-benchmark.com

Subdomain used by Keycloak clients, that will automatically fail over from 1 to 2 in the event of a failure.

Those DNS entries are registered with the OpenShift clusters so that they respond to requests to that host names. After the setup, the Keycloak deployment is updated to use the new hostnames.

See below for the newly created elements (green) and the updated elements (yellow).

route 53 configuration.dio

Setup new Route 53 failover

Prerequisites

A hosted zone already exists for keycloak-benchmark.com

Procedure

  1. Create two ROSA clusters

  2. Create subdomain records and health Checks

    PRIMARY_CLUSTER=<name-rosa-cluster> \
    BACKUP_CLUSTER=<name-of-rosa_cluster> \
    ./provision/aws/route53/route53_create.sh

    Note down the domain and URLs generated by the script for the following steps. The generated part of the subdomain name allows for multiple Keycloak instances in the different clusters.

    Domain: <generated-subdomain>.keycloak-benchmark.com
    Client Site URL: client.<generated-subdomain>.keycloak-benchmark.com
    Primary Site URL: primary.<generated-subdomain>.keycloak-benchmark.com
    Backup Site URL: backup.<generated-subdomain>.keycloak-benchmark.com
  3. Deploy Keycloak as normal, but with the following environment variables set.

    1. Primary cluster:

      KC_HOSTNAME_OVERRIDE=client.<generated-subdomain>.keycloak-benchmark.com # Hostname used by clients
      KC_HEALTH_HOSTNAME=primary.<generated-subdomain>.keycloak-benchmark.com # Hostname used by AWS health checks
    2. Backup cluster:

      KC_HOSTNAME_OVERRIDE=client.<generated-subdomain>.keycloak-benchmark.com # Hostname used by clients
      KC_HEALTH_HOSTNAME=backup.<generated-subdomain>.keycloak-benchmark.com # Hostname used by AWS health checks

Testing Failover

To test failover from primary to the backup site, do the following:

  1. Verify that client.<generated-subdomain>.keycloak-benchmark.com connects to primary.

    ./provision/aws/route53/route53_test_primary_used.sh <generated-subdomain>.keycloak-benchmark.com; echo $?

    The script returns 0 if the client. subdomain is pointing to the same IP as primary. subdomain.

    This script will fail if the PRIMARY_CLUSTER and BACKUP_CLUSTER are set to the same ROSA cluster.
  2. Login to the primary ROSA cluster and delete the aws-health-route Route from the keycloak namespace.

  3. Wait for about 30 seconds for the Health Checks to determine that primary.<generated-subdomain>.keycloak-benchmark.com is no longer healthy. This can be confirmed by inspecting the health check in the AWS console.

  4. Execute the script from the first step and an exit code of 1 should be returned.

If the aws-health-route is recreated on the PRIMARY_CLUSTER, the health check will eventually pass and the client. record will revert to routing requests to the primary cluster.

Remove Route 53 Failover

To delete the generated subdomain including the health checks, run the following command. If no subdomain is specified, all health checks are removed.

SUBDOMAIN=<generated-subdomain> \
./provision/aws/route53/route53_delete.sh