v0.0.1 · system status
Open APIOpenAPI 3.1Zero token storage

JWKS rotation

Operate JWKS rotations without breaking in-flight tokens.

JWKS rotations break production at 3am because nobody noticed the new keys arrived. The mistake is treating rotation as a one-step operation; in practice it is a four-state machine, and only one of those states is safe to advance from. This guide describes how to operate the machine using JWTShield’s /v1/validate/jwks-rotation endpoint as your check.

The four states

StateDescriptionAction
no_changeprevious == current.Nothing to do.
safe_overlapAll previous keys retained; at least one new key added.New tokens can be issued under the new key. Wait for max_token_ttl_seconds before dropping the old key.
overlapSome previous keys retained, some dropped.Risky unless you can prove no in-flight tokens reference the dropped keys.
disjointNo keys in common.Rollback or accept a verification outage during cutover.

The desired flow is no_change → safe_overlap → no_change — you add keys, wait, then drop the old ones, returning to a single-key state.

The cron pattern

Run a job every N minutes that:

  1. Fetches the provider’s current JWKS (e.g. GET https://acme.auth0.com/.well-known/jwks.json).
  2. Compares it against the snapshot from the previous run.
  3. Calls POST /v1/validate/jwks-rotation with both.
  4. Acts on rotation_state.
#!/usr/bin/env bash
set -euo pipefail

PROVIDER_JWKS_URL="${PROVIDER_JWKS_URL:?}"
JWTSHIELD_URL="${JWTSHIELD_URL:?}"
JWTSHIELD_KEY="${JWTSHIELD_KEY:?}"
SNAPSHOT_PATH="${SNAPSHOT_PATH:-/var/lib/jwks/snapshot.json}"

current=$(curl -sSf "$PROVIDER_JWKS_URL")
previous=$(cat "$SNAPSHOT_PATH" 2>/dev/null || echo "$current")

response=$(curl -sSf -X POST "$JWTSHIELD_URL/v1/validate/jwks-rotation" \
  -H "Authorization: Bearer $JWTSHIELD_KEY" \
  -H "Content-Type: application/json" \
  -d "$(jq -n \
    --argjson previous "$previous" \
    --argjson current  "$current" \
    --argjson policy   '{"min_overlap_count": 1, "max_token_ttl_seconds": 86400}' \
    '{previous_jwks: $previous, current_jwks: $current, overlap_policy: $policy}'
  )")

state=$(jq -r '.rotation_state' <<<"$response")
case "$state" in
  no_change|safe_overlap) ;; # benign
  overlap)  echo "WARN: in-progress rotation"; jq . <<<"$response" ;;
  disjoint) echo "ALERT: no key overlap"; jq . <<<"$response" ; exit 1 ;;
esac

echo "$current" > "$SNAPSHOT_PATH"

Wire disjoint to a page; wire overlap to a Slack alert.

Sample-token verification

The endpoint accepts up to three sample tokens to confirm key behaviour empirically:

Use the audit suite (/v1/test/auth-regression) to run these sample checks alongside your standard validation fixtures.

Operational thresholds

overlap_policy.max_token_ttl_seconds is your safety margin. Set it to the longest token lifetime your services issue — typically the access-token TTL, not the refresh-token TTL. JWTShield uses this to advise when it is safe to drop the previous key after a safe_overlap.

overlap_policy.min_overlap_count defaults to 1; set it higher if your provider runs multiple kid’d keys in parallel and you want to require all of them remain during the rotation.

Why not just trust the provider

Cloud IdPs rotate without warning. The 2018 and 2021 outages where Auth0/Cognito/Okta-backed services started rejecting tokens en masse all came down to: provider rotated keys faster than client caches expired, in-flight tokens started failing, and there was no monitoring in place. Run rotation checks on a schedule independent of your verification path.

Reference