* Cleanup excessive comments. * Remove leftovers hanging around from previous refactor * Improve test coverage
14 KiB
Redis Cache for Distributed Deployments
Redis cache support for multi-replica Traefik deployments with shared state.
Table of Contents
- Overview
- Why Use Redis Cache?
- Configuration
- Cache Modes
- Deployment Examples
- Performance Tuning
- Monitoring
- Troubleshooting
- Migration Guide
Overview
The Redis cache feature provides distributed caching for the Traefik OIDC plugin, enabling seamless operation across multiple Traefik instances.
Key Features
- Distributed JTI Replay Detection: Prevents token replay attacks across all instances
- Shared Session Management: Consistent user sessions across replicas
- Circuit Breaker: Automatic fallback to memory cache during Redis outages
- Health Checking: Continuous monitoring of Redis connectivity
- Flexible Cache Modes: Memory, Redis, or hybrid caching strategies
- Pure-Go Implementation: Yaegi-compatible, works with dynamic plugin loading
Architecture
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Traefik #1 │ │ Traefik #2 │ │ Traefik #3 │
│ (Plugin) │ │ (Plugin) │ │ (Plugin) │
└──────┬───────┘ └──────┬───────┘ └──────┬───────┘
│ │ │
└────────────────────┼────────────────────┘
│
┌──────▼──────┐
│ Redis │
│ (Shared │
│ Cache) │
└─────────────┘
Why Use Redis Cache?
The Problem
When running multiple Traefik instances without shared cache:
-
False Positive Replay Detection
- User authenticates → Token stored in Instance A's JTI cache
- Next request → Load balancer routes to Instance B
- Instance B doesn't have the JTI → Falsely detects replay attack
-
Session Inconsistency
- User session created on Instance A
- Subsequent request routed to Instance B
- Instance B has no knowledge of the session
-
Token Metadata Fragmentation
- Token refresh happens on Instance A
- Other instances continue using old tokens
The Solution
Redis provides centralized cache that all instances share, ensuring:
- Consistent Authentication: All instances share authentication state
- True Replay Detection: JTI cache shared across all instances
- Seamless Scaling: Add/remove instances without affecting sessions
- High Availability: Circuit breaker with automatic fallback
Configuration
Basic Configuration
redis:
enabled: true
address: "redis:6379"
password: "your-password" # Optional
db: 0
keyPrefix: "traefikoidc:"
cacheMode: "hybrid"
All Configuration Options
| Parameter | Type | Default | Description |
|---|---|---|---|
enabled |
bool | false |
Enable Redis caching |
address |
string | - | Redis server address (host:port) |
password |
string | - | Redis password (optional) |
db |
int | 0 |
Redis database number (0-15) |
keyPrefix |
string | traefikoidc: |
Prefix for all Redis keys |
cacheMode |
string | redis |
Cache mode: memory, redis, hybrid |
poolSize |
int | 10 |
Connection pool size |
connectTimeout |
int | 5 |
Connection timeout (seconds) |
readTimeout |
int | 3 |
Read timeout (seconds) |
writeTimeout |
int | 3 |
Write timeout (seconds) |
enableTLS |
bool | false |
Enable TLS for connections |
tlsSkipVerify |
bool | false |
Skip TLS certificate verification |
enableCircuitBreaker |
bool | true |
Enable circuit breaker |
circuitBreakerThreshold |
int | 5 |
Failures before circuit opens |
circuitBreakerTimeout |
int | 60 |
Circuit reset timeout (seconds) |
enableHealthCheck |
bool | true |
Enable periodic health checks |
healthCheckInterval |
int | 30 |
Health check interval (seconds) |
hybridL1Size |
int | 500 |
Max items in L1 cache (hybrid mode) |
hybridL1MemoryMB |
int64 | 10 |
Max memory for L1 cache in MB |
Environment Variables (Fallback)
If not configured through Traefik, these environment variables are used:
REDIS_ENABLED=true
REDIS_ADDRESS=redis:6379
REDIS_PASSWORD=your-password
REDIS_DB=0
REDIS_KEY_PREFIX=traefikoidc:
REDIS_CACHE_MODE=hybrid
REDIS_POOL_SIZE=10
REDIS_CONNECT_TIMEOUT=5
REDIS_READ_TIMEOUT=3
REDIS_WRITE_TIMEOUT=3
REDIS_ENABLE_TLS=false
REDIS_TLS_SKIP_VERIFY=false
Cache Modes
Memory Mode (Default without Redis)
redis:
cacheMode: "memory"
- Uses only in-memory cache
- Suitable for single-instance deployments
- No Redis dependency
- Fastest performance
Redis Mode
redis:
enabled: true
address: "redis:6379"
cacheMode: "redis"
- All operations go directly to Redis
- Ensures consistency across replicas
- Slightly higher latency
Hybrid Mode (Recommended)
redis:
enabled: true
address: "redis:6379"
cacheMode: "hybrid"
Two-tier caching strategy:
┌─────────────────────────────────────────┐
│ Client Request │
└────────────────┬────────────────────────┘
▼
┌────────────────┐
│ Local Cache │ ← L1 Cache (Fast)
│ (Memory) │
└────────┬───────┘
│ Miss
▼
┌────────────────┐
│ Remote Cache │ ← L2 Cache (Shared)
│ (Redis) │
└────────────────┘
Read Path:
- Check local memory cache (L1)
- On miss, check Redis (L2)
- On hit in Redis, populate L1
- Return value
Write Path:
- Write to Redis (L2) for durability
- Write to local cache (L1) for speed
Performance Comparison
| Operation | Memory Mode | Redis Mode | Hybrid Mode |
|---|---|---|---|
| Read (p50) | 0.1ms | 2ms | 0.2ms |
| Read (p99) | 0.5ms | 10ms | 5ms |
| Write (p50) | 0.2ms | 3ms | 3ms |
| Throughput | 100k/s | 20k/s | 80k/s |
Deployment Examples
Docker Compose
version: '3.8'
services:
redis:
image: redis:7-alpine
command: redis-server --requirepass ${REDIS_PASSWORD}
volumes:
- redis-data:/data
healthcheck:
test: ["CMD", "redis-cli", "--raw", "incr", "ping"]
interval: 30s
timeout: 3s
retries: 3
traefik:
image: traefik:v3.2
deploy:
replicas: 3
labels:
- "traefik.http.middlewares.oidc.plugin.traefikoidc.redis.enabled=true"
- "traefik.http.middlewares.oidc.plugin.traefikoidc.redis.address=redis:6379"
- "traefik.http.middlewares.oidc.plugin.traefikoidc.redis.password=${REDIS_PASSWORD}"
- "traefik.http.middlewares.oidc.plugin.traefikoidc.redis.cacheMode=hybrid"
depends_on:
redis:
condition: service_healthy
volumes:
redis-data:
Kubernetes
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
name: oidc-with-redis
spec:
plugin:
traefikoidc:
providerURL: https://accounts.google.com
clientID: your-client-id
clientSecret: your-client-secret
sessionEncryptionKey: your-encryption-key
callbackURL: /oauth2/callback
redis:
enabled: true
address: "redis-service.redis-namespace:6379"
password: "urn:k8s:secret:redis-secret:password"
db: 0
keyPrefix: "traefikoidc:"
cacheMode: "hybrid"
poolSize: 20
enableCircuitBreaker: true
circuitBreakerThreshold: 5
AWS ElastiCache
redis:
enabled: true
address: "your-cache.abc123.cache.amazonaws.com:6379"
cacheMode: "hybrid"
enableTLS: true
password: "your-elasticache-auth-token"
Performance Tuning
Connection Pool Sizing
redis:
poolSize: 20 # Formula: 2 * CPU cores * replicas
# For 4 cores, 3 replicas: poolSize = 24
TTL Strategy
The plugin automatically sets TTLs based on token lifetimes:
- JTI Cache: Matches token lifetime (typically 1 hour)
- Session: Matches
sessionMaxAgeconfiguration - Token Metadata: 5 minutes (short-lived)
Redis Server Configuration
# Recommended Redis settings for cache
maxmemory 512mb
maxmemory-policy allkeys-lru # Evict least recently used
# For cache data, disable persistence for better performance
save ""
appendonly no
Hybrid Mode Tuning
redis:
cacheMode: "hybrid"
hybridL1Size: 500 # Max items in local cache
hybridL1MemoryMB: 10 # Max memory for local cache
Monitoring
Key Metrics
- Cache hit rate (target: >90% for hybrid mode)
- Redis latency (target: <10ms p99)
- Circuit breaker state
- **Connection pool utilization
Redis Commands for Monitoring
# Monitor commands in real-time
redis-cli MONITOR
# Check slow queries
redis-cli SLOWLOG GET 10
# Memory usage
redis-cli INFO memory
# Key statistics
redis-cli DBSIZE
# List keys with prefix
redis-cli --scan --pattern "traefikoidc:*"
# Check key TTL
redis-cli TTL "traefikoidc:session:abc123"
Health Check Endpoint
The plugin provides health information including:
{
"status": "healthy",
"cache": {
"mode": "hybrid",
"redis": {
"connected": true,
"latency": "2ms"
},
"circuit_breaker": {
"state": "closed",
"failures": 0
}
}
}
Troubleshooting
Connection Refused
Symptoms: dial tcp: connection refused
Solutions:
- Verify Redis is running:
redis-cli ping - Check network connectivity:
telnet redis-host 6379 - Verify address configuration
Authentication Failure
Symptoms: NOAUTH Authentication required
Solutions:
- Set Redis password in configuration
- Verify password is correct
Circuit Breaker Open
Symptoms: Circuit breaker is open, falling back to memory
Solutions:
- Check Redis health:
redis-cli INFO server - Review network latency:
redis-cli --latency - Adjust circuit breaker thresholds if needed
High Memory Usage
Symptoms: Redis memory constantly growing, OOM errors
Solutions:
- Configure eviction policy:
CONFIG SET maxmemory 512mb CONFIG SET maxmemory-policy allkeys-lru - Review key count:
redis-cli DBSIZE - Check for large keys:
redis-cli --bigkeys
Inconsistent Cache State
Symptoms: Different responses from different replicas
Solutions:
- Verify all instances use the same Redis address
- Check cache mode consistency across instances
- Verify time synchronization on all hosts
Migration Guide
From Memory-Only to Redis
Phase 1: Preparation
- Deploy Redis infrastructure
- Test Redis connectivity
- Configure monitoring
Phase 2: Gradual Rollout
- Enable Redis on one instance:
redis: enabled: true address: "redis:6379" cacheMode: "hybrid" - Monitor for errors
- Gradually enable on more instances
Phase 3: Full Migration
- Enable Redis on all instances
- Remove
disableReplayDetection: trueif set - Monitor for issues
Rollback Plan
If issues occur:
- Set
redis.enabled: false - Plugin falls back to memory cache automatically
- Investigate and resolve issues
Migration Checklist
- Redis deployed and accessible
- Redis password configured
- Network connectivity verified
- Monitoring configured
- Backup plan prepared
- Test environment validated
- Gradual rollout planned
Best Practices
Security
- Always use Redis password authentication
- Enable TLS for production deployments
- Use network segmentation (private subnets)
- Rotate Redis passwords regularly
High Availability
- Use Redis Sentinel or Cluster for HA
- Configure appropriate circuit breaker thresholds
- Implement proper health checks
- Use connection pooling
Performance
- Use hybrid cache mode for best performance
- Monitor cache hit rates
- Size Redis memory appropriately
- Disable persistence for cache-only usage
Operations
- Implement comprehensive monitoring
- Set up alerting for circuit breaker state
- Document Redis configuration
- Test failover scenarios
FAQ
Is Redis required?
No, Redis is optional. The plugin works with in-memory cache for single-instance deployments.
What happens if Redis goes down?
The circuit breaker opens after threshold failures, and the plugin falls back to in-memory cache. It periodically attempts to reconnect.
Which cache mode should I use?
For production multi-replica deployments, use hybrid mode for best performance and consistency.
How much memory does Redis need?
Depends on active sessions and token sizes:
- Small (1-1000 users): 128MB
- Medium (1000-10000 users): 256-512MB
- Large (10000+ users): 1GB+
Can I use managed Redis services?
Yes, the plugin works with AWS ElastiCache, Azure Cache for Redis, Google Cloud Memorystore, and Redis Enterprise Cloud.
Is data encrypted in Redis?
Session data is encrypted before storing using sessionEncryptionKey. Additionally, you can enable TLS for Redis connections.