mirror of
https://github.com/lukaszraczylo/traefikoidc.git
synced 2026-06-05 22:44:17 +00:00
e64fc7f730
* Add redis support for distributed caching * Move towards the self-provided Redis connection pool and RESP protocol implementation. Official redis client library won't work with yaegi. * fixup! Move towards the self-provided Redis connection pool and RESP protocol implementation. Official redis client library won't work with yaegi. * fixup! fixup! Move towards the self-provided Redis connection pool and RESP protocol implementation. Official redis client library won't work with yaegi. * fixup! fixup! fixup! Move towards the self-provided Redis connection pool and RESP protocol implementation. Official redis client library won't work with yaegi. * fixup! fixup! fixup! fixup! Move towards the self-provided Redis connection pool and RESP protocol implementation. Official redis client library won't work with yaegi. * fixup! fixup! fixup! fixup! fixup! Move towards the self-provided Redis connection pool and RESP protocol implementation. Official redis client library won't work with yaegi. * ... and another all nighter. * fixup! ... and another all nighter. * fixup! fixup! ... and another all nighter. * fixup! fixup! fixup! ... and another all nighter. * Resolve issue #85 by adding ability to set custom claims in JWT tokens * Remove redundant validation in auth middleware ( issue #89 ) * Add ability to set cookie prefix for session cookies ( #87 ) * fixup! Add ability to set cookie prefix for session cookies ( #87 ) * Add ability to set cookie max age - issue #91 * Potential fix for code scanning alert no. 10: Size computation for allocation may overflow Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> * fixup! Merge main into 0.8.0-redis: resolve conflicts --------- Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
1125 lines
28 KiB
Markdown
1125 lines
28 KiB
Markdown
# Redis Cache for Traefik OIDC Plugin
|
|
|
|
## Table of Contents
|
|
|
|
- [Overview](#overview)
|
|
- [Why Use Redis Cache?](#why-use-redis-cache)
|
|
- [Architecture](#architecture)
|
|
- [Configuration Reference](#configuration-reference)
|
|
- [Deployment Scenarios](#deployment-scenarios)
|
|
- [Performance Tuning](#performance-tuning)
|
|
- [Monitoring and Observability](#monitoring-and-observability)
|
|
- [Troubleshooting](#troubleshooting)
|
|
- [Migration Guide](#migration-guide)
|
|
- [Best Practices](#best-practices)
|
|
- [FAQ](#faq)
|
|
|
|
## Overview
|
|
|
|
The Redis cache feature provides a distributed caching solution for the Traefik OIDC plugin, enabling seamless operation across multiple Traefik instances. It implements a pluggable backend architecture that supports memory-only, Redis-only, or hybrid caching strategies.
|
|
|
|
### Key Features
|
|
|
|
- **Distributed JTI Replay Detection**: Prevents token replay attacks across all instances
|
|
- **Shared Session Management**: Consistent user sessions across replicas
|
|
- **Circuit Breaker**: Automatic fallback to memory cache during Redis outages
|
|
- **Health Checking**: Continuous monitoring of Redis connectivity
|
|
- **Flexible Cache Modes**: Choose between memory, Redis, or hybrid caching
|
|
- **Zero-Downtime Migration**: Seamlessly migrate from memory-only to Redis-backed cache
|
|
- **Yaegi Compatible**: Pure-Go implementation works with both dynamic loading and pre-compiled deployments
|
|
|
|
### ✨ Pure-Go Implementation
|
|
|
|
This plugin implements Redis support using a **custom pure-Go RESP protocol client** that is fully compatible with Traefik's Yaegi interpreter. Unlike other Redis clients that rely on the `unsafe` package, our implementation:
|
|
|
|
- Works seamlessly with Yaegi's dynamic plugin loading
|
|
- Provides full Redis functionality (GET, SET, DEL, TTL, etc.)
|
|
- Includes connection pooling for performance
|
|
- Supports both SETEX (seconds) and PSETEX (milliseconds) for precise TTL control
|
|
- No external dependencies beyond the standard library
|
|
|
|
This means you get **full Redis caching support whether you're using**:
|
|
- ✅ Traefik's dynamic plugin loading (Yaegi interpreter)
|
|
- ✅ Pre-compiled Traefik builds with the plugin included
|
|
|
|
## Why Use Redis Cache?
|
|
|
|
### The Problem
|
|
|
|
When running multiple Traefik instances behind a load balancer, each instance maintains its own isolated in-memory cache. This isolation causes several issues:
|
|
|
|
1. **False Positive Replay Detection**
|
|
- User authenticates → Token stored in Instance A's JTI cache
|
|
- Next request → Load balancer routes to Instance B
|
|
- Instance B doesn't have the JTI → Falsely detects replay attack
|
|
- Result: Authentication failures and user frustration
|
|
|
|
2. **Session Inconsistency**
|
|
- User session created on Instance A
|
|
- Subsequent request routed to Instance B
|
|
- Instance B has no knowledge of the session
|
|
- Result: User forced to re-authenticate
|
|
|
|
3. **Token Metadata Fragmentation**
|
|
- Token refresh happens on Instance A
|
|
- New tokens stored only in Instance A's cache
|
|
- Other instances continue using old tokens
|
|
- Result: Inconsistent authentication state
|
|
|
|
### The Solution
|
|
|
|
Redis provides a centralized cache that all Traefik instances can share:
|
|
|
|
```
|
|
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
|
|
│ Traefik #1 │ │ Traefik #2 │ │ Traefik #3 │
|
|
│ (Plugin) │ │ (Plugin) │ │ (Plugin) │
|
|
└──────┬───────┘ └──────┬───────┘ └──────┬───────┘
|
|
│ │ │
|
|
└────────────────────┼────────────────────┘
|
|
│
|
|
┌──────▼──────┐
|
|
│ Redis │
|
|
│ (Shared │
|
|
│ Cache) │
|
|
└─────────────┘
|
|
```
|
|
|
|
### Benefits
|
|
|
|
- **Consistent Authentication**: All instances share the same authentication state
|
|
- **True Replay Detection**: JTI cache shared across all instances
|
|
- **Seamless Scaling**: Add/remove instances without affecting user sessions
|
|
- **High Availability**: Built-in resilience with circuit breakers and fallback
|
|
- **Performance**: Hybrid mode provides local caching with Redis synchronization
|
|
|
|
## Architecture
|
|
|
|
### Cache Backend Interface
|
|
|
|
The plugin implements a pluggable cache backend architecture:
|
|
|
|
```go
|
|
type CacheBackend interface {
|
|
Get(ctx context.Context, key string) ([]byte, error)
|
|
Set(ctx context.Context, key string, value []byte, ttl time.Duration) error
|
|
Delete(ctx context.Context, key string) error
|
|
Exists(ctx context.Context, key string) (bool, error)
|
|
Clear(ctx context.Context) error
|
|
Health(ctx context.Context) error
|
|
}
|
|
```
|
|
|
|
### Cache Implementations
|
|
|
|
#### 1. Memory Backend (Default)
|
|
- **Use Case**: Single-instance deployments
|
|
- **Pros**: Fast, no external dependencies
|
|
- **Cons**: Not suitable for multi-replica deployments
|
|
|
|
#### 2. Redis Backend
|
|
- **Use Case**: Multi-replica deployments requiring shared state
|
|
- **Pros**: Distributed, persistent, scalable
|
|
- **Cons**: External dependency, network latency
|
|
|
|
#### 3. Hybrid Backend
|
|
- **Use Case**: High-performance multi-replica deployments
|
|
- **Pros**: Best of both worlds - speed + distribution
|
|
- **Cons**: More complex, requires tuning
|
|
|
|
### Hybrid Cache Architecture
|
|
|
|
The hybrid cache implements a two-tier caching strategy:
|
|
|
|
```
|
|
┌─────────────────────────────────────────┐
|
|
│ Client Request │
|
|
└────────────────┬────────────────────────┘
|
|
▼
|
|
┌────────────────┐
|
|
│ Local Cache │ ← L1 Cache (Fast)
|
|
│ (Memory) │
|
|
└────────┬───────┘
|
|
│ Miss
|
|
▼
|
|
┌────────────────┐
|
|
│ Remote Cache │ ← L2 Cache (Shared)
|
|
│ (Redis) │
|
|
└────────────────┘
|
|
```
|
|
|
|
**Read Path**:
|
|
1. Check local memory cache (L1)
|
|
2. On miss, check Redis (L2)
|
|
3. On hit in Redis, populate L1
|
|
4. Return value
|
|
|
|
**Write Path**:
|
|
1. Write to Redis (L2) for durability
|
|
2. Write to local cache (L1) for speed
|
|
3. Broadcast invalidation to other instances (future enhancement)
|
|
|
|
### Circuit Breaker Pattern
|
|
|
|
The Redis backend implements a circuit breaker to handle Redis failures gracefully:
|
|
|
|
```
|
|
States: CLOSED → OPEN → HALF-OPEN → CLOSED
|
|
|
|
CLOSED (Normal Operation):
|
|
- All requests go to Redis
|
|
- Track failures
|
|
- Open circuit after threshold
|
|
|
|
OPEN (Redis Down):
|
|
- Fail fast, don't attempt Redis
|
|
- Fall back to memory cache
|
|
- Wait for recovery timeout
|
|
|
|
HALF-OPEN (Testing Recovery):
|
|
- Allow limited requests to Redis
|
|
- If successful, close circuit
|
|
- If failures continue, re-open
|
|
```
|
|
|
|
## Configuration Reference
|
|
|
|
### Plugin Configuration
|
|
|
|
```yaml
|
|
apiVersion: traefik.io/v1alpha1
|
|
kind: Middleware
|
|
metadata:
|
|
name: oidc-with-redis
|
|
spec:
|
|
plugin:
|
|
traefikoidc:
|
|
# Standard OIDC configuration
|
|
providerURL: https://accounts.google.com
|
|
clientID: your-client-id
|
|
clientSecret: your-client-secret
|
|
sessionEncryptionKey: your-encryption-key
|
|
callbackURL: /oauth2/callback
|
|
|
|
# Redis cache configuration
|
|
redis:
|
|
enabled: true # Enable Redis cache
|
|
address: "redis.example.com:6379" # Redis server address
|
|
password: "your-redis-password" # Optional: Redis password
|
|
db: 0 # Redis database number (0-15)
|
|
keyPrefix: "traefikoidc" # Prefix for all keys
|
|
cacheMode: "hybrid" # Cache mode: memory|redis|hybrid
|
|
|
|
# Connection pool settings
|
|
maxRetries: 3 # Max retry attempts
|
|
poolSize: 10 # Connection pool size
|
|
minIdleConns: 5 # Minimum idle connections
|
|
maxConnAge: 3600 # Max connection age (seconds)
|
|
poolTimeout: 4 # Pool timeout (seconds)
|
|
idleTimeout: 900 # Idle timeout (seconds)
|
|
|
|
# Timeouts
|
|
dialTimeout: 5 # Connection timeout (seconds)
|
|
readTimeout: 3 # Read timeout (seconds)
|
|
writeTimeout: 3 # Write timeout (seconds)
|
|
|
|
# Circuit breaker settings
|
|
circuitBreakerThreshold: 5 # Failures before opening
|
|
circuitBreakerTimeout: 60 # Recovery timeout (seconds)
|
|
|
|
# TLS configuration (optional)
|
|
tls:
|
|
enabled: true
|
|
certFile: "/path/to/cert.pem"
|
|
keyFile: "/path/to/key.pem"
|
|
caFile: "/path/to/ca.pem"
|
|
insecureSkipVerify: false
|
|
```
|
|
|
|
### Environment Variables
|
|
|
|
All Redis settings can be configured via environment variables:
|
|
|
|
```bash
|
|
# Basic Configuration
|
|
export REDIS_ENABLED=true
|
|
export REDIS_ADDRESS=redis.example.com:6379
|
|
export REDIS_PASSWORD=your-password
|
|
export REDIS_DB=0
|
|
export REDIS_KEY_PREFIX=traefikoidc
|
|
export REDIS_CACHE_MODE=hybrid
|
|
|
|
# Connection Pool
|
|
export REDIS_MAX_RETRIES=3
|
|
export REDIS_POOL_SIZE=10
|
|
export REDIS_MIN_IDLE_CONNS=5
|
|
export REDIS_MAX_CONN_AGE=3600
|
|
export REDIS_POOL_TIMEOUT=4
|
|
export REDIS_IDLE_TIMEOUT=900
|
|
|
|
# Timeouts
|
|
export REDIS_DIAL_TIMEOUT=5
|
|
export REDIS_READ_TIMEOUT=3
|
|
export REDIS_WRITE_TIMEOUT=3
|
|
|
|
# Circuit Breaker
|
|
export REDIS_CIRCUIT_BREAKER_THRESHOLD=5
|
|
export REDIS_CIRCUIT_BREAKER_TIMEOUT=60
|
|
|
|
# TLS
|
|
export REDIS_TLS_ENABLED=true
|
|
export REDIS_TLS_CERT_FILE=/path/to/cert.pem
|
|
export REDIS_TLS_KEY_FILE=/path/to/key.pem
|
|
export REDIS_TLS_CA_FILE=/path/to/ca.pem
|
|
export REDIS_TLS_INSECURE_SKIP_VERIFY=false
|
|
```
|
|
|
|
### Cache Modes Explained
|
|
|
|
#### Memory Mode (Default)
|
|
```yaml
|
|
redis:
|
|
cacheMode: "memory" # or omit redis config entirely
|
|
```
|
|
- Uses only in-memory cache
|
|
- Suitable for single-instance deployments
|
|
- No Redis dependency
|
|
|
|
#### Redis Mode
|
|
```yaml
|
|
redis:
|
|
enabled: true
|
|
address: "redis:6379"
|
|
cacheMode: "redis"
|
|
```
|
|
- All cache operations go directly to Redis
|
|
- No local caching
|
|
- Ensures consistency but higher latency
|
|
|
|
#### Hybrid Mode (Recommended for Production)
|
|
```yaml
|
|
redis:
|
|
enabled: true
|
|
address: "redis:6379"
|
|
cacheMode: "hybrid"
|
|
```
|
|
- Local memory cache for fast reads
|
|
- Redis for shared state and persistence
|
|
- Best performance with consistency
|
|
|
|
## Deployment Scenarios
|
|
|
|
### Single Instance Deployment
|
|
|
|
For single Traefik instance deployments, Redis is optional:
|
|
|
|
```yaml
|
|
# No Redis configuration needed
|
|
# Plugin uses in-memory cache by default
|
|
spec:
|
|
plugin:
|
|
traefikoidc:
|
|
providerURL: https://accounts.google.com
|
|
# ... other config
|
|
# Redis not configured - uses memory cache
|
|
```
|
|
|
|
### Multi-Replica with Docker Compose
|
|
|
|
```yaml
|
|
version: '3.8'
|
|
|
|
services:
|
|
redis:
|
|
image: redis:7-alpine
|
|
command: >
|
|
redis-server
|
|
--requirepass ${REDIS_PASSWORD}
|
|
--maxmemory 256mb
|
|
--maxmemory-policy allkeys-lru
|
|
volumes:
|
|
- redis-data:/data
|
|
healthcheck:
|
|
test: ["CMD", "redis-cli", "--raw", "incr", "ping"]
|
|
interval: 30s
|
|
timeout: 3s
|
|
retries: 3
|
|
networks:
|
|
- traefik-net
|
|
|
|
traefik:
|
|
image: traefik:v3.2
|
|
deploy:
|
|
replicas: 3
|
|
update_config:
|
|
parallelism: 1
|
|
delay: 10s
|
|
restart_policy:
|
|
condition: on-failure
|
|
environment:
|
|
- REDIS_ENABLED=true
|
|
- REDIS_ADDRESS=redis:6379
|
|
- REDIS_PASSWORD=${REDIS_PASSWORD}
|
|
- REDIS_CACHE_MODE=hybrid
|
|
- REDIS_KEY_PREFIX=traefikoidc
|
|
volumes:
|
|
- ./traefik.yml:/etc/traefik/traefik.yml:ro
|
|
- ./dynamic.yml:/etc/traefik/dynamic.yml:ro
|
|
networks:
|
|
- traefik-net
|
|
depends_on:
|
|
redis:
|
|
condition: service_healthy
|
|
|
|
volumes:
|
|
redis-data:
|
|
|
|
networks:
|
|
traefik-net:
|
|
driver: overlay
|
|
attachable: true
|
|
```
|
|
|
|
### Kubernetes with Redis Operator
|
|
|
|
```yaml
|
|
# Install Redis operator
|
|
kubectl apply -f https://raw.githubusercontent.com/spotahome/redis-operator/master/manifests/databases.spotahome.com_redis_crd.yaml
|
|
kubectl apply -f https://raw.githubusercontent.com/spotahome/redis-operator/master/manifests/databases.spotahome.com_redisfailovers_crd.yaml
|
|
|
|
---
|
|
# Redis Failover for HA
|
|
apiVersion: databases.spotahome.com/v1
|
|
kind: RedisFailover
|
|
metadata:
|
|
name: traefikoidc-redis
|
|
namespace: traefik
|
|
spec:
|
|
sentinel:
|
|
replicas: 3
|
|
resources:
|
|
requests:
|
|
memory: 100Mi
|
|
limits:
|
|
memory: 200Mi
|
|
redis:
|
|
replicas: 3
|
|
resources:
|
|
requests:
|
|
memory: 500Mi
|
|
limits:
|
|
memory: 1Gi
|
|
config:
|
|
maxmemory: 512mb
|
|
maxmemory-policy: allkeys-lru
|
|
|
|
---
|
|
# ConfigMap for Redis configuration
|
|
apiVersion: v1
|
|
kind: ConfigMap
|
|
metadata:
|
|
name: traefik-oidc-redis-config
|
|
namespace: traefik
|
|
data:
|
|
REDIS_ENABLED: "true"
|
|
REDIS_ADDRESS: "rfs-traefikoidc-redis:6379"
|
|
REDIS_CACHE_MODE: "hybrid"
|
|
REDIS_KEY_PREFIX: "traefikoidc"
|
|
REDIS_POOL_SIZE: "20"
|
|
REDIS_CIRCUIT_BREAKER_THRESHOLD: "5"
|
|
REDIS_CIRCUIT_BREAKER_TIMEOUT: "60"
|
|
|
|
---
|
|
# Secret for Redis password
|
|
apiVersion: v1
|
|
kind: Secret
|
|
metadata:
|
|
name: traefik-oidc-redis-secret
|
|
namespace: traefik
|
|
type: Opaque
|
|
data:
|
|
REDIS_PASSWORD: <base64-encoded-password>
|
|
|
|
---
|
|
# Traefik Deployment
|
|
apiVersion: apps/v1
|
|
kind: Deployment
|
|
metadata:
|
|
name: traefik
|
|
namespace: traefik
|
|
spec:
|
|
replicas: 3
|
|
selector:
|
|
matchLabels:
|
|
app: traefik
|
|
template:
|
|
metadata:
|
|
labels:
|
|
app: traefik
|
|
spec:
|
|
containers:
|
|
- name: traefik
|
|
image: traefik:v3.2
|
|
envFrom:
|
|
- configMapRef:
|
|
name: traefik-oidc-redis-config
|
|
- secretRef:
|
|
name: traefik-oidc-redis-secret
|
|
ports:
|
|
- containerPort: 80
|
|
- containerPort: 443
|
|
volumeMounts:
|
|
- name: config
|
|
mountPath: /etc/traefik
|
|
volumes:
|
|
- name: config
|
|
configMap:
|
|
name: traefik-config
|
|
|
|
---
|
|
# HorizontalPodAutoscaler
|
|
apiVersion: autoscaling/v2
|
|
kind: HorizontalPodAutoscaler
|
|
metadata:
|
|
name: traefik-hpa
|
|
namespace: traefik
|
|
spec:
|
|
scaleTargetRef:
|
|
apiVersion: apps/v1
|
|
kind: Deployment
|
|
name: traefik
|
|
minReplicas: 3
|
|
maxReplicas: 10
|
|
metrics:
|
|
- type: Resource
|
|
resource:
|
|
name: cpu
|
|
target:
|
|
type: Utilization
|
|
averageUtilization: 70
|
|
- type: Resource
|
|
resource:
|
|
name: memory
|
|
target:
|
|
type: Utilization
|
|
averageUtilization: 80
|
|
```
|
|
|
|
### AWS ECS with ElastiCache
|
|
|
|
```json
|
|
{
|
|
"family": "traefik-oidc",
|
|
"taskRoleArn": "arn:aws:iam::123456789012:role/ecsTaskRole",
|
|
"executionRoleArn": "arn:aws:iam::123456789012:role/ecsExecutionRole",
|
|
"networkMode": "awsvpc",
|
|
"containerDefinitions": [
|
|
{
|
|
"name": "traefik",
|
|
"image": "traefik:v3.2",
|
|
"essential": true,
|
|
"environment": [
|
|
{
|
|
"name": "REDIS_ENABLED",
|
|
"value": "true"
|
|
},
|
|
{
|
|
"name": "REDIS_ADDRESS",
|
|
"value": "traefikoidc-cache.abc123.ng.0001.use1.cache.amazonaws.com:6379"
|
|
},
|
|
{
|
|
"name": "REDIS_CACHE_MODE",
|
|
"value": "hybrid"
|
|
},
|
|
{
|
|
"name": "REDIS_KEY_PREFIX",
|
|
"value": "traefikoidc"
|
|
},
|
|
{
|
|
"name": "REDIS_TLS_ENABLED",
|
|
"value": "true"
|
|
}
|
|
],
|
|
"secrets": [
|
|
{
|
|
"name": "REDIS_PASSWORD",
|
|
"valueFrom": "arn:aws:secretsmanager:us-east-1:123456789012:secret:redis-password"
|
|
}
|
|
],
|
|
"portMappings": [
|
|
{
|
|
"containerPort": 80,
|
|
"protocol": "tcp"
|
|
}
|
|
],
|
|
"logConfiguration": {
|
|
"logDriver": "awslogs",
|
|
"options": {
|
|
"awslogs-group": "/ecs/traefik",
|
|
"awslogs-region": "us-east-1",
|
|
"awslogs-stream-prefix": "ecs"
|
|
}
|
|
}
|
|
}
|
|
],
|
|
"requiresCompatibilities": ["FARGATE"],
|
|
"cpu": "512",
|
|
"memory": "1024"
|
|
}
|
|
```
|
|
|
|
### Redis Cluster Configuration
|
|
|
|
For high-throughput environments, use Redis Cluster:
|
|
|
|
```yaml
|
|
# Redis Cluster configuration
|
|
redis:
|
|
enabled: true
|
|
# Provide one or more cluster nodes
|
|
address: "redis-cluster-1:6379,redis-cluster-2:6379,redis-cluster-3:6379"
|
|
cacheMode: "redis" # Use redis mode for cluster
|
|
clusterMode: true
|
|
|
|
# Cluster-specific settings
|
|
maxRedirects: 3 # Maximum cluster redirects
|
|
readOnly: false # Allow reads from replicas
|
|
routeByLatency: true # Route to fastest node
|
|
routeRandomly: false # Random routing
|
|
```
|
|
|
|
## Performance Tuning
|
|
|
|
### Key Design Patterns
|
|
|
|
#### 1. TTL Strategy
|
|
```yaml
|
|
# Recommended TTL values
|
|
JTI_CACHE_TTL: 3600 # 1 hour - matches token lifetime
|
|
SESSION_TTL: 86400 # 24 hours - user session duration
|
|
TOKEN_METADATA_TTL: 300 # 5 minutes - short-lived metadata
|
|
```
|
|
|
|
#### 2. Connection Pool Optimization
|
|
```yaml
|
|
redis:
|
|
poolSize: 10 # Base formula: 2 * CPU cores
|
|
minIdleConns: 5 # 50% of poolSize
|
|
maxConnAge: 3600 # Rotate connections hourly
|
|
idleTimeout: 900 # Close idle connections after 15 min
|
|
```
|
|
|
|
#### 3. Memory Management
|
|
```bash
|
|
# Redis memory configuration
|
|
maxmemory 512mb # Set appropriate limit
|
|
maxmemory-policy allkeys-lru # Evict least recently used
|
|
```
|
|
|
|
### Benchmarking Results
|
|
|
|
Performance comparison across cache modes:
|
|
|
|
| Operation | Memory Mode | Redis Mode | Hybrid Mode |
|
|
|-----------|------------|------------|-------------|
|
|
| Read (p50) | 0.1ms | 2ms | 0.2ms |
|
|
| Read (p99) | 0.5ms | 10ms | 5ms |
|
|
| Write (p50) | 0.2ms | 3ms | 3ms |
|
|
| Write (p99) | 1ms | 15ms | 15ms |
|
|
| Throughput | 100k/s | 20k/s | 80k/s |
|
|
|
|
### Optimization Tips
|
|
|
|
1. **Use Hybrid Mode for Production**
|
|
- Provides best balance of speed and consistency
|
|
- Local cache reduces Redis load by 70-80%
|
|
|
|
2. **Configure Connection Pooling**
|
|
```yaml
|
|
redis:
|
|
poolSize: 20 # For high traffic
|
|
minIdleConns: 10 # Maintain warm connections
|
|
```
|
|
|
|
3. **Enable Pipelining** (Future Enhancement)
|
|
- Batch multiple operations
|
|
- Reduces round-trip latency
|
|
|
|
4. **Monitor Redis Memory**
|
|
```bash
|
|
redis-cli INFO memory
|
|
# used_memory_human:250.34M
|
|
# used_memory_peak_human:512.00M
|
|
# maxmemory_policy:allkeys-lru
|
|
```
|
|
|
|
5. **Use Redis Persistence Wisely**
|
|
```bash
|
|
# For cache data, disable persistence for better performance
|
|
save ""
|
|
appendonly no
|
|
```
|
|
|
|
## Monitoring and Observability
|
|
|
|
### Key Metrics to Monitor
|
|
|
|
#### Application Metrics
|
|
- Cache hit rate (target: >90% for hybrid mode)
|
|
- Cache operation latency (p50, p95, p99)
|
|
- Circuit breaker state and transitions
|
|
- Redis connection pool utilization
|
|
|
|
#### Redis Metrics
|
|
```bash
|
|
# Monitor with redis-cli
|
|
redis-cli --stat
|
|
|
|
# Key metrics:
|
|
# - Connected clients
|
|
# - Ops/sec
|
|
# - Network I/O
|
|
# - Memory usage
|
|
# - Evicted keys
|
|
```
|
|
|
|
### Prometheus Metrics
|
|
|
|
Export metrics for Prometheus monitoring:
|
|
|
|
```yaml
|
|
# Grafana dashboard for visualization
|
|
apiVersion: v1
|
|
kind: ConfigMap
|
|
metadata:
|
|
name: traefik-oidc-dashboard
|
|
data:
|
|
dashboard.json: |
|
|
{
|
|
"panels": [
|
|
{
|
|
"title": "Cache Hit Rate",
|
|
"targets": [
|
|
{
|
|
"expr": "rate(traefikoidc_cache_hits_total[5m]) / rate(traefikoidc_cache_requests_total[5m])"
|
|
}
|
|
]
|
|
},
|
|
{
|
|
"title": "Redis Latency",
|
|
"targets": [
|
|
{
|
|
"expr": "histogram_quantile(0.99, traefikoidc_redis_operation_duration_seconds_bucket)"
|
|
}
|
|
]
|
|
},
|
|
{
|
|
"title": "Circuit Breaker State",
|
|
"targets": [
|
|
{
|
|
"expr": "traefikoidc_circuit_breaker_state"
|
|
}
|
|
]
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
### Logging
|
|
|
|
Enable debug logging for troubleshooting:
|
|
|
|
```yaml
|
|
# Plugin configuration
|
|
logLevel: debug
|
|
|
|
# Log entries to watch:
|
|
# - "Redis cache initialized"
|
|
# - "Circuit breaker opened"
|
|
# - "Falling back to memory cache"
|
|
# - "Redis connection restored"
|
|
```
|
|
|
|
### Health Checks
|
|
|
|
Implement health check endpoints:
|
|
|
|
```go
|
|
// Health check endpoint response
|
|
{
|
|
"status": "healthy",
|
|
"cache": {
|
|
"mode": "hybrid",
|
|
"redis": {
|
|
"connected": true,
|
|
"latency": "2ms",
|
|
"pool": {
|
|
"active": 5,
|
|
"idle": 5,
|
|
"total": 10
|
|
}
|
|
},
|
|
"memory": {
|
|
"entries": 1000,
|
|
"size": "50MB"
|
|
},
|
|
"circuit_breaker": {
|
|
"state": "closed",
|
|
"failures": 0
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### Common Issues and Solutions
|
|
|
|
#### Issue 1: "Redis connection refused"
|
|
|
|
**Symptoms:**
|
|
- Logs show "dial tcp: connection refused"
|
|
- Circuit breaker opens immediately
|
|
|
|
**Solutions:**
|
|
1. Verify Redis is running:
|
|
```bash
|
|
redis-cli ping
|
|
# Should return: PONG
|
|
```
|
|
|
|
2. Check network connectivity:
|
|
```bash
|
|
telnet redis-host 6379
|
|
```
|
|
|
|
3. Verify Redis address in configuration:
|
|
```yaml
|
|
redis:
|
|
address: "redis:6379" # Ensure correct host:port
|
|
```
|
|
|
|
#### Issue 2: "Authentication failure"
|
|
|
|
**Symptoms:**
|
|
- Logs show "NOAUTH Authentication required"
|
|
|
|
**Solutions:**
|
|
1. Set Redis password:
|
|
```bash
|
|
export REDIS_PASSWORD=your-password
|
|
```
|
|
|
|
2. Or in configuration:
|
|
```yaml
|
|
redis:
|
|
password: "your-password"
|
|
```
|
|
|
|
#### Issue 3: "Circuit breaker open"
|
|
|
|
**Symptoms:**
|
|
- Logs show "Circuit breaker is open"
|
|
- Falls back to memory cache
|
|
|
|
**Solutions:**
|
|
1. Check Redis health:
|
|
```bash
|
|
redis-cli INFO server
|
|
```
|
|
|
|
2. Review circuit breaker settings:
|
|
```yaml
|
|
redis:
|
|
circuitBreakerThreshold: 10 # Increase threshold
|
|
circuitBreakerTimeout: 30 # Reduce timeout
|
|
```
|
|
|
|
3. Monitor Redis performance:
|
|
```bash
|
|
redis-cli --latency
|
|
```
|
|
|
|
#### Issue 4: "High memory usage"
|
|
|
|
**Symptoms:**
|
|
- Redis memory constantly growing
|
|
- OOM errors
|
|
|
|
**Solutions:**
|
|
1. Configure Redis eviction:
|
|
```bash
|
|
CONFIG SET maxmemory 512mb
|
|
CONFIG SET maxmemory-policy allkeys-lru
|
|
```
|
|
|
|
2. Review key expiration:
|
|
```yaml
|
|
# Ensure TTLs are set appropriately
|
|
SESSION_TTL: 86400 # Not too long
|
|
```
|
|
|
|
3. Monitor key count:
|
|
```bash
|
|
redis-cli DBSIZE
|
|
redis-cli --bigkeys
|
|
```
|
|
|
|
#### Issue 5: "Inconsistent cache state"
|
|
|
|
**Symptoms:**
|
|
- Different responses from different replicas
|
|
- Stale data being served
|
|
|
|
**Solutions:**
|
|
1. Ensure all instances use same Redis:
|
|
```yaml
|
|
redis:
|
|
address: "shared-redis:6379" # Same for all instances
|
|
```
|
|
|
|
2. Verify cache mode consistency:
|
|
```bash
|
|
# All instances should use same mode
|
|
export REDIS_CACHE_MODE=hybrid
|
|
```
|
|
|
|
3. Check time synchronization:
|
|
```bash
|
|
# Ensure all instances have synchronized time
|
|
timedatectl status
|
|
```
|
|
|
|
### Debug Commands
|
|
|
|
Useful Redis commands for debugging:
|
|
|
|
```bash
|
|
# Monitor all Redis commands in real-time
|
|
redis-cli MONITOR
|
|
|
|
# Check slow queries
|
|
redis-cli SLOWLOG GET 10
|
|
|
|
# Analyze memory usage
|
|
redis-cli MEMORY DOCTOR
|
|
|
|
# List all keys (careful in production)
|
|
redis-cli --scan --pattern "traefikoidc:*"
|
|
|
|
# Get key TTL
|
|
redis-cli TTL "traefikoidc:session:abc123"
|
|
|
|
# Check Redis info
|
|
redis-cli INFO all
|
|
```
|
|
|
|
## Migration Guide
|
|
|
|
### Migrating from Memory-Only to Redis
|
|
|
|
#### Phase 1: Preparation
|
|
1. Deploy Redis infrastructure
|
|
2. Test Redis connectivity
|
|
3. Configure monitoring
|
|
|
|
#### Phase 2: Gradual Rollout
|
|
1. Enable Redis on one instance:
|
|
```yaml
|
|
redis:
|
|
enabled: true
|
|
address: "redis:6379"
|
|
cacheMode: "hybrid"
|
|
```
|
|
|
|
2. Monitor performance and errors
|
|
|
|
3. Gradually enable on more instances
|
|
|
|
#### Phase 3: Full Migration
|
|
1. Enable Redis on all instances
|
|
2. Remove `disableReplayDetection: true` if set
|
|
3. Monitor for issues
|
|
|
|
#### Rollback Plan
|
|
If issues occur:
|
|
1. Disable Redis: `REDIS_ENABLED=false`
|
|
2. Falls back to memory cache automatically
|
|
3. Investigate and resolve issues
|
|
|
|
### Migration Checklist
|
|
|
|
- [ ] Redis deployed and accessible
|
|
- [ ] Redis password configured
|
|
- [ ] Network connectivity verified
|
|
- [ ] Monitoring configured
|
|
- [ ] Backup plan prepared
|
|
- [ ] Test environment validated
|
|
- [ ] Gradual rollout planned
|
|
- [ ] Team notified of changes
|
|
|
|
## Best Practices
|
|
|
|
### 1. Security
|
|
- Always use Redis password authentication
|
|
- Enable TLS for production deployments
|
|
- Use network segmentation (private subnets)
|
|
- Rotate Redis passwords regularly
|
|
|
|
### 2. High Availability
|
|
- Use Redis Sentinel or Cluster for HA
|
|
- Configure appropriate circuit breaker thresholds
|
|
- Implement proper health checks
|
|
- Use connection pooling
|
|
|
|
### 3. Performance
|
|
- Use hybrid cache mode for best performance
|
|
- Configure appropriate TTLs
|
|
- Monitor cache hit rates
|
|
- Size Redis memory appropriately
|
|
|
|
### 4. Operations
|
|
- Implement comprehensive monitoring
|
|
- Set up alerting for circuit breaker state
|
|
- Regular backup of Redis data (if persistence enabled)
|
|
- Document Redis configuration
|
|
|
|
### 5. Development
|
|
- Use memory mode for local development
|
|
- Test with Redis in staging environment
|
|
- Validate circuit breaker behavior
|
|
- Load test with expected traffic patterns
|
|
|
|
## FAQ
|
|
|
|
### Q: Is Redis required for the plugin to work?
|
|
|
|
**A:** No, Redis is optional. The plugin works perfectly with in-memory cache for single-instance deployments. Redis is only needed for multi-replica deployments to share cache state.
|
|
|
|
### Q: What happens if Redis goes down?
|
|
|
|
**A:** The plugin implements a circuit breaker pattern. When Redis becomes unavailable:
|
|
1. Circuit breaker opens after threshold failures
|
|
2. Plugin falls back to in-memory cache
|
|
3. Periodically attempts to reconnect to Redis
|
|
4. Resumes Redis operations when connection restored
|
|
|
|
### Q: Can I use Redis Cluster?
|
|
|
|
**A:** Yes, Redis Cluster is supported. Configure with multiple node addresses and enable cluster mode in the configuration.
|
|
|
|
### Q: What's the recommended cache mode?
|
|
|
|
**A:** For production multi-replica deployments, use `hybrid` mode. It provides the best balance of performance and consistency.
|
|
|
|
### Q: How much memory does Redis need?
|
|
|
|
**A:** Memory requirements depend on:
|
|
- Number of active sessions
|
|
- Token sizes
|
|
- TTL configurations
|
|
|
|
Typical sizing:
|
|
- Small (1-1000 users): 128MB
|
|
- Medium (1000-10000 users): 256MB-512MB
|
|
- Large (10000+ users): 1GB+
|
|
|
|
### Q: Can I use managed Redis services?
|
|
|
|
**A:** Yes, the plugin works with:
|
|
- AWS ElastiCache
|
|
- Azure Cache for Redis
|
|
- Google Cloud Memorystore
|
|
- Redis Enterprise Cloud
|
|
- Any Redis-compatible service
|
|
|
|
### Q: How do I monitor cache performance?
|
|
|
|
**A:** Monitor these key metrics:
|
|
- Cache hit rate (target >90%)
|
|
- Redis latency (target <10ms p99)
|
|
- Circuit breaker state
|
|
- Connection pool utilization
|
|
- Memory usage
|
|
|
|
### Q: Is data encrypted in Redis?
|
|
|
|
**A:** Session data is encrypted before storing in Redis using the `sessionEncryptionKey`. Additionally, you can enable TLS for Redis connections.
|
|
|
|
### Q: Can I migrate from memory to Redis without downtime?
|
|
|
|
**A:** Yes, the migration can be done without downtime:
|
|
1. Deploy Redis
|
|
2. Enable Redis on instances gradually
|
|
3. Monitor for issues
|
|
4. Complete migration
|
|
|
|
### Q: What Redis versions are supported?
|
|
|
|
**A:** The plugin supports Redis 5.0 and later. Redis 6.0+ is recommended for production use.
|
|
|
|
### Q: How do I handle Redis password rotation?
|
|
|
|
**A:** Password rotation strategy:
|
|
1. Update secret in secret management system
|
|
2. Rolling restart of Traefik instances
|
|
3. Each instance picks up new password on restart
|
|
4. No authentication failures during rotation
|
|
|
|
### Q: Can I use Redis with TLS?
|
|
|
|
**A:** Yes, TLS is fully supported:
|
|
```yaml
|
|
redis:
|
|
tls:
|
|
enabled: true
|
|
certFile: "/path/to/cert.pem"
|
|
keyFile: "/path/to/key.pem"
|
|
caFile: "/path/to/ca.pem"
|
|
```
|
|
|
|
### Q: What's the impact on latency?
|
|
|
|
**A:** Latency impact by cache mode:
|
|
- **Memory**: ~0.1ms
|
|
- **Redis**: ~2-5ms (network dependent)
|
|
- **Hybrid**: ~0.2ms for hits, ~2-5ms for misses
|
|
|
|
### Q: Should I enable Redis persistence?
|
|
|
|
**A:** For cache data, persistence is usually not needed:
|
|
- Cache data is transient
|
|
- Disabling persistence improves performance
|
|
- Sessions can be re-established if data is lost
|
|
|
|
### Q: How do I size the connection pool?
|
|
|
|
**A:** Connection pool sizing formula:
|
|
```
|
|
poolSize = 2 * CPU_cores * expected_replicas
|
|
minIdleConns = poolSize / 2
|
|
```
|
|
|
|
Example for 4 cores, 3 replicas:
|
|
- poolSize: 24
|
|
- minIdleConns: 12
|
|
|
|
## Support and Resources
|
|
|
|
### Documentation
|
|
- [Main README](../README.md)
|
|
- [Plugin Configuration Guide](../README.md#configuration-options)
|
|
- [Troubleshooting Guide](../README.md#troubleshooting)
|
|
|
|
### Community
|
|
- GitHub Issues: Report bugs and request features
|
|
- Discussions: Ask questions and share experiences
|
|
|
|
### Additional Resources
|
|
- [Redis Documentation](https://redis.io/documentation)
|
|
- [Redis Best Practices](https://redis.io/docs/manual/patterns/)
|
|
- [Traefik Documentation](https://doc.traefik.io/traefik/)
|
|
|
|
---
|
|
|
|
*Last updated: 2025* |