mirror of
https://github.com/lukaszraczylo/traefikoidc.git
synced 2026-06-05 22:44:17 +00:00
a548665edb
* docs: bearer-token auth design spec * docs: harden bearer-auth spec with security review findings * feat(bearer): opt-in M2M bearer-token authentication Adds an opt-in Authorization: Bearer <jwt> path for machine-to-machine clients. Replaces and supersedes the broken approach in PR #93 (synthetic-session that omitted user_identifier and skipped ID-token rejection / replay-protection-semantics / kid-pinning / etc.). Design Two auth entrypoints feed one shared post-auth pipeline: cookie path ─┐ ├── forwardAuthorized(rw, req, *principal) bearer path ─┘ (roles/groups, header injection, security headers, cookie strip, forward) buildPrincipalFromSession and buildPrincipalFromBearerToken produce the same `principal` value type. forwardAuthorized is session-agnostic and runs the existing post-auth work; processAuthorizedRequest now wraps it with the session-specific concerns (backchannel-logout, dirty/Save). The cookie path's behaviour is byte-identical to before this PR; the existing test suite passes unmodified. Security hardening baked into the bearer path - Audience MANDATORY. Startup fails when EnableBearerAuth=true and Audience is empty. - BearerIdentifierClaim defaults to "sub"; "email" is rejected at startup to avoid the unverified-email spoofing footgun. Cookie path's UserIdentifierClaim is unaffected and still defaults to "email". - ID tokens explicitly rejected via the existing detectTokenType helper (nonce, typ=at+jwt, token_use, scope, aud-vs-clientID heuristics); belt-and-braces nonce/token_use=id rejection on top. - alg pinned to asymmetric allowlist (RS/PS/ES 256/384/512) BEFORE JWKS fetch, blocking alg=none and alg=HS* probes from amplifying into upstream calls. - kid length capped at 256 bytes and charset-restricted before JWKS fetch, blocking pathological-kid JWKS amplification. - Multi-audience tokens require azp == clientID. - iat upper-age bound (MaxTokenAgeSeconds, default 24h) bounds clock- manipulation and forever-token abuse. - Identifier sanitization: length cap, control-char + bidi-override + delimiter (, ; =) rejection. - Per-IP failure throttle: configurable threshold/window/penalty; returns 429 + Retry-After. Limits offline-guessing-style attacks and protects the shared rate-limiter / JWKS endpoint. - JTI replay marking suppressed via new internal verifyOpts {skipReplayMarking} so the same bearer can be reused until exp; the blacklist Get stays active so RevokeToken still terminates a bearer token immediately. The existing exported VerifyToken interface is unchanged so all mocks continue to work. - Cookie wins by default when both bearer and cookie are present (safer against browser/extension/proxy bearer injection). Operator can flip via BearerOverridesCookie. - Authorization header stripped on forward by default; also stripped on excluded URLs so the token can't leak into health/metrics downstream logs. - Optional RFC 7662 introspection via existing requireTokenIntrospection. Introspection-endpoint failure returns 503 (distinguishes infra from token rejection). - 401s use RFC 6750 WWW-Authenticate hints (toggleable). Failure reason is logged at debug; raw tokens are never logged. Implementation - principal.go: pure-data principal type and buildPrincipalFromSession. - bearer_auth.go: alg/kid pin, classifier, identifier sanitization, multi-aud azp gate, iat age check, per-IP failure tracker, handleBearerRequest, buildPrincipalFromBearerToken. - token_manager.go: VerifyToken now wraps a new verifyTokenWithOpts that accepts internal-only verifyOpts. Existing callers, the TokenVerifier interface, and all mocks unchanged. - middleware.go: extracted forwardAuthorized from processAuthorizedRequest; wired bearer detection after init wait + after bypass; excluded-URL Authorization strip when bearer enabled. - settings.go: ten new config fields with defaults applied in CreateConfig. - main.go: startup validation for audience + identifier-claim guard; bearer failure tracker init. Tests - bearer_auth_test.go: table-driven helper tests for every new component (parseBearerJOSEHeader, sanitizeBearerIdentifier, resolveBearerIdentifier, enforceMultiAudienceAzp, enforceIatAge, bearerFailureTracker, detectBearerToken). Integration tests through ServeHTTP covering happy path, ID-token rejection, alg=none rejection, oversized kid, multi-aud with/without azp, iat-too-old, bidi identifier, replay (100x reuse), 429 throttle trip, excluded-URL strip, roles gate, cookie-wins precedence, BearerOverridesCookie, oversized token, malformed JWT, feature-off pass-through. Startup validation for audience- required and email-identifier-rejected. - All existing tests pass unmodified (cookie-path regression). - go vet clean. golangci-lint clean (0 issues). Race detector clean on bearer tests. Documentation - README.md: bearer auth section with security highlights and config snippet; doc link in the index. - .traefik.yml: commented config block exposing every bearer knob. - docs/CONFIGURATION.md: new subsection with full parameter table. - docs/BEARER_AUTH.md: threat model, hardening matrix, failure response table, operational guidance, known follow-ups. - docs/superpowers/specs/2026-05-18-bearer-token-auth-design.md: design spec + security-review hardening history. * fix(cache): redact raw cache keys in debug logs (CodeQL go/clear-text-logging) CodeQL flagged 9 high-severity alerts (go/clear-text-logging) where the in-memory cache and the hybrid L1+L2 backend printed `key=%s` at debug. Cache callers (token cache, blacklist, introspection cache) pass raw access / refresh / id tokens as cache keys, so any debug-enabled deployment would write them to log streams. Pre-existing issue. CodeQL started flagging it on this PR because the new bearer-auth path adds a data-flow source (req.Header.Get("Authorization")) that reaches the existing logging sinks via the same cache. The cookie path had the same risk but wasn't tracked as taint by CodeQL. Fix: hash the key (SHA-256[:8] hex) before printing. Same approach the bearer-auth logger uses for principal identifiers (spec §13). Doesn't change cache semantics — same key still produces the same hash, so debug correlation across log lines is preserved without exposing the raw value. Touches both affected packages: - internal/cache/cache.go (2 sites: Set + LRU eviction) - internal/cache/backends/hybrid.go (12 sites: L1/L2 read/write/fallback) New helper `redactKey` colocated with each package (unexported, package-local) keeps the change blast radius narrow. Tests green; lint clean. * docs(bearer): how to obtain bearer tokens from the OIDC provider Adds a section walking operators through the OAuth 2.0 client_credentials flow (RFC 6749 §4.4) and the JWT bearer assertion alternative (RFC 7523), with a worked Auth0-shape curl example, a per-provider quick reference (Auth0, Okta, Keycloak, Entra v2, Cognito, GitLab, Google), operational notes (token TTL, caching, JWKS rotation, revocation, scope vs audience, secret hygiene), and a three-line validation loop. Most common operator confusion: "I enabled the feature but tokens get 401'd" — almost always missing or wrong audience. The new section makes the audience-matching requirement loud, with per-provider parameter names so people don't have to dig through IdP docs. Locations: - docs/BEARER_AUTH.md — full section under "Quick start" - README.md — short snippet + deep link
415 lines
9.4 KiB
Go
415 lines
9.4 KiB
Go
package cache
|
|
|
|
import (
|
|
"container/list"
|
|
"context"
|
|
"encoding/json"
|
|
"fmt"
|
|
"sync"
|
|
"sync/atomic"
|
|
"time"
|
|
)
|
|
|
|
// Type defines the type of cache for optimized behavior
|
|
type Type string
|
|
|
|
const (
|
|
TypeToken Type = "token"
|
|
TypeMetadata Type = "metadata"
|
|
TypeJWK Type = "jwk"
|
|
TypeSession Type = "session"
|
|
TypeGeneral Type = "general"
|
|
)
|
|
|
|
// Logger interface for cache operations
|
|
type Logger interface {
|
|
Debug(msg string)
|
|
Debugf(format string, args ...interface{})
|
|
Info(msg string)
|
|
Infof(format string, args ...interface{})
|
|
Error(msg string)
|
|
Errorf(format string, args ...interface{})
|
|
}
|
|
|
|
// Config provides configuration for the cache
|
|
type Config struct {
|
|
Logger Logger
|
|
JWKConfig *JWKConfig
|
|
MetadataConfig *MetadataConfig
|
|
TokenConfig *TokenConfig
|
|
Type Type
|
|
DefaultTTL time.Duration
|
|
CleanupInterval time.Duration
|
|
MaxMemoryBytes int64
|
|
MaxSize int
|
|
EnableMetrics bool
|
|
EnableAutoCleanup bool
|
|
EnableMemoryLimit bool
|
|
EnableCompression bool
|
|
}
|
|
|
|
// TokenConfig provides token-specific cache configuration
|
|
type TokenConfig struct {
|
|
BlacklistTTL time.Duration
|
|
RefreshTokenTTL time.Duration
|
|
EnableTokenRotation bool
|
|
}
|
|
|
|
// MetadataConfig provides metadata-specific cache configuration
|
|
type MetadataConfig struct {
|
|
SecurityCriticalFields []string
|
|
GracePeriod time.Duration
|
|
ExtendedGracePeriod time.Duration
|
|
MaxGracePeriod time.Duration
|
|
SecurityCriticalMaxGracePeriod time.Duration
|
|
}
|
|
|
|
// JWKConfig provides JWK-specific cache configuration
|
|
type JWKConfig struct {
|
|
RefreshInterval time.Duration
|
|
MinRefreshTime time.Duration
|
|
MaxKeyAge time.Duration
|
|
}
|
|
|
|
// Item represents a single cache entry
|
|
type Item struct {
|
|
ExpiresAt time.Time
|
|
LastAccessed time.Time
|
|
Value interface{}
|
|
Metadata map[string]interface{}
|
|
element *list.Element
|
|
Key string
|
|
CacheType Type
|
|
Size int64
|
|
AccessCount int64
|
|
}
|
|
|
|
// Cache provides a single, unified cache implementation
|
|
type Cache struct {
|
|
config Config
|
|
ctx context.Context
|
|
logger Logger
|
|
cancel context.CancelFunc
|
|
lruList *list.List
|
|
items map[string]*Item
|
|
stopCleanup chan bool
|
|
wg sync.WaitGroup
|
|
currentSize int64
|
|
currentMemory int64
|
|
hits int64
|
|
misses int64
|
|
evictions int64
|
|
sets int64
|
|
mu sync.RWMutex
|
|
closed int32
|
|
}
|
|
|
|
// DefaultConfig returns a default cache configuration
|
|
func DefaultConfig() Config {
|
|
return Config{
|
|
Type: TypeGeneral,
|
|
MaxSize: 1000,
|
|
MaxMemoryBytes: 64 * 1024 * 1024, // 64MB
|
|
DefaultTTL: 10 * time.Minute,
|
|
CleanupInterval: 5 * time.Minute,
|
|
EnableAutoCleanup: true,
|
|
EnableMemoryLimit: true,
|
|
EnableMetrics: true,
|
|
}
|
|
}
|
|
|
|
// New creates a new cache instance
|
|
func New(config Config) *Cache {
|
|
if config.Logger == nil {
|
|
config.Logger = &noOpLogger{}
|
|
}
|
|
|
|
ctx, cancel := context.WithCancel(context.Background())
|
|
c := &Cache{
|
|
items: make(map[string]*Item),
|
|
lruList: list.New(),
|
|
config: config,
|
|
logger: config.Logger,
|
|
ctx: ctx,
|
|
cancel: cancel,
|
|
}
|
|
|
|
if config.EnableAutoCleanup && config.CleanupInterval > 0 {
|
|
c.stopCleanup = make(chan bool)
|
|
c.startCleanupRoutine()
|
|
}
|
|
|
|
return c
|
|
}
|
|
|
|
// Set stores a value with TTL
|
|
func (c *Cache) Set(key string, value interface{}, ttl time.Duration) error {
|
|
if atomic.LoadInt32(&c.closed) == 1 {
|
|
return fmt.Errorf("cache is closed")
|
|
}
|
|
|
|
c.mu.Lock()
|
|
defer c.mu.Unlock()
|
|
|
|
// Calculate size
|
|
size := c.estimateSize(value)
|
|
|
|
// Check memory limit
|
|
if c.config.EnableMemoryLimit && c.currentMemory+size > c.config.MaxMemoryBytes {
|
|
c.evictLRU()
|
|
}
|
|
|
|
// Check size limit
|
|
if c.config.MaxSize > 0 && len(c.items) >= c.config.MaxSize {
|
|
c.evictLRU()
|
|
}
|
|
|
|
// Create or update item
|
|
item := &Item{
|
|
Key: key,
|
|
Value: value,
|
|
Size: size,
|
|
ExpiresAt: time.Now().Add(ttl),
|
|
LastAccessed: time.Now(),
|
|
AccessCount: 0,
|
|
CacheType: c.config.Type,
|
|
Metadata: make(map[string]interface{}),
|
|
}
|
|
|
|
// Remove old item if exists
|
|
if oldItem, exists := c.items[key]; exists {
|
|
c.lruList.Remove(oldItem.element)
|
|
c.currentMemory -= oldItem.Size
|
|
c.currentSize--
|
|
}
|
|
|
|
// Add new item
|
|
item.element = c.lruList.PushFront(item)
|
|
c.items[key] = item
|
|
c.currentMemory += size
|
|
c.currentSize++
|
|
atomic.AddInt64(&c.sets, 1)
|
|
|
|
c.logger.Debugf("Cache: Set key=%s, size=%d, ttl=%v", redactKey(key), size, ttl)
|
|
return nil
|
|
}
|
|
|
|
// Get retrieves a value from cache
|
|
func (c *Cache) Get(key string) (interface{}, bool) {
|
|
if atomic.LoadInt32(&c.closed) == 1 {
|
|
return nil, false
|
|
}
|
|
|
|
c.mu.Lock()
|
|
defer c.mu.Unlock()
|
|
|
|
item, exists := c.items[key]
|
|
if !exists {
|
|
atomic.AddInt64(&c.misses, 1)
|
|
return nil, false
|
|
}
|
|
|
|
// Check expiration
|
|
if time.Now().After(item.ExpiresAt) {
|
|
c.removeItem(key, item)
|
|
atomic.AddInt64(&c.misses, 1)
|
|
return nil, false
|
|
}
|
|
|
|
// Update LRU
|
|
c.lruList.MoveToFront(item.element)
|
|
item.LastAccessed = time.Now()
|
|
item.AccessCount++
|
|
atomic.AddInt64(&c.hits, 1)
|
|
|
|
return item.Value, true
|
|
}
|
|
|
|
// Delete removes a key from cache
|
|
func (c *Cache) Delete(key string) {
|
|
if atomic.LoadInt32(&c.closed) == 1 {
|
|
return
|
|
}
|
|
|
|
c.mu.Lock()
|
|
defer c.mu.Unlock()
|
|
|
|
if item, exists := c.items[key]; exists {
|
|
c.removeItem(key, item)
|
|
}
|
|
}
|
|
|
|
// Clear removes all items from cache
|
|
func (c *Cache) Clear() {
|
|
c.mu.Lock()
|
|
defer c.mu.Unlock()
|
|
|
|
c.items = make(map[string]*Item)
|
|
c.lruList.Init()
|
|
c.currentSize = 0
|
|
c.currentMemory = 0
|
|
}
|
|
|
|
// Size returns the number of items in cache
|
|
func (c *Cache) Size() int {
|
|
c.mu.RLock()
|
|
defer c.mu.RUnlock()
|
|
return len(c.items)
|
|
}
|
|
|
|
// SetMaxSize updates the maximum cache size
|
|
func (c *Cache) SetMaxSize(size int) {
|
|
c.mu.Lock()
|
|
defer c.mu.Unlock()
|
|
c.config.MaxSize = size
|
|
|
|
// Evict items if necessary
|
|
for len(c.items) > size && c.lruList.Len() > 0 {
|
|
c.evictLRU()
|
|
}
|
|
}
|
|
|
|
// GetStats returns cache statistics
|
|
func (c *Cache) GetStats() map[string]interface{} {
|
|
c.mu.RLock()
|
|
defer c.mu.RUnlock()
|
|
|
|
return map[string]interface{}{
|
|
"size": c.currentSize,
|
|
"memory": c.currentMemory,
|
|
"hits": atomic.LoadInt64(&c.hits),
|
|
"misses": atomic.LoadInt64(&c.misses),
|
|
"evictions": atomic.LoadInt64(&c.evictions),
|
|
"sets": atomic.LoadInt64(&c.sets),
|
|
"hit_rate": c.calculateHitRate(),
|
|
"cache_type": string(c.config.Type),
|
|
}
|
|
}
|
|
|
|
// Close gracefully shuts down the cache
|
|
func (c *Cache) Close() error {
|
|
if !atomic.CompareAndSwapInt32(&c.closed, 0, 1) {
|
|
return fmt.Errorf("cache already closed")
|
|
}
|
|
|
|
c.cancel()
|
|
if c.config.EnableAutoCleanup {
|
|
close(c.stopCleanup)
|
|
c.wg.Wait()
|
|
}
|
|
|
|
c.mu.Lock()
|
|
defer c.mu.Unlock()
|
|
// Clear inline to avoid double locking
|
|
c.items = make(map[string]*Item)
|
|
c.lruList.Init()
|
|
c.currentSize = 0
|
|
c.currentMemory = 0
|
|
|
|
return nil
|
|
}
|
|
|
|
// Cleanup removes expired items
|
|
func (c *Cache) Cleanup() {
|
|
c.mu.Lock()
|
|
defer c.mu.Unlock()
|
|
|
|
now := time.Now()
|
|
var toRemove []string
|
|
|
|
for key, item := range c.items {
|
|
if now.After(item.ExpiresAt) {
|
|
toRemove = append(toRemove, key)
|
|
}
|
|
}
|
|
|
|
for _, key := range toRemove {
|
|
if item, exists := c.items[key]; exists {
|
|
c.removeItem(key, item)
|
|
}
|
|
}
|
|
|
|
c.logger.Debugf("Cache cleanup: removed %d expired items", len(toRemove))
|
|
}
|
|
|
|
// Private methods
|
|
|
|
func (c *Cache) removeItem(key string, item *Item) {
|
|
c.lruList.Remove(item.element)
|
|
delete(c.items, key)
|
|
c.currentMemory -= item.Size
|
|
c.currentSize--
|
|
}
|
|
|
|
func (c *Cache) evictLRU() {
|
|
if elem := c.lruList.Back(); elem != nil {
|
|
item, _ := elem.Value.(*Item) // Safe to ignore: type assertion from known type
|
|
c.removeItem(item.Key, item)
|
|
atomic.AddInt64(&c.evictions, 1)
|
|
c.logger.Debugf("Cache: Evicted LRU item key=%s", redactKey(item.Key))
|
|
}
|
|
}
|
|
|
|
func (c *Cache) estimateSize(value interface{}) int64 {
|
|
// Simple size estimation
|
|
switch v := value.(type) {
|
|
case string:
|
|
return int64(len(v))
|
|
case []byte:
|
|
return int64(len(v))
|
|
case map[string]interface{}:
|
|
// Rough estimation for maps
|
|
data, _ := json.Marshal(v)
|
|
return int64(len(data))
|
|
default:
|
|
// Default size for unknown types
|
|
return 256
|
|
}
|
|
}
|
|
|
|
func (c *Cache) calculateHitRate() float64 {
|
|
hits := atomic.LoadInt64(&c.hits)
|
|
misses := atomic.LoadInt64(&c.misses)
|
|
total := hits + misses
|
|
if total == 0 {
|
|
return 0
|
|
}
|
|
return float64(hits) / float64(total)
|
|
}
|
|
|
|
func (c *Cache) startCleanupRoutine() {
|
|
c.wg.Add(1)
|
|
go func() {
|
|
defer c.wg.Done()
|
|
ticker := time.NewTicker(c.config.CleanupInterval)
|
|
defer ticker.Stop()
|
|
|
|
for {
|
|
select {
|
|
case <-ticker.C:
|
|
c.Cleanup()
|
|
case <-c.stopCleanup:
|
|
return
|
|
case <-c.ctx.Done():
|
|
return
|
|
}
|
|
}
|
|
}()
|
|
}
|
|
|
|
// noOpLogger provides a no-op logger implementation
|
|
type noOpLogger struct{}
|
|
|
|
func (l *noOpLogger) Debug(msg string) {}
|
|
func (l *noOpLogger) Debugf(format string, args ...interface{}) {}
|
|
func (l *noOpLogger) Info(msg string) {}
|
|
func (l *noOpLogger) Infof(format string, args ...interface{}) {}
|
|
func (l *noOpLogger) Error(msg string) {}
|
|
func (l *noOpLogger) Errorf(format string, args ...interface{}) {}
|
|
func (l *noOpLogger) Warn(msg string) {}
|
|
func (l *noOpLogger) Warnf(format string, args ...interface{}) {}
|
|
func (l *noOpLogger) Fatal(msg string) {}
|
|
func (l *noOpLogger) Fatalf(format string, args ...interface{}) {}
|
|
func (l *noOpLogger) WithField(key string, value interface{}) Logger { return l }
|
|
func (l *noOpLogger) WithFields(fields map[string]interface{}) Logger { return l }
|