Files
traefikoidc/internal/cache/cache.go
T
lukaszraczylo a548665edb feat: opt-in M2M bearer-token authentication (supersedes #93) (#140)
* docs: bearer-token auth design spec

* docs: harden bearer-auth spec with security review findings

* feat(bearer): opt-in M2M bearer-token authentication

Adds an opt-in Authorization: Bearer <jwt> path for machine-to-machine
clients. Replaces and supersedes the broken approach in PR #93
(synthetic-session that omitted user_identifier and skipped ID-token
rejection / replay-protection-semantics / kid-pinning / etc.).

Design

  Two auth entrypoints feed one shared post-auth pipeline:

    cookie path  ─┐
                  ├── forwardAuthorized(rw, req, *principal)
    bearer path  ─┘    (roles/groups, header injection, security
                        headers, cookie strip, forward)

  buildPrincipalFromSession and buildPrincipalFromBearerToken produce
  the same `principal` value type. forwardAuthorized is session-agnostic
  and runs the existing post-auth work; processAuthorizedRequest now
  wraps it with the session-specific concerns (backchannel-logout,
  dirty/Save). The cookie path's behaviour is byte-identical to before
  this PR; the existing test suite passes unmodified.

Security hardening baked into the bearer path

  - Audience MANDATORY. Startup fails when EnableBearerAuth=true and
    Audience is empty.
  - BearerIdentifierClaim defaults to "sub"; "email" is rejected at
    startup to avoid the unverified-email spoofing footgun. Cookie
    path's UserIdentifierClaim is unaffected and still defaults to
    "email".
  - ID tokens explicitly rejected via the existing detectTokenType
    helper (nonce, typ=at+jwt, token_use, scope, aud-vs-clientID
    heuristics); belt-and-braces nonce/token_use=id rejection on top.
  - alg pinned to asymmetric allowlist (RS/PS/ES 256/384/512) BEFORE
    JWKS fetch, blocking alg=none and alg=HS* probes from amplifying
    into upstream calls.
  - kid length capped at 256 bytes and charset-restricted before JWKS
    fetch, blocking pathological-kid JWKS amplification.
  - Multi-audience tokens require azp == clientID.
  - iat upper-age bound (MaxTokenAgeSeconds, default 24h) bounds clock-
    manipulation and forever-token abuse.
  - Identifier sanitization: length cap, control-char + bidi-override
    + delimiter (, ; =) rejection.
  - Per-IP failure throttle: configurable threshold/window/penalty;
    returns 429 + Retry-After. Limits offline-guessing-style attacks
    and protects the shared rate-limiter / JWKS endpoint.
  - JTI replay marking suppressed via new internal verifyOpts
    {skipReplayMarking} so the same bearer can be reused until exp;
    the blacklist Get stays active so RevokeToken still terminates a
    bearer token immediately. The existing exported VerifyToken
    interface is unchanged so all mocks continue to work.
  - Cookie wins by default when both bearer and cookie are present
    (safer against browser/extension/proxy bearer injection).
    Operator can flip via BearerOverridesCookie.
  - Authorization header stripped on forward by default; also stripped
    on excluded URLs so the token can't leak into health/metrics
    downstream logs.
  - Optional RFC 7662 introspection via existing
    requireTokenIntrospection. Introspection-endpoint failure returns
    503 (distinguishes infra from token rejection).
  - 401s use RFC 6750 WWW-Authenticate hints (toggleable). Failure
    reason is logged at debug; raw tokens are never logged.

Implementation

  - principal.go: pure-data principal type and buildPrincipalFromSession.
  - bearer_auth.go: alg/kid pin, classifier, identifier sanitization,
    multi-aud azp gate, iat age check, per-IP failure tracker,
    handleBearerRequest, buildPrincipalFromBearerToken.
  - token_manager.go: VerifyToken now wraps a new verifyTokenWithOpts
    that accepts internal-only verifyOpts. Existing callers, the
    TokenVerifier interface, and all mocks unchanged.
  - middleware.go: extracted forwardAuthorized from
    processAuthorizedRequest; wired bearer detection after init wait
    + after bypass; excluded-URL Authorization strip when bearer
    enabled.
  - settings.go: ten new config fields with defaults applied in
    CreateConfig.
  - main.go: startup validation for audience + identifier-claim
    guard; bearer failure tracker init.

Tests

  - bearer_auth_test.go: table-driven helper tests for every new
    component (parseBearerJOSEHeader, sanitizeBearerIdentifier,
    resolveBearerIdentifier, enforceMultiAudienceAzp, enforceIatAge,
    bearerFailureTracker, detectBearerToken). Integration tests
    through ServeHTTP covering happy path, ID-token rejection,
    alg=none rejection, oversized kid, multi-aud with/without azp,
    iat-too-old, bidi identifier, replay (100x reuse), 429 throttle
    trip, excluded-URL strip, roles gate, cookie-wins precedence,
    BearerOverridesCookie, oversized token, malformed JWT,
    feature-off pass-through. Startup validation for audience-
    required and email-identifier-rejected.
  - All existing tests pass unmodified (cookie-path regression).
  - go vet clean. golangci-lint clean (0 issues). Race detector
    clean on bearer tests.

Documentation

  - README.md: bearer auth section with security highlights and
    config snippet; doc link in the index.
  - .traefik.yml: commented config block exposing every bearer knob.
  - docs/CONFIGURATION.md: new subsection with full parameter table.
  - docs/BEARER_AUTH.md: threat model, hardening matrix, failure
    response table, operational guidance, known follow-ups.
  - docs/superpowers/specs/2026-05-18-bearer-token-auth-design.md:
    design spec + security-review hardening history.

* fix(cache): redact raw cache keys in debug logs (CodeQL go/clear-text-logging)

CodeQL flagged 9 high-severity alerts (go/clear-text-logging) where the
in-memory cache and the hybrid L1+L2 backend printed `key=%s` at debug.
Cache callers (token cache, blacklist, introspection cache) pass raw
access / refresh / id tokens as cache keys, so any debug-enabled
deployment would write them to log streams.

Pre-existing issue. CodeQL started flagging it on this PR because the
new bearer-auth path adds a data-flow source (req.Header.Get("Authorization"))
that reaches the existing logging sinks via the same cache. The cookie
path had the same risk but wasn't tracked as taint by CodeQL.

Fix: hash the key (SHA-256[:8] hex) before printing. Same approach the
bearer-auth logger uses for principal identifiers (spec §13). Doesn't
change cache semantics — same key still produces the same hash, so
debug correlation across log lines is preserved without exposing the
raw value.

Touches both affected packages:
  - internal/cache/cache.go (2 sites: Set + LRU eviction)
  - internal/cache/backends/hybrid.go (12 sites: L1/L2 read/write/fallback)

New helper `redactKey` colocated with each package (unexported,
package-local) keeps the change blast radius narrow. Tests green; lint
clean.

* docs(bearer): how to obtain bearer tokens from the OIDC provider

Adds a section walking operators through the OAuth 2.0 client_credentials
flow (RFC 6749 §4.4) and the JWT bearer assertion alternative (RFC 7523),
with a worked Auth0-shape curl example, a per-provider quick reference
(Auth0, Okta, Keycloak, Entra v2, Cognito, GitLab, Google), operational
notes (token TTL, caching, JWKS rotation, revocation, scope vs audience,
secret hygiene), and a three-line validation loop.

Most common operator confusion: "I enabled the feature but tokens get
401'd" — almost always missing or wrong audience. The new section makes
the audience-matching requirement loud, with per-provider parameter
names so people don't have to dig through IdP docs.

Locations:
  - docs/BEARER_AUTH.md  — full section under "Quick start"
  - README.md            — short snippet + deep link
2026-05-18 17:35:37 +01:00

415 lines
9.4 KiB
Go

package cache
import (
"container/list"
"context"
"encoding/json"
"fmt"
"sync"
"sync/atomic"
"time"
)
// Type defines the type of cache for optimized behavior
type Type string
const (
TypeToken Type = "token"
TypeMetadata Type = "metadata"
TypeJWK Type = "jwk"
TypeSession Type = "session"
TypeGeneral Type = "general"
)
// Logger interface for cache operations
type Logger interface {
Debug(msg string)
Debugf(format string, args ...interface{})
Info(msg string)
Infof(format string, args ...interface{})
Error(msg string)
Errorf(format string, args ...interface{})
}
// Config provides configuration for the cache
type Config struct {
Logger Logger
JWKConfig *JWKConfig
MetadataConfig *MetadataConfig
TokenConfig *TokenConfig
Type Type
DefaultTTL time.Duration
CleanupInterval time.Duration
MaxMemoryBytes int64
MaxSize int
EnableMetrics bool
EnableAutoCleanup bool
EnableMemoryLimit bool
EnableCompression bool
}
// TokenConfig provides token-specific cache configuration
type TokenConfig struct {
BlacklistTTL time.Duration
RefreshTokenTTL time.Duration
EnableTokenRotation bool
}
// MetadataConfig provides metadata-specific cache configuration
type MetadataConfig struct {
SecurityCriticalFields []string
GracePeriod time.Duration
ExtendedGracePeriod time.Duration
MaxGracePeriod time.Duration
SecurityCriticalMaxGracePeriod time.Duration
}
// JWKConfig provides JWK-specific cache configuration
type JWKConfig struct {
RefreshInterval time.Duration
MinRefreshTime time.Duration
MaxKeyAge time.Duration
}
// Item represents a single cache entry
type Item struct {
ExpiresAt time.Time
LastAccessed time.Time
Value interface{}
Metadata map[string]interface{}
element *list.Element
Key string
CacheType Type
Size int64
AccessCount int64
}
// Cache provides a single, unified cache implementation
type Cache struct {
config Config
ctx context.Context
logger Logger
cancel context.CancelFunc
lruList *list.List
items map[string]*Item
stopCleanup chan bool
wg sync.WaitGroup
currentSize int64
currentMemory int64
hits int64
misses int64
evictions int64
sets int64
mu sync.RWMutex
closed int32
}
// DefaultConfig returns a default cache configuration
func DefaultConfig() Config {
return Config{
Type: TypeGeneral,
MaxSize: 1000,
MaxMemoryBytes: 64 * 1024 * 1024, // 64MB
DefaultTTL: 10 * time.Minute,
CleanupInterval: 5 * time.Minute,
EnableAutoCleanup: true,
EnableMemoryLimit: true,
EnableMetrics: true,
}
}
// New creates a new cache instance
func New(config Config) *Cache {
if config.Logger == nil {
config.Logger = &noOpLogger{}
}
ctx, cancel := context.WithCancel(context.Background())
c := &Cache{
items: make(map[string]*Item),
lruList: list.New(),
config: config,
logger: config.Logger,
ctx: ctx,
cancel: cancel,
}
if config.EnableAutoCleanup && config.CleanupInterval > 0 {
c.stopCleanup = make(chan bool)
c.startCleanupRoutine()
}
return c
}
// Set stores a value with TTL
func (c *Cache) Set(key string, value interface{}, ttl time.Duration) error {
if atomic.LoadInt32(&c.closed) == 1 {
return fmt.Errorf("cache is closed")
}
c.mu.Lock()
defer c.mu.Unlock()
// Calculate size
size := c.estimateSize(value)
// Check memory limit
if c.config.EnableMemoryLimit && c.currentMemory+size > c.config.MaxMemoryBytes {
c.evictLRU()
}
// Check size limit
if c.config.MaxSize > 0 && len(c.items) >= c.config.MaxSize {
c.evictLRU()
}
// Create or update item
item := &Item{
Key: key,
Value: value,
Size: size,
ExpiresAt: time.Now().Add(ttl),
LastAccessed: time.Now(),
AccessCount: 0,
CacheType: c.config.Type,
Metadata: make(map[string]interface{}),
}
// Remove old item if exists
if oldItem, exists := c.items[key]; exists {
c.lruList.Remove(oldItem.element)
c.currentMemory -= oldItem.Size
c.currentSize--
}
// Add new item
item.element = c.lruList.PushFront(item)
c.items[key] = item
c.currentMemory += size
c.currentSize++
atomic.AddInt64(&c.sets, 1)
c.logger.Debugf("Cache: Set key=%s, size=%d, ttl=%v", redactKey(key), size, ttl)
return nil
}
// Get retrieves a value from cache
func (c *Cache) Get(key string) (interface{}, bool) {
if atomic.LoadInt32(&c.closed) == 1 {
return nil, false
}
c.mu.Lock()
defer c.mu.Unlock()
item, exists := c.items[key]
if !exists {
atomic.AddInt64(&c.misses, 1)
return nil, false
}
// Check expiration
if time.Now().After(item.ExpiresAt) {
c.removeItem(key, item)
atomic.AddInt64(&c.misses, 1)
return nil, false
}
// Update LRU
c.lruList.MoveToFront(item.element)
item.LastAccessed = time.Now()
item.AccessCount++
atomic.AddInt64(&c.hits, 1)
return item.Value, true
}
// Delete removes a key from cache
func (c *Cache) Delete(key string) {
if atomic.LoadInt32(&c.closed) == 1 {
return
}
c.mu.Lock()
defer c.mu.Unlock()
if item, exists := c.items[key]; exists {
c.removeItem(key, item)
}
}
// Clear removes all items from cache
func (c *Cache) Clear() {
c.mu.Lock()
defer c.mu.Unlock()
c.items = make(map[string]*Item)
c.lruList.Init()
c.currentSize = 0
c.currentMemory = 0
}
// Size returns the number of items in cache
func (c *Cache) Size() int {
c.mu.RLock()
defer c.mu.RUnlock()
return len(c.items)
}
// SetMaxSize updates the maximum cache size
func (c *Cache) SetMaxSize(size int) {
c.mu.Lock()
defer c.mu.Unlock()
c.config.MaxSize = size
// Evict items if necessary
for len(c.items) > size && c.lruList.Len() > 0 {
c.evictLRU()
}
}
// GetStats returns cache statistics
func (c *Cache) GetStats() map[string]interface{} {
c.mu.RLock()
defer c.mu.RUnlock()
return map[string]interface{}{
"size": c.currentSize,
"memory": c.currentMemory,
"hits": atomic.LoadInt64(&c.hits),
"misses": atomic.LoadInt64(&c.misses),
"evictions": atomic.LoadInt64(&c.evictions),
"sets": atomic.LoadInt64(&c.sets),
"hit_rate": c.calculateHitRate(),
"cache_type": string(c.config.Type),
}
}
// Close gracefully shuts down the cache
func (c *Cache) Close() error {
if !atomic.CompareAndSwapInt32(&c.closed, 0, 1) {
return fmt.Errorf("cache already closed")
}
c.cancel()
if c.config.EnableAutoCleanup {
close(c.stopCleanup)
c.wg.Wait()
}
c.mu.Lock()
defer c.mu.Unlock()
// Clear inline to avoid double locking
c.items = make(map[string]*Item)
c.lruList.Init()
c.currentSize = 0
c.currentMemory = 0
return nil
}
// Cleanup removes expired items
func (c *Cache) Cleanup() {
c.mu.Lock()
defer c.mu.Unlock()
now := time.Now()
var toRemove []string
for key, item := range c.items {
if now.After(item.ExpiresAt) {
toRemove = append(toRemove, key)
}
}
for _, key := range toRemove {
if item, exists := c.items[key]; exists {
c.removeItem(key, item)
}
}
c.logger.Debugf("Cache cleanup: removed %d expired items", len(toRemove))
}
// Private methods
func (c *Cache) removeItem(key string, item *Item) {
c.lruList.Remove(item.element)
delete(c.items, key)
c.currentMemory -= item.Size
c.currentSize--
}
func (c *Cache) evictLRU() {
if elem := c.lruList.Back(); elem != nil {
item, _ := elem.Value.(*Item) // Safe to ignore: type assertion from known type
c.removeItem(item.Key, item)
atomic.AddInt64(&c.evictions, 1)
c.logger.Debugf("Cache: Evicted LRU item key=%s", redactKey(item.Key))
}
}
func (c *Cache) estimateSize(value interface{}) int64 {
// Simple size estimation
switch v := value.(type) {
case string:
return int64(len(v))
case []byte:
return int64(len(v))
case map[string]interface{}:
// Rough estimation for maps
data, _ := json.Marshal(v)
return int64(len(data))
default:
// Default size for unknown types
return 256
}
}
func (c *Cache) calculateHitRate() float64 {
hits := atomic.LoadInt64(&c.hits)
misses := atomic.LoadInt64(&c.misses)
total := hits + misses
if total == 0 {
return 0
}
return float64(hits) / float64(total)
}
func (c *Cache) startCleanupRoutine() {
c.wg.Add(1)
go func() {
defer c.wg.Done()
ticker := time.NewTicker(c.config.CleanupInterval)
defer ticker.Stop()
for {
select {
case <-ticker.C:
c.Cleanup()
case <-c.stopCleanup:
return
case <-c.ctx.Done():
return
}
}
}()
}
// noOpLogger provides a no-op logger implementation
type noOpLogger struct{}
func (l *noOpLogger) Debug(msg string) {}
func (l *noOpLogger) Debugf(format string, args ...interface{}) {}
func (l *noOpLogger) Info(msg string) {}
func (l *noOpLogger) Infof(format string, args ...interface{}) {}
func (l *noOpLogger) Error(msg string) {}
func (l *noOpLogger) Errorf(format string, args ...interface{}) {}
func (l *noOpLogger) Warn(msg string) {}
func (l *noOpLogger) Warnf(format string, args ...interface{}) {}
func (l *noOpLogger) Fatal(msg string) {}
func (l *noOpLogger) Fatalf(format string, args ...interface{}) {}
func (l *noOpLogger) WithField(key string, value interface{}) Logger { return l }
func (l *noOpLogger) WithFields(fields map[string]interface{}) Logger { return l }