Files
traefikoidc/token_resilience.go
lukaszraczylo 1b49e133da Complete rebuild of the plugin
* Fix bug affecting Azure OIDC authentication ( and most likely others )

* Fixes issue #51

* Ensure that appended roles are unique. Update the documentation.

* Improvements targetting possible memory usage spikes.

* Additional fixes and cleanup

* Refactoring code to fix the issues identified by the users.

* Modernize run

* Fieldalignment

* Multiple changes to improve performance and reduce complexity.
- Optimise the errors and recovery.
- Deduplicate code in metadata cache.
- Remove unused performance monitoring code.
- Simplify session management and settings handling.

* Fix claims issue.

* Add ability to overwrite the default scopes in the settings file

* Well.. that escalated quickly.

Completely forgot that Traefik uses outdated Yaegi and requires compatibility with 1.20 ( pre-generic Go code ).

* Bugfix #51: Ensures that user provided scopes overrides work.

* fixup! Bugfix #51: Ensures that user provided scopes overrides work.

* fixup! fixup! Bugfix #51: Ensures that user provided scopes overrides work.

* Abstract the provider logic into a separate package.

* Additional micro fixes and cleanups.

* Simplify all the things.

* fixup! Simplify all the things.

* fixup! fixup! Simplify all the things.

* fixup! fixup! fixup! Simplify all the things.

* fixup! fixup! fixup! fixup! Simplify all the things.

* ...

* Cleanup tests.

* fixup! Cleanup tests.

* fixup! fixup! fixup! Cleanup tests.

* fixup! fixup! fixup! fixup! Cleanup tests.

* fixup! fixup! fixup! fixup! fixup! Cleanup tests.

* Issue #53: Fix CSRF token handling in reverse proxy

1.  HTTPS Detection Fixed (session.go:723)
- Now uses X-Forwarded-Proto header instead of r.URL.Scheme
- Properly detects HTTPS in reverse proxy environments
2.  SameSite Cookie Attribute Fixed
- Removed automatic SameSiteStrictMode for HTTPS (would break OAuth)
- Keeps SameSiteLaxMode to allow OAuth callbacks from external domains
- Only uses Strict for AJAX requests which don't involve OAuth redirects
3.  Cookie Domain Handling Fixed
- Now respects X-Forwarded-Host header for cookie domain
- Ensures cookies are set for the public domain, not internal proxy domain
4.  EnhanceSessionSecurity Properly Integrated
- Function is now actually called during session save
- Applies security enhancements without breaking OAuth flow

Why Issue #53 Failed Before:

1. Cookies were not marked Secure in HTTPS environments (browser wouldn't send them back)
2. If they had been Secure with SameSite=Strict, Azure callbacks would still fail
3. Cookie domain might have been wrong (internal vs public domain)

Why It Works Now:

1. Cookies are properly marked Secure for HTTPS
2. Uses SameSite=Lax to allow OAuth provider callbacks
3. Cookie domain uses public domain from X-Forwarded-Host
4. CSRF token persists through the entire OAuth flow

* Next set of enhancements together with memory usage improvements.

* Memory leak fixes and optimisations.

* CSRF and Cookie Domain fixes

* fixup! CSRF and Cookie Domain fixes

* Metadata cache leak fix + profiling

* fixup! Metadata cache leak fix + profiling

* Memory leaks hunting, part 1337.

* Further pursue of perfection.

* fixup! Further pursue of perfection.

* fixup! fixup! Further pursue of perfection.

* fixup! fixup! fixup! Further pursue of perfection.

* fixup! fixup! fixup! fixup! Further pursue of perfection.

* fixup! fixup! fixup! fixup! fixup! Further pursue of perfection.

* fixup! fixup! fixup! fixup! fixup! fixup! Further pursue of perfection.

* fixup! fixup! fixup! fixup! fixup! fixup! fixup! Further pursue of perfection.

* fixup! fixup! fixup! fixup! fixup! fixup! fixup! fixup! Further pursue of perfection.

* fixup! fixup! fixup! fixup! fixup! fixup! fixup! fixup! fixup! Further pursue of perfection.

* Clear race conditions

* fixup! Clear race conditions

* Weekend fun with memory leaks

* Splitting code into multiple files with reasonable testing coverage.

```
ok      github.com/lukaszraczylo/traefikoidc    117.017s        coverage: 72.6% of statements
ok      github.com/lukaszraczylo/traefikoidc/auth       0.505s  coverage: 87.1% of statements
ok      github.com/lukaszraczylo/traefikoidc/circuit_breaker    0.283s  coverage: 99.0% of statements
        github.com/lukaszraczylo/traefikoidc/config             coverage: 0.0% of statements
ok      github.com/lukaszraczylo/traefikoidc/handlers   0.349s  coverage: 98.2% of statements
ok      github.com/lukaszraczylo/traefikoidc/internal/providers (cached)        coverage: 94.3% of statements
ok      github.com/lukaszraczylo/traefikoidc/middleware 0.808s  coverage: 78.0% of statements
ok      github.com/lukaszraczylo/traefikoidc/recovery   0.653s  coverage: 100.0% of statements
ok      github.com/lukaszraczylo/traefikoidc/session/chunking   (cached)        coverage: 87.8% of statements
ok      github.com/lukaszraczylo/traefikoidc/session/core       (cached)        coverage: 85.6% of statements
ok      github.com/lukaszraczylo/traefikoidc/session/crypto     (cached)        coverage: 81.8% of statements
ok      github.com/lukaszraczylo/traefikoidc/session/storage    (cached)        coverage: 93.5% of statements
ok      github.com/lukaszraczylo/traefikoidc/session/validators (cached)        coverage: 98.8% of statements
````

* fixup! Splitting code into multiple files with reasonable testing coverage.

* fixup! fixup! Splitting code into multiple files with reasonable testing coverage.

* Weekend fun with further optimisations.

* fixup! Weekend fun with further optimisations.

* fixup! fixup! Weekend fun with further optimisations.

* fixup! fixup! fixup! Weekend fun with further optimisations.

* fixup! fixup! fixup! fixup! Weekend fun with further optimisations.

* fixup! fixup! fixup! fixup! fixup! Weekend fun with further optimisations.

* Pre-release cleanup.

* Enhance test coverage.

* fixup! Enhance test coverage.

* fixup! fixup! Enhance test coverage.

* fixup! fixup! fixup! Enhance test coverage.
2025-09-18 11:01:30 +01:00

245 lines
7.4 KiB
Go

package traefikoidc
import (
"context"
"fmt"
"time"
)
// TokenResilienceConfig centralizes resilience configuration for token operations
type TokenResilienceConfig struct {
// Circuit breaker configuration for token operations
CircuitBreakerEnabled bool
CircuitBreakerConfig CircuitBreakerConfig
// Retry configuration for token operations
RetryEnabled bool
RetryConfig RetryConfig
// Metadata cache progressive grace period configuration
MetadataCacheConfig MetadataCacheResilienceConfig
}
// MetadataCacheResilienceConfig defines resilience settings for metadata cache
type MetadataCacheResilienceConfig struct {
// EnableProgressiveGracePeriod allows extending cache TTL on failures
EnableProgressiveGracePeriod bool
// InitialGracePeriod is the first extension when service is unavailable (5 minutes)
InitialGracePeriod time.Duration
// ExtendedGracePeriod is the second extension for continued failures (15 minutes)
ExtendedGracePeriod time.Duration
// MaxGracePeriod is the maximum extension allowed (30 minutes for normal, 15 for security-critical)
MaxGracePeriod time.Duration
// SecurityCriticalMaxGracePeriod enforces Allan's security limit for critical metadata
SecurityCriticalMaxGracePeriod time.Duration
// SecurityCriticalFields defines which metadata fields are security-critical
SecurityCriticalFields []string
}
// DefaultTokenResilienceConfig returns the default resilience configuration for token operations
func DefaultTokenResilienceConfig() TokenResilienceConfig {
return TokenResilienceConfig{
CircuitBreakerEnabled: true,
CircuitBreakerConfig: CircuitBreakerConfig{
MaxFailures: 3,
Timeout: 30 * time.Second,
ResetTimeout: 15 * time.Second,
},
RetryEnabled: true,
RetryConfig: RetryConfig{
MaxAttempts: 3,
InitialDelay: 250 * time.Millisecond,
MaxDelay: 2 * time.Second,
BackoffFactor: 2.0,
EnableJitter: true,
RetryableErrors: []string{
"connection refused",
"timeout",
"temporary failure",
"network unreachable",
"connection reset",
"no route to host",
},
},
MetadataCacheConfig: DefaultMetadataCacheResilienceConfig(),
}
}
// DefaultMetadataCacheResilienceConfig returns the default metadata cache resilience configuration
func DefaultMetadataCacheResilienceConfig() MetadataCacheResilienceConfig {
return MetadataCacheResilienceConfig{
EnableProgressiveGracePeriod: true,
InitialGracePeriod: 5 * time.Minute,
ExtendedGracePeriod: 15 * time.Minute,
MaxGracePeriod: 30 * time.Minute,
SecurityCriticalMaxGracePeriod: 15 * time.Minute, // Allan's security limit
SecurityCriticalFields: []string{
"jwks_uri",
"authorization_endpoint",
"token_endpoint",
"revocation_endpoint",
"end_session_endpoint",
},
}
}
// TokenResilienceManager coordinates resilience mechanisms for token operations
type TokenResilienceManager struct {
config TokenResilienceConfig
errorRecoveryManager *ErrorRecoveryManager
circuitBreaker *CircuitBreaker
retryExecutor *RetryExecutor
logger *Logger
}
// NewTokenResilienceManager creates a new token resilience manager
func NewTokenResilienceManager(config TokenResilienceConfig, logger *Logger) *TokenResilienceManager {
manager := &TokenResilienceManager{
config: config,
logger: logger,
}
// Initialize error recovery manager
manager.errorRecoveryManager = NewErrorRecoveryManager(logger)
// Initialize circuit breaker if enabled
if config.CircuitBreakerEnabled {
manager.circuitBreaker = NewCircuitBreaker(config.CircuitBreakerConfig, logger)
}
// Initialize retry executor if enabled
if config.RetryEnabled {
manager.retryExecutor = NewRetryExecutor(config.RetryConfig, logger)
}
return manager
}
// ExecuteTokenOperation executes a token operation with full resilience support
func (trm *TokenResilienceManager) ExecuteTokenOperation(ctx context.Context, operation string, fn func() error) error {
if trm.logger != nil {
trm.logger.Debugf("Executing token operation %s with resilience", operation)
}
// If no resilience mechanisms are enabled, execute directly
if !trm.config.CircuitBreakerEnabled && !trm.config.RetryEnabled {
return fn()
}
// Compose resilience mechanisms
var finalOperation func() error = fn
// Wrap with circuit breaker if enabled
if trm.config.CircuitBreakerEnabled && trm.circuitBreaker != nil {
originalOp := finalOperation
finalOperation = func() error {
return trm.circuitBreaker.ExecuteWithContext(ctx, originalOp)
}
}
// Wrap with retry if enabled
if trm.config.RetryEnabled && trm.retryExecutor != nil {
originalOp := finalOperation
finalOperation = func() error {
return trm.retryExecutor.ExecuteWithContext(ctx, originalOp)
}
}
err := finalOperation()
if err != nil && trm.logger != nil {
trm.logger.Errorf("Token operation %s failed after resilience mechanisms: %v", operation, err)
} else if trm.logger != nil {
trm.logger.Debugf("Token operation %s completed successfully", operation)
}
return err
}
// ExecuteTokenExchange executes token exchange with resilience
func (trm *TokenResilienceManager) ExecuteTokenExchange(ctx context.Context, t *TraefikOidc, grantType, codeOrToken, redirectURL, codeVerifier string) (*TokenResponse, error) {
var result *TokenResponse
var err error
operation := fmt.Sprintf("token_exchange_%s", grantType)
err = trm.ExecuteTokenOperation(ctx, operation, func() error {
result, err = t.exchangeTokens(ctx, grantType, codeOrToken, redirectURL, codeVerifier)
return err
})
return result, err
}
// ExecuteTokenRefresh executes token refresh with resilience
func (trm *TokenResilienceManager) ExecuteTokenRefresh(ctx context.Context, t *TraefikOidc, refreshToken string) (*TokenResponse, error) {
var result *TokenResponse
var err error
err = trm.ExecuteTokenOperation(ctx, "token_refresh", func() error {
result, err = t.getNewTokenWithRefreshToken(refreshToken)
return err
})
return result, err
}
// GetMetrics returns metrics for all resilience mechanisms
func (trm *TokenResilienceManager) GetMetrics() map[string]interface{} {
metrics := make(map[string]interface{})
if trm.circuitBreaker != nil {
metrics["circuit_breaker"] = trm.circuitBreaker.GetMetrics()
}
if trm.retryExecutor != nil {
metrics["retry_executor"] = trm.retryExecutor.GetMetrics()
}
if trm.errorRecoveryManager != nil {
recoveryMetrics := trm.errorRecoveryManager.GetRecoveryMetrics()
metrics["error_recovery"] = recoveryMetrics
}
return metrics
}
// Reset resets all resilience mechanisms
func (trm *TokenResilienceManager) Reset() {
if trm.circuitBreaker != nil {
trm.circuitBreaker.Reset()
}
if trm.retryExecutor != nil {
trm.retryExecutor.Reset()
}
if trm.logger != nil {
trm.logger.Infof("Token resilience manager has been reset")
}
}
// IsSecurityCriticalField checks if a metadata field is security-critical
func (config MetadataCacheResilienceConfig) IsSecurityCriticalField(fieldName string) bool {
for _, criticalField := range config.SecurityCriticalFields {
if fieldName == criticalField {
return true
}
}
return false
}
// GetEffectiveMaxGracePeriod returns the effective maximum grace period for a field
// considering Allan's security limits
func (config MetadataCacheResilienceConfig) GetEffectiveMaxGracePeriod(fieldName string) time.Duration {
if config.IsSecurityCriticalField(fieldName) {
return config.SecurityCriticalMaxGracePeriod
}
return config.MaxGracePeriod
}