Files
traefikoidc/recovery/error_handler.go
T
lukaszraczylo 1b49e133da Complete rebuild of the plugin
* Fix bug affecting Azure OIDC authentication ( and most likely others )

* Fixes issue #51

* Ensure that appended roles are unique. Update the documentation.

* Improvements targetting possible memory usage spikes.

* Additional fixes and cleanup

* Refactoring code to fix the issues identified by the users.

* Modernize run

* Fieldalignment

* Multiple changes to improve performance and reduce complexity.
- Optimise the errors and recovery.
- Deduplicate code in metadata cache.
- Remove unused performance monitoring code.
- Simplify session management and settings handling.

* Fix claims issue.

* Add ability to overwrite the default scopes in the settings file

* Well.. that escalated quickly.

Completely forgot that Traefik uses outdated Yaegi and requires compatibility with 1.20 ( pre-generic Go code ).

* Bugfix #51: Ensures that user provided scopes overrides work.

* fixup! Bugfix #51: Ensures that user provided scopes overrides work.

* fixup! fixup! Bugfix #51: Ensures that user provided scopes overrides work.

* Abstract the provider logic into a separate package.

* Additional micro fixes and cleanups.

* Simplify all the things.

* fixup! Simplify all the things.

* fixup! fixup! Simplify all the things.

* fixup! fixup! fixup! Simplify all the things.

* fixup! fixup! fixup! fixup! Simplify all the things.

* ...

* Cleanup tests.

* fixup! Cleanup tests.

* fixup! fixup! fixup! Cleanup tests.

* fixup! fixup! fixup! fixup! Cleanup tests.

* fixup! fixup! fixup! fixup! fixup! Cleanup tests.

* Issue #53: Fix CSRF token handling in reverse proxy

1.  HTTPS Detection Fixed (session.go:723)
- Now uses X-Forwarded-Proto header instead of r.URL.Scheme
- Properly detects HTTPS in reverse proxy environments
2.  SameSite Cookie Attribute Fixed
- Removed automatic SameSiteStrictMode for HTTPS (would break OAuth)
- Keeps SameSiteLaxMode to allow OAuth callbacks from external domains
- Only uses Strict for AJAX requests which don't involve OAuth redirects
3.  Cookie Domain Handling Fixed
- Now respects X-Forwarded-Host header for cookie domain
- Ensures cookies are set for the public domain, not internal proxy domain
4.  EnhanceSessionSecurity Properly Integrated
- Function is now actually called during session save
- Applies security enhancements without breaking OAuth flow

Why Issue #53 Failed Before:

1. Cookies were not marked Secure in HTTPS environments (browser wouldn't send them back)
2. If they had been Secure with SameSite=Strict, Azure callbacks would still fail
3. Cookie domain might have been wrong (internal vs public domain)

Why It Works Now:

1. Cookies are properly marked Secure for HTTPS
2. Uses SameSite=Lax to allow OAuth provider callbacks
3. Cookie domain uses public domain from X-Forwarded-Host
4. CSRF token persists through the entire OAuth flow

* Next set of enhancements together with memory usage improvements.

* Memory leak fixes and optimisations.

* CSRF and Cookie Domain fixes

* fixup! CSRF and Cookie Domain fixes

* Metadata cache leak fix + profiling

* fixup! Metadata cache leak fix + profiling

* Memory leaks hunting, part 1337.

* Further pursue of perfection.

* fixup! Further pursue of perfection.

* fixup! fixup! Further pursue of perfection.

* fixup! fixup! fixup! Further pursue of perfection.

* fixup! fixup! fixup! fixup! Further pursue of perfection.

* fixup! fixup! fixup! fixup! fixup! Further pursue of perfection.

* fixup! fixup! fixup! fixup! fixup! fixup! Further pursue of perfection.

* fixup! fixup! fixup! fixup! fixup! fixup! fixup! Further pursue of perfection.

* fixup! fixup! fixup! fixup! fixup! fixup! fixup! fixup! Further pursue of perfection.

* fixup! fixup! fixup! fixup! fixup! fixup! fixup! fixup! fixup! Further pursue of perfection.

* Clear race conditions

* fixup! Clear race conditions

* Weekend fun with memory leaks

* Splitting code into multiple files with reasonable testing coverage.

```
ok      github.com/lukaszraczylo/traefikoidc    117.017s        coverage: 72.6% of statements
ok      github.com/lukaszraczylo/traefikoidc/auth       0.505s  coverage: 87.1% of statements
ok      github.com/lukaszraczylo/traefikoidc/circuit_breaker    0.283s  coverage: 99.0% of statements
        github.com/lukaszraczylo/traefikoidc/config             coverage: 0.0% of statements
ok      github.com/lukaszraczylo/traefikoidc/handlers   0.349s  coverage: 98.2% of statements
ok      github.com/lukaszraczylo/traefikoidc/internal/providers (cached)        coverage: 94.3% of statements
ok      github.com/lukaszraczylo/traefikoidc/middleware 0.808s  coverage: 78.0% of statements
ok      github.com/lukaszraczylo/traefikoidc/recovery   0.653s  coverage: 100.0% of statements
ok      github.com/lukaszraczylo/traefikoidc/session/chunking   (cached)        coverage: 87.8% of statements
ok      github.com/lukaszraczylo/traefikoidc/session/core       (cached)        coverage: 85.6% of statements
ok      github.com/lukaszraczylo/traefikoidc/session/crypto     (cached)        coverage: 81.8% of statements
ok      github.com/lukaszraczylo/traefikoidc/session/storage    (cached)        coverage: 93.5% of statements
ok      github.com/lukaszraczylo/traefikoidc/session/validators (cached)        coverage: 98.8% of statements
````

* fixup! Splitting code into multiple files with reasonable testing coverage.

* fixup! fixup! Splitting code into multiple files with reasonable testing coverage.

* Weekend fun with further optimisations.

* fixup! Weekend fun with further optimisations.

* fixup! fixup! Weekend fun with further optimisations.

* fixup! fixup! fixup! Weekend fun with further optimisations.

* fixup! fixup! fixup! fixup! Weekend fun with further optimisations.

* fixup! fixup! fixup! fixup! fixup! Weekend fun with further optimisations.

* Pre-release cleanup.

* Enhance test coverage.

* fixup! Enhance test coverage.

* fixup! fixup! Enhance test coverage.

* fixup! fixup! fixup! Enhance test coverage.
2025-09-18 11:01:30 +01:00

259 lines
7.6 KiB
Go

// Package recovery provides error recovery and resilience mechanisms
package recovery
import (
"context"
"sync"
"sync/atomic"
"time"
)
// ErrorRecoveryMechanism defines the interface for error recovery strategies.
// It provides a common contract for implementing various resilience patterns
// (circuit breaker, retry, graceful degradation) to handle transient failures
// and protect downstream services from cascading failures.
type ErrorRecoveryMechanism interface {
// ExecuteWithContext executes a function with error recovery mechanisms
ExecuteWithContext(ctx context.Context, fn func() error) error
// GetMetrics returns metrics about the recovery mechanism's performance
GetMetrics() map[string]interface{}
// Reset resets the mechanism to its initial state
Reset()
// IsAvailable returns whether the mechanism is available for requests
IsAvailable() bool
}
// Logger interface for dependency injection
type Logger interface {
Infof(format string, args ...interface{})
Errorf(format string, args ...interface{})
Debugf(format string, args ...interface{})
}
// BaseRecoveryMechanism provides common functionality and metrics tracking
// for all error recovery mechanisms. It handles request/failure/success counting,
// timing information, and logging capabilities for derived recovery mechanisms.
type BaseRecoveryMechanism struct {
// startTime tracks when the mechanism was created
startTime time.Time
// lastFailureTime records the most recent failure timestamp
lastFailureTime time.Time
// lastSuccessTime records the most recent success timestamp
lastSuccessTime time.Time
// logger for debugging and monitoring
logger Logger
// name identifies this recovery mechanism instance
name string
// totalRequests counts all requests processed
totalRequests int64
// totalFailures counts failed requests
totalFailures int64
// totalSuccesses counts successful requests
totalSuccesses int64
// mutex protects shared state access
mutex sync.RWMutex
}
// NewBaseRecoveryMechanism creates a new base recovery mechanism with the given name and logger.
// This serves as the foundation for specific recovery mechanism implementations.
func NewBaseRecoveryMechanism(name string, logger Logger) *BaseRecoveryMechanism {
if logger == nil {
logger = NewNoOpLogger()
}
return &BaseRecoveryMechanism{
name: name,
logger: logger,
startTime: time.Now(),
}
}
// RecordRequest increments the total request counter.
// This method is thread-safe using atomic operations.
func (b *BaseRecoveryMechanism) RecordRequest() {
atomic.AddInt64(&b.totalRequests, 1)
}
// RecordSuccess increments the success counter and updates the last success timestamp.
// This method is thread-safe using atomic operations for counters
// and mutex protection for timestamp updates.
func (b *BaseRecoveryMechanism) RecordSuccess() {
atomic.AddInt64(&b.totalSuccesses, 1)
b.mutex.Lock()
defer b.mutex.Unlock()
b.lastSuccessTime = time.Now()
}
// RecordFailure increments the failure counter and updates the last failure timestamp.
// This method is thread-safe using atomic operations for counters
// and mutex protection for timestamp updates.
func (b *BaseRecoveryMechanism) RecordFailure() {
atomic.AddInt64(&b.totalFailures, 1)
b.mutex.Lock()
defer b.mutex.Unlock()
b.lastFailureTime = time.Now()
}
// GetBaseMetrics returns basic metrics collected by the base recovery mechanism.
// This includes request counts, success/failure rates, and timing information.
func (b *BaseRecoveryMechanism) GetBaseMetrics() map[string]interface{} {
b.mutex.RLock()
defer b.mutex.RUnlock()
totalReqs := atomic.LoadInt64(&b.totalRequests)
totalSucc := atomic.LoadInt64(&b.totalSuccesses)
totalFail := atomic.LoadInt64(&b.totalFailures)
metrics := map[string]interface{}{
"name": b.name,
"total_requests": totalReqs,
"total_successes": totalSucc,
"total_failures": totalFail,
"start_time": b.startTime,
}
if totalReqs > 0 {
metrics["success_rate"] = float64(totalSucc) / float64(totalReqs)
metrics["failure_rate"] = float64(totalFail) / float64(totalReqs)
}
if !b.lastSuccessTime.IsZero() {
metrics["last_success_time"] = b.lastSuccessTime
metrics["time_since_last_success"] = time.Since(b.lastSuccessTime)
}
if !b.lastFailureTime.IsZero() {
metrics["last_failure_time"] = b.lastFailureTime
metrics["time_since_last_failure"] = time.Since(b.lastFailureTime)
}
metrics["uptime"] = time.Since(b.startTime)
return metrics
}
// LogInfo logs an info message if a logger is available
func (b *BaseRecoveryMechanism) LogInfo(format string, args ...interface{}) {
if b.logger != nil {
b.logger.Infof(format, args...)
}
}
// LogError logs an error message if a logger is available
func (b *BaseRecoveryMechanism) LogError(format string, args ...interface{}) {
if b.logger != nil {
b.logger.Errorf(format, args...)
}
}
// LogDebug logs a debug message if a logger is available
func (b *BaseRecoveryMechanism) LogDebug(format string, args ...interface{}) {
if b.logger != nil {
b.logger.Debugf(format, args...)
}
}
// ErrorHandler provides centralized error handling and recovery coordination
type ErrorHandler struct {
mechanisms []ErrorRecoveryMechanism
logger Logger
mutex sync.RWMutex
}
// NewErrorHandler creates a new error handler with the given mechanisms
func NewErrorHandler(logger Logger, mechanisms ...ErrorRecoveryMechanism) *ErrorHandler {
return &ErrorHandler{
mechanisms: mechanisms,
logger: logger,
}
}
// AddMechanism adds a recovery mechanism to the handler
func (eh *ErrorHandler) AddMechanism(mechanism ErrorRecoveryMechanism) {
eh.mutex.Lock()
defer eh.mutex.Unlock()
eh.mechanisms = append(eh.mechanisms, mechanism)
}
// ExecuteWithRecovery executes a function with all configured recovery mechanisms
func (eh *ErrorHandler) ExecuteWithRecovery(ctx context.Context, fn func() error) error {
eh.mutex.RLock()
mechanisms := make([]ErrorRecoveryMechanism, len(eh.mechanisms))
copy(mechanisms, eh.mechanisms)
eh.mutex.RUnlock()
// If no mechanisms are configured, execute directly
if len(mechanisms) == 0 {
return fn()
}
// Chain the mechanisms - each wraps the next
var wrappedFn func() error = fn
for i := len(mechanisms) - 1; i >= 0; i-- {
mechanism := mechanisms[i]
currentFn := wrappedFn
wrappedFn = func() error {
return mechanism.ExecuteWithContext(ctx, currentFn)
}
}
return wrappedFn()
}
// GetAllMetrics returns metrics from all configured mechanisms
func (eh *ErrorHandler) GetAllMetrics() map[string]interface{} {
eh.mutex.RLock()
defer eh.mutex.RUnlock()
allMetrics := make(map[string]interface{})
for i, mechanism := range eh.mechanisms {
mechanismKey := "mechanism_" + string(rune(i))
allMetrics[mechanismKey] = mechanism.GetMetrics()
}
return allMetrics
}
// ResetAll resets all configured mechanisms
func (eh *ErrorHandler) ResetAll() {
eh.mutex.RLock()
defer eh.mutex.RUnlock()
for _, mechanism := range eh.mechanisms {
mechanism.Reset()
}
}
// IsHealthy returns true if all mechanisms are available
func (eh *ErrorHandler) IsHealthy() bool {
eh.mutex.RLock()
defer eh.mutex.RUnlock()
for _, mechanism := range eh.mechanisms {
if !mechanism.IsAvailable() {
return false
}
}
return true
}
// NoOpLogger provides a logger that does nothing
type NoOpLogger struct{}
// NewNoOpLogger creates a new no-op logger
func NewNoOpLogger() *NoOpLogger {
return &NoOpLogger{}
}
// Infof does nothing
func (l *NoOpLogger) Infof(format string, args ...interface{}) {}
// Errorf does nothing
func (l *NoOpLogger) Errorf(format string, args ...interface{}) {}
// Debugf does nothing
func (l *NoOpLogger) Debugf(format string, args ...interface{}) {}