traefikoidc

mirror of https://github.com/lukaszraczylo/traefikoidc.git synced 2026-06-05 22:44:17 +00:00

Author	SHA1	Message	Date
lukaszraczylo	72e2b682bb	fix: eliminate per-request global mutexes in Yaegi hot paths The v1.0.14 fix replaced one contended sync.RWMutex (RefreshCoordinator. refreshMutex) with sync.Map. Production showed the same death-spiral signature recurring ~2 hours later — same shape, different mutex: 65 goroutines stuck on a sync.(RWMutex).Lock at one address, pod pinned at 1000m CPU, identical Yaegi runCfg/reflect.Value.Call stack pattern. The mutex was RefreshCoordinator.attemptsMutex. Generalising: under Yaegi (interpreted Go for traefik plugins), any per-request global mutex acquisition is a latent serialization point. reflect.Value.Call dispatch on a held lock turns a microsecond critical section into a multi-millisecond one, and on a GOMAXPROCS=1 pod the queue is unbounded. This commit removes every per-request global mutex on the hot path: 1. RefreshCoordinator.attemptsMutex (sync.RWMutex) sessionRefreshAttempts: map -> sync.Map. refreshAttemptTracker: all fields atomic (int32, int64 UnixNano, cooldownEndNano == 0 as the not-in-cooldown sentinel, replacing the inCooldown bool). isInCooldown / recordRefreshAttempt / recordRefreshSuccess / recordRefreshFailure all become lock-free. Cooldown entry uses CompareAndSwapInt64 so only one goroutine logs the transition. 2. RefreshCircuitBreaker.mutex (sync.RWMutex) lastFailureTime / lastSuccessTime -> atomic.Int64 UnixNano. state and failures already atomic. AllowRequest / RecordSuccess / RecordFailure now pure atomic ops. 3. TraefikOidc.firstRequestMutex (sync.Mutex) firstRequestReceived bool -> firstRequestStarted int32. metadataRefreshStarted bool -> metadataRefreshStartedAtomic int32. ServeHTTP bootstrap path uses CompareAndSwapInt32 — fires once, zero steady-state cost. Previously the mutex was acquired on every non-health request forever. 4. TraefikOidc.metadataRetryMutex (sync.Mutex) lastMetadataRetryTime time.Time -> lastMetadataRetryNano int64. The 30-second retry throttle is now a CAS on lastMetadataRetryNano. cleanupStaleEntries iterates via sync.Map.Range; eviction is a CompareAndDelete by pointer identity so a tracker freshly re-used by a concurrent caller is not lost. Empirical evidence (3 specialist-agent analysis of the v1.0.14 spike, profiles in /tmp/traefik-spike-1779511683/): mutex profile: 97% delay in sync.(Mutex).Unlock via HTTPHandlerSwitcher -> accesslog -> metrics -> backoff.RetryNotify 65 stuck goroutines at one RWMutex address (0x40022eb648), identical Yaegi CFG pointer, all on rc.attemptsMutex via recordRefreshAttempt + isInCooldown * traffic driver: long-lived in-cluster Go-http-client doing ~5.4 req/s POST embeddings via OIDC cookie session → same sessionID → contention all funnels to one tracker entry Yaegi support for sync/atomic confirmed at github.com/traefik/yaegi@v0.16.1/stdlib/go1_22_sync_atomic.go: AddInt32/Int64, LoadInt32/Int64, StoreInt32/Int64, CompareAndSwapInt32/Int64 all exposed via reflect.ValueOf. Yaegi dispatches each call through reflect.Value.Call to the COMPILED atomic.* function, which executes a single hardware CAS/LOCK-XADD instruction. Each atomic op still pays Yaegi dispatch cost but cannot block — no queueing, no death spiral. Trade-off acknowledged: v1.0.15 issues ~6-8 atomic/sync.Map ops per leader-path request vs the 4 mutex ops of v1.0.14. Under low contention this is a modest CPU bump. Under high contention it's an unbounded → bounded transformation. Net win. All tests pass with -race; golangci-lint clean.	2026-05-23 10:47:21 +01:00
lukaszraczylo	9cbca4c4fb	fix(refresh): honor userIdentifierClaim in token refresh path (#132 ) patch-release The refresh path in token_manager.go hardcoded the "email" claim when extracting the user identifier from a refreshed ID token, ignoring the configured userIdentifierClaim. Keycloak users without an email claim (using sub or another identifier) were kicked out on refresh even though their initial login worked. The callback path (auth_flow.go:226-239) already honored userIdentifierClaim with "sub" fallback; PR #100 (commit `a316a98`) added that support but missed the refresh path. Mirror the callback logic in refreshToken so both paths behave the same. Cleanup: rename Get/SetEmail to Get/SetUserIdentifier on SessionData to match the actual semantics. The slot already stored the configured identifier (email, sub, oid, upn, preferred_username), only the API name was misleading. Storage key "email" → "user_identifier" and combinedSessionPayload field E (json:"e") → Ui (json:"ui"). Compat note: existing user sessions invalidate on upgrade — every active user re-authenticates once after deploying this change.	2026-05-07 09:21:41 +01:00
lukaszraczylo	7816e05c98	fix issue with logout url (#112 ) * fix(logout): handle logout requests before OIDC initialization - [x] Add debug logging to logout handler entry point - [x] Move logout path check before OIDC initialization to enable logout when provider unavailable - [x] Move excluded URL and SSE checks before initialization wait - [x] Add debug logging for initialization wait to diagnose hanging requests - [x] Add test for logout functionality without OIDC provider availability * feat(logout): implement OIDC backchannel and front-channel logout - [x] Add logout token validation and backchannel logout handler - [x] Add front-channel logout handler with iframe support - [x] Implement session invalidation cache for distributed deployments - [x] Add comprehensive logout token claim verification (issuer, audience, events, iat, sid/sub) - [x] Integrate session invalidation checks into authorization flow - [x] Add configuration options for enabling backchannel/front-channel logout - [x] Add extensive test coverage for logout flows and edge cases - [x] Update documentation with logout configuration examples - [x] Add middleware routing for logout endpoints - [x] Extend cache manager with session invalidation cache support Resolves #110 * fixup! feat(logout): implement OIDC backchannel and front-channel logout * fixup! Merge branch 'main' into fix-issue-with-logout-url	2026-01-04 01:59:50 +00:00
lukaszraczylo	ae59a5e88a	0.7.10 (#80 ) * Add ability to disable replay protection. - This is useful for runs with multiple traefik replicas to avoid false positives and tokens re-creation. * Enhance the CI/CD pipelines * Increase test coverage. * Update vendored dependencies. * Update behaviour on forceHTTPS as per issue #82	2025-10-16 10:56:28 +01:00

4 Commits