traefikoidc

mirror of https://github.com/lukaszraczylo/traefikoidc.git synced 2026-06-06 22:49:43 +00:00

Author	SHA1	Message	Date
lukaszraczylo	2af05701dc	build(release): publish multi-arch oidcgate Docker image per release tag - Add 'oidcgate' build entry (linux/darwin × amd64/arm64) to goreleaser. - Add per-OS/arch tar.gz archives for the daemon binary. - Add dockers + docker_manifests entries publishing ghcr.io/lukaszraczylo/oidcgate:vX.Y.Z (release tag), :vX.Y, :vX, :latest as multi-arch manifests (linux/amd64 + linux/arm64). - Add cmd/oidcgate/Dockerfile (distroless static, nonroot user). - Sign images with cosign keyless (docker_signs). - Preserve existing source-only Traefik plugin archive via meta:true. - Update README to advertise the published image.	2026-05-19 17:14:29 +01:00
lukaszraczylo	03a755cb53	docs(oidcgate): expand user guide and cross-link - Add HAProxy and Envoy ext_authz_http wiring snippets. - Add full OIDCGATE_* env-var inventory (26 fields). - Add Security Posture section (X-Forwarded-Uri sanitisation, excludedURLs guardrail, callbackURL/logoutURL validation). - Add Bearer-token (M2M) auth composition section with link to BEARER_AUTH.md. - Add Operational Guidance section (healthz/readyz ACL, Redis for multi-replica, no built-in metrics, graceful shutdown deadline). - Add Debugging section (sentinel path, silent open-redirect rejections, /readyz warm-up). - Cross-link from docs/CONFIGURATION.md.	2026-05-19 16:59:15 +01:00
lukaszraczylo	dc0e7e0238	fix(oidcgate): gosec G304 — clean config path + native #nosec directive The //nolint:gosec directive only suppresses golangci-lint; the standalone gosec GitHub Action uses its own '#nosec G304 -- reason' syntax. Use both filepath.Clean as canonical mitigation and the native directive.	2026-05-19 16:41:57 +01:00
lukaszraczylo	b2e79d8798	Merge remote-tracking branch 'origin/main' into conflict-resolve # Conflicts: # docs/superpowers/specs/2026-05-18-bearer-token-auth-design.md # middleware.go # settings.go # types.go	2026-05-19 16:41:34 +01:00
lukaszraczylo	52ef32ece7	fix(oidcgate): security hardening — sanitize XFU, guardrails, validations	2026-05-19 15:17:04 +01:00
lukaszraczylo	3bf7c60ef4	chore: gofmt	2026-05-19 15:00:42 +01:00
lukaszraczylo	775ca7afc3	docs(oidcgate): user-facing setup guide and nginx/Caddy/Traefik wiring	2026-05-19 14:25:38 +01:00
lukaszraczylo	a1273e6883	feat(oidcgate): main entrypoint with graceful shutdown	2026-05-19 14:22:46 +01:00
lukaszraczylo	0bc0079a58	refactor(oidcgate): WriteTimeout for slowloris guard, nolint reason	2026-05-19 14:18:28 +01:00
lukaszraczylo	20294f1339	feat(oidcgate): mux wiring and http.Server with graceful shutdown	2026-05-19 14:13:13 +01:00
lukaszraczylo	43938ed8a8	feat(oidcgate): healthz and readyz endpoints	2026-05-19 14:08:53 +01:00
lukaszraczylo	46679c82eb	refactor(oidcgate): simplify cloneAndRewrite, flip ?rd precedence, assert XFU passthrough	2026-05-19 14:07:44 +01:00
lukaszraczylo	a46be72be5	feat(oidcgate): auth/start/callback/logout endpoint handlers	2026-05-19 13:59:20 +01:00
lukaszraczylo	91966c1bec	refactor(oidcgate): idempotent Finalize; document and test 307/308 intercept	2026-05-19 13:57:15 +01:00
lukaszraczylo	c465fc888b	feat(oidcgate): response-writer interceptor converts 302->401 for /oauth2/auth	2026-05-19 13:50:03 +01:00
lukaszraczylo	047fea3c75	refactor(oidcgate): drop unreachable lowercase prefix; add multi-value mirror test	2026-05-19 13:48:13 +01:00
lukaszraczylo	0c092a5a22	feat(oidcgate): synthetic success handler mirrors X-* headers to response	2026-05-19 13:41:51 +01:00
lukaszraczylo	8f458b4f6e	fix(oidcgate): quality fixes — rune-safe snake-upper, drop dead import, listen validation, nested-struct test	2026-05-19 13:40:24 +01:00
lukaszraczylo	17c28fd574	feat(oidcgate): YAML config loader with env-var overrides	2026-05-19 13:30:28 +01:00
lukaszraczylo	21cc2ed747	refactor(lib): match codebase metadataMu lock pattern in Ready()	2026-05-19 13:25:13 +01:00
lukaszraczylo	ded90e5dc1	feat(lib): add (*TraefikOidc).Ready() metadata-discovery readiness accessor	2026-05-19 13:19:20 +01:00
lukaszraczylo	46777d0510	fix(lib): also route X-Auth-Request-Redirect through originalRequestURI helper	2026-05-19 13:14:16 +01:00
lukaszraczylo	f990365cb8	feat(lib): add TrustForwardedURI to honor X-Forwarded-Uri for post-login redirect target	2026-05-19 13:07:35 +01:00
lukaszraczylo	85eb9ecd16	docs: add oidcgate implementation plan	2026-05-19 13:00:56 +01:00
lukaszraczylo	3495e70cbb	docs: add oidcgate Tier 1 forward-auth daemon design	2026-05-19 12:51:41 +01:00
lukaszraczylo	a548665edb	feat: opt-in M2M bearer-token authentication (supersedes #93 ) (#140 ) * docs: bearer-token auth design spec * docs: harden bearer-auth spec with security review findings * feat(bearer): opt-in M2M bearer-token authentication Adds an opt-in Authorization: Bearer <jwt> path for machine-to-machine clients. Replaces and supersedes the broken approach in PR #93 (synthetic-session that omitted user_identifier and skipped ID-token rejection / replay-protection-semantics / kid-pinning / etc.). Design Two auth entrypoints feed one shared post-auth pipeline: cookie path ─┐ ├── forwardAuthorized(rw, req, principal) bearer path ─┘ (roles/groups, header injection, security headers, cookie strip, forward) buildPrincipalFromSession and buildPrincipalFromBearerToken produce the same `principal` value type. forwardAuthorized is session-agnostic and runs the existing post-auth work; processAuthorizedRequest now wraps it with the session-specific concerns (backchannel-logout, dirty/Save). The cookie path's behaviour is byte-identical to before this PR; the existing test suite passes unmodified. Security hardening baked into the bearer path - Audience MANDATORY. Startup fails when EnableBearerAuth=true and Audience is empty. - BearerIdentifierClaim defaults to "sub"; "email" is rejected at startup to avoid the unverified-email spoofing footgun. Cookie path's UserIdentifierClaim is unaffected and still defaults to "email". - ID tokens explicitly rejected via the existing detectTokenType helper (nonce, typ=at+jwt, token_use, scope, aud-vs-clientID heuristics); belt-and-braces nonce/token_use=id rejection on top. - alg pinned to asymmetric allowlist (RS/PS/ES 256/384/512) BEFORE JWKS fetch, blocking alg=none and alg=HS probes from amplifying into upstream calls. - kid length capped at 256 bytes and charset-restricted before JWKS fetch, blocking pathological-kid JWKS amplification. - Multi-audience tokens require azp == clientID. - iat upper-age bound (MaxTokenAgeSeconds, default 24h) bounds clock- manipulation and forever-token abuse. - Identifier sanitization: length cap, control-char + bidi-override + delimiter (, ; =) rejection. - Per-IP failure throttle: configurable threshold/window/penalty; returns 429 + Retry-After. Limits offline-guessing-style attacks and protects the shared rate-limiter / JWKS endpoint. - JTI replay marking suppressed via new internal verifyOpts {skipReplayMarking} so the same bearer can be reused until exp; the blacklist Get stays active so RevokeToken still terminates a bearer token immediately. The existing exported VerifyToken interface is unchanged so all mocks continue to work. - Cookie wins by default when both bearer and cookie are present (safer against browser/extension/proxy bearer injection). Operator can flip via BearerOverridesCookie. - Authorization header stripped on forward by default; also stripped on excluded URLs so the token can't leak into health/metrics downstream logs. - Optional RFC 7662 introspection via existing requireTokenIntrospection. Introspection-endpoint failure returns 503 (distinguishes infra from token rejection). - 401s use RFC 6750 WWW-Authenticate hints (toggleable). Failure reason is logged at debug; raw tokens are never logged. Implementation - principal.go: pure-data principal type and buildPrincipalFromSession. - bearer_auth.go: alg/kid pin, classifier, identifier sanitization, multi-aud azp gate, iat age check, per-IP failure tracker, handleBearerRequest, buildPrincipalFromBearerToken. - token_manager.go: VerifyToken now wraps a new verifyTokenWithOpts that accepts internal-only verifyOpts. Existing callers, the TokenVerifier interface, and all mocks unchanged. - middleware.go: extracted forwardAuthorized from processAuthorizedRequest; wired bearer detection after init wait + after bypass; excluded-URL Authorization strip when bearer enabled. - settings.go: ten new config fields with defaults applied in CreateConfig. - main.go: startup validation for audience + identifier-claim guard; bearer failure tracker init. Tests - bearer_auth_test.go: table-driven helper tests for every new component (parseBearerJOSEHeader, sanitizeBearerIdentifier, resolveBearerIdentifier, enforceMultiAudienceAzp, enforceIatAge, bearerFailureTracker, detectBearerToken). Integration tests through ServeHTTP covering happy path, ID-token rejection, alg=none rejection, oversized kid, multi-aud with/without azp, iat-too-old, bidi identifier, replay (100x reuse), 429 throttle trip, excluded-URL strip, roles gate, cookie-wins precedence, BearerOverridesCookie, oversized token, malformed JWT, feature-off pass-through. Startup validation for audience- required and email-identifier-rejected. - All existing tests pass unmodified (cookie-path regression). - go vet clean. golangci-lint clean (0 issues). Race detector clean on bearer tests. Documentation - README.md: bearer auth section with security highlights and config snippet; doc link in the index. - .traefik.yml: commented config block exposing every bearer knob. - docs/CONFIGURATION.md: new subsection with full parameter table. - docs/BEARER_AUTH.md: threat model, hardening matrix, failure response table, operational guidance, known follow-ups. - docs/superpowers/specs/2026-05-18-bearer-token-auth-design.md: design spec + security-review hardening history. * fix(cache): redact raw cache keys in debug logs (CodeQL go/clear-text-logging) CodeQL flagged 9 high-severity alerts (go/clear-text-logging) where the in-memory cache and the hybrid L1+L2 backend printed `key=%s` at debug. Cache callers (token cache, blacklist, introspection cache) pass raw access / refresh / id tokens as cache keys, so any debug-enabled deployment would write them to log streams. Pre-existing issue. CodeQL started flagging it on this PR because the new bearer-auth path adds a data-flow source (req.Header.Get("Authorization")) that reaches the existing logging sinks via the same cache. The cookie path had the same risk but wasn't tracked as taint by CodeQL. Fix: hash the key (SHA-256[:8] hex) before printing. Same approach the bearer-auth logger uses for principal identifiers (spec §13). Doesn't change cache semantics — same key still produces the same hash, so debug correlation across log lines is preserved without exposing the raw value. Touches both affected packages: - internal/cache/cache.go (2 sites: Set + LRU eviction) - internal/cache/backends/hybrid.go (12 sites: L1/L2 read/write/fallback) New helper `redactKey` colocated with each package (unexported, package-local) keeps the change blast radius narrow. Tests green; lint clean. * docs(bearer): how to obtain bearer tokens from the OIDC provider Adds a section walking operators through the OAuth 2.0 client_credentials flow (RFC 6749 §4.4) and the JWT bearer assertion alternative (RFC 7523), with a worked Auth0-shape curl example, a per-provider quick reference (Auth0, Okta, Keycloak, Entra v2, Cognito, GitLab, Google), operational notes (token TTL, caching, JWKS rotation, revocation, scope vs audience, secret hygiene), and a three-line validation loop. Most common operator confusion: "I enabled the feature but tokens get 401'd" — almost always missing or wrong audience. The new section makes the audience-matching requirement loud, with per-provider parameter names so people don't have to dig through IdP docs. Locations: - docs/BEARER_AUTH.md — full section under "Quick start" - README.md — short snippet + deep link v1.0.11	2026-05-18 17:35:37 +01:00
lukaszraczylo	fcb21a36e6	docs: harden bearer-auth spec with security review findings	2026-05-18 16:24:52 +01:00
lukaszraczylo	a6c38c0747	docs: bearer-token auth design spec	2026-05-18 15:35:12 +01:00
lukaszraczylo	8c5df82dcf	fix(azure): treat Microsoft proprietary access tokens as opaque (#134 ) (#138 ) Followup to issue #134 — two reporters returned saying that even with the JWKS caching fix in v1.0.7/v1.0.8, every request emitted: ERROR: TraefikOidcPlugin: UNKNOWN token verification failed: signature verification failed: crypto/rsa: verification error ERROR: TraefikOidcPlugin: DIAGNOSTIC: Signature verification failed for kid=<kid>, alg=RS256: crypto/rsa: verification error Root cause: when an Azure tenant is configured without a custom API resource, Microsoft issues access tokens for Microsoft Graph (or Azure Mgmt). These tokens carry a `nonce` value in the JWT header; the bytes that get signed contain SHA256(nonce), while the wire token ships the original nonce. Any standard JWS verifier rejects the signature, which is exactly Microsoft's intent — they document the format as proprietary and tell client apps not to validate it (https://learn.microsoft.com/en-us/entra/identity-platform/access-tokens "you can't validate tokens for Microsoft Graph according to these rules due to their proprietary format"). validateAzureTokens was nonetheless attempting JWT verification on every JWT-shaped access token, then silently falling back to the ID token when verification failed. Auth still worked end-to-end, but every request spammed two error log lines. Two-layer defense: * validateAzureTokens now detects the proprietary-nonce header before calling verifyToken on the access token. When detected, the token is treated as opaque (matching the existing branch for non-JWT tokens) and validation proceeds via the ID token, exactly as Microsoft prescribes. * VerifyJWTSignatureAndClaims downgrades the DIAGNOSTIC error log to debug for tokens carrying the same proprietary marker, in case any path outside validateAzureTokens reaches it. Authorization still hinges on a separately-verifiable ID token — the confused-deputy guard from CWE-441 is preserved (and explicitly tested). v1.0.10	2026-05-11 17:31:37 +01:00
lukaszraczylo	aa96e9dbee	Add sponsorship Just in case you appreciate this project, feel generous and want to sponsor my caffeine addiction.	2026-05-10 21:25:26 +01:00
lukaszraczylo	1e33bb0a4d	feat(auth): support private_key_jwt and client_secret_basic (#137 ) revocation endpoints, joining the existing client_secret_post default. Both are opt-in via the new clientAuthMethod config field. Closes #135. private_key_jwt (RFC 7523 §2.2 / OpenID Connect Core §9) ======================================================== Plugin signs a short-lived JWT with a configured private key and presents it as client_assertion. Use when the IdP enforces short secret TTLs or requires secretless client auth (Microsoft Entra ID / Azure AD, Okta, Auth0, Keycloak). New Config fields: clientAuthMethod (default: client_secret_post) clientAssertionPrivateKey (inline PEM) clientAssertionKeyPath (PEM file path; mutually exclusive) clientAssertionKeyID (JWS kid header — required) clientAssertionAlg (default: RS256; RS/PS/ES 256–512 supported) PEM forms accepted: PKCS#8, PKCS#1, SEC1. Assertion claims: iss=sub=clientID, aud=tokenURL, iat=now, exp=now+60s, random 16-byte hex jti per request. ECDSA signatures are raw r\|\|s per RFC 7515 (not ASN.1). client_secret_basic (RFC 6749 §2.3.1) ===================================== Sends credentials in the Authorization: Basic header instead of the body. Both halves are form-urlencoded individually before base64 — that encoding step is required by the spec and is NOT what stdlib's http.Request.SetBasicAuth does, so the plugin uses its own helper. The form body omits client_id and client_secret on this path. Wire-up ======= Both methods are dispatched at the same two call sites: helpers.go:exchangeTokens — auth_code + refresh_token grants token_manager.go:RevokeTokenWithProvider — RFC 7009 revocation Existing clientSecret deployments are unaffected — empty clientAuthMethod maps to the historical client_secret_post behavior, and clientAssertion remains nil unless the new fields are set. Yaegi compatibility =================== All required crypto/rsa, crypto/ecdsa, crypto/x509, encoding/pem and crypto/sha256/384/512 symbols are exposed by the traefik/yaegi stdlib symbol tables (RSA SignPKCS1v15 + SignPSS, ECDSA Sign, ParsePKCS8/1PrivateKey, ParseECPrivateKey). Tests (16 new) ============== Algorithm-family coverage: TestIssue135_SignerRSAFamily — RS256/384/512 + PS256/384/512 TestIssue135_SignerECDSAFamily — ES256/384/512, raw r\|\|s shape TestIssue135_SignerRejectsAlgKeyMismatch TestIssue135_SignerJTIUniqueness — 50 sigs, all jti distinct TestIssue135_SignerPEMVariants — PKCS#8, PKCS#1, SEC1 Config validation: TestIssue135_ConfigValidation — full Validate() matrix TestIssue135_ConfigKeyPathLoadsFile Wire-up: TestIssue135_AuthCodeExchangeUsesAssertion TestIssue135_RefreshTokenUsesAssertion TestIssue135_BackcompatClientSecretPath TestIssue135_RevocationUsesAssertion TestIssue135_BuildSignerFromInlineConfig TestIssue135_BuildSignerDefaultsToRS256 TestIssue135_ClientSecretBasicAuth — Authorization header, no body creds TestIssue135_ClientSecretBasicURLEncodesReservedChars — :, +, /, @, =, & TestIssue135_ClientSecretBasicRevocation — revocation parity Documentation ============= README.md — required-row note + 5 optional rows + dedicated section docs/CONFIGURATION.md — new Client Authentication section with three method subsections, OpenSSL keygen snippet, RFC links docs/index.html — 5 new config-table rows + Private Key JWT explainer card .traefik.yml + examples/complete-traefik-config.yaml — commented opt-in example Out of scope (deferred) ======================= mTLS / tls_client_auth (RFC 8705) — separate change; requires per-call http.Client with tls.Config.Certificates and conflicts with the current pooled HTTP client architecture. v1.0.8	2026-05-09 18:02:41 +01:00
lukaszraczylo	bfd702a447	fix(jwk): keep parsed JWKS in local cache only (#134 ) (#136 ) Under yaegi (Traefik's plugin runtime) json.Marshal exposes unexported struct fields with an X-prefixed name. parsedJWKS{ keys map[string] crypto.PublicKey } therefore round-tripped through Redis as {"Xkeys":{"<kid>":{"N":<huge>,"E":65537}}} — rsa.PublicKey.N is a big.Int that marshals to a JSON number hundreds of digits long. On read, json.Unmarshal into interface{} parses numbers as float64, which cannot represent that range: Failed to deserialize value for key .../discovery/v2.0/keys:parsed: json: cannot unmarshal number 2251513... into Go value of type float64 Auth still worked (the JWKCache rebuilt the keys in memory on every miss) but the error log spammed every request. Two structural problems were behind it: * parsedJWKS holds crypto.PublicKey interface values that aren't meaningfully JSON-serializable. Even on compiled Go (where the unexported field marshals to {}), the post-roundtrip type assertion v.(parsedJWKS) silently failed and the cache was useless. The same pattern applied to *JWKSet — the struct shape survived JSON but the type assertion still failed, defeating the cache for every call that went through Redis. Both keys now use the new UniversalCache.SetLocal/GetLocal pair, which skips the configured distributed backend entirely. JWK rotation is rare and a per-replica HTTP fetch on cold cache is cheap, so cross-replica coherence buys nothing for these entries. Stale Redis entries written by previous versions are simply ignored — the new code never reads under those keys, and Redis TTL retires them. Includes regression coverage for the Azure round-trip, the poisoned-stale-data scenario, and the SetLocal/GetLocal isolation contract. patch-release v1.0.7	2026-05-08 13:35:23 +01:00
lukaszraczylo	68c150eba4	fix(cache/redis): honor enableTLS for Redis backend (#133 ) The redis.enableTLS / redis.tlsSkipVerify settings were accepted by the config layer but silently dropped before reaching the connection pool, so the plugin always dialed Redis in plaintext. This blocked TLS-only Redis deployments such as AWS ElastiCache with in-transit encryption. - Add EnableTLS, TLSSkipVerify, TLSServerName to backends.Config and PoolConfig and forward them through universal_cache_singleton -> backends.Config -> PoolConfig. - In the connection pool, dial via tls.Dialer.DialContext (TLS 1.2 minimum) with SNI defaulting to the host part of the configured Address when TLSServerName is empty, so ElastiCache cluster endpoints validate out of the box. Plain dial path now also propagates ctx. - Add regression tests covering successful TLS negotiation with skip- verify, rejection of self-signed certs without skip-verify, rejection of plain TCP servers when EnableTLS=true, and unaffected plaintext behavior. - Document maxRefreshTokenAgeSeconds (added in `1b6c861`) and the implicit SSE / WebSocket auth bypass (added in `684a990`) in README.md, docs/CONFIGURATION.md and docs/index.html. - Add the missing redis.tlsSkipVerify row to docs/index.html and clarify the redis.enableTLS description. patch-release v1.0.5	2026-05-07 12:24:13 +01:00
lukaszraczylo	9cbca4c4fb	fix(refresh): honor userIdentifierClaim in token refresh path (#132 ) patch-release The refresh path in token_manager.go hardcoded the "email" claim when extracting the user identifier from a refreshed ID token, ignoring the configured userIdentifierClaim. Keycloak users without an email claim (using sub or another identifier) were kicked out on refresh even though their initial login worked. The callback path (auth_flow.go:226-239) already honored userIdentifierClaim with "sub" fallback; PR #100 (commit `a316a98`) added that support but missed the refresh path. Mirror the callback logic in refreshToken so both paths behave the same. Cleanup: rename Get/SetEmail to Get/SetUserIdentifier on SessionData to match the actual semantics. The slot already stored the configured identifier (email, sub, oid, upn, preferred_username), only the API name was misleading. Storage key "email" → "user_identifier" and combinedSessionPayload field E (json:"e") → Ui (json:"ui"). Compat note: existing user sessions invalidate on upgrade — every active user re-authenticates once after deploying this change. v1.0.3	2026-05-07 09:21:41 +01:00
lukaszraczylo	684a990f59	fix: reduce yaegi CPU footprint + require auth on SSE/WebSocket bypass minor-release Behaviour changes (potentially breaking for operators relying on the prior unauthenticated SSE bypass): * SSE (`Accept: text/event-stream`) and WebSocket upgrade requests now return 401 when no authenticated session is present. Previously the bypass forwarded unconditionally, which let any caller reach the backend by setting the right header. Excluded URLs are unchanged. Operators relying on unauthenticated SSE/WS access must move the path into ExcludedURLs. Performance fixes (target: long-running dashboards like Grafana / ArgoCD where many panels poll concurrently while the page stays open): * Stop honouring isTestMode() for the singleton-token-cleanup interval under yaegi (the Traefik plugin runtime). In production the plugin was running a 20 Hz no-op cleanup ticker because runtime.Compiler == "yaegi" tripped the test-mode branch. * processAuthorizedRequest now resolves ID-token claims at most once per request via SessionData.GetIDTokenClaims (already cached on the session) and reuses them for both groups/roles extraction and header-template rendering. Previously every authenticated request parsed the JWT twice. * Added extractGroupsAndRolesFromClaims to drive groups/roles off pre-parsed claims; extractGroupsAndRoles still works for tests. * Removed the unconditional session.MarkDirty() in the header-templates branch. Templates only mutate request headers, not session state, so the prior MarkDirty was re-encrypting and rewriting all session cookies on every authenticated request that used header templates. Other: * Added isWebSocketUpgrade (RFC 6455 handshake detection — Connection: Upgrade + Upgrade: websocket, tolerant of multi-token Connection headers and case). * Renamed applySSEUserHeaders -> applyBypassUserHeaders; it now returns bool so the dispatcher can reject unauthenticated SSE/WS with 401. * Added tests for SSE and WS bypass covering both the auth-rejection path and the authenticated forward path. v1.0.1	2026-05-02 03:12:20 +01:00
lukaszraczylo	1b6c8616fd	fix(refresh): coalesce refresh-token grants + bound goroutines + cache hot path (target v0.8.27) (#131 ) * fix(refresh): wire RefreshCoordinator into the live refresh path The RefreshCoordinator existed but was never instantiated. The actual refresh path used only session.refreshMutex, which is per-SessionData instance - and SessionData is pulled from a sync.Pool per request - so concurrent requests sharing a refresh token had ZERO coordination. Symptom: when access_token expired (e.g. 5min Zitadel default), every in-flight request from a polling client (Grafana panels) entered the refresh path simultaneously and POSTed the same refresh_token to the IdP. With refresh-token rotation enabled (Zitadel/Authentik default), only one grant succeeded; the rest got invalid_grant and each cleared the entire session. Subsequent requests then thrashed in re-auth loops. This commit: - adds refreshCoordinator field on TraefikOidc - instantiates it in NewWithContext with DefaultRefreshCoordinatorConfig - shuts it down in Close() under shutdownOnce - routes refreshToken() through the coordinator via coordinatedTokenRefresh, which collapses concurrent grants to a single upstream call per refresh_token hash - exports refreshCoordinatorSessionID for both internal hashing and the middleware-level wireup so dedup keys stay aligned Behavioural notes: - nil-coordinator fallback preserves existing tests that build TraefikOidc literals without going through the constructor - followers receive the same TokenResponse/error as the leader, so no per-instance code paths change - existing TestGetNewTokenWithRefreshToken_Concurrency still passes because it hits GetNewTokenWithRefreshToken directly, below the coordinator boundary Tests: - refresh_coordinator_wireup_test.go: 50 concurrent refreshes coalesce to <=2 upstream calls; distinct tokens still run in parallel; nil coordinator falls back cleanly * perf(cache): bound L1 backfill goroutines in HybridBackend Get() and GetMany() previously spawned a goroutine per L2 hit to write the value through to L1. Under sustained polling traffic (e.g. a Grafana dashboard refreshing every 30s with N panels) this minted thousands of goroutines, each running in Yaegi - directly contributing to the ~1000% CPU spike that pairs with the refresh-token herd. Replace the per-hit goroutines with a single l1BackfillWorker fed by l1BackfillBuffer, mirroring the existing asyncWriteBuffer/asyncWriteWorker pattern for L2 writes. Buffer overflow drops the backfill (counted via l1BackfillDrops) - a dropped backfill just means the next L2 hit for that key re-queues it, which is safe. Tests: - TestHybridBackend_L1BackfillBounded: 1000 distinct L2 hits keep goroutine count within +20 of baseline (pre-fix it grew by ~1000) - TestHybridBackend_L1BackfillFullDrops: drops are accounted for when the buffer is saturated and the worker is stopped * feat(refresh): implement isRefreshTokenExpired heuristic Replace the placeholder `return false` with a real check based on the issued_at timestamp that SetRefreshToken already stamps into the session. Gated by a new MaxRefreshTokenAgeSeconds config field (default 21600 = 6h, matching the existing comment). 0 disables the check. This wires the previously-dead refreshTokenExpired branch in middleware.go, which short-circuits AJAX requests with a 401 instead of letting them hammer the IdP for a refresh token that's almost certainly stale - the classic Grafana-after-long-pause failure mode. Behaviour: - maxRefreshTokenAge=0 disables the check (preserves prior behaviour) - legacy sessions without issued_at still attempt one refresh; the IdP remains the source of truth on first try - nil-receiver and nil-session guards keep test code that builds TraefikOidc literals safe Tests: - TestIsRefreshTokenExpired_DisabledWhenAgeZero - TestIsRefreshTokenExpired_LegacySessionWithoutTimestamp - TestIsRefreshTokenExpired_WithinWindow - TestIsRefreshTokenExpired_BeyondWindow - TestIsRefreshTokenExpired_NilGuards * perf(token): skip parseJWT on cache hit in VerifyToken The token cache fast-return existed but ran AFTER parseJWT, so every validation paid for base64 + JSON unmarshal even on a hit. Under bursty traffic (e.g. 10+ concurrent panel requests on every Grafana dashboard refresh, each calling validateStandardTokens which verifies BOTH the access token and the ID token), this is two redundant parses per request multiplied by the panel count. Move the cache lookup ahead of parseJWT. On a hit the function returns nil immediately. On a miss the original flow runs unchanged. Also nil-guard t.tokenCache to keep partial-literal test instances safe (matches the same pattern we already use for tokenBlacklist). Tests: - TestVerifyToken_CacheHitSkipsParse: cache pre-populated with claims for a token whose body would fail parseJWT - returns nil iff the fast-path bypasses the parse - TestVerifyToken_CacheMissStillParses: a syntactically valid but unsigned token still errors past parseJWT on cache miss * feat(refresh): cross-replica refresh-grant dedup via shared cache The in-process RefreshCoordinator added in `9f96d8c` already collapses concurrent refresh-token grants on a single Traefik replica. With the plugin's existing Redis (Dragonfly) cache infrastructure available, we can extend that dedup across replicas: if pod A refreshes a token at T+0 and pod B receives a request for the same session at T+1, pod B should reuse pod A's result rather than POSTing the now-rotated refresh token to the IdP. Implementation: - Add a refreshResultCache to UniversalCacheManager (memory-only when Redis is disabled, Redis-backed in production via the existing hybrid/Redis-only mode selection) - Expose it through CacheManager.GetSharedRefreshResultCache and on the TraefikOidc struct as refreshResultCache (CacheInterface) - Inside the closure passed to RefreshCoordinator.CoordinateRefresh, consult the cache first; on hit return immediately, on miss exchange with the IdP and populate the cache for peers - 5s TTL: long enough for siblings to observe, short enough that a rotated refresh token cannot be re-supplied after the IdP has moved on - Errors are intentionally NOT cached - peers must always be able to retry on their own Pragmatic choice: optimistic cache rather than a hard distributed lock. - A hard lock (SET NX + poll) doubles Redis RTT and risks dead-locks if a Traefik pod dies mid-grant. - The user's BGP+Local externalTrafficPolicy already pins ingress for a session to one node in steady state, so cross-pod racing is rare. - This optimistic path catches the rare failover case without adding failure modes. Tests: - TestCoordinatedTokenRefresh_CrossReplicaCacheHit: pre-populated cache short-circuits the upstream call entirely (0 IdP calls) - TestCoordinatedTokenRefresh_PopulatesCrossReplicaCache: leader stores a successful result for peers to find - TestCoordinatedTokenRefresh_ErrorIsNotCached: invalid_grant must not poison the dedup cache - peers must retry independently v0.8.27	2026-04-30 18:52:39 +01:00
lukaszraczylo	4d28fa01ab	perf(jwk,cache): cache parsed public keys + RLock token cache reads Hot-path JWT verification rebuilt the public key on every call: jwk -> ToRSAPublicKey -> x509.MarshalPKIXPublicKey -> pem.Encode -> verifySignature -> pem.Decode -> x509.ParsePKIXPublicKey -> verify Under yaegi this pinned a CPU when many concurrent dashboard panels poll behind the middleware. The PEM round trip is pure waste. * jwk.go: cache pre-parsed crypto.PublicKey per kid alongside the raw JWKSet (parallel cache entry, same 1h TTL, invalidates together). * jwt.go: split verifySignatureWithKey from verifySignature; existing PEM-input entry point preserved for backchannel-logout callers. * token_manager.go: VerifyJWTSignatureAndClaims now goes straight from jwks cache to verifySignatureWithKey, no PEM round trip and no per-request availableKids slice. * universal_cache.go: token/JWK/session Get() takes RLock when the entry is unexpired, so concurrent token verifications no longer serialize on a single mutex. LRU semantics for general and metadata caches are unchanged (tests cover the strict-LRU contract there). * mocks: MockJWKCache, EnhancedMockJWKCache, mockJWKCacheForLogout, staticJWKCache satisfy the extended interface. v0.8.26	2026-04-30 10:14:10 +01:00
lukaszraczylo	2d1b04c637	review fixes apr 2026 (#130 ) * Multiple fixes - refresh coordinator dedup + memory pressure wire - middleware sse consolidation + timer leak + claim cache - universal cache sync backfill + isDebug gate - lazy background task race - memory monitor stw cached + refresh() api * fix(auth): suppress OIDC redirects on non-navigation requests - [x] Add isNonNavigationRequest using Sec-Fetch-Mode and Accept headers - [x] Add comprehensive TestIsNonNavigationRequest - [x] Update ServeHTTP to 401 non-navigation and AJAX requests Fixes #129 * feat(config): add custom CA and insecure skip verify for OIDC TLS - [x] Add CACertPath, CACertPEM, InsecureSkipVerify to Config - [x] Implement loadCACertPool for CA bundle loading - [x] Update HTTPClientConfig with RootCAs and InsecureSkipVerify - [x] Apply CA pool and skip verify to pooled HTTP clients - [x] Enhance configKey to distinguish TLS configs - [x] Add comprehensive ca_cert_test.go Fixes #125 * feat(oidc): add custom CA certificate support for private OIDC providers - [x] Add caCertPath, caCertPEM, insecureSkipVerify config options - [x] Update traefik.yml with new OIDC client config fields - [x] Add configuration schema descriptions for new options - [x] Update README table and add Custom CA Certificates section * Fix the documentation. * test(redis): add oversized argument rejection test - [x] Add TestRedisConn_RejectOversizedArgumentBytes - [x] Import strings package * Dependencies cleanup v0.8.25	2026-04-19 10:12:00 +01:00
lukaszraczylo	ccbb98b9dd	fix-issue-122 (#128 ) v0.8.24	2026-03-04 00:23:30 +00:00
Serhii Vasyliev	1362cc0dac	Improve debug logging around callback URL matching (#126 ) * Add debug logging around callback URL matching in ServeHTTP The callback URL comparison at the core of OIDC flow had zero logging, making it extremely difficult to diagnose redirect loop issues caused by misconfigured callbackURL (e.g., full URL vs path-only). Every other path comparison in ServeHTTP already logs debug info (logout, backchannel, frontchannel, excluded URLs), but the callback URL check was completely silent. Added debug logs that show: - The values being compared (request path vs configured callback) - Whether the match succeeded or failed - Configured redirURLPath during initialization This would have immediately revealed the root cause of issue #1 where callbackURL was set as a full URL but compared against req.URL.Path which only contains the path component. Closes #3 * improve-callback-url-logging: Add init-time logging for callbackURL config v0.8.23	2026-02-23 10:36:37 +00:00
Yuval Bar-On	249dcad1b3	fix: prevent deadlock in SessionData.Clear method (#114 ) Move mutex unlock before calling Save() to prevent potential deadlock when Save() method needs to acquire the same mutex. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-authored-by: Claude <noreply@anthropic.com> v0.8.22	2026-02-16 15:02:33 +00:00
lukaszraczylo	de4b4d7258	fix(cache): remove sync.Pool for Yaegi compatibility (#121 ) - [x] Remove sync.Pool implementation that causes reflection panics - [x] Replace pool-based NewRESPWriter with direct instantiation - [x] Replace pool-based NewRESPReader with direct instantiation - [x] Convert Release() methods to no-ops for API compatibility - [x] Add documentation explaining sync.Pool removal for Yaegi - [x] Remove "sync" import Resolves #120 v0.8.21	2026-01-19 17:52:31 +00:00
lukaszraczylo	9d52f1b018	feat(core): refactor linters config and improve code quality (#119 ) - [x] Reorganize golangci-lint configuration with documented disable reasons - [x] Simplify errcheck and revive linter rules with targeted exclusions - [x] Pre-compile regex patterns in input_validation.go for performance - [x] Fix type assertions in memory_shard.go and resp.go with safety checks - [x] Replace string comparison with EqualFold for case-insensitive matching - [x] Fix loop variable captures in jwk.go and logout.go - [x] Change high goroutine log level from Info to Debug in autocleanup.go - [x] Replace deprecated "cancelled" spelling with "canceled" throughout - [x] Add nolint annotations for intentional unused parameters - [x] Improve comment formatting for deprecated functions - [x] Fix comment spelling: "marshalling" → "marshaling" - [x] Refactor provider warnings formatting in internal/providers/warnings.go - [x] Simplify metrics summary building in internal/recovery/metrics.go - [x] Pre-allocate slice in error_recovery.go GetDegradedServices - [x] Refactor context cancellation checks in redis.go v0.8.20	2026-01-15 10:40:49 +00:00
lukaszraczylo	57724918fe	fix 116 (#118 ) * Fix cache serialisation * fix(cache): add integer overflow protection for serialization - [x] Add maxCacheEntrySize constant (64 MiB) to prevent memory overflow - [x] Validate byte slice size before adding marker byte - [x] Validate JSON-serialized data size before marker addition - [x] Add comprehensive overflow protection test cases * docs: add security fix documentation for integer overflow protection * test: fix goroutine tests to use mock OIDC servers The TestContextAwareGoroutineManagement tests were making real HTTP calls to hardcoded URLs like https://example.com, causing failures in CI when those requests timeout or return HTTP errors. Changes: - Added createMockOIDCServer() helper function using httptest - Updated GoroutineCleanupOnContextCancel to use mock server - Updated NoGoroutineLeakOnMultipleInstances to use 3 mock servers - Updated SingletonTasksAcrossInstances to use mock servers array This prevents network calls and makes tests more reliable and faster. Fixes test failures in GitHub Actions CI. v0.8.19	2026-01-08 22:50:46 +00:00
lukaszraczylo	775de2ada1	Fix cache serialisation (#117 ) * Fix cache serialisation * fix(cache): add integer overflow protection for serialization - [x] Add maxCacheEntrySize constant (64 MiB) to prevent memory overflow - [x] Validate byte slice size before adding marker byte - [x] Validate JSON-serialized data size before marker addition - [x] Add comprehensive overflow protection test cases	2026-01-08 22:06:19 +00:00
lukaszraczylo	7816e05c98	fix issue with logout url (#112 ) * fix(logout): handle logout requests before OIDC initialization - [x] Add debug logging to logout handler entry point - [x] Move logout path check before OIDC initialization to enable logout when provider unavailable - [x] Move excluded URL and SSE checks before initialization wait - [x] Add debug logging for initialization wait to diagnose hanging requests - [x] Add test for logout functionality without OIDC provider availability * feat(logout): implement OIDC backchannel and front-channel logout - [x] Add logout token validation and backchannel logout handler - [x] Add front-channel logout handler with iframe support - [x] Implement session invalidation cache for distributed deployments - [x] Add comprehensive logout token claim verification (issuer, audience, events, iat, sid/sub) - [x] Integrate session invalidation checks into authorization flow - [x] Add configuration options for enabling backchannel/front-channel logout - [x] Add extensive test coverage for logout flows and edge cases - [x] Update documentation with logout configuration examples - [x] Add middleware routing for logout endpoints - [x] Extend cache manager with session invalidation cache support Resolves #110 * fixup! feat(logout): implement OIDC backchannel and front-channel logout * fixup! Merge branch 'main' into fix-issue-with-logout-url v0.8.17	2026-01-04 01:59:50 +00:00
Dominik Chilla	8bf7998150	Fix for Hashicorp Vault - accept opaque access tokens with dot-characters (#113 ) v0.8.16	2026-01-02 16:42:22 +00:00
muffn_	22c4323fcb	fix: set X-Forwarded-User header for SSE requests from existing session (#111 ) Co-authored-by: muffin <MonsterMuffin@users.noreply.github.com> v0.8.15	2026-01-02 02:50:11 +00:00
lukaszraczylo	06b219d1f8	feat(dcr): Add Redis storage support for multi-replica deployments (#109 ) - [x] Add file and Redis storage backends for DCR credentials - [x] Implement storage abstraction with FileStore and RedisStore - [x] Add factory function for automatic backend selection (auto/file/redis) - [x] Integrate DCR credentials cache into UniversalCacheManager - [x] Add comprehensive tests for storage backends and factory - [x] Update configuration schema with storage backend options - [x] Update documentation with multi-replica deployment guidance - [x] Add Redis key prefix configuration for credential isolation v0.8.14	2025-12-31 12:52:39 +00:00
lukaszraczylo	413e4a1b7d	LRU + cache conflicts prevention. (#104 ) * LRU + cache conflicts prevention. * Bugfix universalCache flooding ( issue #105 ) 1. Traefik cancels the context for old plugin instances 2. Each plugin's Close() method is called 3. The CacheInterfaceWrapper.Close() was calling cache.Close() on the shared singleton caches 4. Each Close() triggered Clear() which logged "Cleared all items" at INFO level v0.8.13	2025-12-24 18:54:39 +00:00

1 2 3 4 5

212 Commits