Commit Graph

212 Commits

Author SHA1 Message Date
lukaszraczylo 2af05701dc build(release): publish multi-arch oidcgate Docker image per release tag
- Add 'oidcgate' build entry (linux/darwin × amd64/arm64) to goreleaser.
- Add per-OS/arch tar.gz archives for the daemon binary.
- Add dockers + docker_manifests entries publishing
  ghcr.io/lukaszraczylo/oidcgate:vX.Y.Z (release tag), :vX.Y, :vX, :latest
  as multi-arch manifests (linux/amd64 + linux/arm64).
- Add cmd/oidcgate/Dockerfile (distroless static, nonroot user).
- Sign images with cosign keyless (docker_signs).
- Preserve existing source-only Traefik plugin archive via meta:true.
- Update README to advertise the published image.
2026-05-19 17:14:29 +01:00
lukaszraczylo 03a755cb53 docs(oidcgate): expand user guide and cross-link
- Add HAProxy and Envoy ext_authz_http wiring snippets.
- Add full OIDCGATE_* env-var inventory (26 fields).
- Add Security Posture section (X-Forwarded-Uri sanitisation, excludedURLs
  guardrail, callbackURL/logoutURL validation).
- Add Bearer-token (M2M) auth composition section with link to BEARER_AUTH.md.
- Add Operational Guidance section (healthz/readyz ACL, Redis for multi-replica,
  no built-in metrics, graceful shutdown deadline).
- Add Debugging section (sentinel path, silent open-redirect rejections,
  /readyz warm-up).
- Cross-link from docs/CONFIGURATION.md.
2026-05-19 16:59:15 +01:00
lukaszraczylo dc0e7e0238 fix(oidcgate): gosec G304 — clean config path + native #nosec directive
The //nolint:gosec directive only suppresses golangci-lint; the standalone
gosec GitHub Action uses its own '#nosec G304 -- reason' syntax. Use both
filepath.Clean as canonical mitigation and the native directive.
2026-05-19 16:41:57 +01:00
lukaszraczylo b2e79d8798 Merge remote-tracking branch 'origin/main' into conflict-resolve
# Conflicts:
#	docs/superpowers/specs/2026-05-18-bearer-token-auth-design.md
#	middleware.go
#	settings.go
#	types.go
2026-05-19 16:41:34 +01:00
lukaszraczylo 52ef32ece7 fix(oidcgate): security hardening — sanitize XFU, guardrails, validations 2026-05-19 15:17:04 +01:00
lukaszraczylo 3bf7c60ef4 chore: gofmt 2026-05-19 15:00:42 +01:00
lukaszraczylo 775ca7afc3 docs(oidcgate): user-facing setup guide and nginx/Caddy/Traefik wiring 2026-05-19 14:25:38 +01:00
lukaszraczylo a1273e6883 feat(oidcgate): main entrypoint with graceful shutdown 2026-05-19 14:22:46 +01:00
lukaszraczylo 0bc0079a58 refactor(oidcgate): WriteTimeout for slowloris guard, nolint reason 2026-05-19 14:18:28 +01:00
lukaszraczylo 20294f1339 feat(oidcgate): mux wiring and http.Server with graceful shutdown 2026-05-19 14:13:13 +01:00
lukaszraczylo 43938ed8a8 feat(oidcgate): healthz and readyz endpoints 2026-05-19 14:08:53 +01:00
lukaszraczylo 46679c82eb refactor(oidcgate): simplify cloneAndRewrite, flip ?rd precedence, assert XFU passthrough 2026-05-19 14:07:44 +01:00
lukaszraczylo a46be72be5 feat(oidcgate): auth/start/callback/logout endpoint handlers 2026-05-19 13:59:20 +01:00
lukaszraczylo 91966c1bec refactor(oidcgate): idempotent Finalize; document and test 307/308 intercept 2026-05-19 13:57:15 +01:00
lukaszraczylo c465fc888b feat(oidcgate): response-writer interceptor converts 302->401 for /oauth2/auth 2026-05-19 13:50:03 +01:00
lukaszraczylo 047fea3c75 refactor(oidcgate): drop unreachable lowercase prefix; add multi-value mirror test 2026-05-19 13:48:13 +01:00
lukaszraczylo 0c092a5a22 feat(oidcgate): synthetic success handler mirrors X-* headers to response 2026-05-19 13:41:51 +01:00
lukaszraczylo 8f458b4f6e fix(oidcgate): quality fixes — rune-safe snake-upper, drop dead import, listen validation, nested-struct test 2026-05-19 13:40:24 +01:00
lukaszraczylo 17c28fd574 feat(oidcgate): YAML config loader with env-var overrides 2026-05-19 13:30:28 +01:00
lukaszraczylo 21cc2ed747 refactor(lib): match codebase metadataMu lock pattern in Ready() 2026-05-19 13:25:13 +01:00
lukaszraczylo ded90e5dc1 feat(lib): add (*TraefikOidc).Ready() metadata-discovery readiness accessor 2026-05-19 13:19:20 +01:00
lukaszraczylo 46777d0510 fix(lib): also route X-Auth-Request-Redirect through originalRequestURI helper 2026-05-19 13:14:16 +01:00
lukaszraczylo f990365cb8 feat(lib): add TrustForwardedURI to honor X-Forwarded-Uri for post-login redirect target 2026-05-19 13:07:35 +01:00
lukaszraczylo 85eb9ecd16 docs: add oidcgate implementation plan 2026-05-19 13:00:56 +01:00
lukaszraczylo 3495e70cbb docs: add oidcgate Tier 1 forward-auth daemon design 2026-05-19 12:51:41 +01:00
lukaszraczylo a548665edb feat: opt-in M2M bearer-token authentication (supersedes #93) (#140)
* docs: bearer-token auth design spec

* docs: harden bearer-auth spec with security review findings

* feat(bearer): opt-in M2M bearer-token authentication

Adds an opt-in Authorization: Bearer <jwt> path for machine-to-machine
clients. Replaces and supersedes the broken approach in PR #93
(synthetic-session that omitted user_identifier and skipped ID-token
rejection / replay-protection-semantics / kid-pinning / etc.).

Design

  Two auth entrypoints feed one shared post-auth pipeline:

    cookie path  ─┐
                  ├── forwardAuthorized(rw, req, *principal)
    bearer path  ─┘    (roles/groups, header injection, security
                        headers, cookie strip, forward)

  buildPrincipalFromSession and buildPrincipalFromBearerToken produce
  the same `principal` value type. forwardAuthorized is session-agnostic
  and runs the existing post-auth work; processAuthorizedRequest now
  wraps it with the session-specific concerns (backchannel-logout,
  dirty/Save). The cookie path's behaviour is byte-identical to before
  this PR; the existing test suite passes unmodified.

Security hardening baked into the bearer path

  - Audience MANDATORY. Startup fails when EnableBearerAuth=true and
    Audience is empty.
  - BearerIdentifierClaim defaults to "sub"; "email" is rejected at
    startup to avoid the unverified-email spoofing footgun. Cookie
    path's UserIdentifierClaim is unaffected and still defaults to
    "email".
  - ID tokens explicitly rejected via the existing detectTokenType
    helper (nonce, typ=at+jwt, token_use, scope, aud-vs-clientID
    heuristics); belt-and-braces nonce/token_use=id rejection on top.
  - alg pinned to asymmetric allowlist (RS/PS/ES 256/384/512) BEFORE
    JWKS fetch, blocking alg=none and alg=HS* probes from amplifying
    into upstream calls.
  - kid length capped at 256 bytes and charset-restricted before JWKS
    fetch, blocking pathological-kid JWKS amplification.
  - Multi-audience tokens require azp == clientID.
  - iat upper-age bound (MaxTokenAgeSeconds, default 24h) bounds clock-
    manipulation and forever-token abuse.
  - Identifier sanitization: length cap, control-char + bidi-override
    + delimiter (, ; =) rejection.
  - Per-IP failure throttle: configurable threshold/window/penalty;
    returns 429 + Retry-After. Limits offline-guessing-style attacks
    and protects the shared rate-limiter / JWKS endpoint.
  - JTI replay marking suppressed via new internal verifyOpts
    {skipReplayMarking} so the same bearer can be reused until exp;
    the blacklist Get stays active so RevokeToken still terminates a
    bearer token immediately. The existing exported VerifyToken
    interface is unchanged so all mocks continue to work.
  - Cookie wins by default when both bearer and cookie are present
    (safer against browser/extension/proxy bearer injection).
    Operator can flip via BearerOverridesCookie.
  - Authorization header stripped on forward by default; also stripped
    on excluded URLs so the token can't leak into health/metrics
    downstream logs.
  - Optional RFC 7662 introspection via existing
    requireTokenIntrospection. Introspection-endpoint failure returns
    503 (distinguishes infra from token rejection).
  - 401s use RFC 6750 WWW-Authenticate hints (toggleable). Failure
    reason is logged at debug; raw tokens are never logged.

Implementation

  - principal.go: pure-data principal type and buildPrincipalFromSession.
  - bearer_auth.go: alg/kid pin, classifier, identifier sanitization,
    multi-aud azp gate, iat age check, per-IP failure tracker,
    handleBearerRequest, buildPrincipalFromBearerToken.
  - token_manager.go: VerifyToken now wraps a new verifyTokenWithOpts
    that accepts internal-only verifyOpts. Existing callers, the
    TokenVerifier interface, and all mocks unchanged.
  - middleware.go: extracted forwardAuthorized from
    processAuthorizedRequest; wired bearer detection after init wait
    + after bypass; excluded-URL Authorization strip when bearer
    enabled.
  - settings.go: ten new config fields with defaults applied in
    CreateConfig.
  - main.go: startup validation for audience + identifier-claim
    guard; bearer failure tracker init.

Tests

  - bearer_auth_test.go: table-driven helper tests for every new
    component (parseBearerJOSEHeader, sanitizeBearerIdentifier,
    resolveBearerIdentifier, enforceMultiAudienceAzp, enforceIatAge,
    bearerFailureTracker, detectBearerToken). Integration tests
    through ServeHTTP covering happy path, ID-token rejection,
    alg=none rejection, oversized kid, multi-aud with/without azp,
    iat-too-old, bidi identifier, replay (100x reuse), 429 throttle
    trip, excluded-URL strip, roles gate, cookie-wins precedence,
    BearerOverridesCookie, oversized token, malformed JWT,
    feature-off pass-through. Startup validation for audience-
    required and email-identifier-rejected.
  - All existing tests pass unmodified (cookie-path regression).
  - go vet clean. golangci-lint clean (0 issues). Race detector
    clean on bearer tests.

Documentation

  - README.md: bearer auth section with security highlights and
    config snippet; doc link in the index.
  - .traefik.yml: commented config block exposing every bearer knob.
  - docs/CONFIGURATION.md: new subsection with full parameter table.
  - docs/BEARER_AUTH.md: threat model, hardening matrix, failure
    response table, operational guidance, known follow-ups.
  - docs/superpowers/specs/2026-05-18-bearer-token-auth-design.md:
    design spec + security-review hardening history.

* fix(cache): redact raw cache keys in debug logs (CodeQL go/clear-text-logging)

CodeQL flagged 9 high-severity alerts (go/clear-text-logging) where the
in-memory cache and the hybrid L1+L2 backend printed `key=%s` at debug.
Cache callers (token cache, blacklist, introspection cache) pass raw
access / refresh / id tokens as cache keys, so any debug-enabled
deployment would write them to log streams.

Pre-existing issue. CodeQL started flagging it on this PR because the
new bearer-auth path adds a data-flow source (req.Header.Get("Authorization"))
that reaches the existing logging sinks via the same cache. The cookie
path had the same risk but wasn't tracked as taint by CodeQL.

Fix: hash the key (SHA-256[:8] hex) before printing. Same approach the
bearer-auth logger uses for principal identifiers (spec §13). Doesn't
change cache semantics — same key still produces the same hash, so
debug correlation across log lines is preserved without exposing the
raw value.

Touches both affected packages:
  - internal/cache/cache.go (2 sites: Set + LRU eviction)
  - internal/cache/backends/hybrid.go (12 sites: L1/L2 read/write/fallback)

New helper `redactKey` colocated with each package (unexported,
package-local) keeps the change blast radius narrow. Tests green; lint
clean.

* docs(bearer): how to obtain bearer tokens from the OIDC provider

Adds a section walking operators through the OAuth 2.0 client_credentials
flow (RFC 6749 §4.4) and the JWT bearer assertion alternative (RFC 7523),
with a worked Auth0-shape curl example, a per-provider quick reference
(Auth0, Okta, Keycloak, Entra v2, Cognito, GitLab, Google), operational
notes (token TTL, caching, JWKS rotation, revocation, scope vs audience,
secret hygiene), and a three-line validation loop.

Most common operator confusion: "I enabled the feature but tokens get
401'd" — almost always missing or wrong audience. The new section makes
the audience-matching requirement loud, with per-provider parameter
names so people don't have to dig through IdP docs.

Locations:
  - docs/BEARER_AUTH.md  — full section under "Quick start"
  - README.md            — short snippet + deep link
v1.0.11
2026-05-18 17:35:37 +01:00
lukaszraczylo fcb21a36e6 docs: harden bearer-auth spec with security review findings 2026-05-18 16:24:52 +01:00
lukaszraczylo a6c38c0747 docs: bearer-token auth design spec 2026-05-18 15:35:12 +01:00
lukaszraczylo 8c5df82dcf fix(azure): treat Microsoft proprietary access tokens as opaque (#134) (#138)
Followup to issue #134 — two reporters returned saying that even with the
JWKS caching fix in v1.0.7/v1.0.8, every request emitted:

  ERROR: TraefikOidcPlugin: UNKNOWN token verification failed:
    signature verification failed: crypto/rsa: verification error
  ERROR: TraefikOidcPlugin: DIAGNOSTIC: Signature verification failed for
    kid=<kid>, alg=RS256: crypto/rsa: verification error

Root cause: when an Azure tenant is configured without a custom API
resource, Microsoft issues access tokens for Microsoft Graph (or Azure
Mgmt). These tokens carry a `nonce` value in the JWT *header*; the bytes
that get signed contain SHA256(nonce), while the wire token ships the
original nonce. Any standard JWS verifier rejects the signature, which is
exactly Microsoft's intent — they document the format as proprietary and
tell client apps not to validate it
(https://learn.microsoft.com/en-us/entra/identity-platform/access-tokens
"you can't validate tokens for Microsoft Graph according to these rules
due to their proprietary format").

validateAzureTokens was nonetheless attempting JWT verification on every
JWT-shaped access token, then silently falling back to the ID token when
verification failed. Auth still worked end-to-end, but every request
spammed two error log lines.

Two-layer defense:

* validateAzureTokens now detects the proprietary-nonce header before
  calling verifyToken on the access token. When detected, the token is
  treated as opaque (matching the existing branch for non-JWT tokens) and
  validation proceeds via the ID token, exactly as Microsoft prescribes.

* VerifyJWTSignatureAndClaims downgrades the DIAGNOSTIC error log to
  debug for tokens carrying the same proprietary marker, in case any
  path outside validateAzureTokens reaches it.

Authorization still hinges on a separately-verifiable ID token — the
confused-deputy guard from CWE-441 is preserved (and explicitly tested).
v1.0.10
2026-05-11 17:31:37 +01:00
lukaszraczylo aa96e9dbee Add sponsorship
Just in case you appreciate this project, feel generous and want to sponsor my caffeine addiction.
2026-05-10 21:25:26 +01:00
lukaszraczylo 1e33bb0a4d feat(auth): support private_key_jwt and client_secret_basic (#137)
revocation endpoints, joining the existing client_secret_post default.
Both are opt-in via the new clientAuthMethod config field. Closes #135.

private_key_jwt (RFC 7523 §2.2 / OpenID Connect Core §9)
========================================================
Plugin signs a short-lived JWT with a configured private key and presents
it as client_assertion. Use when the IdP enforces short secret TTLs or
requires secretless client auth (Microsoft Entra ID / Azure AD, Okta,
Auth0, Keycloak).

New Config fields:
  clientAuthMethod          (default: client_secret_post)
  clientAssertionPrivateKey (inline PEM)
  clientAssertionKeyPath    (PEM file path; mutually exclusive)
  clientAssertionKeyID      (JWS kid header — required)
  clientAssertionAlg        (default: RS256; RS/PS/ES 256–512 supported)

PEM forms accepted: PKCS#8, PKCS#1, SEC1.
Assertion claims: iss=sub=clientID, aud=tokenURL, iat=now, exp=now+60s,
random 16-byte hex jti per request. ECDSA signatures are raw r||s per
RFC 7515 (not ASN.1).

client_secret_basic (RFC 6749 §2.3.1)
=====================================
Sends credentials in the Authorization: Basic header instead of the
body. Both halves are form-urlencoded individually before base64 — that
encoding step is required by the spec and is NOT what stdlib's
http.Request.SetBasicAuth does, so the plugin uses its own helper. The
form body omits client_id and client_secret on this path.

Wire-up
=======
Both methods are dispatched at the same two call sites:
  helpers.go:exchangeTokens — auth_code + refresh_token grants
  token_manager.go:RevokeTokenWithProvider — RFC 7009 revocation

Existing clientSecret deployments are unaffected — empty
clientAuthMethod maps to the historical client_secret_post behavior, and
clientAssertion remains nil unless the new fields are set.

Yaegi compatibility
===================
All required crypto/rsa, crypto/ecdsa, crypto/x509, encoding/pem and
crypto/sha256/384/512 symbols are exposed by the traefik/yaegi stdlib
symbol tables (RSA SignPKCS1v15 + SignPSS, ECDSA Sign,
ParsePKCS8/1PrivateKey, ParseECPrivateKey).

Tests (16 new)
==============
Algorithm-family coverage:
  TestIssue135_SignerRSAFamily — RS256/384/512 + PS256/384/512
  TestIssue135_SignerECDSAFamily — ES256/384/512, raw r||s shape
  TestIssue135_SignerRejectsAlgKeyMismatch
  TestIssue135_SignerJTIUniqueness — 50 sigs, all jti distinct
  TestIssue135_SignerPEMVariants — PKCS#8, PKCS#1, SEC1

Config validation:
  TestIssue135_ConfigValidation — full Validate() matrix
  TestIssue135_ConfigKeyPathLoadsFile

Wire-up:
  TestIssue135_AuthCodeExchangeUsesAssertion
  TestIssue135_RefreshTokenUsesAssertion
  TestIssue135_BackcompatClientSecretPath
  TestIssue135_RevocationUsesAssertion
  TestIssue135_BuildSignerFromInlineConfig
  TestIssue135_BuildSignerDefaultsToRS256
  TestIssue135_ClientSecretBasicAuth — Authorization header, no body creds
  TestIssue135_ClientSecretBasicURLEncodesReservedChars — :, +, /, @, =, &
  TestIssue135_ClientSecretBasicRevocation — revocation parity

Documentation
=============
  README.md — required-row note + 5 optional rows + dedicated section
  docs/CONFIGURATION.md — new Client Authentication section with three
    method subsections, OpenSSL keygen snippet, RFC links
  docs/index.html — 5 new config-table rows + Private Key JWT
    explainer card
  .traefik.yml + examples/complete-traefik-config.yaml — commented
    opt-in example

Out of scope (deferred)
=======================
mTLS / tls_client_auth (RFC 8705) — separate change; requires per-call
http.Client with tls.Config.Certificates and conflicts with the current
pooled HTTP client architecture.
v1.0.8
2026-05-09 18:02:41 +01:00
lukaszraczylo bfd702a447 fix(jwk): keep parsed JWKS in local cache only (#134) (#136)
Under yaegi (Traefik's plugin runtime) json.Marshal exposes unexported
struct fields with an X-prefixed name. parsedJWKS{ keys map[string]
crypto.PublicKey } therefore round-tripped through Redis as
{"Xkeys":{"<kid>":{"N":<huge>,"E":65537}}} — *rsa.PublicKey.N is a
*big.Int that marshals to a JSON number hundreds of digits long. On
read, json.Unmarshal into interface{} parses numbers as float64, which
cannot represent that range:

  Failed to deserialize value for key .../discovery/v2.0/keys:parsed:
  json: cannot unmarshal number 2251513...
    into Go value of type float64

Auth still worked (the JWKCache rebuilt the keys in memory on every
miss) but the error log spammed every request.

Two structural problems were behind it:

* parsedJWKS holds crypto.PublicKey interface values that aren't
  meaningfully JSON-serializable. Even on compiled Go (where the
  unexported field marshals to {}), the post-roundtrip type assertion
  v.(*parsedJWKS) silently failed and the cache was useless.
* The same pattern applied to *JWKSet — the struct shape survived JSON
  but the type assertion still failed, defeating the cache for every
  call that went through Redis.

Both keys now use the new UniversalCache.SetLocal/GetLocal pair, which
skips the configured distributed backend entirely. JWK rotation is rare
and a per-replica HTTP fetch on cold cache is cheap, so cross-replica
coherence buys nothing for these entries.

Stale Redis entries written by previous versions are simply ignored —
the new code never reads under those keys, and Redis TTL retires them.

Includes regression coverage for the Azure round-trip, the
poisoned-stale-data scenario, and the SetLocal/GetLocal isolation
contract.

patch-release
v1.0.7
2026-05-08 13:35:23 +01:00
lukaszraczylo 68c150eba4 fix(cache/redis): honor enableTLS for Redis backend (#133)
The redis.enableTLS / redis.tlsSkipVerify settings were accepted by the
config layer but silently dropped before reaching the connection pool, so
the plugin always dialed Redis in plaintext. This blocked TLS-only Redis
deployments such as AWS ElastiCache with in-transit encryption.

- Add EnableTLS, TLSSkipVerify, TLSServerName to backends.Config and
  PoolConfig and forward them through universal_cache_singleton ->
  backends.Config -> PoolConfig.
- In the connection pool, dial via tls.Dialer.DialContext (TLS 1.2
  minimum) with SNI defaulting to the host part of the configured
  Address when TLSServerName is empty, so ElastiCache cluster endpoints
  validate out of the box. Plain dial path now also propagates ctx.
- Add regression tests covering successful TLS negotiation with skip-
  verify, rejection of self-signed certs without skip-verify, rejection
  of plain TCP servers when EnableTLS=true, and unaffected plaintext
  behavior.
- Document maxRefreshTokenAgeSeconds (added in 1b6c861) and the implicit
  SSE / WebSocket auth bypass (added in 684a990) in README.md,
  docs/CONFIGURATION.md and docs/index.html.
- Add the missing redis.tlsSkipVerify row to docs/index.html and clarify
  the redis.enableTLS description.

patch-release
v1.0.5
2026-05-07 12:24:13 +01:00
lukaszraczylo 9cbca4c4fb fix(refresh): honor userIdentifierClaim in token refresh path (#132)
patch-release

The refresh path in token_manager.go hardcoded the "email" claim when
extracting the user identifier from a refreshed ID token, ignoring the
configured userIdentifierClaim. Keycloak users without an email claim
(using sub or another identifier) were kicked out on refresh even
though their initial login worked.

The callback path (auth_flow.go:226-239) already honored
userIdentifierClaim with "sub" fallback; PR #100 (commit a316a98)
added that support but missed the refresh path.

Mirror the callback logic in refreshToken so both paths behave the same.

Cleanup: rename Get/SetEmail to Get/SetUserIdentifier on SessionData
to match the actual semantics. The slot already stored the configured
identifier (email, sub, oid, upn, preferred_username), only the API
name was misleading. Storage key "email" → "user_identifier" and
combinedSessionPayload field E (json:"e") → Ui (json:"ui").

Compat note: existing user sessions invalidate on upgrade — every active
user re-authenticates once after deploying this change.
v1.0.3
2026-05-07 09:21:41 +01:00
lukaszraczylo 684a990f59 fix: reduce yaegi CPU footprint + require auth on SSE/WebSocket bypass
minor-release

Behaviour changes (potentially breaking for operators relying on the prior
unauthenticated SSE bypass):

* SSE (`Accept: text/event-stream`) and WebSocket upgrade requests now
  return 401 when no authenticated session is present. Previously the
  bypass forwarded unconditionally, which let any caller reach the
  backend by setting the right header. Excluded URLs are unchanged.
  Operators relying on unauthenticated SSE/WS access must move the path
  into ExcludedURLs.

Performance fixes (target: long-running dashboards like Grafana / ArgoCD
where many panels poll concurrently while the page stays open):

* Stop honouring isTestMode() for the singleton-token-cleanup interval
  under yaegi (the Traefik plugin runtime). In production the plugin was
  running a 20 Hz no-op cleanup ticker because runtime.Compiler ==
  "yaegi" tripped the test-mode branch.
* processAuthorizedRequest now resolves ID-token claims at most once per
  request via SessionData.GetIDTokenClaims (already cached on the
  session) and reuses them for both groups/roles extraction and
  header-template rendering. Previously every authenticated request
  parsed the JWT twice.
* Added extractGroupsAndRolesFromClaims to drive groups/roles off
  pre-parsed claims; extractGroupsAndRoles still works for tests.
* Removed the unconditional session.MarkDirty() in the header-templates
  branch. Templates only mutate request headers, not session state, so
  the prior MarkDirty was re-encrypting and rewriting all session
  cookies on every authenticated request that used header templates.

Other:

* Added isWebSocketUpgrade (RFC 6455 handshake detection — Connection:
  Upgrade + Upgrade: websocket, tolerant of multi-token Connection
  headers and case).
* Renamed applySSEUserHeaders -> applyBypassUserHeaders; it now returns
  bool so the dispatcher can reject unauthenticated SSE/WS with 401.
* Added tests for SSE and WS bypass covering both the auth-rejection
  path and the authenticated forward path.
v1.0.1
2026-05-02 03:12:20 +01:00
lukaszraczylo 1b6c8616fd fix(refresh): coalesce refresh-token grants + bound goroutines + cache hot path (target v0.8.27) (#131)
* fix(refresh): wire RefreshCoordinator into the live refresh path

The RefreshCoordinator existed but was never instantiated. The actual
refresh path used only session.refreshMutex, which is per-SessionData
instance - and SessionData is pulled from a sync.Pool per request -
so concurrent requests sharing a refresh token had ZERO coordination.

Symptom: when access_token expired (e.g. 5min Zitadel default), every
in-flight request from a polling client (Grafana panels) entered the
refresh path simultaneously and POSTed the same refresh_token to the
IdP. With refresh-token rotation enabled (Zitadel/Authentik default),
only one grant succeeded; the rest got invalid_grant and each cleared
the entire session. Subsequent requests then thrashed in re-auth loops.

This commit:
- adds refreshCoordinator field on TraefikOidc
- instantiates it in NewWithContext with DefaultRefreshCoordinatorConfig
- shuts it down in Close() under shutdownOnce
- routes refreshToken() through the coordinator via coordinatedTokenRefresh,
  which collapses concurrent grants to a single upstream call per
  refresh_token hash
- exports refreshCoordinatorSessionID for both internal hashing and the
  middleware-level wireup so dedup keys stay aligned

Behavioural notes:
- nil-coordinator fallback preserves existing tests that build TraefikOidc
  literals without going through the constructor
- followers receive the same TokenResponse/error as the leader, so no
  per-instance code paths change
- existing TestGetNewTokenWithRefreshToken_Concurrency still passes
  because it hits GetNewTokenWithRefreshToken directly, below the
  coordinator boundary

Tests:
- refresh_coordinator_wireup_test.go: 50 concurrent refreshes coalesce
  to <=2 upstream calls; distinct tokens still run in parallel; nil
  coordinator falls back cleanly

* perf(cache): bound L1 backfill goroutines in HybridBackend

Get() and GetMany() previously spawned a goroutine per L2 hit to write
the value through to L1. Under sustained polling traffic (e.g. a Grafana
dashboard refreshing every 30s with N panels) this minted thousands of
goroutines, each running in Yaegi - directly contributing to the
~1000% CPU spike that pairs with the refresh-token herd.

Replace the per-hit goroutines with a single l1BackfillWorker fed by
l1BackfillBuffer, mirroring the existing asyncWriteBuffer/asyncWriteWorker
pattern for L2 writes. Buffer overflow drops the backfill (counted via
l1BackfillDrops) - a dropped backfill just means the next L2 hit for
that key re-queues it, which is safe.

Tests:
- TestHybridBackend_L1BackfillBounded: 1000 distinct L2 hits keep
  goroutine count within +20 of baseline (pre-fix it grew by ~1000)
- TestHybridBackend_L1BackfillFullDrops: drops are accounted for when
  the buffer is saturated and the worker is stopped

* feat(refresh): implement isRefreshTokenExpired heuristic

Replace the placeholder `return false` with a real check based on the
issued_at timestamp that SetRefreshToken already stamps into the session.
Gated by a new MaxRefreshTokenAgeSeconds config field (default 21600 =
6h, matching the existing comment). 0 disables the check.

This wires the previously-dead refreshTokenExpired branch in middleware.go,
which short-circuits AJAX requests with a 401 instead of letting them
hammer the IdP for a refresh token that's almost certainly stale - the
classic Grafana-after-long-pause failure mode.

Behaviour:
- maxRefreshTokenAge=0 disables the check (preserves prior behaviour)
- legacy sessions without issued_at still attempt one refresh; the IdP
  remains the source of truth on first try
- nil-receiver and nil-session guards keep test code that builds
  TraefikOidc literals safe

Tests:
- TestIsRefreshTokenExpired_DisabledWhenAgeZero
- TestIsRefreshTokenExpired_LegacySessionWithoutTimestamp
- TestIsRefreshTokenExpired_WithinWindow
- TestIsRefreshTokenExpired_BeyondWindow
- TestIsRefreshTokenExpired_NilGuards

* perf(token): skip parseJWT on cache hit in VerifyToken

The token cache fast-return existed but ran AFTER parseJWT, so every
validation paid for base64 + JSON unmarshal even on a hit. Under bursty
traffic (e.g. 10+ concurrent panel requests on every Grafana dashboard
refresh, each calling validateStandardTokens which verifies BOTH the
access token and the ID token), this is two redundant parses per
request multiplied by the panel count.

Move the cache lookup ahead of parseJWT. On a hit the function returns
nil immediately. On a miss the original flow runs unchanged.

Also nil-guard t.tokenCache to keep partial-literal test instances safe
(matches the same pattern we already use for tokenBlacklist).

Tests:
- TestVerifyToken_CacheHitSkipsParse: cache pre-populated with claims
  for a token whose body would fail parseJWT - returns nil iff the
  fast-path bypasses the parse
- TestVerifyToken_CacheMissStillParses: a syntactically valid but
  unsigned token still errors past parseJWT on cache miss

* feat(refresh): cross-replica refresh-grant dedup via shared cache

The in-process RefreshCoordinator added in 9f96d8c already collapses
concurrent refresh-token grants on a single Traefik replica. With the
plugin's existing Redis (Dragonfly) cache infrastructure available, we
can extend that dedup across replicas: if pod A refreshes a token at
T+0 and pod B receives a request for the same session at T+1, pod B
should reuse pod A's result rather than POSTing the now-rotated refresh
token to the IdP.

Implementation:
- Add a refreshResultCache to UniversalCacheManager (memory-only when
  Redis is disabled, Redis-backed in production via the existing
  hybrid/Redis-only mode selection)
- Expose it through CacheManager.GetSharedRefreshResultCache and on the
  TraefikOidc struct as refreshResultCache (CacheInterface)
- Inside the closure passed to RefreshCoordinator.CoordinateRefresh,
  consult the cache first; on hit return immediately, on miss exchange
  with the IdP and populate the cache for peers
- 5s TTL: long enough for siblings to observe, short enough that a
  rotated refresh token cannot be re-supplied after the IdP has moved on
- Errors are intentionally NOT cached - peers must always be able to
  retry on their own

Pragmatic choice: optimistic cache rather than a hard distributed lock.
- A hard lock (SET NX + poll) doubles Redis RTT and risks dead-locks
  if a Traefik pod dies mid-grant.
- The user's BGP+Local externalTrafficPolicy already pins ingress for
  a session to one node in steady state, so cross-pod racing is rare.
- This optimistic path catches the rare failover case without adding
  failure modes.

Tests:
- TestCoordinatedTokenRefresh_CrossReplicaCacheHit: pre-populated cache
  short-circuits the upstream call entirely (0 IdP calls)
- TestCoordinatedTokenRefresh_PopulatesCrossReplicaCache: leader stores
  a successful result for peers to find
- TestCoordinatedTokenRefresh_ErrorIsNotCached: invalid_grant must not
  poison the dedup cache - peers must retry independently
v0.8.27
2026-04-30 18:52:39 +01:00
lukaszraczylo 4d28fa01ab perf(jwk,cache): cache parsed public keys + RLock token cache reads
Hot-path JWT verification rebuilt the public key on every call:
  jwk -> ToRSAPublicKey -> x509.MarshalPKIXPublicKey -> pem.Encode
  -> verifySignature -> pem.Decode -> x509.ParsePKIXPublicKey -> verify
Under yaegi this pinned a CPU when many concurrent dashboard panels
poll behind the middleware. The PEM round trip is pure waste.

* jwk.go: cache pre-parsed crypto.PublicKey per kid alongside the
  raw JWKSet (parallel cache entry, same 1h TTL, invalidates together).
* jwt.go: split verifySignatureWithKey from verifySignature; existing
  PEM-input entry point preserved for backchannel-logout callers.
* token_manager.go: VerifyJWTSignatureAndClaims now goes straight from
  jwks cache to verifySignatureWithKey, no PEM round trip and no
  per-request availableKids slice.
* universal_cache.go: token/JWK/session Get() takes RLock when the
  entry is unexpired, so concurrent token verifications no longer
  serialize on a single mutex. LRU semantics for general and metadata
  caches are unchanged (tests cover the strict-LRU contract there).
* mocks: MockJWKCache, EnhancedMockJWKCache, mockJWKCacheForLogout,
  staticJWKCache satisfy the extended interface.
v0.8.26
2026-04-30 10:14:10 +01:00
lukaszraczylo 2d1b04c637 review fixes apr 2026 (#130)
* Multiple fixes

- refresh coordinator dedup + memory pressure wire
- middleware sse consolidation + timer leak + claim cache
- universal cache sync backfill + isDebug gate
- lazy background task race
- memory monitor stw cached + refresh() api

* fix(auth): suppress OIDC redirects on non-navigation requests

- [x] Add isNonNavigationRequest using Sec-Fetch-Mode and Accept headers
- [x] Add comprehensive TestIsNonNavigationRequest
- [x] Update ServeHTTP to 401 non-navigation and AJAX requests

Fixes #129

* feat(config): add custom CA and insecure skip verify for OIDC TLS

- [x] Add CACertPath, CACertPEM, InsecureSkipVerify to Config
- [x] Implement loadCACertPool for CA bundle loading
- [x] Update HTTPClientConfig with RootCAs and InsecureSkipVerify
- [x] Apply CA pool and skip verify to pooled HTTP clients
- [x] Enhance configKey to distinguish TLS configs
- [x] Add comprehensive ca_cert_test.go

Fixes #125

* feat(oidc): add custom CA certificate support for private OIDC providers

- [x] Add caCertPath, caCertPEM, insecureSkipVerify config options
- [x] Update traefik.yml with new OIDC client config fields
- [x] Add configuration schema descriptions for new options
- [x] Update README table and add Custom CA Certificates section

* Fix the documentation.

* test(redis): add oversized argument rejection test

- [x] Add TestRedisConn_RejectOversizedArgumentBytes
- [x] Import strings package

* Dependencies cleanup
v0.8.25
2026-04-19 10:12:00 +01:00
lukaszraczylo ccbb98b9dd fix-issue-122 (#128) v0.8.24 2026-03-04 00:23:30 +00:00
Serhii Vasyliev 1362cc0dac Improve debug logging around callback URL matching (#126)
* Add debug logging around callback URL matching in ServeHTTP

The callback URL comparison at the core of OIDC flow had zero logging,
making it extremely difficult to diagnose redirect loop issues caused
by misconfigured callbackURL (e.g., full URL vs path-only).

Every other path comparison in ServeHTTP already logs debug info
(logout, backchannel, frontchannel, excluded URLs), but the callback
URL check was completely silent.

Added debug logs that show:
- The values being compared (request path vs configured callback)
- Whether the match succeeded or failed
- Configured redirURLPath during initialization

This would have immediately revealed the root cause of issue #1
where callbackURL was set as a full URL but compared against
req.URL.Path which only contains the path component.

Closes #3

* improve-callback-url-logging: Add init-time logging for callbackURL config
v0.8.23
2026-02-23 10:36:37 +00:00
Yuval Bar-On 249dcad1b3 fix: prevent deadlock in SessionData.Clear method (#114)
Move mutex unlock before calling Save() to prevent potential deadlock
when Save() method needs to acquire the same mutex.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-authored-by: Claude <noreply@anthropic.com>
v0.8.22
2026-02-16 15:02:33 +00:00
lukaszraczylo de4b4d7258 fix(cache): remove sync.Pool for Yaegi compatibility (#121)
- [x] Remove sync.Pool implementation that causes reflection panics
- [x] Replace pool-based NewRESPWriter with direct instantiation
- [x] Replace pool-based NewRESPReader with direct instantiation
- [x] Convert Release() methods to no-ops for API compatibility
- [x] Add documentation explaining sync.Pool removal for Yaegi
- [x] Remove "sync" import

Resolves #120
v0.8.21
2026-01-19 17:52:31 +00:00
lukaszraczylo 9d52f1b018 feat(core): refactor linters config and improve code quality (#119)
- [x] Reorganize golangci-lint configuration with documented disable reasons
- [x] Simplify errcheck and revive linter rules with targeted exclusions
- [x] Pre-compile regex patterns in input_validation.go for performance
- [x] Fix type assertions in memory_shard.go and resp.go with safety checks
- [x] Replace string comparison with EqualFold for case-insensitive matching
- [x] Fix loop variable captures in jwk.go and logout.go
- [x] Change high goroutine log level from Info to Debug in autocleanup.go
- [x] Replace deprecated "cancelled" spelling with "canceled" throughout
- [x] Add nolint annotations for intentional unused parameters
- [x] Improve comment formatting for deprecated functions
- [x] Fix comment spelling: "marshalling" → "marshaling"
- [x] Refactor provider warnings formatting in internal/providers/warnings.go
- [x] Simplify metrics summary building in internal/recovery/metrics.go
- [x] Pre-allocate slice in error_recovery.go GetDegradedServices
- [x] Refactor context cancellation checks in redis.go
v0.8.20
2026-01-15 10:40:49 +00:00
lukaszraczylo 57724918fe fix 116 (#118)
* Fix cache serialisation

* fix(cache): add integer overflow protection for serialization

- [x] Add maxCacheEntrySize constant (64 MiB) to prevent memory overflow
- [x] Validate byte slice size before adding marker byte
- [x] Validate JSON-serialized data size before marker addition
- [x] Add comprehensive overflow protection test cases

* docs: add security fix documentation for integer overflow protection

* test: fix goroutine tests to use mock OIDC servers

The TestContextAwareGoroutineManagement tests were making real HTTP
calls to hardcoded URLs like https://example.com, causing failures
in CI when those requests timeout or return HTTP errors.

Changes:
- Added createMockOIDCServer() helper function using httptest
- Updated GoroutineCleanupOnContextCancel to use mock server
- Updated NoGoroutineLeakOnMultipleInstances to use 3 mock servers
- Updated SingletonTasksAcrossInstances to use mock servers array

This prevents network calls and makes tests more reliable and faster.

Fixes test failures in GitHub Actions CI.
v0.8.19
2026-01-08 22:50:46 +00:00
lukaszraczylo 775de2ada1 Fix cache serialisation (#117)
* Fix cache serialisation

* fix(cache): add integer overflow protection for serialization

- [x] Add maxCacheEntrySize constant (64 MiB) to prevent memory overflow
- [x] Validate byte slice size before adding marker byte
- [x] Validate JSON-serialized data size before marker addition
- [x] Add comprehensive overflow protection test cases
2026-01-08 22:06:19 +00:00
lukaszraczylo 7816e05c98 fix issue with logout url (#112)
* fix(logout): handle logout requests before OIDC initialization

- [x] Add debug logging to logout handler entry point
- [x] Move logout path check before OIDC initialization to enable logout when provider unavailable
- [x] Move excluded URL and SSE checks before initialization wait
- [x] Add debug logging for initialization wait to diagnose hanging requests
- [x] Add test for logout functionality without OIDC provider availability

* feat(logout): implement OIDC backchannel and front-channel logout

- [x] Add logout token validation and backchannel logout handler
- [x] Add front-channel logout handler with iframe support
- [x] Implement session invalidation cache for distributed deployments
- [x] Add comprehensive logout token claim verification (issuer, audience, events, iat, sid/sub)
- [x] Integrate session invalidation checks into authorization flow
- [x] Add configuration options for enabling backchannel/front-channel logout
- [x] Add extensive test coverage for logout flows and edge cases
- [x] Update documentation with logout configuration examples
- [x] Add middleware routing for logout endpoints
- [x] Extend cache manager with session invalidation cache support

Resolves #110

* fixup! feat(logout): implement OIDC backchannel and front-channel logout

* fixup! Merge branch 'main' into fix-issue-with-logout-url
v0.8.17
2026-01-04 01:59:50 +00:00
Dominik Chilla 8bf7998150 Fix for Hashicorp Vault - accept opaque access tokens with dot-characters (#113) v0.8.16 2026-01-02 16:42:22 +00:00
muffn_ 22c4323fcb fix: set X-Forwarded-User header for SSE requests from existing session (#111)
Co-authored-by: muffin <MonsterMuffin@users.noreply.github.com>
v0.8.15
2026-01-02 02:50:11 +00:00
lukaszraczylo 06b219d1f8 feat(dcr): Add Redis storage support for multi-replica deployments (#109)
- [x] Add file and Redis storage backends for DCR credentials
- [x] Implement storage abstraction with FileStore and RedisStore
- [x] Add factory function for automatic backend selection (auto/file/redis)
- [x] Integrate DCR credentials cache into UniversalCacheManager
- [x] Add comprehensive tests for storage backends and factory
- [x] Update configuration schema with storage backend options
- [x] Update documentation with multi-replica deployment guidance
- [x] Add Redis key prefix configuration for credential isolation
v0.8.14
2025-12-31 12:52:39 +00:00
lukaszraczylo 413e4a1b7d LRU + cache conflicts prevention. (#104)
* LRU + cache conflicts prevention.

* Bugfix universalCache flooding ( issue #105 )

  1. Traefik cancels the context for old plugin instances
  2. Each plugin's Close() method is called
  3. The CacheInterfaceWrapper.Close() was calling cache.Close() on the shared singleton caches
  4. Each Close() triggered Clear() which logged "Cleared all items" at INFO level
v0.8.13
2025-12-24 18:54:39 +00:00