Commit Graph

18 Commits

Author SHA1 Message Date
lukaszraczylo 5f9c574f95 refactor: delete dead non-RS validators; tests use RS variants
After v1.0.20 the non-RS validation chain had no production callers —
middleware.ServeHTTP dispatched exclusively through isUserAuthenticatedRS.
The orphaned functions stayed reachable only from a handful of test
files and risked silent logic drift against their RS counterparts.

Deleted from production code (~440 LOC):
  - auth_flow.go:        isUserAuthenticated
  - token_manager.go:    validateAzureTokens
  - token_manager.go:    validateGoogleTokens
  - token_manager.go:    validateStandardTokens
  - token_manager.go:    validateTokenExpiry
  - removed now-unused encoding/base64 and encoding/json imports
    from token_manager.go (only the deleted validateStandardTokens
    needed them; the RS variant in token_validation_rs.go keeps its
    own imports).

Added (3 LOC):
  - token_validation_rs.go: validateGoogleTokensRS (trivial delegator,
    parity with the deleted non-RS variant so isUserAuthenticatedRS
    can dispatch cleanly).

Tests ported (10 call sites across 3 files):
  - audience_test.go:                ts.tOidc.validateStandardTokens
  - azure_oidc_test.go:              tOidc.validateAzureTokens,
                                     ts.tOidc.validateGoogleTokens,
                                     ts.tOidc.validateAzureTokens,
                                     ts.tOidc.isUserAuthenticated
  - issue134_followup_graph_test.go: oidc.validateAzureTokens (4x)

Each ported site now constructs a *requestState from its existing
*SessionData via (&requestState{}).captureSession(session) and calls
the *RS variant. Same data, different read source.

Net diff: -440 LOC production, ~+25 LOC tests, +3 LOC stub.
Production now has a single source of truth for token validation;
no parallel implementations to keep in sync.

All tests pass with -race; golangci-lint clean.
2026-05-23 13:04:26 +01:00
lukaszraczylo a548665edb feat: opt-in M2M bearer-token authentication (supersedes #93) (#140)
* docs: bearer-token auth design spec

* docs: harden bearer-auth spec with security review findings

* feat(bearer): opt-in M2M bearer-token authentication

Adds an opt-in Authorization: Bearer <jwt> path for machine-to-machine
clients. Replaces and supersedes the broken approach in PR #93
(synthetic-session that omitted user_identifier and skipped ID-token
rejection / replay-protection-semantics / kid-pinning / etc.).

Design

  Two auth entrypoints feed one shared post-auth pipeline:

    cookie path  ─┐
                  ├── forwardAuthorized(rw, req, *principal)
    bearer path  ─┘    (roles/groups, header injection, security
                        headers, cookie strip, forward)

  buildPrincipalFromSession and buildPrincipalFromBearerToken produce
  the same `principal` value type. forwardAuthorized is session-agnostic
  and runs the existing post-auth work; processAuthorizedRequest now
  wraps it with the session-specific concerns (backchannel-logout,
  dirty/Save). The cookie path's behaviour is byte-identical to before
  this PR; the existing test suite passes unmodified.

Security hardening baked into the bearer path

  - Audience MANDATORY. Startup fails when EnableBearerAuth=true and
    Audience is empty.
  - BearerIdentifierClaim defaults to "sub"; "email" is rejected at
    startup to avoid the unverified-email spoofing footgun. Cookie
    path's UserIdentifierClaim is unaffected and still defaults to
    "email".
  - ID tokens explicitly rejected via the existing detectTokenType
    helper (nonce, typ=at+jwt, token_use, scope, aud-vs-clientID
    heuristics); belt-and-braces nonce/token_use=id rejection on top.
  - alg pinned to asymmetric allowlist (RS/PS/ES 256/384/512) BEFORE
    JWKS fetch, blocking alg=none and alg=HS* probes from amplifying
    into upstream calls.
  - kid length capped at 256 bytes and charset-restricted before JWKS
    fetch, blocking pathological-kid JWKS amplification.
  - Multi-audience tokens require azp == clientID.
  - iat upper-age bound (MaxTokenAgeSeconds, default 24h) bounds clock-
    manipulation and forever-token abuse.
  - Identifier sanitization: length cap, control-char + bidi-override
    + delimiter (, ; =) rejection.
  - Per-IP failure throttle: configurable threshold/window/penalty;
    returns 429 + Retry-After. Limits offline-guessing-style attacks
    and protects the shared rate-limiter / JWKS endpoint.
  - JTI replay marking suppressed via new internal verifyOpts
    {skipReplayMarking} so the same bearer can be reused until exp;
    the blacklist Get stays active so RevokeToken still terminates a
    bearer token immediately. The existing exported VerifyToken
    interface is unchanged so all mocks continue to work.
  - Cookie wins by default when both bearer and cookie are present
    (safer against browser/extension/proxy bearer injection).
    Operator can flip via BearerOverridesCookie.
  - Authorization header stripped on forward by default; also stripped
    on excluded URLs so the token can't leak into health/metrics
    downstream logs.
  - Optional RFC 7662 introspection via existing
    requireTokenIntrospection. Introspection-endpoint failure returns
    503 (distinguishes infra from token rejection).
  - 401s use RFC 6750 WWW-Authenticate hints (toggleable). Failure
    reason is logged at debug; raw tokens are never logged.

Implementation

  - principal.go: pure-data principal type and buildPrincipalFromSession.
  - bearer_auth.go: alg/kid pin, classifier, identifier sanitization,
    multi-aud azp gate, iat age check, per-IP failure tracker,
    handleBearerRequest, buildPrincipalFromBearerToken.
  - token_manager.go: VerifyToken now wraps a new verifyTokenWithOpts
    that accepts internal-only verifyOpts. Existing callers, the
    TokenVerifier interface, and all mocks unchanged.
  - middleware.go: extracted forwardAuthorized from
    processAuthorizedRequest; wired bearer detection after init wait
    + after bypass; excluded-URL Authorization strip when bearer
    enabled.
  - settings.go: ten new config fields with defaults applied in
    CreateConfig.
  - main.go: startup validation for audience + identifier-claim
    guard; bearer failure tracker init.

Tests

  - bearer_auth_test.go: table-driven helper tests for every new
    component (parseBearerJOSEHeader, sanitizeBearerIdentifier,
    resolveBearerIdentifier, enforceMultiAudienceAzp, enforceIatAge,
    bearerFailureTracker, detectBearerToken). Integration tests
    through ServeHTTP covering happy path, ID-token rejection,
    alg=none rejection, oversized kid, multi-aud with/without azp,
    iat-too-old, bidi identifier, replay (100x reuse), 429 throttle
    trip, excluded-URL strip, roles gate, cookie-wins precedence,
    BearerOverridesCookie, oversized token, malformed JWT,
    feature-off pass-through. Startup validation for audience-
    required and email-identifier-rejected.
  - All existing tests pass unmodified (cookie-path regression).
  - go vet clean. golangci-lint clean (0 issues). Race detector
    clean on bearer tests.

Documentation

  - README.md: bearer auth section with security highlights and
    config snippet; doc link in the index.
  - .traefik.yml: commented config block exposing every bearer knob.
  - docs/CONFIGURATION.md: new subsection with full parameter table.
  - docs/BEARER_AUTH.md: threat model, hardening matrix, failure
    response table, operational guidance, known follow-ups.
  - docs/superpowers/specs/2026-05-18-bearer-token-auth-design.md:
    design spec + security-review hardening history.

* fix(cache): redact raw cache keys in debug logs (CodeQL go/clear-text-logging)

CodeQL flagged 9 high-severity alerts (go/clear-text-logging) where the
in-memory cache and the hybrid L1+L2 backend printed `key=%s` at debug.
Cache callers (token cache, blacklist, introspection cache) pass raw
access / refresh / id tokens as cache keys, so any debug-enabled
deployment would write them to log streams.

Pre-existing issue. CodeQL started flagging it on this PR because the
new bearer-auth path adds a data-flow source (req.Header.Get("Authorization"))
that reaches the existing logging sinks via the same cache. The cookie
path had the same risk but wasn't tracked as taint by CodeQL.

Fix: hash the key (SHA-256[:8] hex) before printing. Same approach the
bearer-auth logger uses for principal identifiers (spec §13). Doesn't
change cache semantics — same key still produces the same hash, so
debug correlation across log lines is preserved without exposing the
raw value.

Touches both affected packages:
  - internal/cache/cache.go (2 sites: Set + LRU eviction)
  - internal/cache/backends/hybrid.go (12 sites: L1/L2 read/write/fallback)

New helper `redactKey` colocated with each package (unexported,
package-local) keeps the change blast radius narrow. Tests green; lint
clean.

* docs(bearer): how to obtain bearer tokens from the OIDC provider

Adds a section walking operators through the OAuth 2.0 client_credentials
flow (RFC 6749 §4.4) and the JWT bearer assertion alternative (RFC 7523),
with a worked Auth0-shape curl example, a per-provider quick reference
(Auth0, Okta, Keycloak, Entra v2, Cognito, GitLab, Google), operational
notes (token TTL, caching, JWKS rotation, revocation, scope vs audience,
secret hygiene), and a three-line validation loop.

Most common operator confusion: "I enabled the feature but tokens get
401'd" — almost always missing or wrong audience. The new section makes
the audience-matching requirement loud, with per-provider parameter
names so people don't have to dig through IdP docs.

Locations:
  - docs/BEARER_AUTH.md  — full section under "Quick start"
  - README.md            — short snippet + deep link
2026-05-18 17:35:37 +01:00
lukaszraczylo 8c5df82dcf fix(azure): treat Microsoft proprietary access tokens as opaque (#134) (#138)
Followup to issue #134 — two reporters returned saying that even with the
JWKS caching fix in v1.0.7/v1.0.8, every request emitted:

  ERROR: TraefikOidcPlugin: UNKNOWN token verification failed:
    signature verification failed: crypto/rsa: verification error
  ERROR: TraefikOidcPlugin: DIAGNOSTIC: Signature verification failed for
    kid=<kid>, alg=RS256: crypto/rsa: verification error

Root cause: when an Azure tenant is configured without a custom API
resource, Microsoft issues access tokens for Microsoft Graph (or Azure
Mgmt). These tokens carry a `nonce` value in the JWT *header*; the bytes
that get signed contain SHA256(nonce), while the wire token ships the
original nonce. Any standard JWS verifier rejects the signature, which is
exactly Microsoft's intent — they document the format as proprietary and
tell client apps not to validate it
(https://learn.microsoft.com/en-us/entra/identity-platform/access-tokens
"you can't validate tokens for Microsoft Graph according to these rules
due to their proprietary format").

validateAzureTokens was nonetheless attempting JWT verification on every
JWT-shaped access token, then silently falling back to the ID token when
verification failed. Auth still worked end-to-end, but every request
spammed two error log lines.

Two-layer defense:

* validateAzureTokens now detects the proprietary-nonce header before
  calling verifyToken on the access token. When detected, the token is
  treated as opaque (matching the existing branch for non-JWT tokens) and
  validation proceeds via the ID token, exactly as Microsoft prescribes.

* VerifyJWTSignatureAndClaims downgrades the DIAGNOSTIC error log to
  debug for tokens carrying the same proprietary marker, in case any
  path outside validateAzureTokens reaches it.

Authorization still hinges on a separately-verifiable ID token — the
confused-deputy guard from CWE-441 is preserved (and explicitly tested).
2026-05-11 17:31:37 +01:00
lukaszraczylo 1e33bb0a4d feat(auth): support private_key_jwt and client_secret_basic (#137)
revocation endpoints, joining the existing client_secret_post default.
Both are opt-in via the new clientAuthMethod config field. Closes #135.

private_key_jwt (RFC 7523 §2.2 / OpenID Connect Core §9)
========================================================
Plugin signs a short-lived JWT with a configured private key and presents
it as client_assertion. Use when the IdP enforces short secret TTLs or
requires secretless client auth (Microsoft Entra ID / Azure AD, Okta,
Auth0, Keycloak).

New Config fields:
  clientAuthMethod          (default: client_secret_post)
  clientAssertionPrivateKey (inline PEM)
  clientAssertionKeyPath    (PEM file path; mutually exclusive)
  clientAssertionKeyID      (JWS kid header — required)
  clientAssertionAlg        (default: RS256; RS/PS/ES 256–512 supported)

PEM forms accepted: PKCS#8, PKCS#1, SEC1.
Assertion claims: iss=sub=clientID, aud=tokenURL, iat=now, exp=now+60s,
random 16-byte hex jti per request. ECDSA signatures are raw r||s per
RFC 7515 (not ASN.1).

client_secret_basic (RFC 6749 §2.3.1)
=====================================
Sends credentials in the Authorization: Basic header instead of the
body. Both halves are form-urlencoded individually before base64 — that
encoding step is required by the spec and is NOT what stdlib's
http.Request.SetBasicAuth does, so the plugin uses its own helper. The
form body omits client_id and client_secret on this path.

Wire-up
=======
Both methods are dispatched at the same two call sites:
  helpers.go:exchangeTokens — auth_code + refresh_token grants
  token_manager.go:RevokeTokenWithProvider — RFC 7009 revocation

Existing clientSecret deployments are unaffected — empty
clientAuthMethod maps to the historical client_secret_post behavior, and
clientAssertion remains nil unless the new fields are set.

Yaegi compatibility
===================
All required crypto/rsa, crypto/ecdsa, crypto/x509, encoding/pem and
crypto/sha256/384/512 symbols are exposed by the traefik/yaegi stdlib
symbol tables (RSA SignPKCS1v15 + SignPSS, ECDSA Sign,
ParsePKCS8/1PrivateKey, ParseECPrivateKey).

Tests (16 new)
==============
Algorithm-family coverage:
  TestIssue135_SignerRSAFamily — RS256/384/512 + PS256/384/512
  TestIssue135_SignerECDSAFamily — ES256/384/512, raw r||s shape
  TestIssue135_SignerRejectsAlgKeyMismatch
  TestIssue135_SignerJTIUniqueness — 50 sigs, all jti distinct
  TestIssue135_SignerPEMVariants — PKCS#8, PKCS#1, SEC1

Config validation:
  TestIssue135_ConfigValidation — full Validate() matrix
  TestIssue135_ConfigKeyPathLoadsFile

Wire-up:
  TestIssue135_AuthCodeExchangeUsesAssertion
  TestIssue135_RefreshTokenUsesAssertion
  TestIssue135_BackcompatClientSecretPath
  TestIssue135_RevocationUsesAssertion
  TestIssue135_BuildSignerFromInlineConfig
  TestIssue135_BuildSignerDefaultsToRS256
  TestIssue135_ClientSecretBasicAuth — Authorization header, no body creds
  TestIssue135_ClientSecretBasicURLEncodesReservedChars — :, +, /, @, =, &
  TestIssue135_ClientSecretBasicRevocation — revocation parity

Documentation
=============
  README.md — required-row note + 5 optional rows + dedicated section
  docs/CONFIGURATION.md — new Client Authentication section with three
    method subsections, OpenSSL keygen snippet, RFC links
  docs/index.html — 5 new config-table rows + Private Key JWT
    explainer card
  .traefik.yml + examples/complete-traefik-config.yaml — commented
    opt-in example

Out of scope (deferred)
=======================
mTLS / tls_client_auth (RFC 8705) — separate change; requires per-call
http.Client with tls.Config.Certificates and conflicts with the current
pooled HTTP client architecture.
2026-05-09 18:02:41 +01:00
lukaszraczylo 9cbca4c4fb fix(refresh): honor userIdentifierClaim in token refresh path (#132)
patch-release

The refresh path in token_manager.go hardcoded the "email" claim when
extracting the user identifier from a refreshed ID token, ignoring the
configured userIdentifierClaim. Keycloak users without an email claim
(using sub or another identifier) were kicked out on refresh even
though their initial login worked.

The callback path (auth_flow.go:226-239) already honored
userIdentifierClaim with "sub" fallback; PR #100 (commit a316a98)
added that support but missed the refresh path.

Mirror the callback logic in refreshToken so both paths behave the same.

Cleanup: rename Get/SetEmail to Get/SetUserIdentifier on SessionData
to match the actual semantics. The slot already stored the configured
identifier (email, sub, oid, upn, preferred_username), only the API
name was misleading. Storage key "email" → "user_identifier" and
combinedSessionPayload field E (json:"e") → Ui (json:"ui").

Compat note: existing user sessions invalidate on upgrade — every active
user re-authenticates once after deploying this change.
2026-05-07 09:21:41 +01:00
lukaszraczylo 684a990f59 fix: reduce yaegi CPU footprint + require auth on SSE/WebSocket bypass
minor-release

Behaviour changes (potentially breaking for operators relying on the prior
unauthenticated SSE bypass):

* SSE (`Accept: text/event-stream`) and WebSocket upgrade requests now
  return 401 when no authenticated session is present. Previously the
  bypass forwarded unconditionally, which let any caller reach the
  backend by setting the right header. Excluded URLs are unchanged.
  Operators relying on unauthenticated SSE/WS access must move the path
  into ExcludedURLs.

Performance fixes (target: long-running dashboards like Grafana / ArgoCD
where many panels poll concurrently while the page stays open):

* Stop honouring isTestMode() for the singleton-token-cleanup interval
  under yaegi (the Traefik plugin runtime). In production the plugin was
  running a 20 Hz no-op cleanup ticker because runtime.Compiler ==
  "yaegi" tripped the test-mode branch.
* processAuthorizedRequest now resolves ID-token claims at most once per
  request via SessionData.GetIDTokenClaims (already cached on the
  session) and reuses them for both groups/roles extraction and
  header-template rendering. Previously every authenticated request
  parsed the JWT twice.
* Added extractGroupsAndRolesFromClaims to drive groups/roles off
  pre-parsed claims; extractGroupsAndRoles still works for tests.
* Removed the unconditional session.MarkDirty() in the header-templates
  branch. Templates only mutate request headers, not session state, so
  the prior MarkDirty was re-encrypting and rewriting all session
  cookies on every authenticated request that used header templates.

Other:

* Added isWebSocketUpgrade (RFC 6455 handshake detection — Connection:
  Upgrade + Upgrade: websocket, tolerant of multi-token Connection
  headers and case).
* Renamed applySSEUserHeaders -> applyBypassUserHeaders; it now returns
  bool so the dispatcher can reject unauthenticated SSE/WS with 401.
* Added tests for SSE and WS bypass covering both the auth-rejection
  path and the authenticated forward path.
2026-05-02 03:12:20 +01:00
lukaszraczylo 1b6c8616fd fix(refresh): coalesce refresh-token grants + bound goroutines + cache hot path (target v0.8.27) (#131)
* fix(refresh): wire RefreshCoordinator into the live refresh path

The RefreshCoordinator existed but was never instantiated. The actual
refresh path used only session.refreshMutex, which is per-SessionData
instance - and SessionData is pulled from a sync.Pool per request -
so concurrent requests sharing a refresh token had ZERO coordination.

Symptom: when access_token expired (e.g. 5min Zitadel default), every
in-flight request from a polling client (Grafana panels) entered the
refresh path simultaneously and POSTed the same refresh_token to the
IdP. With refresh-token rotation enabled (Zitadel/Authentik default),
only one grant succeeded; the rest got invalid_grant and each cleared
the entire session. Subsequent requests then thrashed in re-auth loops.

This commit:
- adds refreshCoordinator field on TraefikOidc
- instantiates it in NewWithContext with DefaultRefreshCoordinatorConfig
- shuts it down in Close() under shutdownOnce
- routes refreshToken() through the coordinator via coordinatedTokenRefresh,
  which collapses concurrent grants to a single upstream call per
  refresh_token hash
- exports refreshCoordinatorSessionID for both internal hashing and the
  middleware-level wireup so dedup keys stay aligned

Behavioural notes:
- nil-coordinator fallback preserves existing tests that build TraefikOidc
  literals without going through the constructor
- followers receive the same TokenResponse/error as the leader, so no
  per-instance code paths change
- existing TestGetNewTokenWithRefreshToken_Concurrency still passes
  because it hits GetNewTokenWithRefreshToken directly, below the
  coordinator boundary

Tests:
- refresh_coordinator_wireup_test.go: 50 concurrent refreshes coalesce
  to <=2 upstream calls; distinct tokens still run in parallel; nil
  coordinator falls back cleanly

* perf(cache): bound L1 backfill goroutines in HybridBackend

Get() and GetMany() previously spawned a goroutine per L2 hit to write
the value through to L1. Under sustained polling traffic (e.g. a Grafana
dashboard refreshing every 30s with N panels) this minted thousands of
goroutines, each running in Yaegi - directly contributing to the
~1000% CPU spike that pairs with the refresh-token herd.

Replace the per-hit goroutines with a single l1BackfillWorker fed by
l1BackfillBuffer, mirroring the existing asyncWriteBuffer/asyncWriteWorker
pattern for L2 writes. Buffer overflow drops the backfill (counted via
l1BackfillDrops) - a dropped backfill just means the next L2 hit for
that key re-queues it, which is safe.

Tests:
- TestHybridBackend_L1BackfillBounded: 1000 distinct L2 hits keep
  goroutine count within +20 of baseline (pre-fix it grew by ~1000)
- TestHybridBackend_L1BackfillFullDrops: drops are accounted for when
  the buffer is saturated and the worker is stopped

* feat(refresh): implement isRefreshTokenExpired heuristic

Replace the placeholder `return false` with a real check based on the
issued_at timestamp that SetRefreshToken already stamps into the session.
Gated by a new MaxRefreshTokenAgeSeconds config field (default 21600 =
6h, matching the existing comment). 0 disables the check.

This wires the previously-dead refreshTokenExpired branch in middleware.go,
which short-circuits AJAX requests with a 401 instead of letting them
hammer the IdP for a refresh token that's almost certainly stale - the
classic Grafana-after-long-pause failure mode.

Behaviour:
- maxRefreshTokenAge=0 disables the check (preserves prior behaviour)
- legacy sessions without issued_at still attempt one refresh; the IdP
  remains the source of truth on first try
- nil-receiver and nil-session guards keep test code that builds
  TraefikOidc literals safe

Tests:
- TestIsRefreshTokenExpired_DisabledWhenAgeZero
- TestIsRefreshTokenExpired_LegacySessionWithoutTimestamp
- TestIsRefreshTokenExpired_WithinWindow
- TestIsRefreshTokenExpired_BeyondWindow
- TestIsRefreshTokenExpired_NilGuards

* perf(token): skip parseJWT on cache hit in VerifyToken

The token cache fast-return existed but ran AFTER parseJWT, so every
validation paid for base64 + JSON unmarshal even on a hit. Under bursty
traffic (e.g. 10+ concurrent panel requests on every Grafana dashboard
refresh, each calling validateStandardTokens which verifies BOTH the
access token and the ID token), this is two redundant parses per
request multiplied by the panel count.

Move the cache lookup ahead of parseJWT. On a hit the function returns
nil immediately. On a miss the original flow runs unchanged.

Also nil-guard t.tokenCache to keep partial-literal test instances safe
(matches the same pattern we already use for tokenBlacklist).

Tests:
- TestVerifyToken_CacheHitSkipsParse: cache pre-populated with claims
  for a token whose body would fail parseJWT - returns nil iff the
  fast-path bypasses the parse
- TestVerifyToken_CacheMissStillParses: a syntactically valid but
  unsigned token still errors past parseJWT on cache miss

* feat(refresh): cross-replica refresh-grant dedup via shared cache

The in-process RefreshCoordinator added in 9f96d8c already collapses
concurrent refresh-token grants on a single Traefik replica. With the
plugin's existing Redis (Dragonfly) cache infrastructure available, we
can extend that dedup across replicas: if pod A refreshes a token at
T+0 and pod B receives a request for the same session at T+1, pod B
should reuse pod A's result rather than POSTing the now-rotated refresh
token to the IdP.

Implementation:
- Add a refreshResultCache to UniversalCacheManager (memory-only when
  Redis is disabled, Redis-backed in production via the existing
  hybrid/Redis-only mode selection)
- Expose it through CacheManager.GetSharedRefreshResultCache and on the
  TraefikOidc struct as refreshResultCache (CacheInterface)
- Inside the closure passed to RefreshCoordinator.CoordinateRefresh,
  consult the cache first; on hit return immediately, on miss exchange
  with the IdP and populate the cache for peers
- 5s TTL: long enough for siblings to observe, short enough that a
  rotated refresh token cannot be re-supplied after the IdP has moved on
- Errors are intentionally NOT cached - peers must always be able to
  retry on their own

Pragmatic choice: optimistic cache rather than a hard distributed lock.
- A hard lock (SET NX + poll) doubles Redis RTT and risks dead-locks
  if a Traefik pod dies mid-grant.
- The user's BGP+Local externalTrafficPolicy already pins ingress for
  a session to one node in steady state, so cross-pod racing is rare.
- This optimistic path catches the rare failover case without adding
  failure modes.

Tests:
- TestCoordinatedTokenRefresh_CrossReplicaCacheHit: pre-populated cache
  short-circuits the upstream call entirely (0 IdP calls)
- TestCoordinatedTokenRefresh_PopulatesCrossReplicaCache: leader stores
  a successful result for peers to find
- TestCoordinatedTokenRefresh_ErrorIsNotCached: invalid_grant must not
  poison the dedup cache - peers must retry independently
2026-04-30 18:52:39 +01:00
lukaszraczylo 4d28fa01ab perf(jwk,cache): cache parsed public keys + RLock token cache reads
Hot-path JWT verification rebuilt the public key on every call:
  jwk -> ToRSAPublicKey -> x509.MarshalPKIXPublicKey -> pem.Encode
  -> verifySignature -> pem.Decode -> x509.ParsePKIXPublicKey -> verify
Under yaegi this pinned a CPU when many concurrent dashboard panels
poll behind the middleware. The PEM round trip is pure waste.

* jwk.go: cache pre-parsed crypto.PublicKey per kid alongside the
  raw JWKSet (parallel cache entry, same 1h TTL, invalidates together).
* jwt.go: split verifySignatureWithKey from verifySignature; existing
  PEM-input entry point preserved for backchannel-logout callers.
* token_manager.go: VerifyJWTSignatureAndClaims now goes straight from
  jwks cache to verifySignatureWithKey, no PEM round trip and no
  per-request availableKids slice.
* universal_cache.go: token/JWK/session Get() takes RLock when the
  entry is unexpired, so concurrent token verifications no longer
  serialize on a single mutex. LRU semantics for general and metadata
  caches are unchanged (tests cover the strict-LRU contract there).
* mocks: MockJWKCache, EnhancedMockJWKCache, mockJWKCacheForLogout,
  staticJWKCache satisfy the extended interface.
2026-04-30 10:14:10 +01:00
lukaszraczylo 9d52f1b018 feat(core): refactor linters config and improve code quality (#119)
- [x] Reorganize golangci-lint configuration with documented disable reasons
- [x] Simplify errcheck and revive linter rules with targeted exclusions
- [x] Pre-compile regex patterns in input_validation.go for performance
- [x] Fix type assertions in memory_shard.go and resp.go with safety checks
- [x] Replace string comparison with EqualFold for case-insensitive matching
- [x] Fix loop variable captures in jwk.go and logout.go
- [x] Change high goroutine log level from Info to Debug in autocleanup.go
- [x] Replace deprecated "cancelled" spelling with "canceled" throughout
- [x] Add nolint annotations for intentional unused parameters
- [x] Improve comment formatting for deprecated functions
- [x] Fix comment spelling: "marshalling" → "marshaling"
- [x] Refactor provider warnings formatting in internal/providers/warnings.go
- [x] Simplify metrics summary building in internal/recovery/metrics.go
- [x] Pre-allocate slice in error_recovery.go GetDegradedServices
- [x] Refactor context cancellation checks in redis.go
2026-01-15 10:40:49 +00:00
lukaszraczylo c474bbafd6 Cleanup [dec2025] (#101)
* Cleanup excessive comments.

* Remove leftovers hanging around from previous refactor

* Improve test coverage
2025-12-09 01:38:02 +00:00
lukaszraczylo 66b9ed0861 Reauthentication + redis fix
When introspection explicitly returns that a token is inactive/revoked/expired, the plugin now properly triggers re-authentication or refresh instead of falling back to ID token validation. This fixes the functional issue where users
weren't being redirected to re-authenticate.
Redis change ensures that when the caller's context is cancelled (e.g., the 200ms timeout in UniversalCache.Get()), the operation aborts quickly instead of continuing with retries.
2025-12-01 13:47:28 +00:00
lukaszraczylo e64fc7f730 Add redis support for distributed caching (#83)
* Add redis support for distributed caching

* Move towards the self-provided Redis connection pool and RESP protocol implementation.
Official redis client library won't work with yaegi.

* fixup! Move towards the self-provided Redis connection pool and RESP protocol implementation. Official redis client library won't work with yaegi.

* fixup! fixup! Move towards the self-provided Redis connection pool and RESP protocol implementation. Official redis client library won't work with yaegi.

* fixup! fixup! fixup! Move towards the self-provided Redis connection pool and RESP protocol implementation. Official redis client library won't work with yaegi.

* fixup! fixup! fixup! fixup! Move towards the self-provided Redis connection pool and RESP protocol implementation. Official redis client library won't work with yaegi.

* fixup! fixup! fixup! fixup! fixup! Move towards the self-provided Redis connection pool and RESP protocol implementation. Official redis client library won't work with yaegi.

* ... and another all nighter.

* fixup! ... and another all nighter.

* fixup! fixup! ... and another all nighter.

* fixup! fixup! fixup! ... and another all nighter.

* Resolve issue #85 by adding ability to set custom claims in JWT tokens

* Remove redundant validation in auth middleware ( issue #89 )

* Add ability to set cookie prefix for session cookies ( #87 )

* fixup! Add ability to set cookie prefix for session cookies ( #87 )

* Add ability to set cookie max age - issue #91

* Potential fix for code scanning alert no. 10: Size computation for allocation may overflow

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

* fixup! Merge main into 0.8.0-redis: resolve conflicts

---------

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
2025-11-30 02:18:46 +00:00
lukaszraczylo 5fcbd54955 Add sharded cache and prevention of CPU spikes / locks (#96)
* Add sharded cache and prevention of CPU spikes / locks

* Add dynamic client registration with oidc provider

* Fix race condition introduced during the sharded cache implementation.

* Add page for traefikoidc.
2025-11-30 01:41:12 +00:00
lukaszraczylo ae59a5e88a 0.7.10 (#80)
* Add ability to disable replay protection. - This is useful for runs with multiple traefik replicas to avoid false positives and tokens re-creation.
* Enhance the CI/CD pipelines
* Increase test coverage.
* Update vendored dependencies.
* Update behaviour on forceHTTPS as per issue #82
2025-10-16 10:56:28 +01:00
lukaszraczylo 79e9b164f9 release 0.7.9 (#78)
* Speed improvements.

After introduction of introspection the plugin became significantly slower.
This commit introduces several optimizations to bring the speed back up.

* Add relevant documentation and tests.
2025-10-13 10:43:35 +01:00
lukaszraczylo eff9bd7bd2 Multiple issues addressed (#76)
- Issue #74
- Issue #14
2025-10-09 00:44:03 +01:00
lukaszraczylo bde1db1c3b traefik plugin 0.7.7 (#73)
* Automatic discovery of the scopes.

Issue #61 raised very valid concerns about users configuring scopes that are not supported by the provider.
This change introduces automatic discovery of supported scopes by fetching the provider's discovery document and filtering out unsupported scopes.

Before:
User configures: scopes: ["openid", "profile", "email", "offline_access"]
Self-hosted GitLab: "The requested scope is invalid, unknown, or malformed"
Authentication:  FAILS

After:
User configures: scopes: ["openid", "profile", "email", "offline_access"]
Middleware checks discovery doc → offline_access not supported
Automatically filters to: ["openid", "profile", "email"]
Authentication:  SUCCEEDS

* Resolves issue #74 by enabling user to specify expected audience in the configuration.

* Fix flaky tests.
2025-10-08 11:44:00 +01:00
lukaszraczylo c3f23cb99b Release 0.7.5 (#70)
* Resolve issue with opaque tokens not being parsed correctly

* Increase test coverage

* Further improvements to test coverage and code quality

* Add new providers.

* fixup! Add new providers.

* Cleanup.

* fixup! Cleanup.

* fixup! fixup! Cleanup.

* fixup! fixup! fixup! Cleanup.

* fixup! fixup! fixup! fixup! Cleanup.

* Memory management optimisation

24 bytes per Put < 256-4096 bytes per buffer allocation avoided (10-170x difference)

* Pooling cleanup.
2025-10-01 12:13:10 +01:00