feat: opt-in M2M bearer-token authentication (supersedes #93) (#140)

* docs: bearer-token auth design spec * docs: harden bearer-auth spec with security review findings * feat(bearer): opt-in M2M bearer-token authentication Adds an opt-in Authorization: Bearer <jwt> path for machine-to-machine clients. Replaces and supersedes the broken approach in PR #93 (synthetic-session that omitted user_identifier and skipped ID-token rejection / replay-protection-semantics / kid-pinning / etc.). Design Two auth entrypoints feed one shared post-auth pipeline: cookie path ─┐ ├── forwardAuthorized(rw, req, *principal) bearer path ─┘ (roles/groups, header injection, security headers, cookie strip, forward) buildPrincipalFromSession and buildPrincipalFromBearerToken produce the same `principal` value type. forwardAuthorized is session-agnostic and runs the existing post-auth work; processAuthorizedRequest now wraps it with the session-specific concerns (backchannel-logout, dirty/Save). The cookie path's behaviour is byte-identical to before this PR; the existing test suite passes unmodified. Security hardening baked into the bearer path - Audience MANDATORY. Startup fails when EnableBearerAuth=true and Audience is empty. - BearerIdentifierClaim defaults to "sub"; "email" is rejected at startup to avoid the unverified-email spoofing footgun. Cookie path's UserIdentifierClaim is unaffected and still defaults to "email". - ID tokens explicitly rejected via the existing detectTokenType helper (nonce, typ=at+jwt, token_use, scope, aud-vs-clientID heuristics); belt-and-braces nonce/token_use=id rejection on top. - alg pinned to asymmetric allowlist (RS/PS/ES 256/384/512) BEFORE JWKS fetch, blocking alg=none and alg=HS* probes from amplifying into upstream calls. - kid length capped at 256 bytes and charset-restricted before JWKS fetch, blocking pathological-kid JWKS amplification. - Multi-audience tokens require azp == clientID. - iat upper-age bound (MaxTokenAgeSeconds, default 24h) bounds clock- manipulation and forever-token abuse. - Identifier sanitization: length cap, control-char + bidi-override + delimiter (, ; =) rejection. - Per-IP failure throttle: configurable threshold/window/penalty; returns 429 + Retry-After. Limits offline-guessing-style attacks and protects the shared rate-limiter / JWKS endpoint. - JTI replay marking suppressed via new internal verifyOpts {skipReplayMarking} so the same bearer can be reused until exp; the blacklist Get stays active so RevokeToken still terminates a bearer token immediately. The existing exported VerifyToken interface is unchanged so all mocks continue to work. - Cookie wins by default when both bearer and cookie are present (safer against browser/extension/proxy bearer injection). Operator can flip via BearerOverridesCookie. - Authorization header stripped on forward by default; also stripped on excluded URLs so the token can't leak into health/metrics downstream logs. - Optional RFC 7662 introspection via existing requireTokenIntrospection. Introspection-endpoint failure returns 503 (distinguishes infra from token rejection). - 401s use RFC 6750 WWW-Authenticate hints (toggleable). Failure reason is logged at debug; raw tokens are never logged. Implementation - principal.go: pure-data principal type and buildPrincipalFromSession. - bearer_auth.go: alg/kid pin, classifier, identifier sanitization, multi-aud azp gate, iat age check, per-IP failure tracker, handleBearerRequest, buildPrincipalFromBearerToken. - token_manager.go: VerifyToken now wraps a new verifyTokenWithOpts that accepts internal-only verifyOpts. Existing callers, the TokenVerifier interface, and all mocks unchanged. - middleware.go: extracted forwardAuthorized from processAuthorizedRequest; wired bearer detection after init wait + after bypass; excluded-URL Authorization strip when bearer enabled. - settings.go: ten new config fields with defaults applied in CreateConfig. - main.go: startup validation for audience + identifier-claim guard; bearer failure tracker init. Tests - bearer_auth_test.go: table-driven helper tests for every new component (parseBearerJOSEHeader, sanitizeBearerIdentifier, resolveBearerIdentifier, enforceMultiAudienceAzp, enforceIatAge, bearerFailureTracker, detectBearerToken). Integration tests through ServeHTTP covering happy path, ID-token rejection, alg=none rejection, oversized kid, multi-aud with/without azp, iat-too-old, bidi identifier, replay (100x reuse), 429 throttle trip, excluded-URL strip, roles gate, cookie-wins precedence, BearerOverridesCookie, oversized token, malformed JWT, feature-off pass-through. Startup validation for audience- required and email-identifier-rejected. - All existing tests pass unmodified (cookie-path regression). - go vet clean. golangci-lint clean (0 issues). Race detector clean on bearer tests. Documentation - README.md: bearer auth section with security highlights and config snippet; doc link in the index. - .traefik.yml: commented config block exposing every bearer knob. - docs/CONFIGURATION.md: new subsection with full parameter table. - docs/BEARER_AUTH.md: threat model, hardening matrix, failure response table, operational guidance, known follow-ups. - docs/superpowers/specs/2026-05-18-bearer-token-auth-design.md: design spec + security-review hardening history. * fix(cache): redact raw cache keys in debug logs (CodeQL go/clear-text-logging) CodeQL flagged 9 high-severity alerts (go/clear-text-logging) where the in-memory cache and the hybrid L1+L2 backend printed `key=%s` at debug. Cache callers (token cache, blacklist, introspection cache) pass raw access / refresh / id tokens as cache keys, so any debug-enabled deployment would write them to log streams. Pre-existing issue. CodeQL started flagging it on this PR because the new bearer-auth path adds a data-flow source (req.Header.Get("Authorization")) that reaches the existing logging sinks via the same cache. The cookie path had the same risk but wasn't tracked as taint by CodeQL. Fix: hash the key (SHA-256[:8] hex) before printing. Same approach the bearer-auth logger uses for principal identifiers (spec §13). Doesn't change cache semantics — same key still produces the same hash, so debug correlation across log lines is preserved without exposing the raw value. Touches both affected packages: - internal/cache/cache.go (2 sites: Set + LRU eviction) - internal/cache/backends/hybrid.go (12 sites: L1/L2 read/write/fallback) New helper `redactKey` colocated with each package (unexported, package-local) keeps the change blast radius narrow. Tests green; lint clean. * docs(bearer): how to obtain bearer tokens from the OIDC provider Adds a section walking operators through the OAuth 2.0 client_credentials flow (RFC 6749 §4.4) and the JWT bearer assertion alternative (RFC 7523), with a worked Auth0-shape curl example, a per-provider quick reference (Auth0, Okta, Keycloak, Entra v2, Cognito, GitLab, Google), operational notes (token TTL, caching, JWKS rotation, revocation, scope vs audience, secret hygiene), and a three-line validation loop. Most common operator confusion: "I enabled the feature but tokens get 401'd" — almost always missing or wrong audience. The new section makes the audience-matching requirement loud, with per-provider parameter names so people don't have to dig through IdP docs. Locations: - docs/BEARER_AUTH.md — full section under "Quick start" - README.md — short snippet + deep link
2026-06-05 22:44:17 +00:00 · 2026-05-18 17:35:37 +01:00
parent 8c5df82dcf
commit a548665edb
17 changed files with 2702 additions and 107 deletions
@@ -9,6 +9,7 @@ manages sessions, and forwards user identity to downstream services.
 - [Configuration reference](docs/CONFIGURATION.md) — every parameter
 - [Provider guide](docs/PROVIDERS.md) — Google, Azure, Auth0, Okta, Keycloak, Cognito, GitLab, GitHub, generic
 - [Auth0 audience guide](docs/AUTH0_AUDIENCE_GUIDE.md) — custom APIs, opaque tokens, token confusion
+- [Bearer-token (M2M) auth](docs/BEARER_AUTH.md) — opt-in `Authorization: Bearer` path, threat model
 - [Redis cache](docs/REDIS.md) — multi-replica deployments
 - [Dynamic Client Registration](docs/DCR.md) — RFC 7591
 - [Development](docs/DEVELOPMENT.md) · [Testing](docs/TESTING.md)
@@ -171,6 +172,92 @@ Each instance must use a unique `cookiePrefix` **and** `sessionEncryptionKey`,
 otherwise a session minted by one instance can grant access through another.
 See [issue #87](https://github.com/lukaszraczylo/traefikoidc/issues/87).

+### Bearer-token (M2M) authentication
+
+Opt-in path for API clients that present `Authorization: Bearer <jwt>` instead
+of logging in via the browser flow. Default off. When enabled, the middleware
+validates the bearer JWT against the configured OIDC provider (signature,
+issuer, audience, expiry) and forwards the request downstream with the
+principal headers — no cookie session is created.
+
+```yaml
+enableBearerAuth: true
+audience: https://api.example.com   # REQUIRED when bearer is enabled
+# optional, defaults shown:
+bearerIdentifierClaim: sub          # claim used as X-Forwarded-User
+stripAuthorizationHeader: true      # drop the raw token before forwarding
+bearerEmitWWWAuthenticate: true     # RFC 6750 hint on 401s
+bearerOverridesCookie: false        # cookie wins when both are present (safer)
+maxTokenAgeSeconds: 86400           # 24h cap on iat
+bearerFailureThreshold: 20          # consecutive 401s/IP before 429 throttle
+```
+
+Hardening built in by default:
+
+- **Audience required.** Startup fails if `enableBearerAuth=true` and
+  `audience` is unset. Eliminates the "token issued for service B accepted
+  by A" confusion vector.
+- **ID tokens explicitly rejected.** Bearer is access-token-only. ID tokens
+  (detected via `nonce`, `typ: at+jwt`, `token_use`, `scope`, or audience
+  shape) return `401`.
+- **`alg` and `kid` pinned at the entrypoint.** Asymmetric-only allowlist
+  (`RS256/384/512`, `PS256/384/512`, `ES256/384/512`); `kid` length and
+  charset capped — both checked **before** any JWKS fetch so attacker noise
+  can't amplify into upstream calls.
+- **Identifier sanitised.** Default identifier source is `sub`; `email` is
+  rejected unless explicitly opted in (which the middleware still refuses to
+  avoid the unverified-email spoofing footgun). Control characters, bidi-
+  override codepoints, and the delimiters `, ; =` are all rejected before
+  the value reaches `X-Forwarded-User`.
+- **Multi-audience tokens require `azp`.** When `aud` is an array of more
+  than one element, the token must carry `azp == clientID`.
+- **`iat` upper-age bound.** Tokens older than `maxTokenAgeSeconds` are
+  rejected even if `exp` is far in the future.
+- **Per-IP 401 throttle.** After `bearerFailureThreshold` consecutive 401s
+  from one source IP, further bearer requests from that IP are rejected
+  with `429 Too Many Requests` + `Retry-After`.
+- **Cookie-wins by default.** When both a session cookie and an
+  `Authorization: Bearer` header arrive on the same request, the cookie path
+  runs (safer against browser/extension/proxy bearer injection). Set
+  `bearerOverridesCookie: true` for the AWS/GCP/Kubernetes convention.
+- **Replay protection preserved.** The bearer path skips the JTI **Set**
+  (so the same token can be reused) but the **Get** stays active —
+  `RevokeToken` still terminates a bearer token immediately.
+- **Excluded URLs strip Authorization.** When `enableBearerAuth=true`,
+  excluded paths (e.g. `/health`, `/metrics`) get the `Authorization` header
+  removed before forwarding so the token can't leak into public endpoint
+  logs.
+- **Optional real-time revocation.** Set `requireTokenIntrospection: true`
+  to call RFC 7662 introspection on every cache miss; revoked tokens fail
+  immediately. Introspection endpoint failures return `503` (distinguishes
+  infra outage from credential rejection).
+
+**Obtaining bearer tokens** — minting is the IdP's job, not the
+middleware's. The canonical M2M flow is OAuth 2.0 `client_credentials`
+(RFC 6749 §4.4); Google requires JWT bearer assertion (RFC 7523) instead.
+Minimal Auth0-shape request:
+
+```bash
+curl -s -X POST https://issuer.example.com/oauth/token \
+  -H 'Content-Type: application/json' \
+  -d '{
+    "grant_type":    "client_credentials",
+    "client_id":     "your-m2m-client-id",
+    "client_secret": "your-m2m-client-secret",
+    "audience":      "https://api.example.com",
+    "scope":         "api:read api:write"
+  }'
+```
+
+The `audience` you request from the IdP **must match** the `audience` you
+configured on the middleware. Per-provider endpoints, parameter names, and
+gotchas (Entra v2 endpoint, Cognito Resource Servers, Keycloak audience
+mappers, Google's opaque-token quirk) are documented in
+[docs/BEARER_AUTH.md](docs/BEARER_AUTH.md#obtaining-bearer-tokens-from-your-oidc-provider).
+
+Full threat model, configuration matrix, and follow-up gaps in
+[docs/BEARER_AUTH.md](docs/BEARER_AUTH.md).
+
 ### SSE and WebSocket endpoints

 Browser clients cannot follow an OIDC `302` redirect on an SSE stream or a