universal_cache: stop the write-lock convoy / 100%-CPU spin (observed via pprof: one ServeHTTP goroutine holding c.mu.Lock for hours while 119 requests queued). The per-request populate path (updateLocalCache) PushFronted a duplicate LRU node + overwrote items[key] without removing the prior node; once eviction deleted the key, orphan nodes at Back() were never removable and the eviction loop spun forever under the write lock. Replace the entry in place (mirroring setLocal) and harden evictOldest with a forward-progress guard. Adds universal_cache_orphan_test.go.
telemetry: delete the hand-rolled client; call oss-telemetry v0.2.3 (vendored, Yaegi-safe) directly from New(), once per process via sync.Once.
version: add version.go + workflow-prepare.sh so the release semver is stamped into source at build time (the value cannot be resolved at runtime under Yaegi). dev/source builds keep the 0.0.0-dev sentinel and emit no telemetry.
UniversalCache.getLocal(): when a cached token expires, the RLock fast
path (line 385-398) previously fell through to c.mu.Lock() (write lock).
Under Yaegi, the write-lock holder takes 10-100ms for LRU manipulation,
and Go's RWMutex writer-priority blocks ALL new RLock callers. A single
expired-token event turned every concurrent request from read-parallel
into write-serialized — the convoy that produced the 737-goroutine
pileup at 0x400275a608 (pprof captured at /tmp/traefik-spike-1779663149).
Fix: return (nil, false) immediately on expiry for Token/JWK/Session
cache types. The periodic cleanup goroutine handles eviction. Write lock
is never taken on the read path for these cache types.
refreshAttemptTracker.mutateState(): the CAS loop used
t.state.CompareAndSwap(t.state.Load(), next) — a second Load that can
see a different value from a concurrent writer, silently overwriting
their update. Fixed to CompareAndSwap(cur, next) using the snapshot we
computed the mutation from.
Under yaegi (Traefik's plugin runtime) json.Marshal exposes unexported
struct fields with an X-prefixed name. parsedJWKS{ keys map[string]
crypto.PublicKey } therefore round-tripped through Redis as
{"Xkeys":{"<kid>":{"N":<huge>,"E":65537}}} — *rsa.PublicKey.N is a
*big.Int that marshals to a JSON number hundreds of digits long. On
read, json.Unmarshal into interface{} parses numbers as float64, which
cannot represent that range:
Failed to deserialize value for key .../discovery/v2.0/keys:parsed:
json: cannot unmarshal number 2251513...
into Go value of type float64
Auth still worked (the JWKCache rebuilt the keys in memory on every
miss) but the error log spammed every request.
Two structural problems were behind it:
* parsedJWKS holds crypto.PublicKey interface values that aren't
meaningfully JSON-serializable. Even on compiled Go (where the
unexported field marshals to {}), the post-roundtrip type assertion
v.(*parsedJWKS) silently failed and the cache was useless.
* The same pattern applied to *JWKSet — the struct shape survived JSON
but the type assertion still failed, defeating the cache for every
call that went through Redis.
Both keys now use the new UniversalCache.SetLocal/GetLocal pair, which
skips the configured distributed backend entirely. JWK rotation is rare
and a per-replica HTTP fetch on cold cache is cheap, so cross-replica
coherence buys nothing for these entries.
Stale Redis entries written by previous versions are simply ignored —
the new code never reads under those keys, and Redis TTL retires them.
Includes regression coverage for the Azure round-trip, the
poisoned-stale-data scenario, and the SetLocal/GetLocal isolation
contract.
patch-release
Hot-path JWT verification rebuilt the public key on every call:
jwk -> ToRSAPublicKey -> x509.MarshalPKIXPublicKey -> pem.Encode
-> verifySignature -> pem.Decode -> x509.ParsePKIXPublicKey -> verify
Under yaegi this pinned a CPU when many concurrent dashboard panels
poll behind the middleware. The PEM round trip is pure waste.
* jwk.go: cache pre-parsed crypto.PublicKey per kid alongside the
raw JWKSet (parallel cache entry, same 1h TTL, invalidates together).
* jwt.go: split verifySignatureWithKey from verifySignature; existing
PEM-input entry point preserved for backchannel-logout callers.
* token_manager.go: VerifyJWTSignatureAndClaims now goes straight from
jwks cache to verifySignatureWithKey, no PEM round trip and no
per-request availableKids slice.
* universal_cache.go: token/JWK/session Get() takes RLock when the
entry is unexpired, so concurrent token verifications no longer
serialize on a single mutex. LRU semantics for general and metadata
caches are unchanged (tests cover the strict-LRU contract there).
* mocks: MockJWKCache, EnhancedMockJWKCache, mockJWKCacheForLogout,
staticJWKCache satisfy the extended interface.
* LRU + cache conflicts prevention.
* Bugfix universalCache flooding ( issue #105 )
1. Traefik cancels the context for old plugin instances
2. Each plugin's Close() method is called
3. The CacheInterfaceWrapper.Close() was calling cache.Close() on the shared singleton caches
4. Each Close() triggered Clear() which logged "Cleared all items" at INFO level
* Smarter approach to the cookies
- Single maxCookieSize = 1400 constant with clear documentation
- Combined cookie storage for ~40-45% size reduction
- Backward compatible migration from legacy cookies
* Tuneup the code.
* Add redis support for distributed caching
* Move towards the self-provided Redis connection pool and RESP protocol implementation.
Official redis client library won't work with yaegi.
* fixup! Move towards the self-provided Redis connection pool and RESP protocol implementation. Official redis client library won't work with yaegi.
* fixup! fixup! Move towards the self-provided Redis connection pool and RESP protocol implementation. Official redis client library won't work with yaegi.
* fixup! fixup! fixup! Move towards the self-provided Redis connection pool and RESP protocol implementation. Official redis client library won't work with yaegi.
* fixup! fixup! fixup! fixup! Move towards the self-provided Redis connection pool and RESP protocol implementation. Official redis client library won't work with yaegi.
* fixup! fixup! fixup! fixup! fixup! Move towards the self-provided Redis connection pool and RESP protocol implementation. Official redis client library won't work with yaegi.
* ... and another all nighter.
* fixup! ... and another all nighter.
* fixup! fixup! ... and another all nighter.
* fixup! fixup! fixup! ... and another all nighter.
* Resolve issue #85 by adding ability to set custom claims in JWT tokens
* Remove redundant validation in auth middleware ( issue #89 )
* Add ability to set cookie prefix for session cookies ( #87 )
* fixup! Add ability to set cookie prefix for session cookies ( #87 )
* Add ability to set cookie max age - issue #91
* Potential fix for code scanning alert no. 10: Size computation for allocation may overflow
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
* fixup! Merge main into 0.8.0-redis: resolve conflicts
---------
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
* Add sharded cache and prevention of CPU spikes / locks
* Add dynamic client registration with oidc provider
* Fix race condition introduced during the sharded cache implementation.
* Add page for traefikoidc.
* Add ability to disable replay protection. - This is useful for runs with multiple traefik replicas to avoid false positives and tokens re-creation.
* Enhance the CI/CD pipelines
* Increase test coverage.
* Update vendored dependencies.
* Update behaviour on forceHTTPS as per issue #82
* Fix bug affecting Azure OIDC authentication ( and most likely others )
* Fixes issue #51
* Ensure that appended roles are unique. Update the documentation.
* Improvements targetting possible memory usage spikes.
* Additional fixes and cleanup
* Refactoring code to fix the issues identified by the users.
* Modernize run
* Fieldalignment
* Multiple changes to improve performance and reduce complexity.
- Optimise the errors and recovery.
- Deduplicate code in metadata cache.
- Remove unused performance monitoring code.
- Simplify session management and settings handling.
* Fix claims issue.
* Add ability to overwrite the default scopes in the settings file
* Well.. that escalated quickly.
Completely forgot that Traefik uses outdated Yaegi and requires compatibility with 1.20 ( pre-generic Go code ).
* Bugfix #51: Ensures that user provided scopes overrides work.
* fixup! Bugfix #51: Ensures that user provided scopes overrides work.
* fixup! fixup! Bugfix #51: Ensures that user provided scopes overrides work.
* Abstract the provider logic into a separate package.
* Additional micro fixes and cleanups.
* Simplify all the things.
* fixup! Simplify all the things.
* fixup! fixup! Simplify all the things.
* fixup! fixup! fixup! Simplify all the things.
* fixup! fixup! fixup! fixup! Simplify all the things.
* ...
* Cleanup tests.
* fixup! Cleanup tests.
* fixup! fixup! fixup! Cleanup tests.
* fixup! fixup! fixup! fixup! Cleanup tests.
* fixup! fixup! fixup! fixup! fixup! Cleanup tests.
* Issue #53: Fix CSRF token handling in reverse proxy
1. ✅ HTTPS Detection Fixed (session.go:723)
- Now uses X-Forwarded-Proto header instead of r.URL.Scheme
- Properly detects HTTPS in reverse proxy environments
2. ✅ SameSite Cookie Attribute Fixed
- Removed automatic SameSiteStrictMode for HTTPS (would break OAuth)
- Keeps SameSiteLaxMode to allow OAuth callbacks from external domains
- Only uses Strict for AJAX requests which don't involve OAuth redirects
3. ✅ Cookie Domain Handling Fixed
- Now respects X-Forwarded-Host header for cookie domain
- Ensures cookies are set for the public domain, not internal proxy domain
4. ✅ EnhanceSessionSecurity Properly Integrated
- Function is now actually called during session save
- Applies security enhancements without breaking OAuth flow
Why Issue #53 Failed Before:
1. Cookies were not marked Secure in HTTPS environments (browser wouldn't send them back)
2. If they had been Secure with SameSite=Strict, Azure callbacks would still fail
3. Cookie domain might have been wrong (internal vs public domain)
Why It Works Now:
1. Cookies are properly marked Secure for HTTPS
2. Uses SameSite=Lax to allow OAuth provider callbacks
3. Cookie domain uses public domain from X-Forwarded-Host
4. CSRF token persists through the entire OAuth flow
* Next set of enhancements together with memory usage improvements.
* Memory leak fixes and optimisations.
* CSRF and Cookie Domain fixes
* fixup! CSRF and Cookie Domain fixes
* Metadata cache leak fix + profiling
* fixup! Metadata cache leak fix + profiling
* Memory leaks hunting, part 1337.
* Further pursue of perfection.
* fixup! Further pursue of perfection.
* fixup! fixup! Further pursue of perfection.
* fixup! fixup! fixup! Further pursue of perfection.
* fixup! fixup! fixup! fixup! Further pursue of perfection.
* fixup! fixup! fixup! fixup! fixup! Further pursue of perfection.
* fixup! fixup! fixup! fixup! fixup! fixup! Further pursue of perfection.
* fixup! fixup! fixup! fixup! fixup! fixup! fixup! Further pursue of perfection.
* fixup! fixup! fixup! fixup! fixup! fixup! fixup! fixup! Further pursue of perfection.
* fixup! fixup! fixup! fixup! fixup! fixup! fixup! fixup! fixup! Further pursue of perfection.
* Clear race conditions
* fixup! Clear race conditions
* Weekend fun with memory leaks
* Splitting code into multiple files with reasonable testing coverage.
```
ok github.com/lukaszraczylo/traefikoidc 117.017s coverage: 72.6% of statements
ok github.com/lukaszraczylo/traefikoidc/auth 0.505s coverage: 87.1% of statements
ok github.com/lukaszraczylo/traefikoidc/circuit_breaker 0.283s coverage: 99.0% of statements
github.com/lukaszraczylo/traefikoidc/config coverage: 0.0% of statements
ok github.com/lukaszraczylo/traefikoidc/handlers 0.349s coverage: 98.2% of statements
ok github.com/lukaszraczylo/traefikoidc/internal/providers (cached) coverage: 94.3% of statements
ok github.com/lukaszraczylo/traefikoidc/middleware 0.808s coverage: 78.0% of statements
ok github.com/lukaszraczylo/traefikoidc/recovery 0.653s coverage: 100.0% of statements
ok github.com/lukaszraczylo/traefikoidc/session/chunking (cached) coverage: 87.8% of statements
ok github.com/lukaszraczylo/traefikoidc/session/core (cached) coverage: 85.6% of statements
ok github.com/lukaszraczylo/traefikoidc/session/crypto (cached) coverage: 81.8% of statements
ok github.com/lukaszraczylo/traefikoidc/session/storage (cached) coverage: 93.5% of statements
ok github.com/lukaszraczylo/traefikoidc/session/validators (cached) coverage: 98.8% of statements
````
* fixup! Splitting code into multiple files with reasonable testing coverage.
* fixup! fixup! Splitting code into multiple files with reasonable testing coverage.
* Weekend fun with further optimisations.
* fixup! Weekend fun with further optimisations.
* fixup! fixup! Weekend fun with further optimisations.
* fixup! fixup! fixup! Weekend fun with further optimisations.
* fixup! fixup! fixup! fixup! Weekend fun with further optimisations.
* fixup! fixup! fixup! fixup! fixup! Weekend fun with further optimisations.
* Pre-release cleanup.
* Enhance test coverage.
* fixup! Enhance test coverage.
* fixup! fixup! Enhance test coverage.
* fixup! fixup! fixup! Enhance test coverage.