perf+coverage: optimisation pass + coverage push to ≥70%

Performance / resource usage:
- circuit_breaker_metrics: fix data race on failCounters map (RWMutex + double-checked locking)
- server.go: drop user_id and op_name metric labels (Prometheus cardinality bound); de-duplicate extractUserInfo
- graphql.go: gate runtime.ReadMemStats per-request behind ENABLE_ALLOCATION_TRACKING flag (default off)
- graphql.go: collapse two-pass AST scan into single pass; lower-case once
- sanitization.go: cache compiled redaction regexes per pattern via sync.Map; hoist inner constants to pkg vars
- proxy.go: hoist connection/timeout substrings to pkg vars; sentinel errors for static error paths; drop dead Headers map alloc
- metrics_aggregator.go: log-field allocation guarded by Logger.IsLevelEnabled
- logging/logger.go: add IsLevelEnabled helper
- lru_cache.go: 16-shard sharding, FNV-1a routing (concurrent throughput +22%)
- cache/memory/lru_memory_cache.go: gzip compress/decompress moved outside mu.Lock
- rps_tracker.go: RWMutex+uint64 -> atomic.Uint64
- retry_budget.go: drop unused mutex
- api.go: bannedUsersIDs map+RWMutex -> sync.Map (+ snapshot/replace helpers)
- tracing/tracing.go: pkg-level constSpanAttrs, copy-then-append in StartSpanWithAttributes
- admin_dashboard.go: handleStatsWebSocket reuses bytes.Buffer + json.Encoder per connection

Build / runtime:
- Makefile: -ldflags="-s -w" -trimpath, CGO_ENABLED=0 for build (=1 for test recipes)
- Dockerfile + Dockerfile.goreleaser: ENV GOMEMLIMIT=512MiB
- main.go: blank import go.uber.org/automaxprocs (cgroup-aware GOMAXPROCS)
- main.go: PPROF_PORT env var wires net/http/pprof on 127.0.0.1 only with full server timeouts
- README.md: env-var docs + metric-label docs updated; cardinality note

Test coverage push (per package):
- main 51.2% -> 74.7%
- cache 66.3% -> 93.7%
- cache/redis 45.5% -> 98.2%
- tracing 66.7% -> 72.9%
- (cache/memory 91.6%, logging 91.9%, monitoring 77.6%, pkg/pools 100% unchanged)

New test files: coverage_micro_test, coverage_extras_test, server_handlers_test,
api_health_test, admin_dashboard_cluster_test, metrics_aggregator_test, concerns_test,
cache/cache_coverage_test, cache/redis/redis_coverage_test, tracing/tracing_coverage_test.

Bug fix: connection_resilience_test.go TestIntegratedHealthManagement.health_manager_startup
was sync.Once-coupled to InitializeBackendHealth and panicked when another test (e.g. via
parseConfig) had already triggered Once. Use NewBackendHealthManager directly.
This commit is contained in:
2026-04-19 19:49:24 +01:00
parent 65fa936b60
commit c2c75d69c0
36 changed files with 4322 additions and 381 deletions
+13 -9
View File
@@ -198,6 +198,7 @@ You can still use the non-prefixed environment variables in the spirit of the ba
| `MAX_CONNS_PER_HOST` | Maximum connections per host | `1024` |
| `CLIENT_DISABLE_TLS_VERIFY` | Disable TLS verification | `false` |
| `LOG_LEVEL` | The log level | `info` |
| `ENABLE_ALLOCATION_TRACKING` | Enable per-request memory allocation tracking | `false` |
| `BLOCK_SCHEMA_INTROSPECTION`| Blocks the schema introspection | `false` |
| `ALLOWED_INTROSPECTION` | Allow only certain queries in introspection | `` |
| `ENABLE_ACCESS_LOG` | Enable the access log | `false` |
@@ -224,6 +225,7 @@ You can still use the non-prefixed environment variables in the spirit of the ba
| `WEBSOCKET_PONG_TIMEOUT` | WebSocket pong timeout in seconds | `60` |
| `WEBSOCKET_MAX_MESSAGE_SIZE` | Max WebSocket message size in bytes | `524288` (512KB) |
| `ADMIN_DASHBOARD_ENABLE` | Enable admin dashboard UI | `true` |
| `PPROF_PORT` | Localhost-only debug pprof endpoint port (default: disabled). Never expose publicly. | `` |
### Tracing
@@ -1098,16 +1100,18 @@ If you'd like the `/healthz` endpoint to perform actual check for the connectivi
Example metrics produced by the proxy:
The `executed_query` and `timed_query` metrics carry only the `{op_type, cached}` label set. The previous `user_id` and `op_name` labels were removed to bound Prometheus cardinality (per-user and per-operation-name labels caused unbounded series growth).
```
graphql_proxy_timed_query_bucket{cached="false",user_id="-",op_type="mutation",op_name="updateUserDetails",vmrange="1.000e-02...1.136e-02"} 6
graphql_proxy_timed_query_count{op_name="",cached="false",user_id="-",op_type=""} 78
graphql_proxy_timed_query_bucket{op_name="MyQuery",cached="false",user_id="-",op_type="query",vmrange="5.995e+00...6.813e+00"} 1
graphql_proxy_timed_query_sum{op_name="MyQuery",cached="false",user_id="-",op_type="query"} 6
graphql_proxy_timed_query_count{op_name="MyQuery",cached="false",user_id="-",op_type="query"} 1
graphql_proxy_executed_query{user_id="-",op_type="mutation",op_name="updateKnownSpammer",cached="false"} 1486
graphql_proxy_executed_query{user_id="-",op_type="query",op_name="checkIfAdminsNeedRefreshing",cached="false"} 13167
graphql_proxy_executed_query{user_id="1337",op_type="query",op_name="checkIfKnownMedia",cached="false"} 429
graphql_proxy_executed_query{user_id="-",op_type="query",op_name="checkIfSpamAIRequiresUpdate",cached="false"} 8891
graphql_proxy_timed_query_bucket{op_type="mutation",cached="false",vmrange="1.000e-02...1.136e-02"} 6
graphql_proxy_timed_query_count{op_type="",cached="false"} 78
graphql_proxy_timed_query_bucket{op_type="query",cached="false",vmrange="5.995e+00...6.813e+00"} 1
graphql_proxy_timed_query_sum{op_type="query",cached="false"} 6
graphql_proxy_timed_query_count{op_type="query",cached="false"} 1
graphql_proxy_executed_query{op_type="mutation",cached="false"} 1486
graphql_proxy_executed_query{op_type="query",cached="false"} 13167
graphql_proxy_executed_query{op_type="query",cached="false"} 429
graphql_proxy_executed_query{op_type="query",cached="true"} 8891
graphql_proxy_requests_failed 324
graphql_proxy_requests_skipped 0
graphql_proxy_requests_succesful 454823