Add draft of tracing package.

This commit is contained in:
2024-06-20 00:23:00 +01:00
parent 16844e325e
commit 934bddae1b
7 changed files with 190 additions and 7 deletions
+9
View File
@@ -28,6 +28,7 @@ This project is in active use by [telegram-bot.app](https://telegram-bot.app), a
- [Cache operations](#cache-operations)
- [General](#general)
- [Metrics which matter](#metrics-which-matter)
- [Tracing](#tracing)
- [Healthcheck](#healthcheck)
- [Monitoring endpoint](#monitoring-endpoint)
@@ -144,6 +145,8 @@ You can still use the non-prefixed environment variables in the spirit of the ba
| `HASURA_EVENT_CLEANER` | Enable the hasura event cleaner | `false` |
| `HASURA_EVENT_CLEANER_OLDER_THAN` | The interval for the hasura event cleaner (in days) | `1` |
| `HASURA_EVENT_METADATA_DB` | URL to the hasura metadata database | `postgresql://localhost:5432/hasura` |
| `ENABLE_TRACE` | Enables tracing | `false` |
| `TRACER_ENDPOINT` | Tracing endpoint | `localhost:4317` |
### Speed
@@ -293,6 +296,12 @@ With the `PURGE_METRICS_ON_CRAWL` enabled, the `graphql_proxy_requests_failed`,
If you prefer more control over the metrics purging - you can enable `PURGE_METRICS_ON_TIMER` environment variable and set the interval in seconds. This will allow you to purge the metrics on a regular basis, for example every 90 seconds. It could be better solution if you have multiple crawlers checking the metrics endpoints and you want to avoid the situation when metrics are purged by for example healthcheck.
#### Tracing
Tracing can be enabled by setting `ENABLE_TRACE` to `true` and providing compatible with OTEL `TRACER_ENDPOINT` value ( default is `localhost:4317` ). From that moment you can include `X-Trace-Span` in your requests to the proxy.
The value of X-Trace-Span should be in following format !!#TODO!!
#### Healthcheck
If you'd like the `/healthz` endpoint to perform actual check for the connectivity to the graphql endpoint - set the `HEALTHCHECK_GRAPHQL_URL` environment variable to the exact URL of the graphql endpoint. The query executed will be `query { __typename }` and if the response is not `200 OK` - the healthcheck will fail. Remember that the endpoint is a full URL which you'd like to check, so it should include the protocol, host and path - for example `http://localhost:8080/v1/graphql` and it's NOT the same as value of `HOST_GRAPHQL` environment variable which should provide only the host, without path, ending with slash.