Compare commits

...

7 Commits

Author SHA1 Message Date
lukaszraczylo 3b1af9cafe Add ability to enable / disable access log.
In high frequency environments it can be a little bit noisy.
2023-10-09 08:41:05 +01:00
lukaszraczylo 5f44ce797a fixup! Update README. 2023-10-08 23:45:36 +01:00
lukaszraczylo 1b949583c7 Update README. 2023-10-08 23:43:19 +01:00
lukaszraczylo 16f29488c5 Add license. 2023-10-08 18:45:53 +01:00
lukaszraczylo 5ca37fc9fb Fix README formatting. 2023-10-08 18:44:13 +01:00
lukaszraczylo ed1de61e2e Force minor version for public release. 2023-10-08 18:42:21 +01:00
lukaszraczylo e7b2cc1deb Update readme and make it release ready. 2023-10-08 18:38:55 +01:00
11 changed files with 193 additions and 26 deletions
+2
View File
@@ -0,0 +1,2 @@
github: [ lukaszraczylo ]
custom: [ monzo.me/lukaszraczylo ]
+4 -2
View File
@@ -4,9 +4,11 @@ on:
workflow_dispatch:
push:
paths-ignore:
- '**.md'
- '**/**.md'
- '**/**.yaml'
- 'static/**'
branches:
- "*"
- 'main'
jobs:
shared:
+21
View File
@@ -0,0 +1,21 @@
MIT License
Copyright (c) 2023 Lukasz Raczylo
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
+51 -11
View File
@@ -2,23 +2,63 @@
Creates a passthrough proxy to a graphql endpoint(s), allowing you for analysis of the queries and responses, producing the prometheus metrics at a fraction of the cost - because as we know - $0 is a fair price.
This project is in active use by [telegram-bot.app](https://telegram-bot.app), and was tested with 30k queries per second on a single instance, consuming 10mb of RAM and 0.1% CPU.
![Example of monitoring dashboard](static/monitoring-at-glance.png?raw=true)
You can find the example of the kubernetes manifest in the [example deployment](static/kubernetes-deployment.yaml) file.
### Why this project exists
I wanted to monitor the queries and responses of our graphql endpoint, but we didn't want to pay the price of the graphql server itself ( and I will not point fingers and certain well-known project), as monitoring and basic security features should be a common, free functionality.
### Endpoints
/v1/graphql - the graphql endpoint
/metrics - the prometheus metrics endpoint
/healthz - the healthcheck endpoint
* `:8080/v1/graphql` - the graphql endpoint
* `:9393/metrics` - the prometheus metrics endpoint
* `:8080/healthz` - the healthcheck endpoint
### Features
* MONITORING: Prometheus / VictoriaMetrics metrics
* MONITORING: Extracting user id from JWT token and adding it as a label to the metrics
* MONITORING: Extracting the query name and type and adding it as a label to the metrics
* MONITORING: Calculating the query duration and adding it to the metrics
* SPEED: Caching the queries
* SECURITY: Blocking schema introspection
### Configuration
`MONITORING_PORT` - the port to expose the metrics endpoint on (default: 9393)
`PORT_GRAPHQL` - the port to expose the graphql endpoint on (default: 8080)
`HOST_GRAPHQL` - the host to proxy the graphql endpoint to (default: `localhost/v1/graphql`)
* `MONITORING_PORT` - the port to expose the metrics endpoint on (default: 9393)
* `PORT_GRAPHQL` - the port to expose the graphql endpoint on (default: 8080)
* `HOST_GRAPHQL` - the host to proxy the graphql endpoint to (default: `http://localhost/v1/graphql`)
* `JWT_USER_CLAIM_PATH` - the path to the user claim in the JWT token (default: ``)
* `ENABLE_GLOBAL_CACHE` - enable the cache (default: `false`)
* `CACHE_TTL` - the cache TTL (default: `60s`)
* `LOG_LEVEL` - the log level (default: `info`)
* `BLOCK_SCHEMA_INTROSPECTION` - blocks the schema introspection (default: `false`)
* `ENABLE_ACCESS_LOG` - enable the access log (default: `false`)
`JWT_USER_CLAIM_PATH` - the path to the user claim in the JWT token (default: ``)
### Caching
`ENABLE_CACHE` - enable the cache (default: `false`)
`CACHE_TTL` - the cache TTL (default: `60s`)
Cache engine is enabled in background as it does not use any additional resources.
You can then start using the cache by setting the `ENABLE_GLOBAL_CACHE` environment variable to `true` - which will enable the cache for all queries, without introspection of the query. You can leave the global cache disabled and enable the cache for specific queries by adding the `@cache` directive to the query.
`LOG_LEVEL` - the log level (default: `info`)
### Monitoring endpoint
`BLOCK_SCHEMA_INTROSPECTION` - blocks the schema introspection (default: `false`)
Example metrics produced by the proxy:
```
graphql_proxy_timed_query_bucket{cached="false",user_id="-",op_type="mutation",op_name="updateUserDetails",vmrange="1.000e-02...1.136e-02"} 6
graphql_proxy_timed_query_count{op_name="",cached="false",user_id="-",op_type=""} 78
graphql_proxy_timed_query_bucket{op_name="MyQuery",cached="false",user_id="-",op_type="query",vmrange="5.995e+00...6.813e+00"} 1
graphql_proxy_timed_query_sum{op_name="MyQuery",cached="false",user_id="-",op_type="query"} 6
graphql_proxy_timed_query_count{op_name="MyQuery",cached="false",user_id="-",op_type="query"} 1
graphql_proxy_executed_query{user_id="-",op_type="mutation",op_name="updateKnownSpammer",cached="false"} 1486
graphql_proxy_executed_query{user_id="-",op_type="query",op_name="checkIfAdminsNeedRefreshing",cached="false"} 13167
graphql_proxy_executed_query{user_id="1337",op_type="query",op_name="checkIfKnownMedia",cached="false"} 429
graphql_proxy_executed_query{user_id="-",op_type="query",op_name="checkIfSpamAIRequiresUpdate",cached="false"} 8891
graphql_proxy_requests_failed 324
graphql_proxy_requests_skipped 0
graphql_proxy_requests_succesful 454823
```
+9 -10
View File
@@ -4,7 +4,6 @@ import (
fiber "github.com/gofiber/fiber/v2"
"github.com/graphql-go/graphql/language/ast"
"github.com/graphql-go/graphql/language/parser"
"github.com/k0kubun/pp"
libpack_monitoring "github.com/telegram-bot-app/libpack/monitoring"
)
@@ -31,6 +30,9 @@ var retrospection_queries = []string{
"__directives",
}
// Saving the introspection queries as a map O(1) operation instead of O(n) for a slice.
var retrospectionQuerySet = make(map[string]struct{}, len(retrospection_queries))
func parseGraphQLQuery(c *fiber.Ctx) (operationType, operationName string, cacheRequest bool, should_block bool) {
m := make(map[string]interface{})
err := json.Unmarshal(c.Body(), &m)
@@ -71,15 +73,12 @@ func parseGraphQLQuery(c *fiber.Ctx) (operationType, operationName string, cache
if cfg.Security.BlockIntrospection {
for _, s := range oper.SelectionSet.Selections {
for _, s2 := range s.GetSelectionSet().Selections {
pp.Println(s2.(*ast.Field).Name.Value)
for _, introspection_query := range retrospection_queries {
if s2.(*ast.Field).Name.Value == introspection_query {
cfg.Logger.Warning("Introspection query blocked", m)
cfg.Monitoring.Increment(libpack_monitoring.MetricsSkipped, nil)
c.Status(403).SendString("Introspection queries are not allowed")
should_block = true
return
}
if _, exists := retrospectionQuerySet[s2.(*ast.Field).Name.Value]; exists {
cfg.Logger.Warning("Introspection query blocked", m)
cfg.Monitoring.Increment(libpack_monitoring.MetricsSkipped, nil)
c.Status(403).SendString("Introspection queries are not allowed")
should_block = true
return
}
}
}
+9 -2
View File
@@ -9,19 +9,26 @@ import (
var cfg *config
func init() {
for _, query := range retrospection_queries {
retrospectionQuerySet[query] = struct{}{}
}
}
func parseConfig() {
libpack_config.PKG_NAME = "graphql_proxy"
var c config
c.Server.PortGraphQL = envutil.GetInt("PORT_GRAPHQL", 8080)
c.Server.PortMonitoring = envutil.GetInt("MONITORING_PORT", 9393)
c.Server.HostGraphQL = envutil.Getenv("HOST_GRAPHQL", "localhost/v1/graphql")
c.Server.HostGraphQL = envutil.Getenv("HOST_GRAPHQL", "http://localhost/v1/graphql")
c.Client.JWTUserClaimPath = envutil.Getenv("JWT_USER_CLAIM_PATH", "")
c.Cache.CacheEnable = envutil.GetBool("CACHE_ENABLE", false)
c.Cache.CacheEnable = envutil.GetBool("ENABLE_GLOBAL_CACHE", false)
c.Cache.CacheTTL = envutil.GetInt("CACHE_TTL", 60)
c.Security.BlockIntrospection = envutil.GetBool("BLOCK_SCHEMA_INTROSPECTION", false)
c.Logger = libpack_logging.NewLogger()
c.Client.GQLClient = graphql.NewConnection()
c.Client.GQLClient.SetEndpoint(c.Server.HostGraphQL)
c.Server.AccessLog = envutil.GetBool("ENABLE_ACCESS_LOG", false)
cfg = &c
enableCache() // takes close to no resources, but can be used with dynamic query cache
}
+1
View File
@@ -2,6 +2,7 @@ version: 1
force:
existing: true
strict: false
minor: 1
wording:
patch:
- update
+3 -1
View File
@@ -77,7 +77,9 @@ func processGraphQLRequest(c *fiber.Ctx) error {
}
time_taken := time.Since(t)
cfg.Logger.Info("Request processed", map[string]interface{}{"ip": c.IP(), "user_id": extracted_user_id, "op_type": opType, "op_name": opName, "time": time_taken, "cache": was_cached})
if cfg.Server.AccessLog {
cfg.Logger.Info("Request processed", map[string]interface{}{"ip": c.IP(), "user_id": extracted_user_id, "op_type": opType, "op_name": opName, "time": time_taken, "cache": was_cached})
}
cfg.Monitoring.Increment(libpack_monitoring.MetricsSucceeded, nil)
labels := map[string]string{
+92
View File
@@ -0,0 +1,92 @@
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: hasura-proxy-internal
labels:
app: hasura-proxy-internal
type: support
spec:
replicas: 2
selector:
matchLabels:
app: hasura-proxy-internal
type: support
template:
metadata:
labels:
app: hasura-proxy-internal
type: support
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "9393"
prometheus.io/path: "/metrics"
spec:
securityContext:
runAsUser: 65534 # nobody
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node-role.kubernetes.io/worker
operator: Exists
containers:
- name: graphql-proxy
image: ghcr.io/lukaszraczylo/graphql-monitoring-proxy:latest
imagePullPolicy: Always
resources:
limits:
cpu: "1"
memory: "640Mi"
requests:
cpu: "0.75"
memory: "512Mi"
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 5
ports:
- name: web
containerPort: 8080
- name: monitoring
containerPort: 9393
env:
- name: PORT_GRAPHQL
value: "8080"
- name: MONITORING_PORT
value: "9393"
- name: HOST_GRAPHQL
value: http://hasura-internal:8080/v1/graphql
- name: ENABLE_GLOBAL_CACHE
value: "true"
- name: CACHE_TTL
value: "10"
- name: BLOCK_SCHEMA_INTROSPECTION
value: "true"
---
apiVersion: v1
kind: Service
metadata:
name: hasura-proxy-internal
labels:
app: hasura-proxy-internal
type: support
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "9393"
prometheus.io/path: "/metrics"
spec:
ports:
- name: web
port: 8080
targetPort: 8080
- name: monitoring
port: 9393
targetPort: 9393
selector:
app: hasura-proxy-internal
type: support
type: ClusterIP
Binary file not shown.

After

Width:  |  Height:  |  Size: 336 KiB

+1
View File
@@ -17,6 +17,7 @@ type config struct {
PortGraphQL int
PortMonitoring int
HostGraphQL string
AccessLog bool
}
Client struct {