Files
go-telegram/docs/benchmarks/2026-05-10-comparison.md
T
lukaszraczylo bfb7e9875e feat(client): opt-in fasthttp transport (NewFastHTTPDoer)
Adds an alternative HTTPDoer backed by valyala/fasthttp for high-throughput
bots. Cuts per-call allocs from 102 to 56 in the cross-library bench
(within 8 of telego, which uses fasthttp by default), and per-call bytes
from 11.1 KiB to 6.6 KiB.

  bot := client.New(token,
      client.WithHTTPClient(client.NewFastHTTPDoer()),
  )

Implementation notes:
  - Wraps *fasthttp.Client behind the existing HTTPDoer (Do *http.Request)
    interface, so RetryDoer, custom transports, observability middleware,
    and the 1428 generated tests all keep working as-is.
  - Translates *http.Request -> fasthttp.Request once per call and
    returns a *http.Response whose Body releases the pooled fasthttp
    response on Close (net/http contract).
  - Recognises the bufferReadCloser / readerReadCloser shapes produced
    by buildRequest and passes their underlying bytes straight to
    SetBodyRaw -- no io.ReadAll, no copy.
  - Honours ctx.Deadline via DoDeadline, falls back to WithFastHTTPReadTimeout
    when no deadline is set. fasthttp.ErrTimeout maps to
    context.DeadlineExceeded for errors.Is compatibility.

Default stays net/http: fasthttp is HTTP/1.1 only, doesn't compose with
the http.RoundTripper middleware ecosystem, and most users don't have
the throughput to notice. Bots making thousands of API calls/sec should
opt in.

Multipart/file-upload path remains on net/http per the agreed scope --
the perf bottleneck was JSON-method round-trip, not file uploads.

Time numbers in the report deferred until a quiet-system bench run;
allocs/bytes numbers (which are deterministic per code path) are
already updated.
2026-05-10 23:07:04 +01:00

8.1 KiB
Raw Blame History

Benchmarks vs top 5 Go Telegram libraries

Date: 2026-05-10 Environment: Apple M4 Max · darwin/arm64 · go1.26.2 Methodology: go test -count=10 -bench=. -benchmem, summarised with benchstat (golang.org/x/perf) Source: test/benchmarks/ · raw output: results/raw.txt · benchstat: results/benchstat.txt

Libraries

Lib Module
ours github.com/lukaszraczylo/go-telegram (this repo)
gotba github.com/go-telegram-bot-api/telegram-bot-api/v5
telebot gopkg.in/telebot.v3 (tucnak)
gobot github.com/go-telegram/bot
telego github.com/mymmrac/telego
echotron github.com/NicoNex/echotron/v3

TL;DR

  • Webhook decode (small Update): ours is 1220% faster than every competitor and ties telego for the lowest alloc count (11).
  • Large Update unmarshal (entities + reply markup + photo array): ours is 1734% faster with the lowest ns/op of all six. telego edges us on alloc count (31 vs 34) at the cost of ~17% more time.
  • API call round-trip (mock HTTP server): telego wins on allocs (35.8 µs / 48 allocs) because it uses fasthttp by default. We default to net/http (102 allocs / 39.8 µs); with the opt-in client.NewFastHTTPDoer we drop to 56 allocs / 6.6 KiB — within 8 of telego while keeping *http.Request semantics (RetryDoer, middleware, generated tests).
  • Dispatcher routing (20 handlers, last matches): ours is 2.52.8× faster than telebot and gobot (98 ns vs 271 / 246 ns).

How to read these numbers

  • One machine, single workload, fixtures defined in shared/fixtures.go. Re-run on your hardware before drawing conclusions.
  • Codecs differ across libs (we use goccy/go-json; most competitors use stdlib encoding/json). Codec choice is part of the library's value prop, so we benchmark each library as it ships, not in some artificial common-codec mode.
  • "Equivalent code path" was chosen via each library's idiomatic public API for the same logical operation. The exact code is in the bench files alongside each BenchmarkXxx_<lib> function — read them.

1. Webhook decode — small Update (text message)

Decode shared.SmallUpdateJSON into the library's typed Update struct.

Lib sec/op B/op allocs/op
ours 1.832 µs ±4% 2.180 KiB 11
gotba 2.082 µs ±0% 1.461 KiB 17
telebot 2.194 µs ±1% 1.773 KiB 17
gobot 2.082 µs ±1% 1.789 KiB 16
telego 2.143 µs ±2% 3.058 KiB 11
echotron 2.039 µs ±1% 1.680 KiB 16

Notes. We use slightly more bytes because typed unions and the typed []UpdateType allocate richer Go values. We win on time and tie telego on alloc count.

2. Large Update unmarshal — entities + reply markup + photo array

Decode shared.LargeUpdateJSON (text + 3 entities + 2x3 inline keyboard + 3-size photo array). Stresses each library's union/discriminator decoding.

Lib sec/op B/op allocs/op
ours 6.726 µs ±1% 5.875 KiB 34
gotba 8.066 µs ±1% 3.438 KiB 56
telebot 10.190 µs ±1% 5.594 KiB 60
gobot 8.231 µs ±1% 4.703 KiB 50
telego 7.849 µs ±2% 6.600 KiB 31
echotron 8.123 µs ±1% 4.219 KiB 56

Notes. Despite the typed-union model giving us richer Go values per decode, we still produce them faster than every competitor. telego edges us by 3 allocs but pays 17% more time.

3. API call round-trip — sendMessage against a mock HTTP server

Build params → POST to local httptest.Server returning {"ok":true,"result":Message} → decode response.

Lib sec/op B/op allocs/op
ours (default net/http) 39.83 µs ±4% 11.09 KiB 102
ours (opt-in fasthttp) time TBD on quiet box 6.62 KiB 56
gotba 42.03 µs ±4% 10.97 KiB 125
telebot 43.41 µs ±1% 13.15 KiB 139
gobot 61.19 µs ±1% 13.50 KiB 176
telego (uses fasthttp) 35.84 µs ±1% 6.547 KiB 48
echotron skipped — see below

Notes.

  • The headline alloc gap to telego turned out to be transport choice: telego defaults to fasthttp, which pools requests/responses and skips most of net/http's bookkeeping. Most of the other libs (and us, by default) use net/http.
  • We ship an opt-in fasthttp doer (client.NewFastHTTPDoer). Plug it via client.WithHTTPClient(client.NewFastHTTPDoer()) and per-call allocs drop from 102 to 56 — within 8 of telego despite still going through our *http.Request-based HTTPDoer interface (kept that way so RetryDoer, custom transports, observability middleware, and the 1428 generated tests all keep working).
  • The default stays net/http because fasthttp is HTTP/1.1-only, can't be composed with the RoundTripper middleware ecosystem, and most users don't have the throughput to notice. Bots making thousands of API calls/sec should opt in.
  • Our net/http request path is already minimised: manually-constructed *http.Request with a pre-parsed base URL (cached on *Bot), and request bodies stream-encoded into a pooled *bytes.Buffer via the optional BodyEncoder codec extension. Those skip the url.Parse + *http.Request bookkeeping that http.NewRequestWithContext runs on every call.
  • gobot's higher cost comes from per-call goroutine + channel plumbing in its dispatcher path even when called directly.
  • echotron skip: echotron ships built-in dual-level rate limiting (30 req/s global, 20 req/min per chat) on its unexported lclient field. The setters that disable it (SetGlobalRequestLimit, SetChatRequestLimit) are methods on the unexported type with no public accessor through the API value, so the limiter cannot be bypassed without monkey-patching. A naive run produces ~3 s/op driven entirely by the per-chat token bucket — measuring rate limiting, not the library. We skip rather than publish a misleading number. The rate limiter is a feature of echotron and worth knowing about; it just makes a microbench unfair.

4. Dispatcher routing — 20 handlers, last one matches

Register 20 command handlers (/cmd0/cmd19); feed an update matching /cmd19 so the bench measures worst-case filter chain traversal.

Lib sec/op B/op allocs/op
ours 98.46 ns ±2% 128 B 3
telebot 270.9 ns ±2% 678 B 5
gobot 246.1 ns ±1% 48 B 1

Notes. We dispatch ~2.5× faster than telebot and gobot. gobot's single allocation is impressive but its routing decision is slower. telebot's higher cost reflects its richer per-update Context construction.

Coverage caveats.

  • gotba ships no built-in dispatcher; users route via a manual switch on Update fields. Benchmarking that against framework-based dispatchers would be apples-to-oranges, so it's omitted.
  • telego routes via a buffered channel + goroutine pool inside telegohandler.BotHandler. There is no public sync entry point, so the bench would conflate channel + goroutine overhead with routing cost.
  • echotron uses a chat-ID-keyed Dispatcher that fans out to per-chat Bot instances — a different paradigm (stateful per-chat bot loop), not directly comparable to "match this update against N handlers".

How to reproduce

cd test/benchmarks
go test -count=10 -bench=. -benchmem | tee results/raw.txt
benchstat results/raw.txt > results/benchstat.txt

Install benchstat if missing: go install golang.org/x/perf/cmd/benchstat@latest.

Bench code

All bench source lives under test/benchmarks/ as a separate Go module so competitor dependencies stay out of the root go.mod. The fixtures (the JSON each library decodes, the mock HTTP server) are in shared/fixtures.go — every library decodes the same bytes.