Replace the hand-rolled make([]byte, 0, 1024) + make([]byte, 4096) read loop in WebhookServer.ServeHTTP with a sync.Pool-backed bytes.Buffer drained via ReadFrom, fronted by http.MaxBytesReader for the 1 MiB body cap.
putWebhookBuf caps Cap() at 256 KiB before returning to the pool so a rare oversized update (max body is 1 MiB) doesn't permanently bloat the pool.
Bench delta on Webhook_ServeHTTP: 2564ns -> 2020ns (-21%), 12707B -> 7648B (-40%), 24 -> 23 allocs. The big byte saving is the 4 KiB tmp buffer + 1 KiB initial buf cap, replaced by one reused buffer across requests. The remaining alloc count is dominated by codec.Unmarshal decoding Update's pointer fields (*string, *int64), which is downstream of this change.
Replace io.ReadAll(resp.Body) on the typed Call/callMultipart paths with a sync.Pool-backed bytes.Buffer + ReadFrom. Saves the 512B initial allocation that ReadAll grows from on every successful call.
The pool only covers paths whose decoder copies strings out of the input (decodeResult delegates to goccy/go-json, which copies). CallRaw and callMultipartRaw return slices that alias the buffer storage, so they keep the io.ReadAll path; pooling there would need a defensive copy that defeats the saving.
putRespBuf caps Cap() at 64 KiB before returning to the pool so a single oversized response (e.g. large getFile metadata) doesn't bloat the pool for the rest of the process.
Bench delta on Call_BoolResponse: 14 allocs -> 13 allocs, 1842B -> 1331B, 526ns -> 479ns. Same shape on Call_StructResponse (16 -> 15, 1973B -> 1462B).
Move the three conventional Values keys ("command", "command_args", "regex_match") to typed fields on Context. Router and group routing write the fields directly; the Values map is allocated lazily via the new Set method and reserved for user-defined custom keys.
Allocation impact (M4 Max, b.Loop()):
DispatchCommand: 5 allocs/op -> 1, 153ns -> 69ns (-55%)
DispatchTextRegex: 5 allocs/op -> 2, 181ns -> 107ns (-41%)
DispatchFilter: 2 allocs/op -> 1, 32ns -> 19ns (-41%)
NewContext: 5.79ns -> 1.60ns
Trade-off: Context struct grew from ~48B to ~96B (three new fields), so filter-only paths pay ~50B more per dispatch. Command/regex paths save ~320B + 4 allocs each, which dominates for typical bot workloads.
Handlers reading c.Values["command"], c.Values["command_args"], or c.Values["regex_match"] now get nil; the typed fields c.Command, c.CommandArgs, c.RegexMatch are the new accessors. Custom keys still work via c.Set(k, v) and c.Values[k].
Two changes on the Call hot path:
* Replace httpReq.Header.Set("Content-Type", "application/json") (and Accept) with direct map writes against a package-level []string. Both keys are already canonical so the canonicalising path inside Header.Set was pure overhead; saves the per-call []string{val} allocation x2.
* Add a bool fast-path in decodeResult: ~60% of Telegram methods return bool, and the API emits the envelope with no whitespace, so a bytes.Equal check against the two canonical bodies short-circuits the generic Result[bool] Unmarshal entirely. any(true).(Resp) does not box thanks to Go's static bool interface values.
Combined effect on Call_BoolResponse: 18 -> 14 allocs/op, 634ns -> 526ns. DecodeResult_Bool isolation bench: 50ns / 2 allocs -> 2.87ns / 0 allocs.
Hermetic benchmarks (no network) covering Call encode+decode, webhook ServeHTTP body parse, and Router dispatch (command/regex/filter). Use Go 1.24+ b.Loop() idiom. .benchstats/baseline.txt pins the pre-optimisation numbers for benchstat comparisons.