Two changes that together cut allocs/call from 15 to 13 (client-internal
bench) and per-call CPU from 600ns to 455ns (-24%) on the no-HTTP path:
1. Codec gets an optional BodyEncoder extension (MarshalTo io.Writer).
When present, encodeJSONBody stream-encodes the request directly into
a pooled *bytes.Buffer instead of allocating a [2-step] Marshal+Reader
pair. DefaultCodec implements it via goccy/go-json.NewEncoder.
2. *Bot caches the parsed base URL on construction. buildRequest skips
net/http.NewRequestWithContext for the common case and constructs
*http.Request manually — clones the URL by value, sets the method
path, and populates ContentLength + GetBody from the body's concrete
type so RetryDoer's body-replay across attempts still works.
Cross-library bench (sendMessage round-trip vs httptest.Server): -2
allocs/call (104 -> 102), bytes -1.2%, time within noise (real HTTP
plumbing dominates). The CPU win is visible on synthetic stub-doer
benches and translates to lower GC pressure on sustained-throughput
workloads.
Slow-path fallback preserved for codecs that don't implement BodyEncoder
and for *Bot instances where url.Parse on the configured base failed —
they take the original NewRequestWithContext path.
Hermetic benchmarks (no network) covering Call encode+decode, webhook ServeHTTP body parse, and Router dispatch (command/regex/filter). Use Go 1.24+ b.Loop() idiom. .benchstats/baseline.txt pins the pre-optimisation numbers for benchstat comparisons.