compactor
MCP server that manages LLM working memory within a token budget. Stores, retrieves, and compacts context so conversations stay under limit without losing valuable information.
Designed to complement long-term memory tools (like claude-mnemonic) by handling short-term session context.
Install
Binary
Download from releases or build from source:
go build -o compactor .
Single binary, no external dependencies. ~6 MiB.
Docker
docker pull ghcr.io/lukaszraczylo/compaction-mcp:latest
Multi-platform image (linux/amd64, linux/arm64) built on distroless.
Usage
# Ephemeral (in-memory, default)
compactor
# With persistent state
compactor --state-dir ~/.local/share/compactor
# Explicit token budget
compactor --budget 80000
Docker
The container runs the compactor binary as its entrypoint. Since the MCP server communicates over stdio, run with -i (interactive):
# Ephemeral
docker run -i ghcr.io/lukaszraczylo/compaction-mcp:latest
# With persistent state
docker run -i -v compactor-data:/data ghcr.io/lukaszraczylo/compaction-mcp:latest --state-dir /data
# With explicit budget
docker run -i ghcr.io/lukaszraczylo/compaction-mcp:latest --budget 80000
Claude Code
.claude/settings.json (binary):
{
"mcpServers": {
"compactor": {
"command": "/path/to/compactor",
"args": ["--state-dir", "/tmp/compactor-state"]
}
}
}
.claude/settings.json (Docker):
{
"mcpServers": {
"compactor": {
"command": "docker",
"args": ["run", "-i", "--rm", "-v", "compactor-data:/data", "ghcr.io/lukaszraczylo/compaction-mcp:latest", "--state-dir", "/data"]
}
}
}
Cursor / other MCP clients
Same pattern. The server auto-detects the client and sets a reasonable budget:
- Claude clients: 80K tokens (40% of 200K context)
- Cursor: 60K tokens
- Override with
--budgetflag
Tools
| Tool | Description |
|---|---|
recall |
Call first every session. Restores previous context — returns budget status + top items by relevance |
store |
Store content with optional summary, tags, and importance (1-10) |
query |
BM25-ranked search by text and/or tag filtering |
status |
Check budget usage, item count, auto-compact settings |
compact |
Trigger compaction to a target usage ratio |
update |
Add/update summary for an item (post-compaction workflow) |
pin / unpin |
Protect items from eviction |
forget |
Remove a specific item |
list |
Paginated item listing (newest first) |
bulk_store |
Store multiple items in one call (JSON array) |
export |
Export all items, optionally as summaries |
configure |
Adjust budget, auto-compact toggle and threshold |
How compaction works
Three-phase pipeline, triggered automatically at 90% budget or manually via compact:
- Summary promotion - Replaces content with its summary (lowest-scored items first)
- Deduplication - Merges items with >70% word overlap (Jaccard similarity), keeping the higher-scored item
- Eviction - Removes lowest-scored items until target usage is reached
After compaction, items without summaries are flagged. The LLM can then generate summaries via update for future compaction cycles.
Scoring
Each item gets a retention score combining four signals:
score = 0.4 * importance + 0.3 * recency + 0.2 * access - 0.1 * size_penalty
Content-type awareness adjusts scoring automatically:
| Type | Detection | Score multiplier | Decay half-life |
|---|---|---|---|
| Error | error:, panic:, stack traces |
1.5x | 30 min |
| Decision | "decided", "going with", "approach:" | 1.3x | 6 hours |
| Code | func, class, backtick fences |
1.2x | 6 hours |
| Prose | Default | 1.0x | 2 hours |
| Tool output | $ prefix, table chars |
0.7x | 15 min |
Pinned items are never evicted.
Search
Full-text search uses BM25 ranking (k1=1.2, b=0.75) with:
- camelCase and snake_case token splitting
- 5x score boost for tag matches
- Combined BM25 relevance + item retention score
Auto-tagging
When no tags are provided, items are automatically tagged based on content:
- Content type (error, code, decision, tool-output)
- File extensions (.go, .ts, .py, etc.)
- Infrastructure keywords (kubernetes, docker, cilium, postgres, etc.)
- URL presence (tagged as "reference")
Persistence
With --state-dir, state is saved as atomic JSON every 30 seconds (when dirty) and on graceful shutdown. Without it, storage is ephemeral per session.
CLI flags
| Flag | Default | Description |
|---|---|---|
--budget |
100000 |
Token budget (overrides auto-detection) |
--state-dir |
"" |
Persistent state directory (empty = ephemeral) |
Making it seamless
The compactor is a tool the LLM must actively use — it doesn't intercept context automatically. To make usage habitual, add this to your CLAUDE.md:
## Working Memory (compactor MCP)
- At session start, ALWAYS call `recall` to restore previous context
- After making decisions, reading key files, or encountering errors: call `store` with a summary
- Before re-reading a file: call `query` to check if it's already stored
- When `status` shows >80% usage: call `compact`, then `update` items it flags
- Pin architecture decisions and user preferences with `pin`
The server also sends instructions via the MCP handshake that guide the LLM, but CLAUDE.md rules are stronger because they're treated as hard requirements.
How the three layers work together
- MCP server instructions — injected at connection time, tell the LLM the workflow
- CLAUDE.md rules — persistent across sessions, override default behavior
recalltool — gives the LLM a single action to restore context, reducing friction from 12 tools to 1 entry point
With persistence (--state-dir), context survives across sessions. The LLM calls recall → gets back its stored decisions, errors, code snippets → continues where it left off.
Architecture
main.go - Entry point, CLI flags, MCP server setup, persistence wiring
store.go - Core store: items, scoring, compaction, BM25 integration
tools.go - MCP tool definitions and handlers
index.go - BM25 inverted index with tag boosting
content.go - Content type detection and auto-tagging
persist.go - Atomic JSON persistence with background save
tokens.go - Token count estimation (~4 chars/token)
License
Private.