Files

199 lines
6.5 KiB
Markdown

# compactor
MCP server that manages LLM working memory within a token budget. Stores, retrieves, and compacts context so conversations stay under limit without losing valuable information.
Designed to complement long-term memory tools (like claude-mnemonic) by handling short-term session context.
## Install
### Binary
Download from [releases](https://github.com/lukaszraczylo/compaction-mcp/releases) or build from source:
```sh
go build -o compactor .
```
Single binary, no external dependencies. ~6 MiB.
### Docker
```sh
docker pull ghcr.io/lukaszraczylo/compaction-mcp:latest
```
Multi-platform image (linux/amd64, linux/arm64) built on distroless.
## Usage
```sh
# Ephemeral (in-memory, default)
compactor
# With persistent state
compactor --state-dir ~/.local/share/compactor
# Explicit token budget
compactor --budget 80000
```
### Docker
The container runs the compactor binary as its entrypoint. Since the MCP server communicates over stdio, run with `-i` (interactive):
```sh
# Ephemeral
docker run -i ghcr.io/lukaszraczylo/compaction-mcp:latest
# With persistent state
docker run -i -v compactor-data:/data ghcr.io/lukaszraczylo/compaction-mcp:latest --state-dir /data
# With explicit budget
docker run -i ghcr.io/lukaszraczylo/compaction-mcp:latest --budget 80000
```
### Claude Code
`.claude/settings.json` (binary):
```json
{
"mcpServers": {
"compactor": {
"command": "/path/to/compactor",
"args": ["--state-dir", "/tmp/compactor-state"]
}
}
}
```
`.claude/settings.json` (Docker):
```json
{
"mcpServers": {
"compactor": {
"command": "docker",
"args": ["run", "-i", "--rm", "-v", "compactor-data:/data", "ghcr.io/lukaszraczylo/compaction-mcp:latest", "--state-dir", "/data"]
}
}
}
```
### Cursor / other MCP clients
Same pattern. The server auto-detects the client and sets a reasonable budget:
- **Claude** clients: 80K tokens (40% of 200K context)
- **Cursor**: 60K tokens
- Override with `--budget` flag
## Tools
| Tool | Description |
|------|-------------|
| `recall` | **Call first every session.** Restores previous context — returns budget status + top items by relevance |
| `store` | Store content with optional summary, tags, and importance (1-10) |
| `query` | BM25-ranked search by text and/or tag filtering |
| `status` | Check budget usage, item count, auto-compact settings |
| `compact` | Trigger compaction to a target usage ratio |
| `update` | Add/update summary for an item (post-compaction workflow) |
| `pin` / `unpin` | Protect items from eviction |
| `forget` | Remove a specific item |
| `list` | Paginated item listing (newest first) |
| `bulk_store` | Store multiple items in one call (JSON array) |
| `export` | Export all items, optionally as summaries |
| `configure` | Adjust budget, auto-compact toggle and threshold |
## How compaction works
Three-phase pipeline, triggered automatically at 90% budget or manually via `compact`:
1. **Summary promotion** - Replaces content with its summary (lowest-scored items first)
2. **Deduplication** - Merges items with >70% word overlap (Jaccard similarity), keeping the higher-scored item
3. **Eviction** - Removes lowest-scored items until target usage is reached
After compaction, items without summaries are flagged. The LLM can then generate summaries via `update` for future compaction cycles.
## Scoring
Each item gets a retention score combining four signals:
```
score = 0.4 * importance + 0.3 * recency + 0.2 * access - 0.1 * size_penalty
```
**Content-type awareness** adjusts scoring automatically:
| Type | Detection | Score multiplier | Decay half-life |
|------|-----------|-----------------|-----------------|
| Error | `error:`, `panic:`, stack traces | 1.5x | 30 min |
| Decision | "decided", "going with", "approach:" | 1.3x | 6 hours |
| Code | `func`, `class`, backtick fences | 1.2x | 6 hours |
| Prose | Default | 1.0x | 2 hours |
| Tool output | `$ ` prefix, table chars | 0.7x | 15 min |
Pinned items are never evicted.
## Search
Full-text search uses BM25 ranking (k1=1.2, b=0.75) with:
- camelCase and snake_case token splitting
- 5x score boost for tag matches
- Combined BM25 relevance + item retention score
## Auto-tagging
When no tags are provided, items are automatically tagged based on content:
- Content type (error, code, decision, tool-output)
- File extensions (.go, .ts, .py, etc.)
- Infrastructure keywords (kubernetes, docker, cilium, postgres, etc.)
- URL presence (tagged as "reference")
## Persistence
With `--state-dir`, state is saved as atomic JSON every 30 seconds (when dirty) and on graceful shutdown. Without it, storage is ephemeral per session.
## CLI flags
| Flag | Default | Description |
|------|---------|-------------|
| `--budget` | `100000` | Token budget (overrides auto-detection) |
| `--state-dir` | `""` | Persistent state directory (empty = ephemeral) |
## Making it seamless
The compactor is a tool the LLM must actively use — it doesn't intercept context automatically. To make usage habitual, add this to your `CLAUDE.md`:
```markdown
## Working Memory (compactor MCP)
- At session start, ALWAYS call `recall` to restore previous context
- After making decisions, reading key files, or encountering errors: call `store` with a summary
- Before re-reading a file: call `query` to check if it's already stored
- When `status` shows >80% usage: call `compact`, then `update` items it flags
- Pin architecture decisions and user preferences with `pin`
```
The server also sends instructions via the MCP handshake that guide the LLM, but CLAUDE.md rules are stronger because they're treated as hard requirements.
### How the three layers work together
1. **MCP server instructions** — injected at connection time, tell the LLM the workflow
2. **CLAUDE.md rules** — persistent across sessions, override default behavior
3. **`recall` tool** — gives the LLM a single action to restore context, reducing friction from 12 tools to 1 entry point
With persistence (`--state-dir`), context survives across sessions. The LLM calls `recall` → gets back its stored decisions, errors, code snippets → continues where it left off.
## Architecture
```
main.go - Entry point, CLI flags, MCP server setup, persistence wiring
store.go - Core store: items, scoring, compaction, BM25 integration
tools.go - MCP tool definitions and handlers
index.go - BM25 inverted index with tag boosting
content.go - Content type detection and auto-tagging
persist.go - Atomic JSON persistence with background save
tokens.go - Token count estimation (~4 chars/token)
```
## License
Private.