mirror of
https://github.com/lukaszraczylo/compaction-mcp.git
synced 2026-06-05 23:14:02 +00:00
199 lines
6.5 KiB
Markdown
199 lines
6.5 KiB
Markdown
# compactor
|
|
|
|
MCP server that manages LLM working memory within a token budget. Stores, retrieves, and compacts context so conversations stay under limit without losing valuable information.
|
|
|
|
Designed to complement long-term memory tools (like claude-mnemonic) by handling short-term session context.
|
|
|
|
## Install
|
|
|
|
### Binary
|
|
|
|
Download from [releases](https://github.com/lukaszraczylo/compaction-mcp/releases) or build from source:
|
|
|
|
```sh
|
|
go build -o compactor .
|
|
```
|
|
|
|
Single binary, no external dependencies. ~6 MiB.
|
|
|
|
### Docker
|
|
|
|
```sh
|
|
docker pull ghcr.io/lukaszraczylo/compaction-mcp:latest
|
|
```
|
|
|
|
Multi-platform image (linux/amd64, linux/arm64) built on distroless.
|
|
|
|
## Usage
|
|
|
|
```sh
|
|
# Ephemeral (in-memory, default)
|
|
compactor
|
|
|
|
# With persistent state
|
|
compactor --state-dir ~/.local/share/compactor
|
|
|
|
# Explicit token budget
|
|
compactor --budget 80000
|
|
```
|
|
|
|
### Docker
|
|
|
|
The container runs the compactor binary as its entrypoint. Since the MCP server communicates over stdio, run with `-i` (interactive):
|
|
|
|
```sh
|
|
# Ephemeral
|
|
docker run -i ghcr.io/lukaszraczylo/compaction-mcp:latest
|
|
|
|
# With persistent state
|
|
docker run -i -v compactor-data:/data ghcr.io/lukaszraczylo/compaction-mcp:latest --state-dir /data
|
|
|
|
# With explicit budget
|
|
docker run -i ghcr.io/lukaszraczylo/compaction-mcp:latest --budget 80000
|
|
```
|
|
|
|
### Claude Code
|
|
|
|
`.claude/settings.json` (binary):
|
|
```json
|
|
{
|
|
"mcpServers": {
|
|
"compactor": {
|
|
"command": "/path/to/compactor",
|
|
"args": ["--state-dir", "/tmp/compactor-state"]
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
`.claude/settings.json` (Docker):
|
|
```json
|
|
{
|
|
"mcpServers": {
|
|
"compactor": {
|
|
"command": "docker",
|
|
"args": ["run", "-i", "--rm", "-v", "compactor-data:/data", "ghcr.io/lukaszraczylo/compaction-mcp:latest", "--state-dir", "/data"]
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
### Cursor / other MCP clients
|
|
|
|
Same pattern. The server auto-detects the client and sets a reasonable budget:
|
|
- **Claude** clients: 80K tokens (40% of 200K context)
|
|
- **Cursor**: 60K tokens
|
|
- Override with `--budget` flag
|
|
|
|
## Tools
|
|
|
|
| Tool | Description |
|
|
|------|-------------|
|
|
| `recall` | **Call first every session.** Restores previous context — returns budget status + top items by relevance |
|
|
| `store` | Store content with optional summary, tags, and importance (1-10) |
|
|
| `query` | BM25-ranked search by text and/or tag filtering |
|
|
| `status` | Check budget usage, item count, auto-compact settings |
|
|
| `compact` | Trigger compaction to a target usage ratio |
|
|
| `update` | Add/update summary for an item (post-compaction workflow) |
|
|
| `pin` / `unpin` | Protect items from eviction |
|
|
| `forget` | Remove a specific item |
|
|
| `list` | Paginated item listing (newest first) |
|
|
| `bulk_store` | Store multiple items in one call (JSON array) |
|
|
| `export` | Export all items, optionally as summaries |
|
|
| `configure` | Adjust budget, auto-compact toggle and threshold |
|
|
|
|
## How compaction works
|
|
|
|
Three-phase pipeline, triggered automatically at 90% budget or manually via `compact`:
|
|
|
|
1. **Summary promotion** - Replaces content with its summary (lowest-scored items first)
|
|
2. **Deduplication** - Merges items with >70% word overlap (Jaccard similarity), keeping the higher-scored item
|
|
3. **Eviction** - Removes lowest-scored items until target usage is reached
|
|
|
|
After compaction, items without summaries are flagged. The LLM can then generate summaries via `update` for future compaction cycles.
|
|
|
|
## Scoring
|
|
|
|
Each item gets a retention score combining four signals:
|
|
|
|
```
|
|
score = 0.4 * importance + 0.3 * recency + 0.2 * access - 0.1 * size_penalty
|
|
```
|
|
|
|
**Content-type awareness** adjusts scoring automatically:
|
|
|
|
| Type | Detection | Score multiplier | Decay half-life |
|
|
|------|-----------|-----------------|-----------------|
|
|
| Error | `error:`, `panic:`, stack traces | 1.5x | 30 min |
|
|
| Decision | "decided", "going with", "approach:" | 1.3x | 6 hours |
|
|
| Code | `func`, `class`, backtick fences | 1.2x | 6 hours |
|
|
| Prose | Default | 1.0x | 2 hours |
|
|
| Tool output | `$ ` prefix, table chars | 0.7x | 15 min |
|
|
|
|
Pinned items are never evicted.
|
|
|
|
## Search
|
|
|
|
Full-text search uses BM25 ranking (k1=1.2, b=0.75) with:
|
|
- camelCase and snake_case token splitting
|
|
- 5x score boost for tag matches
|
|
- Combined BM25 relevance + item retention score
|
|
|
|
## Auto-tagging
|
|
|
|
When no tags are provided, items are automatically tagged based on content:
|
|
- Content type (error, code, decision, tool-output)
|
|
- File extensions (.go, .ts, .py, etc.)
|
|
- Infrastructure keywords (kubernetes, docker, cilium, postgres, etc.)
|
|
- URL presence (tagged as "reference")
|
|
|
|
## Persistence
|
|
|
|
With `--state-dir`, state is saved as atomic JSON every 30 seconds (when dirty) and on graceful shutdown. Without it, storage is ephemeral per session.
|
|
|
|
## CLI flags
|
|
|
|
| Flag | Default | Description |
|
|
|------|---------|-------------|
|
|
| `--budget` | `100000` | Token budget (overrides auto-detection) |
|
|
| `--state-dir` | `""` | Persistent state directory (empty = ephemeral) |
|
|
|
|
## Making it seamless
|
|
|
|
The compactor is a tool the LLM must actively use — it doesn't intercept context automatically. To make usage habitual, add this to your `CLAUDE.md`:
|
|
|
|
```markdown
|
|
## Working Memory (compactor MCP)
|
|
- At session start, ALWAYS call `recall` to restore previous context
|
|
- After making decisions, reading key files, or encountering errors: call `store` with a summary
|
|
- Before re-reading a file: call `query` to check if it's already stored
|
|
- When `status` shows >80% usage: call `compact`, then `update` items it flags
|
|
- Pin architecture decisions and user preferences with `pin`
|
|
```
|
|
|
|
The server also sends instructions via the MCP handshake that guide the LLM, but CLAUDE.md rules are stronger because they're treated as hard requirements.
|
|
|
|
### How the three layers work together
|
|
|
|
1. **MCP server instructions** — injected at connection time, tell the LLM the workflow
|
|
2. **CLAUDE.md rules** — persistent across sessions, override default behavior
|
|
3. **`recall` tool** — gives the LLM a single action to restore context, reducing friction from 12 tools to 1 entry point
|
|
|
|
With persistence (`--state-dir`), context survives across sessions. The LLM calls `recall` → gets back its stored decisions, errors, code snippets → continues where it left off.
|
|
|
|
## Architecture
|
|
|
|
```
|
|
main.go - Entry point, CLI flags, MCP server setup, persistence wiring
|
|
store.go - Core store: items, scoring, compaction, BM25 integration
|
|
tools.go - MCP tool definitions and handlers
|
|
index.go - BM25 inverted index with tag boosting
|
|
content.go - Content type detection and auto-tagging
|
|
persist.go - Atomic JSON persistence with background save
|
|
tokens.go - Token count estimation (~4 chars/token)
|
|
```
|
|
|
|
## License
|
|
|
|
Private.
|