compaction-mcp/README.md

# compactor

MCP server that manages LLM working memory within a token budget. Stores, retrieves, and compacts context so conversations stay under limit without losing valuable information.

Designed to complement long-term memory tools (like claude-mnemonic) by handling short-term session context.

## Install

### Binary

Download from [releases](https://github.com/lukaszraczylo/compaction-mcp/releases) or build from source:

```sh
go build -o compactor .
```

Single binary, no external dependencies. ~6 MiB.

### Docker

```sh
docker pull ghcr.io/lukaszraczylo/compaction-mcp:latest
```

Multi-platform image (linux/amd64, linux/arm64) built on distroless.

## Usage

```sh
# Ephemeral (in-memory, default)
compactor

# With persistent state
compactor --state-dir ~/.local/share/compactor

# Explicit token budget
compactor --budget 80000
```

### Docker

The container runs the compactor binary as its entrypoint. Since the MCP server communicates over stdio, run with `-i` (interactive):

```sh
# Ephemeral
docker run -i ghcr.io/lukaszraczylo/compaction-mcp:latest

# With persistent state
docker run -i -v compactor-data:/data ghcr.io/lukaszraczylo/compaction-mcp:latest --state-dir /data

# With explicit budget
docker run -i ghcr.io/lukaszraczylo/compaction-mcp:latest --budget 80000
```

### Claude Code

`.claude/settings.json` (binary):
```json
{
  "mcpServers": {
    "compactor": {
      "command": "/path/to/compactor",
      "args": ["--state-dir", "/tmp/compactor-state"]
    }
  }
}
```

`.claude/settings.json` (Docker):
```json
{
  "mcpServers": {
    "compactor": {
      "command": "docker",
      "args": ["run", "-i", "--rm", "-v", "compactor-data:/data", "ghcr.io/lukaszraczylo/compaction-mcp:latest", "--state-dir", "/data"]
    }
  }
}
```

### Cursor / other MCP clients

Same pattern. The server auto-detects the client and sets a reasonable budget:
- **Claude** clients: 80K tokens (40% of 200K context)
- **Cursor**: 60K tokens
- Override with `--budget` flag

## Tools

| Tool | Description |
|------|-------------|
| `recall` | **Call first every session.** Restores previous context — returns budget status + top items by relevance |
| `store` | Store content with optional summary, tags, and importance (1-10) |
| `query` | BM25-ranked search by text and/or tag filtering |
| `status` | Check budget usage, item count, auto-compact settings |
| `compact` | Trigger compaction to a target usage ratio |
| `update` | Add/update summary for an item (post-compaction workflow) |
| `pin` / `unpin` | Protect items from eviction |
| `forget` | Remove a specific item |
| `list` | Paginated item listing (newest first) |
| `bulk_store` | Store multiple items in one call (JSON array) |
| `export` | Export all items, optionally as summaries |
| `configure` | Adjust budget, auto-compact toggle and threshold |

## How compaction works

Three-phase pipeline, triggered automatically at 90% budget or manually via `compact`:

1. **Summary promotion** - Replaces content with its summary (lowest-scored items first)
2. **Deduplication** - Merges items with >70% word overlap (Jaccard similarity), keeping the higher-scored item
3. **Eviction** - Removes lowest-scored items until target usage is reached

After compaction, items without summaries are flagged. The LLM can then generate summaries via `update` for future compaction cycles.

## Scoring

Each item gets a retention score combining four signals:

```
score = 0.4 * importance + 0.3 * recency + 0.2 * access - 0.1 * size_penalty
```

**Content-type awareness** adjusts scoring automatically:

| Type | Detection | Score multiplier | Decay half-life |
|------|-----------|-----------------|-----------------|
| Error | `error:`, `panic:`, stack traces | 1.5x | 30 min |
| Decision | "decided", "going with", "approach:" | 1.3x | 6 hours |
| Code | `func`, `class`, backtick fences | 1.2x | 6 hours |
| Prose | Default | 1.0x | 2 hours |
| Tool output | `$ ` prefix, table chars | 0.7x | 15 min |

Pinned items are never evicted.

## Search

Full-text search uses BM25 ranking (k1=1.2, b=0.75) with:
- camelCase and snake_case token splitting
- 5x score boost for tag matches
- Combined BM25 relevance + item retention score

## Auto-tagging

When no tags are provided, items are automatically tagged based on content:
- Content type (error, code, decision, tool-output)
- File extensions (.go, .ts, .py, etc.)
- Infrastructure keywords (kubernetes, docker, cilium, postgres, etc.)
- URL presence (tagged as "reference")

## Persistence

With `--state-dir`, state is saved as atomic JSON every 30 seconds (when dirty) and on graceful shutdown. Without it, storage is ephemeral per session.

## CLI flags

| Flag | Default | Description |
|------|---------|-------------|
| `--budget` | `100000` | Token budget (overrides auto-detection) |
| `--state-dir` | `""` | Persistent state directory (empty = ephemeral) |

## Making it seamless

The compactor is a tool the LLM must actively use — it doesn't intercept context automatically. To make usage habitual, add this to your `CLAUDE.md`:

```markdown
## Working Memory (compactor MCP)
- At session start, ALWAYS call `recall` to restore previous context
- After making decisions, reading key files, or encountering errors: call `store` with a summary
- Before re-reading a file: call `query` to check if it's already stored
- When `status` shows >80% usage: call `compact`, then `update` items it flags
- Pin architecture decisions and user preferences with `pin`
```

The server also sends instructions via the MCP handshake that guide the LLM, but CLAUDE.md rules are stronger because they're treated as hard requirements.

### How the three layers work together

1. **MCP server instructions** — injected at connection time, tell the LLM the workflow
2. **CLAUDE.md rules** — persistent across sessions, override default behavior
3. **`recall` tool** — gives the LLM a single action to restore context, reducing friction from 12 tools to 1 entry point

With persistence (`--state-dir`), context survives across sessions. The LLM calls `recall` → gets back its stored decisions, errors, code snippets → continues where it left off.

## Architecture

```
main.go       - Entry point, CLI flags, MCP server setup, persistence wiring
store.go      - Core store: items, scoring, compaction, BM25 integration
tools.go      - MCP tool definitions and handlers
index.go      - BM25 inverted index with tag boosting
content.go    - Content type detection and auto-tagging
persist.go    - Atomic JSON persistence with background save
tokens.go     - Token count estimation (~4 chars/token)
```

## License

Private.