The worker's SQLite WAL could grow unbounded (observed 19MB) and wedge the
DB, hanging Claude Code on every prompt. No checkpoint ever truncated the
WAL (only PASSIVE auto-checkpoint, which cannot reclaim the file), the
connection-scoped pragmas were set via a single Exec so only one pooled
connection received them (e.g. busy_timeout=0 on the rest), and the
maintenance service that would optimize/checkpoint was never wired up.
- Register a sqlite3 ConnectHook driver so all pragmas (busy_timeout,
journal_mode, synchronous, cache_size, foreign_keys, journal_size_limit)
apply to every pooled connection; enable safe connection recycling.
- Add Store.Checkpoint (TRUNCATE), checkpoint-on-Close, and a periodic
size-gated checkpoint loop with configurable interval/threshold.
- Wire up the previously-dead maintenance service; make trigger_maintenance
actually run DB maintenance instead of only recalculating scores.
- Harden the user-prompt hook to honor its deadline and fail open so a
slow worker can never stall a prompt.
- Add regression tests for WAL truncation, checkpoint-on-close, and
per-connection pragmas.
Remove ~170MB of model files from the repository (LFS + committed).
Models are now downloaded at runtime from Hugging Face on first use
and cached to the OS cache directory with progress reporting and retries.
- Add internal/models/download.go: runtime downloader with retry, progress bar, checksums
- Remove go:embed for ONNX models (keep tokenizers embedded)
- Use file-based ONNX session loading instead of byte-slice
- Add scripts/download-models.sh for dev/CI model setup
- Update Makefile with setup-models target
- Update workflow-prepare.sh to download models in CI
- Set lfs: false in all CI workflows
- SHA256: bge=828e14..., cross-encoder=5d3e70...
MCP server (5 fixes):
- Move semaphore acquisition inside goroutine so main loop stays
responsive when all slots are taken
- Add 10s write timeout to sendResponse to prevent pipe deadlock
when Claude Code pauses reading stdout
- Send fallback JSON-RPC error when json.Marshal fails instead of
silently swallowing the error and leaving caller waiting forever
- Silence unknown notification methods (req.ID == nil) instead of
sending unsolicited error responses that may desync the host
- Return MCP isError content for tool failures instead of top-level
JSON-RPC error, matching the MCP specification
Vector/embedding (3 fixes):
- Move EmbedBatchWithContext call before writeMu.Lock in AddDocuments
so ONNX inference runs outside the write lock
- Replace singleflight.Do with DoChan + ctx select in both
getOrComputeEmbedding and UnifiedSearch so callers can bail out
independently when their context expires
- Add activeQueries atomic counter; skip cache warming when user
queries are in-flight; reduce warming timeout from 5s to 2s
Hooks (4 fixes):
- Cap EnsureWorkerRunning to 15s hard deadline with context; reduce
StartupTimeout from 30s to 10s; reduce port-in-use retries
- Fix nil dereference panic in user-prompt hook when initResult is
nil (non-JSON worker response); use comma-ok assertions
- Use package-level hookClient/healthClient with DisableKeepAlives
to prevent FD leaks in short-lived hook processes
- Set SysProcAttr{Setpgid: true} to detach worker from hook process
group, preventing kill-cascade from Claude Code
Worker/DB (3 fixes):
- Replace os.Exit(0) in MCP config watcher with context cancellation
for clean protocol shutdown
- Add 60s context.WithTimeout around ProcessObservation calls in
processAllSessions to prevent hung CLI subprocesses from blocking
the queue processor forever
- Set explicit PRAGMA wal_autocheckpoint=1000 and add PASSIVE WAL
checkpoint to Optimize() to prevent checkpoint stalls
Adds 20+ regression tests across all fix areas.
Root cause: synchronous MCP request processing combined with missing
context propagation to the embedding layer caused indefinite hangs when
ONNX inference was slow or the database was contended.
Changes:
- MCP server: dispatch each request in its own goroutine with semaphore
(cap 10) and WaitGroup for clean shutdown drain
- Embedding: add context-aware mutex acquisition (acquireMutex) so
callers can bail out instead of blocking forever on a stuck ONNX model
- Vector client: propagate context through getOrComputeEmbedding and
replace bare RLock() calls with context-aware acquireRLockWithContext
- Worker handlers: add 15s request-scoped timeouts to all search/context
handlers (handleSearchByPrompt, handleContextInject, handleFileContext,
handleContextCount, handleGetObservations/Summaries/Prompts)
- Worker HTTP server: set WriteTimeout=60s (was 0); SSE endpoint extends
deadline per-request via http.ResponseController
Fixes#45
Root cause: plugin registered as directory source in known_marketplaces.json,
which gets wiped on CLI updates. Now registers in extraKnownMarketplaces
(settings.json) as a GitHub source — same mechanism caveman/context-mode use.
Binaries install to ~/.claude-mnemonic/bin/ instead of the Claude-managed
plugins directory. Thin wrapper scripts in the repo let the marketplace
clone find them. Nothing gets cleaned up when Claude refreshes its cache.
Also fixed along the way:
- ONNX Runtime 1.24.3 → 1.26.0 (API v25 mismatch broke all embedding tests)
- Vector client leaked on DB reinit, processQueue had a race on sessionManager
- reloadConfig called os.Exit(0) bypassing graceful shutdown
- Removed dead QueryRowWithTimeout that leaked contexts
- Added tests for graph/watcher/maintenance/update (all were at 0%)