diff --git a/README.md b/README.md index fa95ab9..0062b88 100644 --- a/README.md +++ b/README.md @@ -13,7 +13,7 @@ Watches the friction in your coding sessions, clusters the signals via an LLM an [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE) [![Version](https://img.shields.io/github/v/release/lukaszraczylo/claude-adam?label=version&color=blue)](https://github.com/lukaszraczylo/claude-adam/releases) -[![Tests](https://img.shields.io/badge/tests-138%20passing-brightgreen.svg)](./adam/tests/run-tests.sh) +[![Tests](https://img.shields.io/badge/tests-140%20passing-brightgreen.svg)](./adam/tests/run-tests.sh) [![Node](https://img.shields.io/badge/node-22%2B-339933.svg)](https://nodejs.org) [![Platform](https://img.shields.io/badge/platform-macOS%20%7C%20Linux-lightgrey.svg)]() @@ -54,7 +54,7 @@ The installer copies files into `~/.claude/`, offers to merge ADAM's hook entrie Then: ```sh -bash ~/.claude/adam/tests/run-tests.sh # expect: 138 passed, 0 failed +bash ~/.claude/adam/tests/run-tests.sh # expect: 140 passed, 0 failed # … start a fresh Claude Code session … /reflect # walks the proposal queue /reflect --explain # also shows the analyst's clustering trace @@ -265,11 +265,12 @@ Or pass `--explain` to `/reflect` to render the full trace inline. │ ├── adam-apply-reinforcement.mjs # reinforcement proposal apply │ ├── adam-upgrade.mjs # .adam-new file UX (list/diff/accept) │ └── adam-archive.mjs # post-apply journal cleanup - └── tests/run-tests.sh # 138 isolated tests; never touches live state + └── tests/run-tests.sh # 140 isolated tests; never touches live state ``` ## What's new +- **v0.6.4** — rollback now keeps its promise. `adam-rollback.mjs`'s docstring always claimed it "removes the ab-tracking entry (so it doesn't re-trigger)," but `executeRollback()` never did — so a rolled-back proposal kept flagging as `regressed` on every subsequent `/reflect`, triggering endless `not_found` rollback attempts. It now deletes the matching `ab-tracking.jsonl` row by `proposal_id` (preserving unrelated rows). Surfaced by running ADAM's own loop twice. 140 tests (up from 138). - **v0.6.3** — release-update notifier. `install.sh` now writes a `~/.claude/adam/.version` marker; `adam-nudge.mjs` (SessionStart) compares it against the latest GitHub release at most once/day (cached, 1.5 s network cap, best-effort — never blocks) and prints a **notify-only** one-line update prompt. Deliberately not auto-applied: re-running the installer resets ADAM's own `/reflect`-applied skill edits, so you choose when to update. Opt out with `ADAM_NO_UPDATE_CHECK=1`. See "Staying up to date". 138 tests (up from 134). - **v0.6.2** — two fixes surfaced by running ADAM's loop on a large real journal. **(1) A/B volume normalization** (`adam-ab-measure.mjs`): regressions are now measured on the signal's *share* of total activity (rate = count / window-total), not raw count — so a generally busier journal after an apply no longer masquerades as a regression. Falls back to raw delta when the signal is the only activity in the window (preserves prior behavior + tests); output adds `raw_delta_pct`, `pre_total`, `post_total`, `normalized` for transparency. **(2) Memory frontmatter schema** (`agents/adam.md`, `SKILL.md`): the drafting protocol now emits the live auto-memory shape — `name` = slug + a `metadata: {node_type, type, originSessionId}` block — instead of flat `type:`/`originSessionId:`, so auto-applied memories load and categorize correctly. 134 tests (up from 132). - **v0.6.1** — new `file_reread` signal (MOSS §1 harness self-modification, proposed and approved through ADAM's own `/reflect` loop). Consecutive Reads of the same file at different `offset`/`limit` escaped `retry_loop`'s arg-hash dedup and leaked into `tool_error_loop`; `file_reread` now catches them (same file ≥3× in the 10-event window, offset-agnostic, guarded against double-counting byte-identical reads). Fully wired: detection (`adam-observe.mjs`), 14-day window (`adam-window.mjs`), severity divisor 3 (`adam-score.mjs`), file-basename clustering (`adam-batch.mjs`), and the analyst rubric/spec. 132 tests (up from 126). diff --git a/adam/scripts/adam-rollback.mjs b/adam/scripts/adam-rollback.mjs index 24c1175..146903d 100755 --- a/adam/scripts/adam-rollback.mjs +++ b/adam/scripts/adam-rollback.mjs @@ -167,6 +167,23 @@ export function executeRollback(plan, adamRoot, opts = {}) { result.actions.push(`nudge failed: ${e.message}`); } + // Remove the ab-tracking entry for this proposal so it stops re-flagging as a + // regression on every future /reflect (which would trigger endless not_found + // rollback attempts). This is the documented contract for rollback. + try { + const abPath = join(adamRoot, "ab-tracking.jsonl"); + if (existsSync(abPath)) { + const before = readJsonlSafe(abPath); + const kept = before.filter((e) => !(e && e.proposal_id === plan.proposal_id)); + if (kept.length !== before.length) { + writeFileSync(abPath, kept.length ? kept.map((e) => JSON.stringify(e)).join("\n") + "\n" : ""); + result.actions.push(`ab-tracking entry removed (${before.length - kept.length})`); + } + } + } catch (e) { + result.actions.push(`ab-tracking cleanup failed: ${e.message}`); + } + result.status = "rolled_back"; return result; } diff --git a/adam/tests/run-tests.sh b/adam/tests/run-tests.sh index 0745f93..f9956c4 100755 --- a/adam/tests/run-tests.sh +++ b/adam/tests/run-tests.sh @@ -2109,6 +2109,45 @@ else fi rm -f "$ROOT/.update-check.json" +# --- Test 118: rollback removes the proposal's ab-tracking entry (stops re-flagging) --- +echo "Test 118: rollback purges ab-tracking entry by proposal_id" +reset_state +rm -f "$ROOT/proposals/"*rollback* "$ROOT/active-nudges.json" +cat > "$ROOT/applied/2026-05-20T00-00-00Z-rb-ab-001.md" <<'EOF' +--- +id: rb-ab-001 +type: memory +target: ~/.claude/projects/-Users-nvm/memory/x.md +confidence: 5 +blast_radius: low +status: applied +source_entries: + - "2026-05-18T10:00:00Z" +--- +# Why +test +# Rollback +```bash +rm -f x +``` +EOF +cat > "$ROOT/ab-tracking.jsonl" <<'EOF' +{"applied_at":1,"proposal_id":"rb-ab-001","proposal_type":"memory","target_skill":"x","proposal_fingerprint":"f1","originating_signals":[{"type":"correction","count":3}],"pre_window_days":7} +{"applied_at":2,"proposal_id":"keep-me-002","proposal_type":"memory","target_skill":"y","proposal_fingerprint":"f2","originating_signals":[{"type":"correction","count":3}],"pre_window_days":7} +EOF +ROLLBACK_RUN --proposal-id rb-ab-001 --home "$TMP_HOME/.claude" >/dev/null 2>&1 || true +if grep -q '"proposal_id":"rb-ab-001"' "$ROOT/ab-tracking.jsonl"; then + echo " FAIL: rolled-back proposal still in ab-tracking.jsonl"; FAIL=$((FAIL+1)) +else + echo " PASS: rolled-back proposal removed from ab-tracking.jsonl"; PASS=$((PASS+1)) +fi +if grep -q '"proposal_id":"keep-me-002"' "$ROOT/ab-tracking.jsonl"; then + echo " PASS: unrelated ab-tracking entry preserved"; PASS=$((PASS+1)) +else + echo " FAIL: rollback clobbered an unrelated ab-tracking entry"; FAIL=$((FAIL+1)) +fi +rm -f "$ROOT/proposals/"*rb-ab-001* "$ROOT/applied/"*rb-ab-001* "$ROOT/ab-tracking.jsonl" "$ROOT/active-nudges.json" + echo echo "Results: $PASS passed, $FAIL failed" [ "$FAIL" = "0" ] diff --git a/agents/adam.md b/agents/adam.md index 5a48292..d8ffb24 100644 --- a/agents/adam.md +++ b/agents/adam.md @@ -516,7 +516,7 @@ MOSS's core thesis: "routing, hook ordering, state invariants, and dispatch live 2. `cross_session_evidence == true` (≥5 occurrences across ≥3 sessions) 3. `auto_apply_eligible: false` — **always**. Harness edits are never auto-applied. 4. `blast_radius: high` -5. Proposal includes a `# Test verification` section with the command `bash ~/.claude/adam/tests/run-tests.sh` and the expected result "138 passed, 0 failed" (or current pass count). The skill runs this test before applying. +5. Proposal includes a `# Test verification` section with the command `bash ~/.claude/adam/tests/run-tests.sh` and the expected result "140 passed, 0 failed" (or current pass count). The skill runs this test before applying. 6. Change is surgical: ≤30 LOC diff, single file. 7. `# Diagnosis` reconstructs the causal chain from harness-level behavior (not from text-artifact behavior). The mismatch must name a specific code path (function, regex, threshold) in the target file.