fix(v0.6.4): rollback removes the proposal's ab-tracking entry

adam-rollback.mjs's docstring always claimed it "removes the ab-tracking entry (so it doesn't re-trigger)", but executeRollback() never did. Consequence: a rolled-back proposal kept being re-detected as `regressed` on every subsequent /reflect, which triggered endless `not_found` rollback attempts (the applied file is already gone) and noisy ## Regressions sections. executeRollback now deletes the matching ab-tracking.jsonl row by proposal_id after the move, preserving all unrelated rows. Surfaced by running ADAM's own /reflect loop a second time (two zombie regressions: 2026-05-16-002 and 2026-05-22-001). Tests: 138 -> 140 (rollback purges the entry by id; an unrelated entry is preserved).
2026-06-09 23:19:12 +00:00 · 2026-05-29 13:50:38 +01:00
parent fcddb6bf79
commit c23b09cc09
4 changed files with 61 additions and 4 deletions
@@ -13,7 +13,7 @@ Watches the friction in your coding sessions, clusters the signals via an LLM an

 [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
 [![Version](https://img.shields.io/github/v/release/lukaszraczylo/claude-adam?label=version&color=blue)](https://github.com/lukaszraczylo/claude-adam/releases)
-[![Tests](https://img.shields.io/badge/tests-138%20passing-brightgreen.svg)](./adam/tests/run-tests.sh)
+[![Tests](https://img.shields.io/badge/tests-140%20passing-brightgreen.svg)](./adam/tests/run-tests.sh)
 [![Node](https://img.shields.io/badge/node-22%2B-339933.svg)](https://nodejs.org)
 [![Platform](https://img.shields.io/badge/platform-macOS%20%7C%20Linux-lightgrey.svg)]()

@@ -54,7 +54,7 @@ The installer copies files into `~/.claude/`, offers to merge ADAM's hook entrie
 Then:

 ```sh
-bash ~/.claude/adam/tests/run-tests.sh   # expect: 138 passed, 0 failed
+bash ~/.claude/adam/tests/run-tests.sh   # expect: 140 passed, 0 failed
 # … start a fresh Claude Code session …
 /reflect                                  # walks the proposal queue
 /reflect --explain                        # also shows the analyst's clustering trace
@@ -265,11 +265,12 @@ Or pass `--explain` to `/reflect` to render the full trace inline.
    │   ├── adam-apply-reinforcement.mjs    # reinforcement proposal apply
    │   ├── adam-upgrade.mjs                # .adam-new file UX (list/diff/accept)
    │   └── adam-archive.mjs                # post-apply journal cleanup
-    └── tests/run-tests.sh            # 138 isolated tests; never touches live state
+    └── tests/run-tests.sh            # 140 isolated tests; never touches live state
 ```

 ## What's new

+- **v0.6.4** — rollback now keeps its promise. `adam-rollback.mjs`'s docstring always claimed it "removes the ab-tracking entry (so it doesn't re-trigger)," but `executeRollback()` never did — so a rolled-back proposal kept flagging as `regressed` on every subsequent `/reflect`, triggering endless `not_found` rollback attempts. It now deletes the matching `ab-tracking.jsonl` row by `proposal_id` (preserving unrelated rows). Surfaced by running ADAM's own loop twice. 140 tests (up from 138).
 - **v0.6.3** — release-update notifier. `install.sh` now writes a `~/.claude/adam/.version` marker; `adam-nudge.mjs` (SessionStart) compares it against the latest GitHub release at most once/day (cached, 1.5 s network cap, best-effort — never blocks) and prints a **notify-only** one-line update prompt. Deliberately not auto-applied: re-running the installer resets ADAM's own `/reflect`-applied skill edits, so you choose when to update. Opt out with `ADAM_NO_UPDATE_CHECK=1`. See "Staying up to date". 138 tests (up from 134).
 - **v0.6.2** — two fixes surfaced by running ADAM's loop on a large real journal. **(1) A/B volume normalization** (`adam-ab-measure.mjs`): regressions are now measured on the signal's *share* of total activity (rate = count / window-total), not raw count — so a generally busier journal after an apply no longer masquerades as a regression. Falls back to raw delta when the signal is the only activity in the window (preserves prior behavior + tests); output adds `raw_delta_pct`, `pre_total`, `post_total`, `normalized` for transparency. **(2) Memory frontmatter schema** (`agents/adam.md`, `SKILL.md`): the drafting protocol now emits the live auto-memory shape — `name` = slug + a `metadata: {node_type, type, originSessionId}` block — instead of flat `type:`/`originSessionId:`, so auto-applied memories load and categorize correctly. 134 tests (up from 132).
 - **v0.6.1** — new `file_reread` signal (MOSS §1 harness self-modification, proposed and approved through ADAM's own `/reflect` loop). Consecutive Reads of the same file at different `offset`/`limit` escaped `retry_loop`'s arg-hash dedup and leaked into `tool_error_loop`; `file_reread` now catches them (same file ≥3× in the 10-event window, offset-agnostic, guarded against double-counting byte-identical reads). Fully wired: detection (`adam-observe.mjs`), 14-day window (`adam-window.mjs`), severity divisor 3 (`adam-score.mjs`), file-basename clustering (`adam-batch.mjs`), and the analyst rubric/spec. 132 tests (up from 126).
@@ -167,6 +167,23 @@ export function executeRollback(plan, adamRoot, opts = {}) {
    result.actions.push(`nudge failed: ${e.message}`);
  }

+  // Remove the ab-tracking entry for this proposal so it stops re-flagging as a
+  // regression on every future /reflect (which would trigger endless not_found
+  // rollback attempts). This is the documented contract for rollback.
+  try {
+    const abPath = join(adamRoot, "ab-tracking.jsonl");
+    if (existsSync(abPath)) {
+      const before = readJsonlSafe(abPath);
+      const kept = before.filter((e) => !(e && e.proposal_id === plan.proposal_id));
+      if (kept.length !== before.length) {
+        writeFileSync(abPath, kept.length ? kept.map((e) => JSON.stringify(e)).join("\n") + "\n" : "");
+        result.actions.push(`ab-tracking entry removed (${before.length - kept.length})`);
+      }
+    }
+  } catch (e) {
+    result.actions.push(`ab-tracking cleanup failed: ${e.message}`);
+  }
+
  result.status = "rolled_back";
  return result;
 }
@@ -2109,6 +2109,45 @@ else
 fi
 rm -f "$ROOT/.update-check.json"

+# --- Test 118: rollback removes the proposal's ab-tracking entry (stops re-flagging) ---
+echo "Test 118: rollback purges ab-tracking entry by proposal_id"
+reset_state
+rm -f "$ROOT/proposals/"*rollback* "$ROOT/active-nudges.json"
+cat > "$ROOT/applied/2026-05-20T00-00-00Z-rb-ab-001.md" <<'EOF'
+---
+id: rb-ab-001
+type: memory
+target: ~/.claude/projects/-Users-nvm/memory/x.md
+confidence: 5
+blast_radius: low
+status: applied
+source_entries:
+  - "2026-05-18T10:00:00Z"
+---
+# Why
+test
+# Rollback
+```bash
+rm -f x
+```
+EOF
+cat > "$ROOT/ab-tracking.jsonl" <<'EOF'
+{"applied_at":1,"proposal_id":"rb-ab-001","proposal_type":"memory","target_skill":"x","proposal_fingerprint":"f1","originating_signals":[{"type":"correction","count":3}],"pre_window_days":7}
+{"applied_at":2,"proposal_id":"keep-me-002","proposal_type":"memory","target_skill":"y","proposal_fingerprint":"f2","originating_signals":[{"type":"correction","count":3}],"pre_window_days":7}
+EOF
+ROLLBACK_RUN --proposal-id rb-ab-001 --home "$TMP_HOME/.claude" >/dev/null 2>&1 || true
+if grep -q '"proposal_id":"rb-ab-001"' "$ROOT/ab-tracking.jsonl"; then
+  echo "  FAIL: rolled-back proposal still in ab-tracking.jsonl"; FAIL=$((FAIL+1))
+else
+  echo "  PASS: rolled-back proposal removed from ab-tracking.jsonl"; PASS=$((PASS+1))
+fi
+if grep -q '"proposal_id":"keep-me-002"' "$ROOT/ab-tracking.jsonl"; then
+  echo "  PASS: unrelated ab-tracking entry preserved"; PASS=$((PASS+1))
+else
+  echo "  FAIL: rollback clobbered an unrelated ab-tracking entry"; FAIL=$((FAIL+1))
+fi
+rm -f "$ROOT/proposals/"*rb-ab-001* "$ROOT/applied/"*rb-ab-001* "$ROOT/ab-tracking.jsonl" "$ROOT/active-nudges.json"
+
 echo
 echo "Results: $PASS passed, $FAIL failed"
 [ "$FAIL" = "0" ]
@@ -516,7 +516,7 @@ MOSS's core thesis: "routing, hook ordering, state invariants, and dispatch live
 2. `cross_session_evidence == true` (≥5 occurrences across ≥3 sessions)
 3. `auto_apply_eligible: false` — **always**. Harness edits are never auto-applied.
 4. `blast_radius: high`
-5. Proposal includes a `# Test verification` section with the command `bash ~/.claude/adam/tests/run-tests.sh` and the expected result "138 passed, 0 failed" (or current pass count). The skill runs this test before applying.
+5. Proposal includes a `# Test verification` section with the command `bash ~/.claude/adam/tests/run-tests.sh` and the expected result "140 passed, 0 failed" (or current pass count). The skill runs this test before applying.
 6. Change is surgical: ≤30 LOC diff, single file.
 7. `# Diagnosis` reconstructs the causal chain from harness-level behavior (not from text-artifact behavior). The mismatch must name a specific code path (function, regex, threshold) in the target file.