fix(v0.6.4): rollback removes the proposal's ab-tracking entry

adam-rollback.mjs's docstring always claimed it "removes the ab-tracking entry
(so it doesn't re-trigger)", but executeRollback() never did. Consequence: a
rolled-back proposal kept being re-detected as `regressed` on every subsequent
/reflect, which triggered endless `not_found` rollback attempts (the applied
file is already gone) and noisy ## Regressions sections.

executeRollback now deletes the matching ab-tracking.jsonl row by proposal_id
after the move, preserving all unrelated rows. Surfaced by running ADAM's own
/reflect loop a second time (two zombie regressions: 2026-05-16-002 and
2026-05-22-001).

Tests: 138 -> 140 (rollback purges the entry by id; an unrelated entry is
preserved).
This commit is contained in:
2026-05-29 13:50:38 +01:00
parent fcddb6bf79
commit c23b09cc09
4 changed files with 61 additions and 4 deletions
+4 -3
View File
@@ -13,7 +13,7 @@ Watches the friction in your coding sessions, clusters the signals via an LLM an
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
[![Version](https://img.shields.io/github/v/release/lukaszraczylo/claude-adam?label=version&color=blue)](https://github.com/lukaszraczylo/claude-adam/releases)
[![Tests](https://img.shields.io/badge/tests-138%20passing-brightgreen.svg)](./adam/tests/run-tests.sh)
[![Tests](https://img.shields.io/badge/tests-140%20passing-brightgreen.svg)](./adam/tests/run-tests.sh)
[![Node](https://img.shields.io/badge/node-22%2B-339933.svg)](https://nodejs.org)
[![Platform](https://img.shields.io/badge/platform-macOS%20%7C%20Linux-lightgrey.svg)]()
@@ -54,7 +54,7 @@ The installer copies files into `~/.claude/`, offers to merge ADAM's hook entrie
Then:
```sh
bash ~/.claude/adam/tests/run-tests.sh # expect: 138 passed, 0 failed
bash ~/.claude/adam/tests/run-tests.sh # expect: 140 passed, 0 failed
# … start a fresh Claude Code session …
/reflect # walks the proposal queue
/reflect --explain # also shows the analyst's clustering trace
@@ -265,11 +265,12 @@ Or pass `--explain` to `/reflect` to render the full trace inline.
│ ├── adam-apply-reinforcement.mjs # reinforcement proposal apply
│ ├── adam-upgrade.mjs # .adam-new file UX (list/diff/accept)
│ └── adam-archive.mjs # post-apply journal cleanup
└── tests/run-tests.sh # 138 isolated tests; never touches live state
└── tests/run-tests.sh # 140 isolated tests; never touches live state
```
## What's new
- **v0.6.4** — rollback now keeps its promise. `adam-rollback.mjs`'s docstring always claimed it "removes the ab-tracking entry (so it doesn't re-trigger)," but `executeRollback()` never did — so a rolled-back proposal kept flagging as `regressed` on every subsequent `/reflect`, triggering endless `not_found` rollback attempts. It now deletes the matching `ab-tracking.jsonl` row by `proposal_id` (preserving unrelated rows). Surfaced by running ADAM's own loop twice. 140 tests (up from 138).
- **v0.6.3** — release-update notifier. `install.sh` now writes a `~/.claude/adam/.version` marker; `adam-nudge.mjs` (SessionStart) compares it against the latest GitHub release at most once/day (cached, 1.5 s network cap, best-effort — never blocks) and prints a **notify-only** one-line update prompt. Deliberately not auto-applied: re-running the installer resets ADAM's own `/reflect`-applied skill edits, so you choose when to update. Opt out with `ADAM_NO_UPDATE_CHECK=1`. See "Staying up to date". 138 tests (up from 134).
- **v0.6.2** — two fixes surfaced by running ADAM's loop on a large real journal. **(1) A/B volume normalization** (`adam-ab-measure.mjs`): regressions are now measured on the signal's *share* of total activity (rate = count / window-total), not raw count — so a generally busier journal after an apply no longer masquerades as a regression. Falls back to raw delta when the signal is the only activity in the window (preserves prior behavior + tests); output adds `raw_delta_pct`, `pre_total`, `post_total`, `normalized` for transparency. **(2) Memory frontmatter schema** (`agents/adam.md`, `SKILL.md`): the drafting protocol now emits the live auto-memory shape — `name` = slug + a `metadata: {node_type, type, originSessionId}` block — instead of flat `type:`/`originSessionId:`, so auto-applied memories load and categorize correctly. 134 tests (up from 132).
- **v0.6.1** — new `file_reread` signal (MOSS §1 harness self-modification, proposed and approved through ADAM's own `/reflect` loop). Consecutive Reads of the same file at different `offset`/`limit` escaped `retry_loop`'s arg-hash dedup and leaked into `tool_error_loop`; `file_reread` now catches them (same file ≥3× in the 10-event window, offset-agnostic, guarded against double-counting byte-identical reads). Fully wired: detection (`adam-observe.mjs`), 14-day window (`adam-window.mjs`), severity divisor 3 (`adam-score.mjs`), file-basename clustering (`adam-batch.mjs`), and the analyst rubric/spec. 132 tests (up from 126).
+17
View File
@@ -167,6 +167,23 @@ export function executeRollback(plan, adamRoot, opts = {}) {
result.actions.push(`nudge failed: ${e.message}`);
}
// Remove the ab-tracking entry for this proposal so it stops re-flagging as a
// regression on every future /reflect (which would trigger endless not_found
// rollback attempts). This is the documented contract for rollback.
try {
const abPath = join(adamRoot, "ab-tracking.jsonl");
if (existsSync(abPath)) {
const before = readJsonlSafe(abPath);
const kept = before.filter((e) => !(e && e.proposal_id === plan.proposal_id));
if (kept.length !== before.length) {
writeFileSync(abPath, kept.length ? kept.map((e) => JSON.stringify(e)).join("\n") + "\n" : "");
result.actions.push(`ab-tracking entry removed (${before.length - kept.length})`);
}
}
} catch (e) {
result.actions.push(`ab-tracking cleanup failed: ${e.message}`);
}
result.status = "rolled_back";
return result;
}
+39
View File
@@ -2109,6 +2109,45 @@ else
fi
rm -f "$ROOT/.update-check.json"
# --- Test 118: rollback removes the proposal's ab-tracking entry (stops re-flagging) ---
echo "Test 118: rollback purges ab-tracking entry by proposal_id"
reset_state
rm -f "$ROOT/proposals/"*rollback* "$ROOT/active-nudges.json"
cat > "$ROOT/applied/2026-05-20T00-00-00Z-rb-ab-001.md" <<'EOF'
---
id: rb-ab-001
type: memory
target: ~/.claude/projects/-Users-nvm/memory/x.md
confidence: 5
blast_radius: low
status: applied
source_entries:
- "2026-05-18T10:00:00Z"
---
# Why
test
# Rollback
```bash
rm -f x
```
EOF
cat > "$ROOT/ab-tracking.jsonl" <<'EOF'
{"applied_at":1,"proposal_id":"rb-ab-001","proposal_type":"memory","target_skill":"x","proposal_fingerprint":"f1","originating_signals":[{"type":"correction","count":3}],"pre_window_days":7}
{"applied_at":2,"proposal_id":"keep-me-002","proposal_type":"memory","target_skill":"y","proposal_fingerprint":"f2","originating_signals":[{"type":"correction","count":3}],"pre_window_days":7}
EOF
ROLLBACK_RUN --proposal-id rb-ab-001 --home "$TMP_HOME/.claude" >/dev/null 2>&1 || true
if grep -q '"proposal_id":"rb-ab-001"' "$ROOT/ab-tracking.jsonl"; then
echo " FAIL: rolled-back proposal still in ab-tracking.jsonl"; FAIL=$((FAIL+1))
else
echo " PASS: rolled-back proposal removed from ab-tracking.jsonl"; PASS=$((PASS+1))
fi
if grep -q '"proposal_id":"keep-me-002"' "$ROOT/ab-tracking.jsonl"; then
echo " PASS: unrelated ab-tracking entry preserved"; PASS=$((PASS+1))
else
echo " FAIL: rollback clobbered an unrelated ab-tracking entry"; FAIL=$((FAIL+1))
fi
rm -f "$ROOT/proposals/"*rb-ab-001* "$ROOT/applied/"*rb-ab-001* "$ROOT/ab-tracking.jsonl" "$ROOT/active-nudges.json"
echo
echo "Results: $PASS passed, $FAIL failed"
[ "$FAIL" = "0" ]
+1 -1
View File
@@ -516,7 +516,7 @@ MOSS's core thesis: "routing, hook ordering, state invariants, and dispatch live
2. `cross_session_evidence == true` (≥5 occurrences across ≥3 sessions)
3. `auto_apply_eligible: false`**always**. Harness edits are never auto-applied.
4. `blast_radius: high`
5. Proposal includes a `# Test verification` section with the command `bash ~/.claude/adam/tests/run-tests.sh` and the expected result "138 passed, 0 failed" (or current pass count). The skill runs this test before applying.
5. Proposal includes a `# Test verification` section with the command `bash ~/.claude/adam/tests/run-tests.sh` and the expected result "140 passed, 0 failed" (or current pass count). The skill runs this test before applying.
6. Change is surgical: ≤30 LOC diff, single file.
7. `# Diagnosis` reconstructs the causal chain from harness-level behavior (not from text-artifact behavior). The mismatch must name a specific code path (function, regex, threshold) in the target file.