diff --git a/README.md b/README.md
index fa95ab9..0062b88 100644
--- a/README.md
+++ b/README.md
@@ -13,7 +13,7 @@ Watches the friction in your coding sessions, clusters the signals via an LLM an
 
 [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
 [![Version](https://img.shields.io/github/v/release/lukaszraczylo/claude-adam?label=version&color=blue)](https://github.com/lukaszraczylo/claude-adam/releases)
-[![Tests](https://img.shields.io/badge/tests-138%20passing-brightgreen.svg)](./adam/tests/run-tests.sh)
+[![Tests](https://img.shields.io/badge/tests-140%20passing-brightgreen.svg)](./adam/tests/run-tests.sh)
 [![Node](https://img.shields.io/badge/node-22%2B-339933.svg)](https://nodejs.org)
 [![Platform](https://img.shields.io/badge/platform-macOS%20%7C%20Linux-lightgrey.svg)]()
 
@@ -54,7 +54,7 @@ The installer copies files into `~/.claude/`, offers to merge ADAM's hook entrie
 Then:
 
 ```sh
-bash ~/.claude/adam/tests/run-tests.sh   # expect: 138 passed, 0 failed
+bash ~/.claude/adam/tests/run-tests.sh   # expect: 140 passed, 0 failed
 # … start a fresh Claude Code session …
 /reflect                                  # walks the proposal queue
 /reflect --explain                        # also shows the analyst's clustering trace
@@ -265,11 +265,12 @@ Or pass `--explain` to `/reflect` to render the full trace inline.
     │   ├── adam-apply-reinforcement.mjs    # reinforcement proposal apply
     │   ├── adam-upgrade.mjs                # .adam-new file UX (list/diff/accept)
     │   └── adam-archive.mjs                # post-apply journal cleanup
-    └── tests/run-tests.sh            # 138 isolated tests; never touches live state
+    └── tests/run-tests.sh            # 140 isolated tests; never touches live state
 ```
 
 ## What's new
 
+- **v0.6.4** — rollback now keeps its promise. `adam-rollback.mjs`'s docstring always claimed it "removes the ab-tracking entry (so it doesn't re-trigger)," but `executeRollback()` never did — so a rolled-back proposal kept flagging as `regressed` on every subsequent `/reflect`, triggering endless `not_found` rollback attempts. It now deletes the matching `ab-tracking.jsonl` row by `proposal_id` (preserving unrelated rows). Surfaced by running ADAM's own loop twice. 140 tests (up from 138).
 - **v0.6.3** — release-update notifier. `install.sh` now writes a `~/.claude/adam/.version` marker; `adam-nudge.mjs` (SessionStart) compares it against the latest GitHub release at most once/day (cached, 1.5 s network cap, best-effort — never blocks) and prints a **notify-only** one-line update prompt. Deliberately not auto-applied: re-running the installer resets ADAM's own `/reflect`-applied skill edits, so you choose when to update. Opt out with `ADAM_NO_UPDATE_CHECK=1`. See "Staying up to date". 138 tests (up from 134).
 - **v0.6.2** — two fixes surfaced by running ADAM's loop on a large real journal. **(1) A/B volume normalization** (`adam-ab-measure.mjs`): regressions are now measured on the signal's *share* of total activity (rate = count / window-total), not raw count — so a generally busier journal after an apply no longer masquerades as a regression. Falls back to raw delta when the signal is the only activity in the window (preserves prior behavior + tests); output adds `raw_delta_pct`, `pre_total`, `post_total`, `normalized` for transparency. **(2) Memory frontmatter schema** (`agents/adam.md`, `SKILL.md`): the drafting protocol now emits the live auto-memory shape — `name` = slug + a `metadata: {node_type, type, originSessionId}` block — instead of flat `type:`/`originSessionId:`, so auto-applied memories load and categorize correctly. 134 tests (up from 132).
 - **v0.6.1** — new `file_reread` signal (MOSS §1 harness self-modification, proposed and approved through ADAM's own `/reflect` loop). Consecutive Reads of the same file at different `offset`/`limit` escaped `retry_loop`'s arg-hash dedup and leaked into `tool_error_loop`; `file_reread` now catches them (same file ≥3× in the 10-event window, offset-agnostic, guarded against double-counting byte-identical reads). Fully wired: detection (`adam-observe.mjs`), 14-day window (`adam-window.mjs`), severity divisor 3 (`adam-score.mjs`), file-basename clustering (`adam-batch.mjs`), and the analyst rubric/spec. 132 tests (up from 126).
diff --git a/adam/scripts/adam-rollback.mjs b/adam/scripts/adam-rollback.mjs
index 24c1175..146903d 100755
--- a/adam/scripts/adam-rollback.mjs
+++ b/adam/scripts/adam-rollback.mjs
@@ -167,6 +167,23 @@ export function executeRollback(plan, adamRoot, opts = {}) {
     result.actions.push(`nudge failed: ${e.message}`);
   }
 
+  // Remove the ab-tracking entry for this proposal so it stops re-flagging as a
+  // regression on every future /reflect (which would trigger endless not_found
+  // rollback attempts). This is the documented contract for rollback.
+  try {
+    const abPath = join(adamRoot, "ab-tracking.jsonl");
+    if (existsSync(abPath)) {
+      const before = readJsonlSafe(abPath);
+      const kept = before.filter((e) => !(e && e.proposal_id === plan.proposal_id));
+      if (kept.length !== before.length) {
+        writeFileSync(abPath, kept.length ? kept.map((e) => JSON.stringify(e)).join("\n") + "\n" : "");
+        result.actions.push(`ab-tracking entry removed (${before.length - kept.length})`);
+      }
+    }
+  } catch (e) {
+    result.actions.push(`ab-tracking cleanup failed: ${e.message}`);
+  }
+
   result.status = "rolled_back";
   return result;
 }
diff --git a/adam/tests/run-tests.sh b/adam/tests/run-tests.sh
index 0745f93..f9956c4 100755
--- a/adam/tests/run-tests.sh
+++ b/adam/tests/run-tests.sh
@@ -2109,6 +2109,45 @@ else
 fi
 rm -f "$ROOT/.update-check.json"
 
+# --- Test 118: rollback removes the proposal's ab-tracking entry (stops re-flagging) ---
+echo "Test 118: rollback purges ab-tracking entry by proposal_id"
+reset_state
+rm -f "$ROOT/proposals/"*rollback* "$ROOT/active-nudges.json"
+cat > "$ROOT/applied/2026-05-20T00-00-00Z-rb-ab-001.md" <<'EOF'
+---
+id: rb-ab-001
+type: memory
+target: ~/.claude/projects/-Users-nvm/memory/x.md
+confidence: 5
+blast_radius: low
+status: applied
+source_entries:
+  - "2026-05-18T10:00:00Z"
+---
+# Why
+test
+# Rollback
+```bash
+rm -f x
+```
+EOF
+cat > "$ROOT/ab-tracking.jsonl" <<'EOF'
+{"applied_at":1,"proposal_id":"rb-ab-001","proposal_type":"memory","target_skill":"x","proposal_fingerprint":"f1","originating_signals":[{"type":"correction","count":3}],"pre_window_days":7}
+{"applied_at":2,"proposal_id":"keep-me-002","proposal_type":"memory","target_skill":"y","proposal_fingerprint":"f2","originating_signals":[{"type":"correction","count":3}],"pre_window_days":7}
+EOF
+ROLLBACK_RUN --proposal-id rb-ab-001 --home "$TMP_HOME/.claude" >/dev/null 2>&1 || true
+if grep -q '"proposal_id":"rb-ab-001"' "$ROOT/ab-tracking.jsonl"; then
+  echo "  FAIL: rolled-back proposal still in ab-tracking.jsonl"; FAIL=$((FAIL+1))
+else
+  echo "  PASS: rolled-back proposal removed from ab-tracking.jsonl"; PASS=$((PASS+1))
+fi
+if grep -q '"proposal_id":"keep-me-002"' "$ROOT/ab-tracking.jsonl"; then
+  echo "  PASS: unrelated ab-tracking entry preserved"; PASS=$((PASS+1))
+else
+  echo "  FAIL: rollback clobbered an unrelated ab-tracking entry"; FAIL=$((FAIL+1))
+fi
+rm -f "$ROOT/proposals/"*rb-ab-001* "$ROOT/applied/"*rb-ab-001* "$ROOT/ab-tracking.jsonl" "$ROOT/active-nudges.json"
+
 echo
 echo "Results: $PASS passed, $FAIL failed"
 [ "$FAIL" = "0" ]
diff --git a/agents/adam.md b/agents/adam.md
index 5a48292..d8ffb24 100644
--- a/agents/adam.md
+++ b/agents/adam.md
@@ -516,7 +516,7 @@ MOSS's core thesis: "routing, hook ordering, state invariants, and dispatch live
 2. `cross_session_evidence == true` (≥5 occurrences across ≥3 sessions)
 3. `auto_apply_eligible: false` — **always**. Harness edits are never auto-applied.
 4. `blast_radius: high`
-5. Proposal includes a `# Test verification` section with the command `bash ~/.claude/adam/tests/run-tests.sh` and the expected result "138 passed, 0 failed" (or current pass count). The skill runs this test before applying.
+5. Proposal includes a `# Test verification` section with the command `bash ~/.claude/adam/tests/run-tests.sh` and the expected result "140 passed, 0 failed" (or current pass count). The skill runs this test before applying.
 6. Change is surgical: ≤30 LOC diff, single file.
 7. `# Diagnosis` reconstructs the causal chain from harness-level behavior (not from text-artifact behavior). The mismatch must name a specific code path (function, regex, threshold) in the target file.