feat(v0.6.5): execution-grounded skill-utility report (adam-skill-utility)

Ranks skills by good:bad outcome co-occurrence (Wilson LB + lift vs baseline) over the journal's active_skills payloads — the SkillsInjector (arXiv 2605.29794) execution-grounded utility signal Δ(s), computed from data already collected, no training. - reuses adam-score NEGATIVE_SIGNAL_TYPES + entrySeverity (single source of truth) - registered in install.sh helper-script copy loop - /reflect pre-step surfaces worst below-baseline skills to the USER as advisory (co-occurrence != causation; not fed to the analyst's proposal machinery) - Test 119 added; full suite 141/141 green
fix(v0.6.4): rollback removes the proposal's ab-tracking entry
2026-06-22 02:01:44 +00:00 · 2026-06-02 01:47:40 +01:00 · 2026-05-29 13:50:38 +01:00 · 2026-05-29 13:13:59 +01:00 · 2026-05-29 12:37:10 +01:00
9 changed files with 647 additions and 27 deletions
@@ -13,7 +13,7 @@ Watches the friction in your coding sessions, clusters the signals via an LLM an

 [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
 [![Version](https://img.shields.io/github/v/release/lukaszraczylo/claude-adam?label=version&color=blue)](https://github.com/lukaszraczylo/claude-adam/releases)
-[![Tests](https://img.shields.io/badge/tests-132%20passing-brightgreen.svg)](./adam/tests/run-tests.sh)
+[![Tests](https://img.shields.io/badge/tests-140%20passing-brightgreen.svg)](./adam/tests/run-tests.sh)
 [![Node](https://img.shields.io/badge/node-22%2B-339933.svg)](https://nodejs.org)
 [![Platform](https://img.shields.io/badge/platform-macOS%20%7C%20Linux-lightgrey.svg)]()

@@ -54,7 +54,7 @@ The installer copies files into `~/.claude/`, offers to merge ADAM's hook entrie
 Then:

 ```sh
-bash ~/.claude/adam/tests/run-tests.sh   # expect: 132 passed, 0 failed
+bash ~/.claude/adam/tests/run-tests.sh   # expect: 140 passed, 0 failed
 # … start a fresh Claude Code session …
 /reflect                                  # walks the proposal queue
 /reflect --explain                        # also shows the analyst's clustering trace
@@ -63,10 +63,27 @@ bash ~/.claude/adam/tests/run-tests.sh   # expect: 132 passed, 0 failed
 Pin a release for reproducibility:

 ```sh
-curl -fsSL https://raw.githubusercontent.com/lukaszraczylo/claude-adam/v0.6.0/install.sh \
-  | VERSION=v0.6.0 bash
+curl -fsSL https://raw.githubusercontent.com/lukaszraczylo/claude-adam/v0.6.3/install.sh \
+  | VERSION=v0.6.3 bash
 ```

+### Staying up to date
+
+`install.sh` records the installed release in `~/.claude/adam/.version`. The
+SessionStart hook (`adam-nudge.mjs`) then checks the latest GitHub release **at
+most once a day** (cached in `~/.claude/adam/.update-check.json`, network call
+hard-capped at 1.5 s, fully best-effort — it never blocks or slows session
+start). When a newer release exists it prints a one-line, **notify-only** prompt:
+
+```
+[adam] update available: v0.6.3 → v0.6.4. Apply: curl -fsSL …/install.sh | bash
+       (re-runs install.sh — resets ADAM's own /reflect-applied skill edits; apply when you're ready)
+```
+
+It is deliberately **not** auto-applied: re-running `install.sh` overwrites
+ADAM's own `/reflect`-applied skill edits, so you decide when to take an update.
+Disable the check entirely with `ADAM_NO_UPDATE_CHECK=1` in your environment.
+
 ## How it works

 ```mermaid
@@ -248,11 +265,14 @@ Or pass `--explain` to `/reflect` to render the full trace inline.
    │   ├── adam-apply-reinforcement.mjs    # reinforcement proposal apply
    │   ├── adam-upgrade.mjs                # .adam-new file UX (list/diff/accept)
    │   └── adam-archive.mjs                # post-apply journal cleanup
-    └── tests/run-tests.sh            # 132 isolated tests; never touches live state
+    └── tests/run-tests.sh            # 140 isolated tests; never touches live state
 ```

 ## What's new

+- **v0.6.4** — rollback now keeps its promise. `adam-rollback.mjs`'s docstring always claimed it "removes the ab-tracking entry (so it doesn't re-trigger)," but `executeRollback()` never did — so a rolled-back proposal kept flagging as `regressed` on every subsequent `/reflect`, triggering endless `not_found` rollback attempts. It now deletes the matching `ab-tracking.jsonl` row by `proposal_id` (preserving unrelated rows). Surfaced by running ADAM's own loop twice. 140 tests (up from 138).
+- **v0.6.3** — release-update notifier. `install.sh` now writes a `~/.claude/adam/.version` marker; `adam-nudge.mjs` (SessionStart) compares it against the latest GitHub release at most once/day (cached, 1.5 s network cap, best-effort — never blocks) and prints a **notify-only** one-line update prompt. Deliberately not auto-applied: re-running the installer resets ADAM's own `/reflect`-applied skill edits, so you choose when to update. Opt out with `ADAM_NO_UPDATE_CHECK=1`. See "Staying up to date". 138 tests (up from 134).
+- **v0.6.2** — two fixes surfaced by running ADAM's loop on a large real journal. **(1) A/B volume normalization** (`adam-ab-measure.mjs`): regressions are now measured on the signal's *share* of total activity (rate = count / window-total), not raw count — so a generally busier journal after an apply no longer masquerades as a regression. Falls back to raw delta when the signal is the only activity in the window (preserves prior behavior + tests); output adds `raw_delta_pct`, `pre_total`, `post_total`, `normalized` for transparency. **(2) Memory frontmatter schema** (`agents/adam.md`, `SKILL.md`): the drafting protocol now emits the live auto-memory shape — `name` = slug + a `metadata: {node_type, type, originSessionId}` block — instead of flat `type:`/`originSessionId:`, so auto-applied memories load and categorize correctly. 134 tests (up from 132).
 - **v0.6.1** — new `file_reread` signal (MOSS §1 harness self-modification, proposed and approved through ADAM's own `/reflect` loop). Consecutive Reads of the same file at different `offset`/`limit` escaped `retry_loop`'s arg-hash dedup and leaked into `tool_error_loop`; `file_reread` now catches them (same file ≥3× in the 10-event window, offset-agnostic, guarded against double-counting byte-identical reads). Fully wired: detection (`adam-observe.mjs`), 14-day window (`adam-window.mjs`), severity divisor 3 (`adam-score.mjs`), file-basename clustering (`adam-batch.mjs`), and the analyst rubric/spec. 132 tests (up from 126).
 - **v0.6.0** — review hardening. Struggle signals now emit `active_skills`, so `silent_drift`'s primary cluster key and the §5b skill-attribution sub-clustering (+1 rubric bonus) actually fire (both were silently dead). `proposal_fingerprint` is now deterministically computable via `adam-cooldown.mjs --compute` instead of asking the LLM analyst to hand-compute a djb2 hash; spec now mandates a *stable* cluster id so fingerprints reproduce across runs. `reinforcement` proposals are correctly excluded from A/B tracking (the spec previously contradicted itself). `adam-nudge.mjs` pending-upgrade check now mirrors the full install set (`adam-utils`/`adam-batch`/`adam-rollback` were missing). Doc/test-count drift corrected. 126 tests (up from 114).
 - **v0.5.0** — MOSS-grounded self-evolution (arXiv 2605.22794). Transcript capture: `context_window` field on struggle signals captures 8 surrounding events for evidence-based diagnosis. Two-stage analysis pipeline: diagnose+plan → inter-stage validation → implement (§3.3). Evidence batching via `adam-batch.mjs`: pre-clusters journal into coherent failure batches (§3.1). Pre-apply verification: 4-check deterministic gate before auto-apply (§3.4). Auto-rollback via `adam-rollback.mjs`: reverts regressed proposals detected by A/B measurement, creates regression nudges (§3.5). Harness self-modification: new `harness_edit` proposal type lets ADAM propose edits to its own scripts with test-suite-gated apply (§1 Table 1). Keypoint matrix: 5 capability dimensions scored per batch for structured evaluation (§4.2). 114 tests (up from 94).
@@ -3,11 +3,19 @@
 //
 // Reads ~/.claude/adam/ab-tracking.jsonl (one line per auto-apply event,
 // written by adam-self-improvement/SKILL.md), then for each entry old enough
-// (>= --min-age-days; default 7) compares signal counts in the 7-day window
-// BEFORE applied_at against the 7-day window AFTER applied_at across the
+// (>= --min-age-days; default 7) compares the originating signal in the 7-day
+// window BEFORE applied_at against the 7-day window AFTER applied_at across the
 // full journal corpus (active + rotated). Surfaces regressions so /reflect
 // can flag proposals that made things worse.
 //
+// Volume normalization: when the windows contain other (non-originating)
+// activity, the delta is computed on the signal's SHARE of total activity
+// (rate = count / total), not its raw count — so a generally busier journal
+// after apply does not masquerade as a regression. When the signal is the only
+// activity in the windows, it falls back to the raw-count delta. Output carries
+// both `delta_pct` (drives status) and `raw_delta_pct` + `normalized` for
+// transparency.
+//
 // CLI:
 //   adam-ab-measure.mjs [--home <path>] [--format json|table] [--min-age-days N]
 //
@@ -92,31 +100,60 @@ export function computeDeltas(entries, journal, opts = {}) {

    const preStart = appliedAt - windowDays * DAY_MS;
    const postEnd = appliedAt + windowDays * DAY_MS;
+    // preCount/postCount = originating-signal occurrences; preTotal/postTotal =
+    // ALL journal entries in the window (the activity denominator).
    let preCount = 0;
    let postCount = 0;
+    let preTotal = 0;
+    let postTotal = 0;
    for (const je of journal || []) {
      if (!je || typeof je !== "object") continue;
-      if (!sigSet.has(je.type)) continue;
      const t = tsMs(je);
      if (Number.isNaN(t)) continue;
-      if (t >= preStart && t < appliedAt) preCount++;
-      else if (t >= appliedAt && t < postEnd) postCount++;
+      const inPre = t >= preStart && t < appliedAt;
+      const inPost = t >= appliedAt && t < postEnd;
+      if (!inPre && !inPost) continue;
+      if (inPre) preTotal++; else postTotal++;
+      if (!sigSet.has(je.type)) continue;
+      if (inPre) preCount++; else postCount++;
    }

    let status;
    let deltaPct;
+    let rawDeltaPct = null;
+    let normalized = false;
    if (preCount === 0) {
      status = "no_baseline";
      deltaPct = null;
    } else {
-      deltaPct = ((postCount - preCount) / preCount) * 100;
+      rawDeltaPct = Math.round(((postCount - preCount) / preCount) * 10000) / 100;
+      // Volume normalization: when the windows contain non-originating activity,
+      // compare the signal's SHARE of activity (rate), not its absolute count —
+      // otherwise a generally busier post-window masquerades as a regression.
+      // No background (signal IS the only activity) → fall back to raw delta,
+      // preserving prior behavior.
+      const hasBackground = (preTotal - preCount) + (postTotal - postCount) > 0;
+      if (hasBackground && postTotal > 0) {
+        const preRate = preCount / preTotal;   // preTotal >= preCount > 0
+        const postRate = postCount / postTotal;
+        deltaPct = ((postRate - preRate) / preRate) * 100;
+        normalized = true;
+      } else {
+        deltaPct = ((postCount - preCount) / preCount) * 100;
+      }
      // Round to 2 dp for stable comparison + presentation.
      deltaPct = Math.round(deltaPct * 100) / 100;
      if (deltaPct <= IMPROVED_PCT) status = "improved";
      else if (deltaPct >= REGRESSED_PCT) status = "regressed";
      else status = "neutral";
    }
-    out.push({ ...base, pre_count: preCount, post_count: postCount, delta_pct: deltaPct, status });
+    out.push({
+      ...base,
+      pre_count: preCount, post_count: postCount,
+      pre_total: preTotal, post_total: postTotal,
+      raw_delta_pct: rawDeltaPct, normalized,
+      delta_pct: deltaPct, status,
+    });
  }
  return out;
 }
@@ -167,6 +167,23 @@ export function executeRollback(plan, adamRoot, opts = {}) {
    result.actions.push(`nudge failed: ${e.message}`);
  }

+  // Remove the ab-tracking entry for this proposal so it stops re-flagging as a
+  // regression on every future /reflect (which would trigger endless not_found
+  // rollback attempts). This is the documented contract for rollback.
+  try {
+    const abPath = join(adamRoot, "ab-tracking.jsonl");
+    if (existsSync(abPath)) {
+      const before = readJsonlSafe(abPath);
+      const kept = before.filter((e) => !(e && e.proposal_id === plan.proposal_id));
+      if (kept.length !== before.length) {
+        writeFileSync(abPath, kept.length ? kept.map((e) => JSON.stringify(e)).join("\n") + "\n" : "");
+        result.actions.push(`ab-tracking entry removed (${before.length - kept.length})`);
+      }
+    }
+  } catch (e) {
+    result.actions.push(`ab-tracking cleanup failed: ${e.message}`);
+  }
+
  result.status = "rolled_back";
  return result;
 }
@@ -0,0 +1,272 @@
+#!/usr/bin/env node
+// adam-skill-utility.mjs — execution-grounded per-skill utility report.
+//
+// Inspired by SkillsInjector (arXiv 2605.29794v1), which shows skill injection
+// should be driven by execution-grounded *utility* Δ(t,s), not surface keyword
+// match — and that some topically-relevant skills actively *lower* success.
+// The paper learns Δ(t,s) from rollout outcomes. We don't train anything: the
+// adam journal already attaches `active_skills` to both positive outcome events
+// (task_completed, clean_recovery, correction_free_streak) and negative ones
+// (dead_end, tool_error_loop, …). So we approximate Δ(s) as a co-occurrence
+// ratio over the data we already collect.
+//
+// CAVEAT (honest): this is CO-OCCURRENCE, not causation. A skill active during
+// a dead_end did not necessarily cause it. Read the report as "which skills
+// correlate with friction", a prompt for review — never as proof.
+//
+// Metric, per skill active during scored events:
+//   pos / neg     — count of positive / negative outcome events it co-occurred with
+//   share         — pos / (pos+neg)
+//   lift          — share − global_baseline   (>0 above baseline, <0 below)
+//   wLB           — Wilson 95% lower bound of the positive proportion; ranks
+//                   *reliably* below-baseline skills to the top (sample-aware)
+//   sevNeg        — severity-weighted negative sum (adam SEVERITY_DIVISORS)
+//   topNeg        — dominant negative event type
+// Rows sorted worst-first (lowest wLB) so harmful/over-eager skills surface.
+//
+// CLI:
+//   adam-skill-utility.mjs [--home <path>] [--input <jsonl-path>]
+//                          [--min <n>] [--days <n>] [--json]
+//   --min   min event count (n) to treat a skill's signal as confident (default 8)
+//   --days  only consider events within the last <n> days (default: all)
+//   --json  emit machine-readable JSON instead of the text table
+//
+// Reuses adam-utils (jsonl IO) and adam-score (canonical NEGATIVE set +
+// severity), so the positive/negative taxonomy stays single-sourced.
+
+import { readFileSync } from "node:fs";
+import { join } from "node:path";
+import { homedir } from "node:os";
+import { readJsonlSafe, listJsonlFiles } from "./adam-utils.mjs";
+import { NEGATIVE_SIGNAL_TYPES, entrySeverity } from "./adam-score.mjs";
+
+// Positive outcome signals (mirror adam's vocabulary; task_completed is adam's
+// canonical "clean task", the same one adam-score uses for reinforcement).
+export const POSITIVE_SIGNAL_TYPES = new Set([
+  "task_completed",
+  "clean_recovery",
+  "correction_free_streak",
+]);
+
+export const DEFAULT_MIN_SAMPLE = 8;
+
+function round(x) {
+  return Math.round(x * 1000) / 1000;
+}
+
+// Wilson score interval lower bound for a binomial proportion. Sample-aware:
+// a skill with 1 pos / 0 neg does NOT outrank one with 40 pos / 2 neg.
+export function wilsonLower(pos, n, z = 1.96) {
+  if (n <= 0) return 0;
+  const p = pos / n;
+  const z2 = z * z;
+  const denom = 1 + z2 / n;
+  const center = p + z2 / (2 * n);
+  const margin = z * Math.sqrt((p * (1 - p) + z2 / (4 * n)) / n);
+  return (center - margin) / denom;
+}
+
+// computeSkillUtility: pure. entries → { baseline, totalPos, totalNeg, min, skills[] }.
+export function computeSkillUtility(entries, opts = {}) {
+  const min = Number.isFinite(opts.min) ? opts.min : DEFAULT_MIN_SAMPLE;
+  const per = new Map();
+  let totalPos = 0;
+  let totalNeg = 0;
+
+  for (const e of entries || []) {
+    if (!e || typeof e !== "object") continue;
+    const isPos = POSITIVE_SIGNAL_TYPES.has(e.type);
+    const isNeg = NEGATIVE_SIGNAL_TYPES.has(e.type);
+    if (!isPos && !isNeg) continue;
+
+    if (isPos) totalPos++;
+    else totalNeg++;
+    const sev = isNeg ? entrySeverity(e) : 0;
+
+    const skills = Array.isArray(e.active_skills) ? e.active_skills : [];
+    for (const slug of skills) {
+      if (!slug || typeof slug !== "string") continue;
+      if (!per.has(slug)) {
+        per.set(slug, { pos: 0, neg: 0, sevNeg: 0, negTypes: {}, recent_ts: null });
+      }
+      const s = per.get(slug);
+      if (isPos) {
+        s.pos++;
+      } else {
+        s.neg++;
+        s.sevNeg += sev;
+        s.negTypes[e.type] = (s.negTypes[e.type] || 0) + 1;
+      }
+      const ts = typeof e.ts === "string" ? e.ts : null;
+      if (ts && (!s.recent_ts || ts > s.recent_ts)) s.recent_ts = ts;
+    }
+  }
+
+  const scored = totalPos + totalNeg;
+  const baseline = scored ? totalPos / scored : 0;
+
+  const skills = [];
+  for (const [slug, s] of per.entries()) {
+    const n = s.pos + s.neg;
+    const share = n ? s.pos / n : 0;
+    const topNeg = Object.entries(s.negTypes).sort((a, b) => b[1] - a[1])[0];
+    skills.push({
+      skill: slug,
+      n,
+      pos: s.pos,
+      neg: s.neg,
+      share: round(share),
+      lift: round(share - baseline),
+      wLB: round(wilsonLower(s.pos, n)),
+      sevNeg: s.sevNeg,
+      topNeg: topNeg ? topNeg[0] : null,
+      lowSample: n < min,
+      recent_ts: s.recent_ts,
+    });
+  }
+  // Worst-first: lowest Wilson lower bound, then most negatives.
+  skills.sort(
+    (a, b) =>
+      a.wLB - b.wLB ||
+      b.neg - a.neg ||
+      (a.skill < b.skill ? -1 : a.skill > b.skill ? 1 : 0),
+  );
+
+  return { baseline: round(baseline), totalPos, totalNeg, min, skills };
+}
+
+function parseArgs(argv) {
+  const args = { home: null, input: null, min: DEFAULT_MIN_SAMPLE, days: null, json: false, help: false };
+  for (let i = 0; i < argv.length; i++) {
+    const a = argv[i];
+    if (a === "--home" && i + 1 < argv.length) args.home = argv[++i];
+    else if (a === "--input" && i + 1 < argv.length) args.input = argv[++i];
+    else if (a === "--min" && i + 1 < argv.length) args.min = Number(argv[++i]);
+    else if (a === "--days" && i + 1 < argv.length) args.days = Number(argv[++i]);
+    else if (a === "--json") args.json = true;
+    else if (a === "--help" || a === "-h") args.help = true;
+  }
+  return args;
+}
+
+function readAllStdin() {
+  try { return readFileSync(0, "utf8"); } catch { return ""; }
+}
+
+function entriesFromText(text) {
+  const out = [];
+  for (const line of (text || "").split("\n")) {
+    if (!line) continue;
+    try { out.push(JSON.parse(line)); } catch { /* skip */ }
+  }
+  return out;
+}
+
+// Same gathering strategy as adam-score.mjs: explicit --input, else piped
+// stdin (e.g. from adam-window.mjs), else the active journal + rotated files.
+function gatherInputEntries(args) {
+  if (args.input) return readJsonlSafe(args.input);
+  if (!process.stdin.isTTY) {
+    const piped = readAllStdin();
+    if (piped && piped.trim()) return entriesFromText(piped);
+  }
+  const home = args.home || join(homedir(), ".claude");
+  const adamRoot = join(home, "adam");
+  const sources = [join(adamRoot, "journal.jsonl"), ...listJsonlFiles(join(adamRoot, "journal"))];
+  const all = [];
+  for (const p of sources) {
+    for (const e of readJsonlSafe(p)) all.push(e);
+  }
+  return all;
+}
+
+function filterByDays(entries, days) {
+  if (!Number.isFinite(days) || days <= 0) return entries;
+  // Anchor the window to the newest ts in the data (avoids Date.now()
+  // nondeterminism and works on historical exports).
+  let maxMs = 0;
+  for (const e of entries) {
+    const ms = e && typeof e.ts === "string" ? Date.parse(e.ts) : NaN;
+    if (Number.isFinite(ms) && ms > maxMs) maxMs = ms;
+  }
+  if (!maxMs) return entries;
+  const cutoff = maxMs - days * 86400000;
+  return entries.filter((e) => {
+    const ms = e && typeof e.ts === "string" ? Date.parse(e.ts) : NaN;
+    return Number.isFinite(ms) ? ms >= cutoff : false;
+  });
+}
+
+function pad(s, w) {
+  s = String(s);
+  return s.length >= w ? s : s + " ".repeat(w - s.length);
+}
+function padL(s, w) {
+  s = String(s);
+  return s.length >= w ? s : " ".repeat(w - s.length) + s;
+}
+
+function renderText(report) {
+  const { baseline, totalPos, totalNeg, min, skills } = report;
+  const lines = [];
+  lines.push("adam skill-utility report — execution-grounded Δ(skill) proxy");
+  lines.push(
+    `baseline positive-rate ${(baseline * 100).toFixed(1)}%  ` +
+      `(${totalPos} positive / ${totalNeg} negative outcome events)  min-sample n≥${min}`,
+  );
+  lines.push("CAVEAT: co-occurrence, not causation. Worst-first. ⚠ = below baseline with n≥min.");
+  lines.push("");
+  const head =
+    pad("skill", 44) + padL("n", 5) + padL("pos", 6) + padL("neg", 6) +
+    padL("share", 8) + padL("lift", 8) + padL("wLB", 7) + padL("sevNeg", 8) +
+    "  " + pad("topNeg", 18) + "flag";
+  lines.push(head);
+  lines.push("-".repeat(head.length));
+  for (const s of skills) {
+    const below = s.lift < 0 && !s.lowSample;
+    const flag = below ? "⚠" : s.lowSample ? "·(low n)" : "";
+    lines.push(
+      pad(s.skill, 44) +
+        padL(s.n, 5) +
+        padL(s.pos, 6) +
+        padL(s.neg, 6) +
+        padL((s.share * 100).toFixed(0) + "%", 8) +
+        padL((s.lift >= 0 ? "+" : "") + (s.lift * 100).toFixed(0) + "%", 8) +
+        padL(s.wLB.toFixed(2), 7) +
+        padL(s.sevNeg, 8) +
+        "  " +
+        pad(s.topNeg || "-", 18) +
+        flag,
+    );
+  }
+  return lines.join("\n");
+}
+
+function main() {
+  const args = parseArgs(process.argv.slice(2));
+  if (args.help) {
+    process.stdout.write(
+      "usage: adam-skill-utility.mjs [--home <path>] [--input <jsonl-path>] " +
+        "[--min <n>] [--days <n>] [--json]\n",
+    );
+    process.exit(0);
+  }
+  try {
+    let entries = gatherInputEntries(args);
+    entries = filterByDays(entries, args.days);
+    const report = computeSkillUtility(entries, { min: args.min });
+    if (args.json) {
+      process.stdout.write(JSON.stringify(report) + "\n");
+    } else {
+      process.stdout.write(renderText(report) + "\n");
+    }
+    process.exit(0);
+  } catch (e) {
+    process.stderr.write(`adam-skill-utility error: ${e.message}\n`);
+    process.exit(1);
+  }
+}
+
+if (import.meta.url === `file://${process.argv[1]}`) {
+  main();
+}
@@ -18,6 +18,7 @@ APPLYREIN="$REAL_HOME/.claude/adam/scripts/adam-apply-reinforcement.mjs"
 UPGRADE="$REAL_HOME/.claude/adam/scripts/adam-upgrade.mjs"
 BATCH="$REAL_HOME/.claude/adam/scripts/adam-batch.mjs"
 ROLLBACK="$REAL_HOME/.claude/adam/scripts/adam-rollback.mjs"
+SKILLUTIL="$REAL_HOME/.claude/adam/scripts/adam-skill-utility.mjs"

 TMP_HOME="$(mktemp -d -t adam-test.XXXXXX)"
 trap 'rm -rf "$TMP_HOME"' EXIT INT TERM
@@ -37,6 +38,7 @@ APPLYREIN_RUN(){ HOME="$TMP_HOME" node "$APPLYREIN" "$@" --home "$TMP_HOME/.clau
 UPGRADE_RUN() { HOME="$TMP_HOME" node "$UPGRADE" "$@"; }
 BATCH_RUN()   { HOME="$TMP_HOME" node "$BATCH" "$@"; }
 ROLLBACK_RUN(){ HOME="$TMP_HOME" node "$ROLLBACK" "$@"; }
+SKILLUTIL_RUN(){ HOME="$TMP_HOME" node "$SKILLUTIL" "$@"; }

 PASS=0
 FAIL=0
@@ -2014,6 +2016,166 @@ done
 assert_grep    "$ROOT/journal.jsonl" '"type":"retry_loop"' "3x byte-identical reads emit retry_loop"
 assert_no_grep "$ROOT/journal.jsonl" '"type":"file_reread"' "byte-identical reads NOT double-counted as file_reread (sameToolArgs>=RETRY guard)"

+# --- Test 112: A/B volume normalization — busier journal does NOT fake a regression ---
+echo "Test 112: A/B volume-normalized (raw +200% but flat share → neutral)"
+reset_state
+applied_at_ms=$(node -e 'console.log(Date.now() - 14*86400000)')
+cat > "$ROOT/ab-tracking.jsonl" <<EOF
+{"applied_at":$applied_at_ms,"proposal_id":"ab-vol-001","proposal_type":"memory","target_skill":"vol","proposal_fingerprint":"fpV","originating_signals":[{"type":"correction","count":2,"session_ids":["sV"]}],"pre_window_days":7}
+EOF
+> "$ROOT/journal.jsonl"
+# pre window: 2 correction + 8 dead_end (rate 0.2)
+for i in 1 2; do ts=$(node -e "console.log(new Date(Date.now()-(15+$i*0.2)*86400000).toISOString())"); echo "{\"ts\":\"$ts\",\"session\":\"sV\",\"type\":\"correction\"}" >> "$ROOT/journal.jsonl"; done
+for i in 1 2 3 4 5 6 7 8; do ts=$(node -e "console.log(new Date(Date.now()-(15+$i*0.1)*86400000).toISOString())"); echo "{\"ts\":\"$ts\",\"session\":\"sV\",\"type\":\"dead_end\"}" >> "$ROOT/journal.jsonl"; done
+# post window: 6 correction + 24 dead_end (rate 0.2 — share unchanged, raw count +200%)
+for i in $(seq 1 6); do ts=$(node -e "console.log(new Date(Date.now()-(8+$i*0.1)*86400000).toISOString())"); echo "{\"ts\":\"$ts\",\"session\":\"sV\",\"type\":\"correction\"}" >> "$ROOT/journal.jsonl"; done
+for i in $(seq 1 24); do ts=$(node -e "console.log(new Date(Date.now()-(8+$i*0.05)*86400000).toISOString())"); echo "{\"ts\":\"$ts\",\"session\":\"sV\",\"type\":\"dead_end\"}" >> "$ROOT/journal.jsonl"; done
+out=$(ABMEASURE_RUN --format json 2>/dev/null)
+if echo "$out" | node -e 'let b="";process.stdin.on("data",d=>b+=d).on("end",()=>{const a=JSON.parse(b);const e=a.find(x=>x.proposal_id==="ab-vol-001");process.exit(e&&e.normalized===true&&e.raw_delta_pct===200&&e.status==="neutral"?0:1)})'; then
+  echo "  PASS: volume growth normalized → neutral (raw +200%)"; PASS=$((PASS+1))
+else
+  echo "  FAIL: volume normalization wrong (got: $out)"; FAIL=$((FAIL+1))
+fi
+rm -f "$ROOT/ab-tracking.jsonl"
+
+# --- Test 113: A/B genuine rate regression still flagged ---
+echo "Test 113: A/B genuine share increase → regressed"
+reset_state
+applied_at_ms=$(node -e 'console.log(Date.now() - 14*86400000)')
+cat > "$ROOT/ab-tracking.jsonl" <<EOF
+{"applied_at":$applied_at_ms,"proposal_id":"ab-vol-002","proposal_type":"memory","target_skill":"vol2","proposal_fingerprint":"fpV2","originating_signals":[{"type":"correction","count":2,"session_ids":["sV2"]}],"pre_window_days":7}
+EOF
+> "$ROOT/journal.jsonl"
+# pre: 2 correction + 8 dead_end (rate 0.2)
+for i in 1 2; do ts=$(node -e "console.log(new Date(Date.now()-(15+$i*0.2)*86400000).toISOString())"); echo "{\"ts\":\"$ts\",\"session\":\"sV2\",\"type\":\"correction\"}" >> "$ROOT/journal.jsonl"; done
+for i in 1 2 3 4 5 6 7 8; do ts=$(node -e "console.log(new Date(Date.now()-(15+$i*0.1)*86400000).toISOString())"); echo "{\"ts\":\"$ts\",\"session\":\"sV2\",\"type\":\"dead_end\"}" >> "$ROOT/journal.jsonl"; done
+# post: 6 correction + 6 dead_end (rate 0.5 — share up → genuine regression)
+for i in $(seq 1 6); do ts=$(node -e "console.log(new Date(Date.now()-(8+$i*0.1)*86400000).toISOString())"); echo "{\"ts\":\"$ts\",\"session\":\"sV2\",\"type\":\"correction\"}" >> "$ROOT/journal.jsonl"; done
+for i in 1 2 3 4 5 6; do ts=$(node -e "console.log(new Date(Date.now()-(8+$i*0.07)*86400000).toISOString())"); echo "{\"ts\":\"$ts\",\"session\":\"sV2\",\"type\":\"dead_end\"}" >> "$ROOT/journal.jsonl"; done
+out=$(ABMEASURE_RUN --format json 2>/dev/null)
+if echo "$out" | node -e 'let b="";process.stdin.on("data",d=>b+=d).on("end",()=>{const a=JSON.parse(b);const e=a.find(x=>x.proposal_id==="ab-vol-002");process.exit(e&&e.normalized===true&&e.status==="regressed"?0:1)})'; then
+  echo "  PASS: genuine share increase → regressed"; PASS=$((PASS+1))
+else
+  echo "  FAIL: genuine regression missed (got: $out)"; FAIL=$((FAIL+1))
+fi
+rm -f "$ROOT/ab-tracking.jsonl"
+
+# --- Test 114: update notifier nudges from cache when a newer release exists (no network) ---
+echo "Test 114: update notifier — cached newer release prints nudge"
+reset_state
+printf 'v0.6.2\n' > "$ROOT/.version"
+node -e "require('fs').writeFileSync('$ROOT/.update-check.json', JSON.stringify({last_check: Date.now(), latest: 'v9.9.9'}))"
+out=$(echo '{"hook_event_name":"SessionStart","session_id":"sUP"}' | NUDGE_RUN 2>/dev/null)
+if echo "$out" | grep -q "update available: v0.6.2 → v9.9.9"; then
+  echo "  PASS: update nudge printed from cache (offline)"; PASS=$((PASS+1))
+else
+  echo "  FAIL: expected update nudge (got: $out)"; FAIL=$((FAIL+1))
+fi
+rm -f "$ROOT/.version" "$ROOT/.update-check.json"
+
+# --- Test 115: update notifier silent when installed is current ---
+echo "Test 115: update notifier — up-to-date is silent"
+reset_state
+printf 'v9.9.9\n' > "$ROOT/.version"
+node -e "require('fs').writeFileSync('$ROOT/.update-check.json', JSON.stringify({last_check: Date.now(), latest: 'v9.9.9'}))"
+out=$(echo '{"hook_event_name":"SessionStart","session_id":"sUP2"}' | NUDGE_RUN 2>/dev/null)
+if echo "$out" | grep -q "update available"; then
+  echo "  FAIL: nudged despite being current (got: $out)"; FAIL=$((FAIL+1))
+else
+  echo "  PASS: no nudge when up-to-date"; PASS=$((PASS+1))
+fi
+rm -f "$ROOT/.version" "$ROOT/.update-check.json"
+
+# --- Test 116: ADAM_NO_UPDATE_CHECK disables the notifier ---
+echo "Test 116: ADAM_NO_UPDATE_CHECK opt-out"
+reset_state
+printf 'v0.6.2\n' > "$ROOT/.version"
+node -e "require('fs').writeFileSync('$ROOT/.update-check.json', JSON.stringify({last_check: Date.now(), latest: 'v9.9.9'}))"
+out=$(echo '{"hook_event_name":"SessionStart","session_id":"sUP3"}' | HOME="$TMP_HOME" ADAM_NO_UPDATE_CHECK=1 node "$NUDGE" 2>/dev/null)
+if echo "$out" | grep -q "update available"; then
+  echo "  FAIL: notifier ran despite opt-out (got: $out)"; FAIL=$((FAIL+1))
+else
+  echo "  PASS: ADAM_NO_UPDATE_CHECK suppressed the check"; PASS=$((PASS+1))
+fi
+rm -f "$ROOT/.version" "$ROOT/.update-check.json"
+
+# --- Test 117: no .version marker → notifier no-op (no crash) ---
+echo "Test 117: missing .version marker → notifier silent, hook still runs"
+reset_state
+node -e "require('fs').writeFileSync('$ROOT/.update-check.json', JSON.stringify({last_check: Date.now(), latest: 'v9.9.9'}))"
+out=$(echo '{"hook_event_name":"SessionStart","session_id":"sUP4"}' | NUDGE_RUN 2>/dev/null)
+if echo "$out" | grep -q "update available"; then
+  echo "  FAIL: nudged without a .version marker (got: $out)"; FAIL=$((FAIL+1))
+else
+  echo "  PASS: no marker → no update nudge"; PASS=$((PASS+1))
+fi
+rm -f "$ROOT/.update-check.json"
+
+# --- Test 118: rollback removes the proposal's ab-tracking entry (stops re-flagging) ---
+echo "Test 118: rollback purges ab-tracking entry by proposal_id"
+reset_state
+rm -f "$ROOT/proposals/"*rollback* "$ROOT/active-nudges.json"
+cat > "$ROOT/applied/2026-05-20T00-00-00Z-rb-ab-001.md" <<'EOF'
+---
+id: rb-ab-001
+type: memory
+target: ~/.claude/projects/-Users-nvm/memory/x.md
+confidence: 5
+blast_radius: low
+status: applied
+source_entries:
+  - "2026-05-18T10:00:00Z"
+---
+# Why
+test
+# Rollback
+```bash
+rm -f x
+```
+EOF
+cat > "$ROOT/ab-tracking.jsonl" <<'EOF'
+{"applied_at":1,"proposal_id":"rb-ab-001","proposal_type":"memory","target_skill":"x","proposal_fingerprint":"f1","originating_signals":[{"type":"correction","count":3}],"pre_window_days":7}
+{"applied_at":2,"proposal_id":"keep-me-002","proposal_type":"memory","target_skill":"y","proposal_fingerprint":"f2","originating_signals":[{"type":"correction","count":3}],"pre_window_days":7}
+EOF
+ROLLBACK_RUN --proposal-id rb-ab-001 --home "$TMP_HOME/.claude" >/dev/null 2>&1 || true
+if grep -q '"proposal_id":"rb-ab-001"' "$ROOT/ab-tracking.jsonl"; then
+  echo "  FAIL: rolled-back proposal still in ab-tracking.jsonl"; FAIL=$((FAIL+1))
+else
+  echo "  PASS: rolled-back proposal removed from ab-tracking.jsonl"; PASS=$((PASS+1))
+fi
+if grep -q '"proposal_id":"keep-me-002"' "$ROOT/ab-tracking.jsonl"; then
+  echo "  PASS: unrelated ab-tracking entry preserved"; PASS=$((PASS+1))
+else
+  echo "  FAIL: rollback clobbered an unrelated ab-tracking entry"; FAIL=$((FAIL+1))
+fi
+rm -f "$ROOT/proposals/"*rb-ab-001* "$ROOT/applied/"*rb-ab-001* "$ROOT/ab-tracking.jsonl" "$ROOT/active-nudges.json"
+
+# --- Test 119: adam-skill-utility ranks friction-correlated skills below baseline ---
+echo "Test 119: adam-skill-utility computes per-skill good:bad utility (execution-grounded Δ)"
+reset_state
+SU_INPUT="$TMP_HOME/su-input.jsonl"
+{
+  for i in 1 2 3 4 5; do echo "{\"ts\":\"2026-05-20T0$i:00:00Z\",\"session\":\"sSU\",\"type\":\"task_completed\",\"active_skills\":[\"goodskill\"]}"; done
+  for i in 1 2 3 4 5; do echo "{\"ts\":\"2026-05-20T1$i:00:00Z\",\"session\":\"sSU\",\"type\":\"dead_end\",\"count\":8,\"active_skills\":[\"badskill\"]}"; done
+} > "$SU_INPUT"
+su_out=$(SKILLUTIL_RUN --input "$SU_INPUT" --json --min 3 2>/dev/null)
+su_check=$(echo "$su_out" | node -e '
+let buf=""; process.stdin.on("data",d=>buf+=d).on("end",()=>{
+  try {
+    const p=JSON.parse(buf);
+    const bad=p.skills.find(s=>s.skill==="badskill");
+    const good=p.skills.find(s=>s.skill==="goodskill");
+    const ok = bad && good && bad.lift<0 && good.lift>0 && p.skills[0].skill==="badskill" && bad.neg===5 && good.pos===5;
+    console.log(ok?"ok":"bad:"+JSON.stringify({bad,good,first:p.skills[0]&&p.skills[0].skill}));
+  } catch(e){ console.log("parse-error:"+e.message); }
+});')
+if [ "$su_check" = "ok" ]; then
+  echo "  PASS: badskill below baseline + ranked worst-first, goodskill above"; PASS=$((PASS+1))
+else
+  echo "  FAIL: skill-utility ranking wrong ($su_check)"; FAIL=$((FAIL+1))
+fi
+rm -f "$SU_INPUT"
+
 echo
 echo "Results: $PASS passed, $FAIL failed"
 [ "$FAIL" = "0" ]
@@ -250,10 +250,12 @@ Required structure:

 ```markdown
 ---
-name: <human-readable name, ≤80 chars>
-description: <one-line description used to decide future relevance — be specific, ≤200 chars>
-type: user | feedback | project | reference
-originSessionId: <session_id from journal entries that fed this cluster>
+name: <slug — snake_case, MUST equal the target filename without `.md`, e.g. feedback_go_test_cache>
+description: "<one-line used to decide future relevance — be specific, ≤200 chars>"
+metadata:
+  node_type: memory
+  type: user | feedback | project | reference
+  originSessionId: <session_id from journal entries that fed this cluster>
 ---

 <Body content per type, see CLAUDE.md memory schema:
@@ -263,12 +265,17 @@ originSessionId: <session_id from journal entries that fed this cluster>
  - reference: pointer to external system + what's there.>
 ```

+The frontmatter MUST match the live auto-memory schema exactly: `name` is the
+slug (NOT a prose title), and `node_type`, `type`, `originSessionId` live under
+a `metadata:` block (verify against an existing file in the target memory dir
+before drafting — match its shape).
+
 Constraints:
- Frontmatter fields `name`, `description`, `type` are **required**. Skill enforces this at apply time.
- `originSessionId` is required — must be a `session` value from one of the cluster's journal entries.
+- Top-level `name` + `description` and nested `metadata.node_type` (always `memory`) + `metadata.type` are **required**. Skill enforces this at apply time.
+- `metadata.originSessionId` is required — must be a `session` value from one of the cluster's journal entries.
 - ≤50 LOC of body content. Surgical.
- Slug (used in `target` path filename) must not collide with any existing memory file.
- For `type=feedback` and `type=project`, body MUST contain `**Why:**` and `**How to apply:**` lines (CLAUDE.md memory schema).
+- `name`/slug (also the `target` path filename) must not collide with any existing memory file.
+- For `type: feedback` and `type: project`, body MUST contain `**Why:**` and `**How to apply:**` lines (CLAUDE.md memory schema).

 ## Diagnosis drafting protocol (required for every proposal)

@@ -509,7 +516,7 @@ MOSS's core thesis: "routing, hook ordering, state invariants, and dispatch live
 2. `cross_session_evidence == true` (≥5 occurrences across ≥3 sessions)
 3. `auto_apply_eligible: false` — **always**. Harness edits are never auto-applied.
 4. `blast_radius: high`
-5. Proposal includes a `# Test verification` section with the command `bash ~/.claude/adam/tests/run-tests.sh` and the expected result "132 passed, 0 failed" (or current pass count). The skill runs this test before applying.
+5. Proposal includes a `# Test verification` section with the command `bash ~/.claude/adam/tests/run-tests.sh` and the expected result "140 passed, 0 failed" (or current pass count). The skill runs this test before applying.
 6. Change is surgical: ≤30 LOC diff, single file.
 7. `# Diagnosis` reconstructs the causal chain from harness-level behavior (not from text-artifact behavior). The mismatch must name a specific code path (function, regex, threshold) in the target file.

@@ -1,9 +1,17 @@
 #!/usr/bin/env node
-// adam-nudge.mjs — SessionStart hook. Prints two kinds of reminders:
+// adam-nudge.mjs — SessionStart hook. Prints reminders:
 //   1. Pending proposals (≥3 queued in adam/proposals/).
 //   2. Cross-session nudges (entries in adam/active-nudges.json whose
 //      source_session differs from the current session and that haven't
 //      expired or exhausted their max_displays).
+//   3. Pending local-edit upgrades (`.adam-new` sidecars).
+//   4. New-release notice: if a newer GitHub release exists than the installed
+//      `.version`, print a notify-only one-line update prompt. Cached + checked
+//      at most once/day, network call hard-capped at 1.5s, fully best-effort —
+//      never blocks SessionStart. Opt out with ADAM_NO_UPDATE_CHECK=1.
+//      NOTE: notify-only by design — applying an update re-runs install.sh,
+//      which resets ADAM's own /reflect-applied skill edits. The user chooses
+//      when to accept that, so we never auto-install.
 import { readdirSync, readFileSync, writeFileSync, existsSync } from "node:fs";
 import { join } from "node:path";
 import { homedir } from "node:os";
@@ -14,7 +22,13 @@ const ADAM_ROOT = join(CLAUDE_ROOT, "adam");
 const PROPOSALS = join(ADAM_ROOT, "proposals");
 const NUDGES_FILE = join(ADAM_ROOT, "active-nudges.json");
 const STATE_FILE = join(ADAM_ROOT, "state.json");
+const VERSION_FILE = join(ADAM_ROOT, ".version");
+const UPDATE_CHECK_FILE = join(ADAM_ROOT, ".update-check.json");
 const THRESHOLD = 3;
+const UPDATE_CHECK_INTERVAL_MS = 24 * 60 * 60 * 1000;
+const UPDATE_FETCH_TIMEOUT_MS = 1500;
+const RELEASES_API = "https://api.github.com/repos/lukaszraczylo/claude-adam/releases/latest";
+const INSTALL_ONELINER = "curl -fsSL https://raw.githubusercontent.com/lukaszraczylo/claude-adam/main/install.sh | bash";

 // Known installable paths (mirrors install.sh copy_file list). Checking a
 // fixed shortlist keeps SessionStart latency under control vs full FS walk.
@@ -118,7 +132,75 @@ function emitPendingUpgrades() {
  } catch { /* never break SessionStart */ }
 }

-function main() {
+// --- update notifier (notify-only; see header note) ---
+
+function readVersion() {
+  try { return readFileSync(VERSION_FILE, "utf8").trim() || null; } catch { return null; }
+}
+
+// Parse "vX.Y.Z" (leading v optional; pre-release/build suffix ignored).
+function parseSemver(s) {
+  if (typeof s !== "string") return null;
+  const m = s.trim().replace(/^v/i, "").match(/^(\d+)\.(\d+)\.(\d+)/);
+  return m ? [Number(m[1]), Number(m[2]), Number(m[3])] : null;
+}
+
+// isNewer(a, b): true iff version a is strictly newer than b. Unparseable → false.
+function isNewer(a, b) {
+  const pa = parseSemver(a), pb = parseSemver(b);
+  if (!pa || !pb) return false;
+  for (let i = 0; i < 3; i++) { if (pa[i] !== pb[i]) return pa[i] > pb[i]; }
+  return false;
+}
+
+async function fetchLatestTag() {
+  // Best-effort, hard-capped. Any failure (offline / timeout / rate-limit /
+  // parse / fetch-unavailable) returns null and the caller silently skips.
+  try {
+    if (typeof fetch !== "function") return null;
+    const ctrl = new AbortController();
+    const timer = setTimeout(() => ctrl.abort(), UPDATE_FETCH_TIMEOUT_MS);
+    let tag = null;
+    try {
+      const res = await fetch(RELEASES_API, {
+        signal: ctrl.signal,
+        headers: { "User-Agent": "claude-adam-nudge", "Accept": "application/vnd.github+json" },
+      });
+      if (res && res.ok) {
+        const j = await res.json();
+        if (j && typeof j.tag_name === "string") tag = j.tag_name;
+      }
+    } finally { clearTimeout(timer); }
+    return tag;
+  } catch { return null; }
+}
+
+function printUpdateNudge(latest, installed) {
+  process.stdout.write(
+    `[adam] update available: ${installed} → ${latest}. Apply: ${INSTALL_ONELINER}\n` +
+    `       (re-runs install.sh — resets ADAM's own /reflect-applied skill edits; apply when you're ready)\n`
+  );
+}
+
+async function emitUpdateCheck() {
+  if (process.env.ADAM_NO_UPDATE_CHECK) return;      // explicit opt-out
+  const installed = readVersion();
+  if (!installed) return;                            // no marker → nothing to compare
+  const cache = readJson(UPDATE_CHECK_FILE, {}) || {};
+  const now = Date.now();
+  let nudged = false;
+  // Instant nudge from cache (no network).
+  if (cache.latest && isNewer(cache.latest, installed)) { printUpdateNudge(cache.latest, installed); nudged = true; }
+  // Refresh cache at most once/day, best-effort — drives the nudge on the NEXT run.
+  if (!cache.last_check || (now - Number(cache.last_check)) > UPDATE_CHECK_INTERVAL_MS) {
+    const latest = await fetchLatestTag();
+    const next = { last_check: now, latest: latest || cache.latest || null };
+    try { writeFileSync(UPDATE_CHECK_FILE, JSON.stringify(next)); } catch { /* swallow */ }
+    if (latest && !nudged && isNewer(latest, installed)) printUpdateNudge(latest, installed);
+  }
+}
+
+async function main() {
  const stdinSession = readSessionInput();
  const stateSession = (() => {
    const st = readJson(STATE_FILE, null);
@@ -128,7 +210,7 @@ function main() {
  emitProposalReminder();
  emitActiveNudges(currentSession);
  emitPendingUpgrades();
+  await emitUpdateCheck();
 }

-try { main(); } catch { /* never block SessionStart */ }
-process.exit(0);
+main().catch(() => { /* never block SessionStart */ }).finally(() => process.exit(0));
@@ -126,7 +126,8 @@ copy_file "$SRC/adam/scripts/adam-archive.mjs"                       "$DEST/adam
 copy_file "$SRC/adam/scripts/adam-upgrade.mjs"                       "$DEST/adam/scripts/adam-upgrade.mjs"
 # v0.3.3 helper scripts — invoked from SKILL.md / hooks / analyst flow
 for _adam_script in adam-utils adam-window adam-explain adam-nudge-eligibility adam-cooldown \
-                    adam-score adam-ab-measure adam-apply-reinforcement adam-batch adam-rollback; do
+                    adam-score adam-ab-measure adam-apply-reinforcement adam-batch adam-rollback \
+                    adam-skill-utility; do
  copy_file "$SRC/adam/scripts/${_adam_script}.mjs" \
            "$DEST/adam/scripts/${_adam_script}.mjs"
  run "chmod +x \"$DEST/adam/scripts/${_adam_script}.mjs\""
@@ -143,6 +144,20 @@ copy_file "$SRC/adam/tests/fixtures/seed-corrections.jsonl"          "$DEST/adam
 # install marker — used by future runs to detect local mtime drift
 run "touch \"$DEST/adam/.install-marker\""

+# version marker — records the installed release tag for the update notifier
+# (adam-nudge.mjs compares it against the latest GitHub release).
+ADAM_VERSION=""
+if [ -n "$VERSION" ]; then
+  ADAM_VERSION="$VERSION"
+elif [ "$PIPED" = 1 ] && [ -n "${REF:-}" ]; then
+  ADAM_VERSION="$REF"
+else
+  ADAM_VERSION="$(git -C "$SRC" describe --tags --abbrev=0 2>/dev/null || true)"
+fi
+[ -z "$ADAM_VERSION" ] && ADAM_VERSION="unknown"
+run "printf '%s\\n' \"$ADAM_VERSION\" > \"$DEST/adam/.version\""
+log "  version marker: $ADAM_VERSION"
+
 # --------------------------------------------------------------------- settings.json
 SETTINGS="$DEST/settings.json"
 EXAMPLE="$SRC/settings.json.example"
@@ -81,6 +81,14 @@ node ~/.claude/adam/scripts/adam-batch.mjs --input /tmp/adam-windowed-journal.js

 This groups entries by (signal_type, cluster_key) and reports per-batch metadata including `has_context_window` (whether transcript evidence is attached). If the script fails: log stderr, pass `null` to the analyst (graceful degradation — analyst falls back to raw journal clustering).

+**Skill utility** (execution-grounded selection signal, in the spirit of SkillsInjector arXiv 2605.29794 — utility Δ(s), not surface match): compute per-skill good:bad outcome ratios over the windowed journal:
+
+```bash
+node ~/.claude/adam/scripts/adam-skill-utility.mjs --input /tmp/adam-windowed-journal.jsonl --json > /tmp/adam-skill-utility.json 2> /tmp/adam-skill-utility.log
+```
+
+This ranks skills by how often they co-occur with positive (`task_completed`, `clean_recovery`, `correction_free_streak`) vs negative outcome events, surfacing skills below the baseline positive rate (with sufficient sample) — advisory candidates for description disambiguation or archival. **CO-OCCURRENCE, NOT CAUSATION**: display the worst 3 below-baseline skills (`lift < 0`, not low-sample) to the *user* as a one-line advisory before listing proposals (e.g. `skill-utility: chezmoi 9% pos n=85, ghostty-config 14% pos n=50, …`). Do NOT feed this into the analyst's proposal machinery or auto-draft skill-archival from it — the human decides. If the script fails: log stderr, skip (best-effort).
+
 ### 2. Dispatch the analyst (two-stage pipeline)

 MOSS §3.3: "A single prompt asked to diagnose, plan, implement, verify, and decide overloads context and produces lower-quality output than a sequenced flow." The analyst is dispatched in two stages with a validation gate between them.
@@ -300,7 +308,7 @@ Before writing any proposal:
 - For `skill_new`: confirm the slug doesn't collide with any existing skill in `~/.claude/skills/`. If it does, refuse and ask user to rename.
 - For `skill_edit`: confirm the diff is append-only (no `-` lines that remove existing content) and that target SKILL.md exists. When auto-applying, ALSO re-verify the eligibility gate steps in §3 (cooldown, blacklist, byte cap) before any `Edit` call — never trust frontmatter alone.
 - For `skill_edit` with `auto_apply_eligible: true`: confirm `contradiction_flag` is absent or null in frontmatter. Refuse auto-apply if `contradiction_flag` is set with any non-empty value (treat the agent's flag as a hard veto on auto-apply; user can still manually approve in walk-the-queue if they disagree with the heuristic).
- For `memory`: confirm `# Proposed change` body starts with `---` frontmatter containing required fields `name`, `description`, `type`, `originSessionId`. Refuse if frontmatter missing — agent must redraft per the Memory drafting protocol.
+- For `memory`: confirm `# Proposed change` body starts with `---` frontmatter matching the live auto-memory schema — top-level `name` (the slug) + `description`, plus a `metadata:` block with `node_type: memory`, `type`, and `originSessionId`. Cross-check the shape against an existing file in the target memory dir. Refuse if frontmatter is flat (`type:`/`originSessionId:` at top level) or missing the `metadata:` block — agent must redraft per the Memory drafting protocol.
 - For `harness_edit`: confirm `auto_apply_eligible: false` (never auto-apply). Confirm `confidence ≥ 5`. Confirm `# Test verification` section names the test command. Confirm diff is ≤30 LOC and targets a single allowed harness file (see `agents/adam.md` §"Harness self-modification"). Run test suite before AND after applying — revert on any regression.
 - Confirm `source_entries` is present in proposal frontmatter as a non-empty list (used for archive). Warn (do not refuse) if missing — legacy proposals from before v0.2.0 won't have it.
Author	SHA1	Message	Date
lukaszraczylo	4d1276a73f	feat(v0.6.5): execution-grounded skill-utility report (adam-skill-utility) Ranks skills by good:bad outcome co-occurrence (Wilson LB + lift vs baseline) over the journal's active_skills payloads — the SkillsInjector (arXiv 2605.29794) execution-grounded utility signal Δ(s), computed from data already collected, no training. - reuses adam-score NEGATIVE_SIGNAL_TYPES + entrySeverity (single source of truth) - registered in install.sh helper-script copy loop - /reflect pre-step surfaces worst below-baseline skills to the USER as advisory (co-occurrence != causation; not fed to the analyst's proposal machinery) - Test 119 added; full suite 141/141 green	2026-06-02 01:47:40 +01:00
lukaszraczylo	c23b09cc09	fix(v0.6.4): rollback removes the proposal's ab-tracking entry adam-rollback.mjs's docstring always claimed it "removes the ab-tracking entry (so it doesn't re-trigger)", but executeRollback() never did. Consequence: a rolled-back proposal kept being re-detected as `regressed` on every subsequent /reflect, which triggered endless `not_found` rollback attempts (the applied file is already gone) and noisy ## Regressions sections. executeRollback now deletes the matching ab-tracking.jsonl row by proposal_id after the move, preserving all unrelated rows. Surfaced by running ADAM's own /reflect loop a second time (two zombie regressions: 2026-05-16-002 and 2026-05-22-001). Tests: 138 -> 140 (rollback purges the entry by id; an unrelated entry is preserved).	2026-05-29 13:50:38 +01:00
lukaszraczylo	fcddb6bf79	feat(v0.6.3): release-update notifier (notify-only, SessionStart) Adds a lightweight "new release available" notice without auto-installing — because re-running install.sh overwrites ADAM's own /reflect-applied skill edits, so the user must choose when to take an update. - install.sh writes ~/.claude/adam/.version (the installed release tag) on every install. Derived from $VERSION / piped REF / `git describe --tags`. - adam-nudge.mjs (SessionStart) compares .version against the latest GitHub release at most once/day. Cached in ~/.claude/adam/.update-check.json; the cache drives an instant nudge (no network on the hot path) and is refreshed best-effort with a 1.5s AbortController cap. fetch unavailable / offline / timeout / rate-limit / parse error all degrade to silent no-op. Opt out with ADAM_NO_UPDATE_CHECK=1. main() is now async; never blocks SessionStart. - README: "Staying up to date" section; pin example bumped to v0.6.3. Tests: 134 -> 138. Notifier verified fully offline (cache-driven): nudges when a newer release is cached, silent when current, suppressed by the opt-out env, and no-ops when the .version marker is absent.	2026-05-29 13:13:59 +01:00
lukaszraczylo	d929101af4	fix(v0.6.2): A/B volume normalization + memory frontmatter schema Two issues surfaced by running ADAM's /reflect loop on a large real journal (4015 entries, 119 sessions) — both caused false/broken auto-apply behavior. 1. A/B over-reported regressions (adam-ab-measure.mjs). Regressions were measured on RAW originating-signal counts pre vs post. On a busy, growing journal almost every signal count rises post-apply regardless of whether the proposal helped — so the loop flagged 9 false "regressions" (and would auto-roll-back good proposals). Now the delta is computed on the signal's SHARE of total activity (rate = count / window-total). Falls back to the raw-count delta when the signal is the only activity in the window (preserves prior behavior + all existing A/B tests). Output adds raw_delta_pct, pre_total, post_total, normalized for transparency. 2. Memory frontmatter drift (agents/adam.md, SKILL.md). The drafting protocol emitted flat `type:`/`originSessionId:` with a prose `name`, but the live auto-memory store uses `name` = slug plus a `metadata: {node_type, type, originSessionId}` block. Auto-applied memories could fail to load/categorize. Protocol + apply-time validation now require the live metadata.* schema and cross-checking against an existing file. Tests: 132 -> 134. New: volume growth (raw +200%) with flat activity-share classifies neutral, not regressed; a genuine share increase still classifies regressed.	2026-05-29 12:37:10 +01:00