diff --git a/README.md b/README.md index 040335a..34d100f 100644 --- a/README.md +++ b/README.md @@ -13,7 +13,7 @@ Watches the friction in your coding sessions, clusters the signals via an LLM an [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE) [![Version](https://img.shields.io/github/v/release/lukaszraczylo/claude-adam?label=version&color=blue)](https://github.com/lukaszraczylo/claude-adam/releases) -[![Tests](https://img.shields.io/badge/tests-114%20passing-brightgreen.svg)](./adam/tests/run-tests.sh) +[![Tests](https://img.shields.io/badge/tests-126%20passing-brightgreen.svg)](./adam/tests/run-tests.sh) [![Node](https://img.shields.io/badge/node-22%2B-339933.svg)](https://nodejs.org) [![Platform](https://img.shields.io/badge/platform-macOS%20%7C%20Linux-lightgrey.svg)]() @@ -54,7 +54,7 @@ The installer copies files into `~/.claude/`, offers to merge ADAM's hook entrie Then: ```sh -bash ~/.claude/adam/tests/run-tests.sh # expect: 87 passed, 0 failed +bash ~/.claude/adam/tests/run-tests.sh # expect: 126 passed, 0 failed # … start a fresh Claude Code session … /reflect # walks the proposal queue /reflect --explain # also shows the analyst's clustering trace @@ -63,8 +63,8 @@ bash ~/.claude/adam/tests/run-tests.sh # expect: 87 passed, 0 failed Pin a release for reproducibility: ```sh -curl -fsSL https://raw.githubusercontent.com/lukaszraczylo/claude-adam/v0.5.0/install.sh \ - | VERSION=v0.5.0 bash +curl -fsSL https://raw.githubusercontent.com/lukaszraczylo/claude-adam/v0.6.0/install.sh \ + | VERSION=v0.6.0 bash ``` ## How it works @@ -114,7 +114,7 @@ flowchart TB class TRACE trace ``` -The observation layer is a 350-line Node hook. Pure regex, counters, ring buffers — no LLM in the hot path. Signals append one JSONL line per detection to `~/.claude/adam/journal.jsonl`. +The observation layer is a ~600-line Node hook. Pure regex, counters, ring buffers — no LLM in the hot path. Signals append one JSONL line per detection to `~/.claude/adam/journal.jsonl`. The analysis layer is an LLM subagent invoked by `/reflect`. Before the analyst runs, three deterministic pre-processors filter and enrich the journal: `adam-window.mjs` drops stale entries per per-signal age, `adam-score.mjs` computes per-session urgency dampeners + reinforcement candidates, and `adam-ab-measure.mjs` checks whether previously auto-applied edits actually reduced their originating signal. @@ -247,11 +247,12 @@ Or pass `--explain` to `/reflect` to render the full trace inline. │ ├── adam-apply-reinforcement.mjs # reinforcement proposal apply │ ├── adam-upgrade.mjs # .adam-new file UX (list/diff/accept) │ └── adam-archive.mjs # post-apply journal cleanup - └── tests/run-tests.sh # 87 isolated tests; never touches live state + └── tests/run-tests.sh # 126 isolated tests; never touches live state ``` ## What's new +- **v0.6.0** — review hardening. Struggle signals now emit `active_skills`, so `silent_drift`'s primary cluster key and the §5b skill-attribution sub-clustering (+1 rubric bonus) actually fire (both were silently dead). `proposal_fingerprint` is now deterministically computable via `adam-cooldown.mjs --compute` instead of asking the LLM analyst to hand-compute a djb2 hash; spec now mandates a *stable* cluster id so fingerprints reproduce across runs. `reinforcement` proposals are correctly excluded from A/B tracking (the spec previously contradicted itself). `adam-nudge.mjs` pending-upgrade check now mirrors the full install set (`adam-utils`/`adam-batch`/`adam-rollback` were missing). Doc/test-count drift corrected. 126 tests (up from 114). - **v0.5.0** — MOSS-grounded self-evolution (arXiv 2605.22794). Transcript capture: `context_window` field on struggle signals captures 8 surrounding events for evidence-based diagnosis. Two-stage analysis pipeline: diagnose+plan → inter-stage validation → implement (§3.3). Evidence batching via `adam-batch.mjs`: pre-clusters journal into coherent failure batches (§3.1). Pre-apply verification: 4-check deterministic gate before auto-apply (§3.4). Auto-rollback via `adam-rollback.mjs`: reverts regressed proposals detected by A/B measurement, creates regression nudges (§3.5). Harness self-modification: new `harness_edit` proposal type lets ADAM propose edits to its own scripts with test-suite-gated apply (§1 Table 1). Keypoint matrix: 5 capability dimensions scored per batch for structured evaluation (§4.2). 114 tests (up from 94). - **v0.4.0** — expanded struggle detection: `silent_drift` (5 consecutive read-only tools), `error_after_recovery` (same error fingerprint returns after clean recovery); severity-sum scoring with per-type divisors; extended `STRUGGLE_TYPES` set. 94 tests (up from 87). - **v0.3.3** — analyst observability, A/B measurement, journal hygiene. ISO-week journal rotation replaces 5MB size-based (fixes silent cluster-straddling under-count); per-signal sliding windows via `adam-window.mjs`; error fingerprint normalisation; correction corpus expanded + weak-token co-occurrence requirement (kills the `"actually, I think..."` false positive); mandatory clustering trace + `adam-explain.mjs`; new `nudge` and `reinforcement` proposal types; per-(skill, fingerprint) cooldown via `adam-cooldown.mjs`; `task_completed` scoring (dampener + reinforcement); A/B effectiveness measurement; upgrade UX overhaul (`adam-upgrade.mjs --list/--diff/--accept`); shared `adam-utils.mjs`. 87 tests (up from 30). diff --git a/adam/scripts/adam-batch.mjs b/adam/scripts/adam-batch.mjs index 46ded2c..66f6e04 100755 --- a/adam/scripts/adam-batch.mjs +++ b/adam/scripts/adam-batch.mjs @@ -4,7 +4,7 @@ // automatically curated batch of production-failure evidence." // // Each batch groups entries by (signal_type, cluster_key) where cluster_key -// follows the same clustering rules as agents/adam.md §4: +// follows the same clustering rules as agents/adam.md ## Signal types / ## Process step 4: // correction → tokenized phrase (cross-cwd) // retry_loop → tool // weak_agent → subagent_type diff --git a/adam/scripts/adam-cooldown.mjs b/adam/scripts/adam-cooldown.mjs index b046a60..a94ef78 100755 --- a/adam/scripts/adam-cooldown.mjs +++ b/adam/scripts/adam-cooldown.mjs @@ -4,8 +4,12 @@ // // CLI: // adam-cooldown.mjs --skill --fingerprint [--home ] +// adam-cooldown.mjs --compute --skill --cluster [--diff-file ] +// → prints {"fingerprint":""}; diff body read from --diff-file +// or stdin. This is how proposal_fingerprint is populated (the analyst +// runs it via Bash after drafting a proposal). // -// Output: JSON one-liner with shape +// Output (gate mode): JSON one-liner with shape // { "status": "cool"|"cooldown"|"blacklisted", // "reason": "", // "blocked_by": { "file": "", "days_remaining": } | null } @@ -33,12 +37,15 @@ const DAY_MS = 86400000; export const LEGACY_FINGERPRINT = "legacy"; function parseArgs(argv) { - const args = { home: null, skill: null, fingerprint: null, help: false }; + const args = { home: null, skill: null, fingerprint: null, compute: false, cluster: null, diffFile: null, help: false }; for (let i = 0; i < argv.length; i++) { const a = argv[i]; if (a === "--home" && i + 1 < argv.length) args.home = argv[++i]; else if (a === "--skill" && i + 1 < argv.length) args.skill = argv[++i]; else if (a === "--fingerprint" && i + 1 < argv.length) args.fingerprint = argv[++i]; + else if (a === "--cluster" && i + 1 < argv.length) args.cluster = argv[++i]; + else if (a === "--diff-file" && i + 1 < argv.length) args.diffFile = argv[++i]; + else if (a === "--compute") args.compute = true; else if (a === "--help" || a === "-h") args.help = true; } return args; @@ -158,9 +165,11 @@ export function computeProposalFingerprint(proposal) { if (!proposal || typeof proposal !== "object") return LEGACY_FINGERPRINT; const skill = proposal.skill_slug || proposal.target_skill || proposal.skill || ""; const cluster = proposal.signal_cluster_id || proposal.cluster_id || ""; + // normalized_diff_body: whitespace (incl. newlines) collapsed to single + // spaces, then trimmed. Matches agents/adam.md §"Per-(skill, fingerprint) + // cooldown". (No trailing-newline strip needed — \s+ already absorbed them.) const diff = String(proposal.diff_body || proposal.proposed_change || "") .replace(/\s+/g, " ") - .replace(/\n+$/g, "") .trim(); return djb2(`${skill}\n${cluster}\n${diff}`); } @@ -168,7 +177,28 @@ export function computeProposalFingerprint(proposal) { function main() { const args = parseArgs(process.argv.slice(2)); if (args.help) { - process.stdout.write("usage: adam-cooldown.mjs --skill --fingerprint [--home ]\n"); + process.stdout.write( + "usage: adam-cooldown.mjs --skill --fingerprint [--home ]\n" + + " adam-cooldown.mjs --compute --skill --cluster [--diff-file ]\n" + ); + process.exit(0); + } + // --compute: deterministically derive a proposal_fingerprint. The analyst + // invokes this (it has Bash) after drafting a proposal, then writes the + // result into proposal frontmatter so the cooldown gate keys on it. + if (args.compute) { + let diff = ""; + if (args.diffFile) { + try { diff = readFileSync(args.diffFile, "utf8"); } catch { /* empty → still deterministic */ } + } else { + try { diff = readFileSync(0, "utf8"); } catch { /* no stdin */ } + } + const fp = computeProposalFingerprint({ + skill_slug: args.skill || "", + signal_cluster_id: args.cluster || "", + diff_body: diff, + }); + process.stdout.write(JSON.stringify({ fingerprint: fp }) + "\n"); process.exit(0); } if (!args.skill || !args.fingerprint) { diff --git a/adam/scripts/adam-explain.mjs b/adam/scripts/adam-explain.mjs index 37cb1c6..c58fedf 100755 --- a/adam/scripts/adam-explain.mjs +++ b/adam/scripts/adam-explain.mjs @@ -135,6 +135,7 @@ export function parseTrace(text) { considered: clusters.length, emitted, skipped: clusters.length - emitted, + regressions: 0, reasons, }; } diff --git a/adam/scripts/adam-score.mjs b/adam/scripts/adam-score.mjs index d3d7514..2f53465 100755 --- a/adam/scripts/adam-score.mjs +++ b/adam/scripts/adam-score.mjs @@ -23,7 +23,8 @@ // Output: JSON object // { // "sessions": [ -// {"session_id": "...", "negative_count": N, "task_completed_count": M, "dampener": 1.0} +// {"session_id": "...", "negative_count": N, "task_completed_count": M, +// "severity_sum": S, "severity_by_type": {"": N, ...}, "dampener": 1.0} // ], // "reinforcement_candidates": [ // {"skill_slug": "tdd-loop", "count": 3, "recent_ts": "..."} diff --git a/adam/tests/run-tests.sh b/adam/tests/run-tests.sh index c7b2832..d3f0c28 100755 --- a/adam/tests/run-tests.sh +++ b/adam/tests/run-tests.sh @@ -71,6 +71,17 @@ assert_grep() { fi } +assert_no_grep() { + local file="$1" pattern="$2" name="$3" + if grep -qE "$pattern" "$file" 2>/dev/null; then + echo " FAIL: $name (pattern $pattern unexpectedly present in $file)" + FAIL=$((FAIL+1)) + else + echo " PASS: $name" + PASS=$((PASS+1)) + fi +} + # --- Test 1: correction signal --- echo "Test 1: user correction" reset_state @@ -1839,6 +1850,146 @@ else echo " FAIL: expected 8 context_window entries (got $cw_len)"; FAIL=$((FAIL+1)) fi +# --- Test 103: silent_drift carries active_skills (its primary cluster key) --- +echo "Test 103: silent_drift emits active_skills (§5b skill-attribution)" +reset_state +echo '{"hook_event_name":"PreToolUse","tool_name":"Skill","tool_input":{"skill":"tdd"},"session_id":"sSK","cwd":"/tmp/x"}' \ + | HOOK_RUN >/dev/null 2>&1 || true +for i in 1 2 3 4 5; do + echo "{\"hook_event_name\":\"PostToolUse\",\"tool_name\":\"Read\",\"tool_input\":{\"file_path\":\"/tmp/sk-$i\"},\"tool_response\":{\"content\":\"ok\"},\"session_id\":\"sSK\",\"cwd\":\"/tmp/x\"}" \ + | HOOK_RUN >/dev/null 2>&1 || true +done +assert_grep "$ROOT/journal.jsonl" '"type":"silent_drift"' "silent_drift emitted after 5 reads with skill active" +assert_grep "$ROOT/journal.jsonl" '"active_skills":\["tdd"\]' "silent_drift carries active_skills cluster key" + +# --- Test 104: retry_loop fires at threshold 3, not below --- +echo "Test 104: retry_loop boundary (2x no fire, 3x fires)" +reset_state +for i in 1 2; do + echo '{"hook_event_name":"PostToolUse","tool_name":"Bash","tool_input":{"command":"make"},"session_id":"sRT","cwd":"/tmp/x"}' \ + | HOOK_RUN >/dev/null 2>&1 || true +done +assert_no_grep "$ROOT/journal.jsonl" '"type":"retry_loop"' "2x same args does NOT emit retry_loop" +echo '{"hook_event_name":"PostToolUse","tool_name":"Bash","tool_input":{"command":"make"},"session_id":"sRT","cwd":"/tmp/x"}' \ + | HOOK_RUN >/dev/null 2>&1 || true +assert_grep "$ROOT/journal.jsonl" '"type":"retry_loop"' "3x same args emits retry_loop" + +# --- Test 105: weak_agent fires at 2 dispatches, not at 1 --- +echo "Test 105: weak_agent boundary (1x no fire, 2x fires)" +reset_state +echo '{"hook_event_name":"PostToolUse","tool_name":"Agent","tool_input":{"subagent_type":"explorer"},"session_id":"sWA","cwd":"/tmp/x"}' \ + | HOOK_RUN >/dev/null 2>&1 || true +assert_no_grep "$ROOT/journal.jsonl" '"type":"weak_agent"' "1x agent dispatch does NOT emit weak_agent" +echo '{"hook_event_name":"PostToolUse","tool_name":"Agent","tool_input":{"subagent_type":"explorer"},"session_id":"sWA","cwd":"/tmp/x"}' \ + | HOOK_RUN >/dev/null 2>&1 || true +assert_grep "$ROOT/journal.jsonl" '"type":"weak_agent"' "2x same agent in window emits weak_agent" + +# --- Test 106: adam-cooldown --compute deterministic + input-sensitive --- +echo "Test 106: adam-cooldown --compute fingerprint" +fp1=$(printf 'add section X' | COOLDOWN_RUN --compute --skill foo --cluster k1 2>/dev/null) +fp2=$(printf 'add section X' | COOLDOWN_RUN --compute --skill foo --cluster k1 2>/dev/null) +fp3=$(printf 'add section X' | COOLDOWN_RUN --compute --skill foo --cluster k2 2>/dev/null) +if [ -n "$fp1" ] && [ "$fp1" = "$fp2" ] && echo "$fp1" | grep -q '"fingerprint":'; then + echo " PASS: --compute deterministic for identical inputs"; PASS=$((PASS+1)) +else + echo " FAIL: --compute not deterministic (got '$fp1' vs '$fp2')"; FAIL=$((FAIL+1)) +fi +if [ "$fp1" != "$fp3" ]; then + echo " PASS: --compute sensitive to cluster id"; PASS=$((PASS+1)) +else + echo " FAIL: --compute ignored cluster id (both '$fp1')"; FAIL=$((FAIL+1)) +fi + +# --- Test 107: A/B boundary — exactly -25% delta → improved --- +echo "Test 107: A/B exact -25% boundary (4 pre / 3 post → improved)" +reset_state +applied_at_ms=$(node -e 'console.log(Date.now() - 14*86400000)') +cat > "$ROOT/ab-tracking.jsonl" < "$ROOT/journal.jsonl" +for i in 1 2 3 4; do + pre_ts=$(node -e "console.log(new Date(Date.now() - (15 + $i*0.3) * 86400000).toISOString())") + echo "{\"ts\":\"$pre_ts\",\"session\":\"sB1\",\"type\":\"correction\",\"phrase\":\"x\"}" >> "$ROOT/journal.jsonl" +done +for i in 1 2 3; do + post_ts=$(node -e "console.log(new Date(Date.now() - (8 + $i*0.3) * 86400000).toISOString())") + echo "{\"ts\":\"$post_ts\",\"session\":\"sB1\",\"type\":\"correction\",\"phrase\":\"y\"}" >> "$ROOT/journal.jsonl" +done +out=$(ABMEASURE_RUN --format json 2>/dev/null) +if echo "$out" | node -e 'let b="";process.stdin.on("data",d=>b+=d).on("end",()=>{const a=JSON.parse(b);const e=a.find(x=>x.proposal_id==="ab-b25-001");process.exit(e&&e.pre_count===4&&e.post_count===3&&e.delta_pct===-25&&e.status==="improved"?0:1)})'; then + echo " PASS: -25% boundary classified improved"; PASS=$((PASS+1)) +else + echo " FAIL: -25% boundary misclassified (got: $out)"; FAIL=$((FAIL+1)) +fi +rm -f "$ROOT/ab-tracking.jsonl" + +# --- Test 108: A/B boundary — exactly +25% delta → regressed --- +echo "Test 108: A/B exact +25% boundary (4 pre / 5 post → regressed)" +reset_state +applied_at_ms=$(node -e 'console.log(Date.now() - 14*86400000)') +cat > "$ROOT/ab-tracking.jsonl" < "$ROOT/journal.jsonl" +for i in 1 2 3 4; do + pre_ts=$(node -e "console.log(new Date(Date.now() - (15 + $i*0.3) * 86400000).toISOString())") + echo "{\"ts\":\"$pre_ts\",\"session\":\"sB2\",\"type\":\"correction\",\"phrase\":\"x\"}" >> "$ROOT/journal.jsonl" +done +for i in 1 2 3 4 5; do + post_ts=$(node -e "console.log(new Date(Date.now() - (8 + $i*0.3) * 86400000).toISOString())") + echo "{\"ts\":\"$post_ts\",\"session\":\"sB2\",\"type\":\"correction\",\"phrase\":\"y\"}" >> "$ROOT/journal.jsonl" +done +out=$(ABMEASURE_RUN --format json 2>/dev/null) +if echo "$out" | node -e 'let b="";process.stdin.on("data",d=>b+=d).on("end",()=>{const a=JSON.parse(b);const e=a.find(x=>x.proposal_id==="ab-b25-002");process.exit(e&&e.pre_count===4&&e.post_count===5&&e.delta_pct===25&&e.status==="regressed"?0:1)})'; then + echo " PASS: +25% boundary classified regressed"; PASS=$((PASS+1)) +else + echo " FAIL: +25% boundary misclassified (got: $out)"; FAIL=$((FAIL+1)) +fi +rm -f "$ROOT/ab-tracking.jsonl" + +# --- Test 109: cooldown blacklist 30d boundary (day 29 active, day 31 expired) --- +echo "Test 109: blacklist 30d boundary" +reset_state +ts29=$(node -e 'console.log(Date.now() - 29*86400000)') +cat > "$ROOT/rejected/2026-blk-29.md" </dev/null) +if echo "$out29" | grep -q '"status":"blacklisted"'; then + echo " PASS: day-29 blacklist still active"; PASS=$((PASS+1)) +else + echo " FAIL: day-29 should be blacklisted (got: $out29)"; FAIL=$((FAIL+1)) +fi +rm -f "$ROOT/rejected/2026-blk-29.md" +ts31=$(node -e 'console.log(Date.now() - 31*86400000)') +cat > "$ROOT/rejected/2026-blk-31.md" </dev/null) +if echo "$out31" | grep -q '"status":"cool"'; then + echo " PASS: day-31 blacklist expired → cool"; PASS=$((PASS+1)) +else + echo " FAIL: day-31 should be cool (got: $out31)"; FAIL=$((FAIL+1)) +fi +rm -f "$ROOT/rejected/2026-blk-31.md" + echo echo "Results: $PASS passed, $FAIL failed" [ "$FAIL" = "0" ] diff --git a/agents/adam.md b/agents/adam.md index 25d9c06..a106851 100644 --- a/agents/adam.md +++ b/agents/adam.md @@ -352,10 +352,18 @@ The cooldown gate is keyed on **(target_skill, proposal_fingerprint)** — not o `proposal_fingerprint` is computed deterministically as `djb2(skill_slug + "\n" + signal_cluster_id + "\n" + normalized_diff_body)` returned as base36, where: - `skill_slug` — target skill basename (or proposed slug for `skill_new`) -- `signal_cluster_id` — the cluster id you assigned in the clustering trace (e.g. `c1`, `tool_error_loop-ECONNREFUSED:5432`) -- `normalized_diff_body` — proposal's `# Proposed change` section with all whitespace collapsed to single spaces and trailing newlines stripped +- `signal_cluster_id` — a **stable** cluster id derived from signal type + key (e.g. `tool_error_loop-ECONNREFUSED:5432`), NOT the ephemeral per-run trace id (`c1`). Stability matters: the same logical proposal must hash identically across `/reflect` runs or the cooldown can never match a prior applied/rejected record. +- `normalized_diff_body` — proposal's `# Proposed change` section with all whitespace collapsed to single spaces and trimmed -Both apply-time and analyst-time checks invoke `adam-cooldown.mjs --skill --fingerprint `. The script returns one of `{"status":"cool"}`, `{"status":"cooldown",...}`, or `{"status":"blacklisted",...}`. Auto-apply requires `cool`. +Do NOT hand-compute the hash (an LLM cannot reproduce djb2 reliably). Run the canonical implementation (`computeProposalFingerprint()` in `adam-cooldown.mjs`) via Bash, then write the result into frontmatter: + +```bash +node ~/.claude/adam/scripts/adam-cooldown.mjs --compute \ + --skill --cluster --diff-file +# → {"fingerprint":""} (diff body may also be piped on stdin) +``` + +Both apply-time and analyst-time *gate* checks then invoke `adam-cooldown.mjs --skill --fingerprint `. The script returns one of `{"status":"cool"}`, `{"status":"cooldown",...}`, or `{"status":"blacklisted",...}`. Auto-apply requires `cool`. Backward compat: proposals from before this rubric version (no `proposal_fingerprint` field) are treated as `fingerprint = "legacy"`. The cooldown script matches legacy applied/rejected records against any query fingerprint for the same skill — i.e. coarse-grained gating until those records age out of their windows (7d / 30d). @@ -373,7 +381,7 @@ The skill (`adam-self-improvement/SKILL.md` §1) runs `adam-score.mjs` immediate ## A/B effectiveness -Every auto-applied edit (`skill_edit`, `skill_new`, `memory`, `nudge`, `reinforcement`) gets a one-line tracking entry written to `~/.claude/adam/ab-tracking.jsonl` by `adam-self-improvement/SKILL.md` immediately after the proposal is moved to `applied/`. Schema: +Every auto-applied edit (`skill_edit`, `skill_new`, `memory`, `nudge`) gets a one-line tracking entry written to `~/.claude/adam/ab-tracking.jsonl` by `adam-self-improvement/SKILL.md` immediately after the proposal is moved to `applied/`. **`reinforcement` is the one exception — it is a positive-only ledger and is intentionally NOT A/B-tracked (see §"`reinforcement` proposals"), to avoid skewing regression detection.** Schema: ```json {"applied_at":,"proposal_id":"","proposal_type":"...","target_skill":"","proposal_fingerprint":"","originating_signals":[{"type":"","count":,"session_ids":[...]}],"pre_window_days":7} @@ -498,7 +506,7 @@ MOSS's core thesis: "routing, hook ordering, state invariants, and dispatch live 2. `cross_session_evidence == true` (≥5 occurrences across ≥3 sessions) 3. `auto_apply_eligible: false` — **always**. Harness edits are never auto-applied. 4. `blast_radius: high` -5. Proposal includes a `# Test verification` section with the command `bash ~/.claude/adam/tests/run-tests.sh` and the expected result "94 passed, 0 failed" (or current pass count). The skill runs this test before applying. +5. Proposal includes a `# Test verification` section with the command `bash ~/.claude/adam/tests/run-tests.sh` and the expected result "126 passed, 0 failed" (or current pass count). The skill runs this test before applying. 6. Change is surgical: ≤30 LOC diff, single file. 7. `# Diagnosis` reconstructs the causal chain from harness-level behavior (not from text-artifact behavior). The mismatch must name a specific code path (function, regex, threshold) in the target file. @@ -552,7 +560,7 @@ source_entries: - "" - "..." # skill_edit / skill_new — required for cooldown gate (see "Per-(skill, fingerprint) cooldown" below) -proposal_fingerprint: "" +proposal_fingerprint: "" target_skill: "" # A/B effectiveness — required on every proposal; consumed at apply time to seed ab-tracking.jsonl originating_signals: diff --git a/hooks/adam-nudge.mjs b/hooks/adam-nudge.mjs index c7ef40a..bb20592 100755 --- a/hooks/adam-nudge.mjs +++ b/hooks/adam-nudge.mjs @@ -33,6 +33,9 @@ const PENDING_CHECK_PATHS = [ "adam/scripts/adam-score.mjs", "adam/scripts/adam-ab-measure.mjs", "adam/scripts/adam-apply-reinforcement.mjs", + "adam/scripts/adam-utils.mjs", + "adam/scripts/adam-batch.mjs", + "adam/scripts/adam-rollback.mjs", "adam/tests/run-tests.sh", ]; diff --git a/hooks/adam-observe.mjs b/hooks/adam-observe.mjs index 05e1dc9..24e2d84 100755 --- a/hooks/adam-observe.mjs +++ b/hooks/adam-observe.mjs @@ -447,6 +447,10 @@ function main() { const emit = (entry) => { if (STRUGGLE_TYPES.has(entry.type)) { entry.context_window = snapshotContext(state); + // Struggle signals carry the active skill set so the analyst can run + // skill-attribution sub-clustering (agents/adam.md §5b) and so silent_drift + // — whose primary cluster key IS active_skills[0] — clusters correctly. + if (entry.active_skills === undefined) entry.active_skills = activeNames(state, "skill"); struggleEmittedThisTurn = entry.type; } appendJournal(entry); diff --git a/skills/adam-self-improvement/SKILL.md b/skills/adam-self-improvement/SKILL.md index ac23f37..810471d 100644 --- a/skills/adam-self-improvement/SKILL.md +++ b/skills/adam-self-improvement/SKILL.md @@ -215,13 +215,13 @@ For each id that passed verification: 8. Add `last_auto_edit: ` to the proposal frontmatter before moving it. 9. Tell user: "skill `` extended (added lines) — auto-applied via win-evidence gate." - Move proposal to `~/.claude/adam/applied/-.md`. -- **A/B tracking append**: as a separate atomic step right after the move, append one JSON line to `~/.claude/adam/ab-tracking.jsonl` (create with empty contents if absent). Read fields from the proposal's frontmatter (`proposal_fingerprint`, `originating_signals` — both populated per `agents/adam.md`; `originating_signals` is a list of `{type, count, session_ids}` objects). Schema: +- **A/B tracking append** (skip for `reinforcement` — positive-only ledger, intentionally not A/B-tracked per `agents/adam.md` §"`reinforcement` proposals"): as a separate atomic step right after the move, append one JSON line to `~/.claude/adam/ab-tracking.jsonl` (create with empty contents if absent). Read fields from the proposal's frontmatter (`proposal_fingerprint`, `originating_signals` — both populated per `agents/adam.md`; `originating_signals` is a list of `{type, count, session_ids}` objects). Schema: ```json { "applied_at": , "proposal_id": "", - "proposal_type": "skill_edit|skill_new|memory|nudge|reinforcement", + "proposal_type": "skill_edit|skill_new|memory|nudge", "target_skill": "", "proposal_fingerprint": "", "originating_signals": [{"type":"","count":,"session_ids":[...]}],