feat(v0.6.0): review hardening — live active_skills clustering, computable fingerprints

Full codebase review (multi-agent, adversarially verified) surfaced several documented-but-dead mechanisms and doc/code drift. Fixes: - adam-observe: struggle signals now emit `active_skills`, so silent_drift's primary cluster key AND §5b skill-attribution sub-clustering (+1 rubric bonus) actually fire — both were silently dead (no struggle signal carried the field). - adam-cooldown: new `--compute` CLI deterministically derives proposal_fingerprint. The exported computeProposalFingerprint() was never called and the analyst was told to hand-compute a djb2 hash it cannot reproduce. Spec now mandates a *stable* cluster id so fingerprints reproduce across /reflect runs. Removed one dead normalization line. - spec: reinforcement proposals excluded from A/B tracking — agents/adam.md contradicted itself (:376 included, :476 excluded); SKILL.md aligned. - adam-nudge: PENDING_CHECK_PATHS now mirrors the full install set (adam-utils / adam-batch / adam-rollback were missing). - adam-explain: synthesized clustering summary carries `regressions: 0` (structural consistency with parsed summaries). - docs: test-count drift (87/94 -> 126) and "350-line hook" (-> ~600) fixed; adam-score header documents severity_sum/severity_by_type; adam-batch §4 reference corrected. Tests: +12 assertions (114 -> 126), all green. New regression tests cover the active_skills fix and --compute, plus boundary gaps the review flagged: retry_loop/weak_agent thresholds, A/B exact +/-25% deltas, cooldown 30d blacklist edge.
2026-07-01 02:55:02 +00:00 · 2026-05-29 01:57:44 +01:00
parent 2d9257922f
commit 4b36d6c09e
10 changed files with 219 additions and 20 deletions
@@ -352,10 +352,18 @@ The cooldown gate is keyed on **(target_skill, proposal_fingerprint)** — not o
 `proposal_fingerprint` is computed deterministically as `djb2(skill_slug + "\n" + signal_cluster_id + "\n" + normalized_diff_body)` returned as base36, where:

 - `skill_slug` — target skill basename (or proposed slug for `skill_new`)
- `signal_cluster_id` — the cluster id you assigned in the clustering trace (e.g. `c1`, `tool_error_loop-ECONNREFUSED:5432`)
- `normalized_diff_body` — proposal's `# Proposed change` section with all whitespace collapsed to single spaces and trailing newlines stripped
+- `signal_cluster_id` — a **stable** cluster id derived from signal type + key (e.g. `tool_error_loop-ECONNREFUSED:5432`), NOT the ephemeral per-run trace id (`c1`). Stability matters: the same logical proposal must hash identically across `/reflect` runs or the cooldown can never match a prior applied/rejected record.
+- `normalized_diff_body` — proposal's `# Proposed change` section with all whitespace collapsed to single spaces and trimmed

-Both apply-time and analyst-time checks invoke `adam-cooldown.mjs --skill <slug> --fingerprint <hash>`. The script returns one of `{"status":"cool"}`, `{"status":"cooldown",...}`, or `{"status":"blacklisted",...}`. Auto-apply requires `cool`.
+Do NOT hand-compute the hash (an LLM cannot reproduce djb2 reliably). Run the canonical implementation (`computeProposalFingerprint()` in `adam-cooldown.mjs`) via Bash, then write the result into frontmatter:
+
+```bash
+node ~/.claude/adam/scripts/adam-cooldown.mjs --compute \
+  --skill <slug> --cluster <signal_cluster_id> --diff-file <file-with-Proposed-change-body>
+# → {"fingerprint":"<djb2_base36>"}   (diff body may also be piped on stdin)
+```
+
+Both apply-time and analyst-time *gate* checks then invoke `adam-cooldown.mjs --skill <slug> --fingerprint <hash>`. The script returns one of `{"status":"cool"}`, `{"status":"cooldown",...}`, or `{"status":"blacklisted",...}`. Auto-apply requires `cool`.

 Backward compat: proposals from before this rubric version (no `proposal_fingerprint` field) are treated as `fingerprint = "legacy"`. The cooldown script matches legacy applied/rejected records against any query fingerprint for the same skill — i.e. coarse-grained gating until those records age out of their windows (7d / 30d).

@@ -373,7 +381,7 @@ The skill (`adam-self-improvement/SKILL.md` §1) runs `adam-score.mjs` immediate

 ## A/B effectiveness

-Every auto-applied edit (`skill_edit`, `skill_new`, `memory`, `nudge`, `reinforcement`) gets a one-line tracking entry written to `~/.claude/adam/ab-tracking.jsonl` by `adam-self-improvement/SKILL.md` immediately after the proposal is moved to `applied/`. Schema:
+Every auto-applied edit (`skill_edit`, `skill_new`, `memory`, `nudge`) gets a one-line tracking entry written to `~/.claude/adam/ab-tracking.jsonl` by `adam-self-improvement/SKILL.md` immediately after the proposal is moved to `applied/`. **`reinforcement` is the one exception — it is a positive-only ledger and is intentionally NOT A/B-tracked (see §"`reinforcement` proposals"), to avoid skewing regression detection.** Schema:

 ```json
 {"applied_at":<ms>,"proposal_id":"<id>","proposal_type":"...","target_skill":"<slug>","proposal_fingerprint":"<hash>","originating_signals":[{"type":"<signal>","count":<N>,"session_ids":[...]}],"pre_window_days":7}
@@ -498,7 +506,7 @@ MOSS's core thesis: "routing, hook ordering, state invariants, and dispatch live
 2. `cross_session_evidence == true` (≥5 occurrences across ≥3 sessions)
 3. `auto_apply_eligible: false` — **always**. Harness edits are never auto-applied.
 4. `blast_radius: high`
-5. Proposal includes a `# Test verification` section with the command `bash ~/.claude/adam/tests/run-tests.sh` and the expected result "94 passed, 0 failed" (or current pass count). The skill runs this test before applying.
+5. Proposal includes a `# Test verification` section with the command `bash ~/.claude/adam/tests/run-tests.sh` and the expected result "126 passed, 0 failed" (or current pass count). The skill runs this test before applying.
 6. Change is surgical: ≤30 LOC diff, single file.
 7. `# Diagnosis` reconstructs the causal chain from harness-level behavior (not from text-artifact behavior). The mismatch must name a specific code path (function, regex, threshold) in the target file.

@@ -552,7 +560,7 @@ source_entries:
  - "<another ts>"
  - "..."
 # skill_edit / skill_new — required for cooldown gate (see "Per-(skill, fingerprint) cooldown" below)
-proposal_fingerprint: "<djb2_base36 hash — computed via computeProposalFingerprint() in adam-cooldown.mjs>"
+proposal_fingerprint: "<djb2_base36 hash — compute via `adam-cooldown.mjs --compute`; see §Per-(skill, fingerprint) cooldown>"
 target_skill: "<slug — populated for skill_edit (basename of target dir) and skill_new (proposed slug)>"
 # A/B effectiveness — required on every proposal; consumed at apply time to seed ab-tracking.jsonl
 originating_signals: