13 Commits

Author SHA1 Message Date
lukaszraczylo 4d1276a73f feat(v0.6.5): execution-grounded skill-utility report (adam-skill-utility)
Ranks skills by good:bad outcome co-occurrence (Wilson LB + lift vs
baseline) over the journal's active_skills payloads — the SkillsInjector
(arXiv 2605.29794) execution-grounded utility signal Δ(s), computed from
data already collected, no training.

- reuses adam-score NEGATIVE_SIGNAL_TYPES + entrySeverity (single source of truth)
- registered in install.sh helper-script copy loop
- /reflect pre-step surfaces worst below-baseline skills to the USER as
  advisory (co-occurrence != causation; not fed to the analyst's proposal machinery)
- Test 119 added; full suite 141/141 green
2026-06-02 01:47:40 +01:00
lukaszraczylo c23b09cc09 fix(v0.6.4): rollback removes the proposal's ab-tracking entry
adam-rollback.mjs's docstring always claimed it "removes the ab-tracking entry
(so it doesn't re-trigger)", but executeRollback() never did. Consequence: a
rolled-back proposal kept being re-detected as `regressed` on every subsequent
/reflect, which triggered endless `not_found` rollback attempts (the applied
file is already gone) and noisy ## Regressions sections.

executeRollback now deletes the matching ab-tracking.jsonl row by proposal_id
after the move, preserving all unrelated rows. Surfaced by running ADAM's own
/reflect loop a second time (two zombie regressions: 2026-05-16-002 and
2026-05-22-001).

Tests: 138 -> 140 (rollback purges the entry by id; an unrelated entry is
preserved).
2026-05-29 13:50:38 +01:00
lukaszraczylo fcddb6bf79 feat(v0.6.3): release-update notifier (notify-only, SessionStart)
Adds a lightweight "new release available" notice without auto-installing —
because re-running install.sh overwrites ADAM's own /reflect-applied skill
edits, so the user must choose when to take an update.

- install.sh writes ~/.claude/adam/.version (the installed release tag) on
  every install. Derived from $VERSION / piped REF / `git describe --tags`.
- adam-nudge.mjs (SessionStart) compares .version against the latest GitHub
  release at most once/day. Cached in ~/.claude/adam/.update-check.json; the
  cache drives an instant nudge (no network on the hot path) and is refreshed
  best-effort with a 1.5s AbortController cap. fetch unavailable / offline /
  timeout / rate-limit / parse error all degrade to silent no-op. Opt out with
  ADAM_NO_UPDATE_CHECK=1. main() is now async; never blocks SessionStart.
- README: "Staying up to date" section; pin example bumped to v0.6.3.

Tests: 134 -> 138. Notifier verified fully offline (cache-driven): nudges when
a newer release is cached, silent when current, suppressed by the opt-out env,
and no-ops when the .version marker is absent.
2026-05-29 13:13:59 +01:00
lukaszraczylo d929101af4 fix(v0.6.2): A/B volume normalization + memory frontmatter schema
Two issues surfaced by running ADAM's /reflect loop on a large real journal
(4015 entries, 119 sessions) — both caused false/broken auto-apply behavior.

1. A/B over-reported regressions (adam-ab-measure.mjs).
   Regressions were measured on RAW originating-signal counts pre vs post. On a
   busy, growing journal almost every signal count rises post-apply regardless
   of whether the proposal helped — so the loop flagged 9 false "regressions"
   (and would auto-roll-back good proposals). Now the delta is computed on the
   signal's SHARE of total activity (rate = count / window-total). Falls back to
   the raw-count delta when the signal is the only activity in the window
   (preserves prior behavior + all existing A/B tests). Output adds
   raw_delta_pct, pre_total, post_total, normalized for transparency.

2. Memory frontmatter drift (agents/adam.md, SKILL.md).
   The drafting protocol emitted flat `type:`/`originSessionId:` with a prose
   `name`, but the live auto-memory store uses `name` = slug plus a
   `metadata: {node_type, type, originSessionId}` block. Auto-applied memories
   could fail to load/categorize. Protocol + apply-time validation now require
   the live metadata.* schema and cross-checking against an existing file.

Tests: 132 -> 134. New: volume growth (raw +200%) with flat activity-share
classifies neutral, not regressed; a genuine share increase still classifies
regressed.
2026-05-29 12:37:10 +01:00
lukaszraczylo 3a54d7d3e1 feat(v0.6.1): file_reread signal — catch offset-shifted same-file re-reads
Proposed and approved through ADAM's own /reflect harness_edit loop (MOSS §1):
the analyst surfaced 23 tool_error_loop entries across 4 sessions whose context
windows were really redundant re-reads of one file.

retry_loop keys on argsHash of the full tool_input (including offset/limit), so
consecutive Reads of the SAME file at different offsets escaped dedup and leaked
into tool_error_loop fingerprints. The new file_reread signal catches them:
same file Read >=3x in the 10-event window, offset-agnostic (keyed on file
path), guarded by `sameToolArgs < RETRY_THRESHOLD` so byte-identical reads stay
with retry_loop (no double-count).

Fully wired end-to-end (not a half-dead signal):
- adam-observe.mjs: detection + STRUGGLE_TYPES membership (so it carries
  context_window + active_skills like other struggle signals).
- adam-window.mjs: 14-day sliding window (task-local, like retry_loop).
- adam-score.mjs: severity divisor 3.
- adam-batch.mjs: file-basename clustering.
- agents/adam.md + README: signal tables, clustering rules, rubric, windows.

Tests: 126 -> 132 (file_reread fires on 3x offset-shifted reads, not on 2x;
byte-identical reads route to retry_loop not file_reread; carries context_window).
2026-05-29 11:31:50 +01:00
lukaszraczylo 4b36d6c09e feat(v0.6.0): review hardening — live active_skills clustering, computable fingerprints
Full codebase review (multi-agent, adversarially verified) surfaced several
documented-but-dead mechanisms and doc/code drift. Fixes:

- adam-observe: struggle signals now emit `active_skills`, so silent_drift's
  primary cluster key AND §5b skill-attribution sub-clustering (+1 rubric
  bonus) actually fire — both were silently dead (no struggle signal carried
  the field).
- adam-cooldown: new `--compute` CLI deterministically derives
  proposal_fingerprint. The exported computeProposalFingerprint() was never
  called and the analyst was told to hand-compute a djb2 hash it cannot
  reproduce. Spec now mandates a *stable* cluster id so fingerprints reproduce
  across /reflect runs. Removed one dead normalization line.
- spec: reinforcement proposals excluded from A/B tracking — agents/adam.md
  contradicted itself (:376 included, :476 excluded); SKILL.md aligned.
- adam-nudge: PENDING_CHECK_PATHS now mirrors the full install set
  (adam-utils / adam-batch / adam-rollback were missing).
- adam-explain: synthesized clustering summary carries `regressions: 0`
  (structural consistency with parsed summaries).
- docs: test-count drift (87/94 -> 126) and "350-line hook" (-> ~600) fixed;
  adam-score header documents severity_sum/severity_by_type; adam-batch §4
  reference corrected.

Tests: +12 assertions (114 -> 126), all green. New regression tests cover the
active_skills fix and --compute, plus boundary gaps the review flagged:
retry_loop/weak_agent thresholds, A/B exact +/-25% deltas, cooldown 30d
blacklist edge.
2026-05-29 01:57:44 +01:00
lukaszraczylo 2d9257922f docs: update README for v0.5.0 release — MOSS-grounded improvements 2026-05-24 11:19:15 +01:00
lukaszraczylo 440fb52eb1 feat: apply MOSS-grounded self-evolution improvements to ADAM
Implements 7 improvements grounded in MOSS paper (arXiv 2605.22794):

1. Transcript capture (§3.4): context_ring buffer in adam-observe.mjs
   captures last 8 events around struggle signals as context_window.

2. Evidence batching (§3.1): new adam-batch.mjs pre-clusters windowed
   journal entries into coherent failure batches by (signal_type, cluster_key).

3. Multi-stage analysis (§3.3): SKILL.md dispatches adam agent in two
   stages (diagnose+plan → implement) with inter-stage validation gate.

4. Pre-apply verification (§3.4): 4-check deterministic gate before
   auto-apply (source entries exist, diagnosis grounded, type-evidence
   match, no conflicting recent proposals).

5. Auto-rollback (§3.5): new adam-rollback.mjs reverts regressed proposals
   detected by A/B measurement, creates regression nudges.

6. Harness self-modification (§1 Table 1): new harness_edit proposal type
   targeting adam's own scripts with stricter gates (confidence≥5, never
   auto-apply, test-suite-gated).

7. Keypoint matrix evaluation (§4.2): 5 capability dimensions
   (tool_selection, scope_discipline, error_recovery, first_attempt,
   build_reliability) scored per batch for structured evaluation.

Test suite: 94 → 114 tests (20 new), all passing.
2026-05-24 11:15:32 +01:00
lukaszraczylo a48c705c0a feat(adam): smarter signals & clustering
- New signal types in hooks/adam-observe.mjs:
  - silent_drift: 5 consecutive read-only PostToolUse without an action tool
  - error_after_recovery: same error fingerprint returns within 5 events of clean_recovery
- Severity-weighted scoring in adam/scripts/adam-score.mjs:
  - SEVERITY_DIVISORS exported per struggle signal type
  - Per-session severity_sum + severity_by_type added to JSON output
- Skill-attribution clustering in agents/adam.md:
  - Sub-cluster struggle signals on active_skills[0]
  - New struggle-driven skill_edit variant (always queues, never auto-applies)
- Rubric updates:
  - +1 for cluster severity-sum >= 10, additional +1 for >= 32
  - +1 for skill-attributed sub-cluster naming an existing skill
  - silent_drift + error_after_recovery added to struggle signal list
- Window: silent_drift 14d, error_after_recovery 30d
- Tests: 94 passing (78-82 new)

Backward compat: entries without count default to severity 1. Existing
win-driven skill_edit gate untouched. No journal migration.
2026-05-13 19:21:59 +01:00
lukaszraczylo a8883aa8b7 fix(logo): explicit light/dark variants + <picture> for GitHub
The prior logo.svg used currentColor, which resolves to black when the
SVG is loaded via <img> on GitHub — making the logo invisible in dark
mode (the GitHub default for many users).

Fix uses GitHub's supported <picture> + prefers-color-scheme media-
source pattern in README:

- assets/logo-light.svg — explicit GitHub light-theme text color #24292f
- assets/logo-dark.svg  — explicit GitHub dark-theme text color #f0f6fc
- assets/logo.svg       — kept with embedded @media + currentColor for
                          standalone use (markmorph notes, anywhere
                          else the SVG is loaded outside <picture>)

README updates the <img> tag to a <picture> with media-conditioned
source so GitHub's renderer picks the right variant per theme.
2026-05-13 02:07:11 +01:00
lukaszraczylo 7ed2aecdfa docs(logo): swap to swaddled-baby design with hands
Replaces the geometric-A-with-observation-dot with a softer, more
on-theme design: a swaddled-baby silhouette (rounded A-shape bundle),
face nestled inside, and the wrap-band extended past the bundle on
both sides as little hands. Maintains currentColor + zero external
assets; reads cleanly down to favicon size.

Ties the visual identity to the 'Story behind Adam' section: the
project is named after the author's son, and now the logo is too.
2026-05-13 02:02:02 +01:00
lukaszraczylo a30f8b1158 docs: replace ASCII pipeline diagram with mermaid flowchart
GitHub renders mermaid natively. Diagram now shows three subgraphs
(Observation → Analysis → Review + apply) with a nested Pre-processors
subgraph inside Analysis. Includes:

- Dotted edge labeled 'user runs /reflect' marking the observe→analyze
  boundary.
- Diamond gate node for auto-apply decision (conf≥4 · low blast ·
  cooldown cool) with explicit yes/no branches.
- Feedback loop: applied/ entries measure back into adam-ab-measure.mjs
  on subsequent reflects.
- Color-coded classDef for stores (blue), processes (orange), and the
  clustering trace artifact (purple).

ASCII art retired — diagram now legible at any zoom on github.com.
2026-05-13 01:54:38 +01:00
lukaszraczylo d3e4350d71 docs: modernize README + add SVG logo + inspiration story
- New 'Story behind Adam' section at the top: the project is named after
  the author's newborn son, whose observe-act-adjust-observe-again
  learning loop is the methodology ADAM applies to LLM sessions.
- New SVG logo at assets/logo.svg: stylized 'A' with a captured
  observation point inside the apex and a feedback crossbar. Uses
  currentColor + gradient so it adapts to light/dark GitHub themes.
- Centered header block with project tagline + 5 badges (License,
  Version, Tests, Node, Platform).
- New 'Highlights' section: 8 emoji-tagged one-liners covering the
  v0.3.3 design pillars (zero LLM cost observation, A/B measurement,
  sliding windows, observability, etc.).
- New 'How it works' ASCII pipeline diagram: observation -> analysis
  pre-processors -> analyst -> review + apply.
- Signals table now includes per-signal sliding window column.
- Rubric section restructured: gates, modifiers (dampener), and
  skill_edit-specific requirements clearly separated.
- New 'Inspecting the analyst's reasoning' section documenting
  adam-explain.mjs + /reflect --explain.
- Layout updated for v0.3.3 state files (active-nudges.json,
  ab-tracking.jsonl, reinforcements.jsonl, last-trace.txt) and all
  9 new helper scripts under adam/scripts/.
- Test count: 27 -> 87.
- Closing line crediting Adam.
2026-05-13 01:50:59 +01:00
18 changed files with 2464 additions and 186 deletions
+281 -128
View File
@@ -1,159 +1,298 @@
<div align="center">
<picture>
<source media="(prefers-color-scheme: dark)" srcset="./assets/logo-dark.svg">
<img src="./assets/logo-light.svg" alt="claude-adam logo" width="128" height="128" />
</picture>
# claude-adam # claude-adam
Self-improvement layer for [Claude Code](https://claude.com/claude-code) that observes friction signals during your sessions and proposes targeted improvements (new skills, memory entries, agent edits) which you can review and apply. **A self-improvement layer for [Claude Code](https://claude.com/claude-code).**
## What's new Watches the friction in your coding sessions, clusters the signals via an LLM analyst, and proposes targeted improvements — new skills, memory entries, agent edits — that you review and apply.
- **v0.3.3** — analyst observability, A/B measurement, journal hygiene. Storage/window/exclusion split: ISO-week journal rotation with safety fuse (replaces size-based, fixes silent under-counting); per-signal sliding windows via new `adam-window.mjs` (`dead_end` 7d, `correction` 30d, reinforcement signals 60d). Error fingerprint normalization — `ECONNREFUSED` and `"Connection refused"` cluster identically. Correction corpus expanded (`wait`, `hold on`, `try again`, `different approach`); weak tokens (`no`, `actually`, `wait`) require negation co-occurrence within 8 tokens to fire — kills the `"actually, I think..."` false positive. Mandatory clustering trace + new `adam-explain.mjs --mode summary|full|json`. New `nudge` proposal type (single-session auto-apply, low blast) for repeated `dead_end`. Per-(skill, fingerprint) cooldown via `adam-cooldown.mjs` (replaces coarse per-skill gate). `task_completed` scoring: urgency dampener + reinforcement candidates. A/B effectiveness measurement on auto-applied edits (`adam-ab-measure.mjs`, 7d pre/post window). Upgrade UX overhaul: `adam-upgrade.mjs --list/--diff/--accept` + SessionStart pending-merge warning. Shared helper module `adam-utils.mjs` deduplicates journal-reading and frontmatter parsing across scripts. 87 tests (up from 30). [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
- **v0.3.2** — `task_completed` signal: post-task skill capture for downstream reinforcement scoring (consumed in v0.3.3). [![Version](https://img.shields.io/github/v/release/lukaszraczylo/claude-adam?label=version&color=blue)](https://github.com/lukaszraczylo/claude-adam/releases)
- **v0.3.1** — code review pass: bug fixes (`errorFingerprint` no longer false-positives on `is_error: false`, archive script handles same-millisecond duplicates correctly, `tool_window` now clears on session change, nudge filters proposal filenames by pattern), prose conciseness cuts, hardened `install.sh` with curl one-liner + settings.json merge, `adam-uninstall.sh`, isolated test harness (no longer pollutes live `~/.claude/adam/` state). [![Tests](https://img.shields.io/badge/tests-140%20passing-brightgreen.svg)](./adam/tests/run-tests.sh)
- **v0.3.0** — causal diagnosis: every proposal carries a `# Diagnosis` block (Trigger/Action/Mismatch/Outcome with verbatim transcript quote) before drafting, plus optional `contradiction_flag` heuristic that vetoes auto-apply on obviously-conflicting `skill_edit` additions. [![Node](https://img.shields.io/badge/node-22%2B-339933.svg)](https://nodejs.org)
- **v0.2.1** — win signals (`correction_free_streak`, `clean_recovery`) feed `skill_edit` auto-apply under a strict gate (≤30 LOC, ≤2× byte cap, 7d cooldown, 30d blacklist on rejection). [![Platform](https://img.shields.io/badge/platform-macOS%20%7C%20Linux-lightgrey.svg)]()
- **v0.2.0** — actioned-entry archival via `adam-archive.mjs`; `cursor` field deprecated.
## What it does </div>
A lightweight Node.js hook (`adam-observe.mjs`) runs on `UserPromptSubmit`, `PreToolUse`, and `PostToolUse` events. It detects: ---
| Signal | Trigger | ## The story behind Adam
|---|---|
| `correction` | User prompt contains "no", "stop", "wrong", "actually", etc. after a tool call | Adam is my newborn son.
| `retry_loop` | Same tool + same args called 3× in a 10-event window |
| `weak_agent` | Same subagent dispatched 2× in last 5 tool calls | Watching him over the last few months — the way he observes the world, tries something, watches what happens, adjusts, and tries again — I realised that the most powerful learning loop in nature is also one of the simplest. No grand theory. No instruction manual. Just relentless feedback and pattern recognition, applied to every waking moment.
| `tool_error_loop` | Same error fingerprint appears 3× in a 5-event ring |
| `dead_end` | 8 PostToolUse events without a UserPromptSubmit between them | LLMs can learn the same way. Give them a hook into the real friction of your work — the corrections, the dead-ends, the moments you say *"no, try again"* — and let them propose improvements grounded in **what actually happened**. Not what they assume might help. What you actually struggled with.
| `edit_churn` | Same file edited 4× in a window |
| `build_loop` | 2× build/test/compile commands fail in same session | **claude-adam** is that loop, wired into Claude Code. It's named after Adam because the methodology is his.
| `subagent_dispatch_pattern` | Same subagent dispatched ≥3× cumulatively |
| `correction_free_streak` | 5 clean UserPromptSubmits in a row (no correction phrase) — feeds `skill_edit` reinforcement | ---
| `clean_recovery` | 3 clean PostToolUse events after a struggle signal — feeds `skill_edit` reinforcement |
## Highlights
- 🔍 **Zero LLM cost at observation time.** Deterministic regex + counter detection in a Node hook. The analyst only runs when you invoke `/reflect`.
- 📡 **11 signal types.** Friction (`correction`, `tool_error_loop`, `dead_end`, `edit_churn`, …) + reinforcement (`task_completed`, `correction_free_streak`, `clean_recovery`) + meta.
- 🛡️ **Tight auto-apply gates.** Confidence ≥ 4, cross-session evidence, contradiction veto, per-(skill, fingerprint) cooldown. Most things queue for your manual review.
- 📊 **A/B effectiveness measurement.** Every auto-applied edit gets a 7-day pre/post signal-count delta. If a proposed fix made things worse, the next `/reflect` says so.
-**Per-signal sliding windows.** Stale friction doesn't accumulate forever. `dead_end` 7d, `correction` 30d, reinforcement signals 60d.
- 🔬 **Observable.** Every clustering decision (passed / threshold-blocked / window-filtered / contradiction-vetoed) emits a trace. `/reflect --explain` shows it.
- 📦 **Pure Node.** Zero npm dependencies. Runs on macOS and Linux (Alpine smoke-tested).
## Quick start
```sh
curl -fsSL https://raw.githubusercontent.com/lukaszraczylo/claude-adam/main/install.sh | bash
```
The installer copies files into `~/.claude/`, offers to merge ADAM's hook entries into `~/.claude/settings.json` (with a diff preview and `[y/N]` confirm), and preserves any local edits via `.adam-new` sidecar files. Pass `--yes` to skip prompts, `--dry-run` to preview.
Then:
```sh
bash ~/.claude/adam/tests/run-tests.sh # expect: 140 passed, 0 failed
# … start a fresh Claude Code session …
/reflect # walks the proposal queue
/reflect --explain # also shows the analyst's clustering trace
```
Pin a release for reproducibility:
```sh
curl -fsSL https://raw.githubusercontent.com/lukaszraczylo/claude-adam/v0.6.3/install.sh \
| VERSION=v0.6.3 bash
```
### Staying up to date
`install.sh` records the installed release in `~/.claude/adam/.version`. The
SessionStart hook (`adam-nudge.mjs`) then checks the latest GitHub release **at
most once a day** (cached in `~/.claude/adam/.update-check.json`, network call
hard-capped at 1.5 s, fully best-effort — it never blocks or slows session
start). When a newer release exists it prints a one-line, **notify-only** prompt:
```
[adam] update available: v0.6.3 → v0.6.4. Apply: curl -fsSL …/install.sh | bash
(re-runs install.sh — resets ADAM's own /reflect-applied skill edits; apply when you're ready)
```
It is deliberately **not** auto-applied: re-running `install.sh` overwrites
ADAM's own `/reflect`-applied skill edits, so you decide when to take an update.
Disable the check entirely with `ADAM_NO_UPDATE_CHECK=1` in your environment.
## How it works
```mermaid
flowchart TB
subgraph OBS["Observation (deterministic, in-hook, zero LLM cost)"]
direction LR
EV["Tool event /<br/>user prompt"] --> OBSERVE["adam-observe.mjs<br/><sub>regex · counters · ring buffers</sub>"]
OBSERVE --> JOURNAL[("journal.jsonl<br/><sub>append-only signal log</sub>")]
end
JOURNAL -. user runs <code>/reflect</code> .-> ANALYSIS
subgraph ANALYSIS["Analysis (LLM, only on demand)"]
direction TB
subgraph PRE["Pre-processors (deterministic)"]
direction LR
W["adam-window.mjs<br/><sub>per-signal sliding window</sub>"]
S["adam-score.mjs<br/><sub>task_completed dampener<br/>+ reinforcement candidates</sub>"]
AB["adam-ab-measure.mjs<br/><sub>7d pre/post deltas<br/>on prior auto-applies</sub>"]
end
AGENT["adam subagent<br/><sub>cluster · score · diagnose</sub>"]
PRE --> AGENT
AGENT --> PROPOSALS[("proposals/")]
AGENT --> TRACE[["clustering trace<br/><sub>adam-explain.mjs renders</sub>"]]
end
PROPOSALS --> REVIEW
subgraph REVIEW["Review + apply"]
direction TB
GATE{"auto-apply<br/>gates pass?<br/><sub>conf≥4 · low blast<br/>· cooldown cool</sub>"}
GATE -->|yes| APPLIED[("applied/<br/>+ ab-tracking.jsonl")]
GATE -->|no| QUEUE["walk-the-queue<br/><sub>approve · reject · edit</sub>"]
QUEUE -->|approve| APPLIED
QUEUE -->|reject| REJECTED[("rejected/")]
end
APPLIED -. measures back into .-> AB
classDef store fill:#e8f4fd,stroke:#5b9bd5,stroke-width:2px,color:#1f3a5f
classDef proc fill:#fff4e6,stroke:#e8a33d,stroke-width:1px,color:#5a3d0f
classDef trace fill:#f0e8fd,stroke:#7e5dc0,stroke-width:1px,color:#2f1e60
class JOURNAL,PROPOSALS,APPLIED,REJECTED store
class EV,OBSERVE,W,S,AB,AGENT,QUEUE proc
class TRACE trace
```
The observation layer is a ~600-line Node hook. Pure regex, counters, ring buffers — no LLM in the hot path. Signals append one JSONL line per detection to `~/.claude/adam/journal.jsonl`.
The analysis layer is an LLM subagent invoked by `/reflect`. Before the analyst runs, three deterministic pre-processors filter and enrich the journal: `adam-window.mjs` drops stale entries per per-signal age, `adam-score.mjs` computes per-session urgency dampeners + reinforcement candidates, and `adam-ab-measure.mjs` checks whether previously auto-applied edits actually reduced their originating signal.
The analyst clusters signals, scores them against a deterministic rubric (see below), and emits proposal markdown files to `~/.claude/adam/proposals/`. Each proposal carries a `# Diagnosis` block (Trigger / Action / Mismatch / Outcome with a verbatim transcript quote), a `# Success criterion`, and the source journal-entry timestamps it clustered.
Auto-apply runs only for low-blast types (memory entries, new skills, ephemeral nudges, reinforcement logs) backed by cross-session evidence. Everything else queues for your manual approve / reject / edit walk.
## Signals
| Signal | Trigger | Window* |
|---|---|---|
| `correction` | Strong tokens (`stop`, `wrong`, `undo`, …) OR weak tokens (`no`, `actually`, `wait`) with negation/contrast nearby | 30d |
| `retry_loop` | Same tool + same args called 3× in a 10-event window | 14d |
| `weak_agent` | Same subagent dispatched 2× in last 5 tool calls | 30d |
| `tool_error_loop` | Same error fingerprint 3× in a 5-event ring (fingerprints normalised — `ECONNREFUSED` and `"Connection refused"` cluster) | 30d |
| `dead_end` | 8 PostToolUse events without a UserPromptSubmit between them | 7d |
| `edit_churn` | Same file edited 4× in a window | 14d |
| `file_reread` | Same file Read ≥3× in the 10-event window, ignoring offset/limit (catches re-reads that escape `retry_loop`'s arg-hash dedup) | 14d |
| `build_loop` | 2× build/test/compile commands fail in same session | 30d |
| `subagent_dispatch_pattern` | Same subagent dispatched ≥ 3× cumulatively | 30d |
| `correction_free_streak` | 5 clean UserPromptSubmits in a row — reinforcement input | 60d |
| `clean_recovery` | 3 clean PostToolUse events after a struggle signal — reinforcement input | 60d |
| `task_completed` | 5 tools / 3 kinds / 0 corrections — fed into the urgency dampener + reinforcement candidates | 60d |
\* Per-signal sliding window for `/reflect` analysis. See `SIGNAL_WINDOWS_DAYS` in `adam/scripts/adam-window.mjs`.
Detection is local, regex-based, zero LLM cost. Signals append to `~/.claude/adam/journal.jsonl`. Detection is local, regex-based, zero LLM cost. Signals append to `~/.claude/adam/journal.jsonl`.
When you run `/reflect`, the `adam` subagent reads the journal, clusters signals, scores them against a deterministic rubric, and emits proposal files to `~/.claude/adam/proposals/`. Auto-applied proposals only ship for low-blast types (memory, new skills) backed by cross-session evidence; everything else queues for your manual approve/reject/edit walk. ## Auto-apply rubric
## Why ```
Sum:
+2 Signal repeated ≥ 3× across ≥ 2 sessions (within signal's window)
+2 Struggle signal appearing ≥ 1× within a single session (does not stack)
+2 Transcript contains positive endorsement near related action
+1 Multi-axis cluster (≥ 2 distinct struggle types in same session)
-1 Type-bias penalty (≥ 3 rejections, applied:rejected < 1:2)
+1 Blast radius low (memory or new isolated skill)
0 Blast radius medium (new agent, new hook, edit existing skill)
-1 Blast radius high (CLAUDE.md, settings hooks, edit agent, deletion)
+1 Surgical (one file, ≤ 50 LOC for non-skill_new; ≤ 80 LOC for skill_new)
-3 Touches deny-list (settings.json hooks/permissions, CLAUDE.md, deletions)
```
LLM coding sessions reveal repeated friction the moment you stop and look. ADAM looks so you don't have to. Modifiers applied at scoring time:
- × `dampener` from `adam-score.mjs` (0.5 / 0.75 / 1.0 based on session's `task_completed` count) — sessions that net-succeeded score lower urgency.
`auto_apply_eligible` requires **all** of:
- `confidence ≥ 4`
- `blast_radius == low`
- `type ∈ {memory, skill_new, nudge, reinforcement}` (or `skill_edit` via the win-driven gate)
- `cross_session_evidence == true` (except `nudge`, which is single-session by design)
- `adam-cooldown.mjs` returns `cool` for `(target_skill, proposal_fingerprint)`
- `contradiction_flag` unset
`skill_edit` additionally requires:
- Win-signal evidence (`correction_free_streak` / `clean_recovery` cites target skill)
- Diff is append-only, ≤ 30 LOC, resulting size ≤ 2× original
- No auto-edit to same target in past 7 days (per-fingerprint cooldown)
- No rejection-blacklist on target in past 30 days
- `# Diagnosis` section present + structurally valid
Everything else queues.
## Lifecycle: from signal to permanent improvement
Every proposal records the journal entry timestamps that fed its cluster (`source_entries` in frontmatter). When you apply or reject a proposal, the skill calls `adam-archive.mjs` which moves matching entries from `journal.jsonl` to `journal/actioned-<id>.jsonl`. The result:
- `journal.jsonl` stays bounded by **active** observations only.
- The next `/reflect` reads `applied/` + `rejected/` frontmatter, builds an excluded-timestamps set, and skips any leftover journal entries that were already actioned.
- Rule changes (e.g. lowering a threshold) immediately re-evaluate the remaining active observations — no manual cursor rewind needed.
Auto-applied proposals additionally append to `~/.claude/adam/ab-tracking.jsonl`. The next time `/reflect` runs (and 7+ days have passed), `adam-ab-measure.mjs` computes a pre/post delta of the originating signal count. Status: `improved` / `neutral` / `regressed` / `no_baseline` / `pending`. Regressions surface at the top of the analyst's output so a bad fix doesn't quietly persist.
## Inspecting the analyst's reasoning
Every `/reflect` run also writes the analyst's clustering trace to `~/.claude/adam/last-trace.txt`. The trace records, per cluster: signal type, occurrence count, sessions, which gates passed or failed, and whether the cluster produced a proposal or was skipped (with reason: `threshold` / `cross_session` / `window` / `contradiction` / `other`).
```sh
node ~/.claude/adam/scripts/adam-explain.mjs --mode summary # SUMMARY + per-decision counts
node ~/.claude/adam/scripts/adam-explain.mjs --mode full # verbatim trace + rejection histogram
node ~/.claude/adam/scripts/adam-explain.mjs --mode json # machine-readable
```
Or pass `--explain` to `/reflect` to render the full trace inline.
## What it will not do
- 🚫 No background LLM spend. The analyst runs only when you invoke `/reflect`.
- 🚫 No retroactive transcript mining beyond the journal.
- 🚫 No hard `rm` of any artifact. Deletions are soft (`mv` to `trash/<ts>/`).
- 🚫 No autonomous edits to `CLAUDE.md`, agents, hooks, or `settings.json` — these always queue for review regardless of confidence.
- 🚫 No proposal that matches a previously-rejected idea (≥ 2 token overlap with rejection's `# Why`).
- 🚫 No invented trigger phrases for new skills — every trigger comes from observed user input.
## Layout ## Layout
``` ```
~/.claude/ ~/.claude/
├── hooks/ ├── hooks/
│ ├── adam-observe.mjs # signal collector │ ├── adam-observe.mjs # signal collector (UserPromptSubmit / PreToolUse / PostToolUse)
│ └── adam-nudge.mjs # SessionStart reminder when ≥3 proposals queued │ └── adam-nudge.mjs # SessionStart reminder + pending-upgrade warning
├── agents/adam.md # analyst subagent (system prompt + rubric) ├── agents/adam.md # analyst subagent (system prompt + rubric)
├── skills/adam-self-improvement/SKILL.md # /reflect protocol ├── skills/adam-self-improvement/
├── commands/reflect.md # /reflect slash command │ └── SKILL.md # /reflect protocol
├── commands/reflect.md # /reflect slash command
└── adam/ └── adam/
├── journal.jsonl # append-only signal log (active observations) ├── journal.jsonl # active observations
├── journal/ # rotated daily logs + actioned-<id>.jsonl per applied/rejected proposal ├── journal/ # rotated weekly (YYYY-Www.jsonl) + actioned-<id>.jsonl
├── state.json # per-session counters ├── state.json # per-session counters
├── usage.json # skill/agent invocation tallies + payload visibility counters ├── usage.json # invocation tallies + visibility metrics
├── proposals/ # queued, awaiting review ├── active-nudges.json # ephemeral SessionStart reminders (auto-expire)
├── applied/ # approved + auto-applied archive ├── ab-tracking.jsonl # one entry per auto-apply, drives effectiveness measurement
├── rejected/ # rejected (with reason) ├── reinforcements.jsonl # appended on reinforcement proposal apply
├── trash/ # soft-deleted artifacts (recoverable) ├── last-trace.txt # most recent analyst clustering trace
├── scripts/ # adam-archive.mjs (called by skill on apply/reject) ├── proposals/ # queued, awaiting review
── tests/run-tests.sh # 27 verification tests (isolated tmpdir; never touches live state) ── applied/ # approved + auto-applied archive
├── rejected/ # rejected with reason
├── trash/ # soft-deleted artifacts (recoverable)
├── scripts/
│ ├── adam-utils.mjs # shared journal-reading + frontmatter parsing
│ ├── adam-window.mjs # per-signal sliding-window filter
│ ├── adam-score.mjs # urgency dampener + reinforcement candidates
│ ├── adam-ab-measure.mjs # 7d pre/post delta per auto-applied edit
│ ├── adam-cooldown.mjs # per-(skill, fingerprint) cooldown gate
│ ├── adam-nudge-eligibility.mjs # dead_end session-count check
│ ├── adam-explain.mjs # clustering trace parser/renderer
│ ├── adam-apply-reinforcement.mjs # reinforcement proposal apply
│ ├── adam-upgrade.mjs # .adam-new file UX (list/diff/accept)
│ └── adam-archive.mjs # post-apply journal cleanup
└── tests/run-tests.sh # 140 isolated tests; never touches live state
``` ```
## Install ## What's new
### One-liner (recommended) - **v0.6.4** — rollback now keeps its promise. `adam-rollback.mjs`'s docstring always claimed it "removes the ab-tracking entry (so it doesn't re-trigger)," but `executeRollback()` never did — so a rolled-back proposal kept flagging as `regressed` on every subsequent `/reflect`, triggering endless `not_found` rollback attempts. It now deletes the matching `ab-tracking.jsonl` row by `proposal_id` (preserving unrelated rows). Surfaced by running ADAM's own loop twice. 140 tests (up from 138).
- **v0.6.3** — release-update notifier. `install.sh` now writes a `~/.claude/adam/.version` marker; `adam-nudge.mjs` (SessionStart) compares it against the latest GitHub release at most once/day (cached, 1.5 s network cap, best-effort — never blocks) and prints a **notify-only** one-line update prompt. Deliberately not auto-applied: re-running the installer resets ADAM's own `/reflect`-applied skill edits, so you choose when to update. Opt out with `ADAM_NO_UPDATE_CHECK=1`. See "Staying up to date". 138 tests (up from 134).
```sh - **v0.6.2** — two fixes surfaced by running ADAM's loop on a large real journal. **(1) A/B volume normalization** (`adam-ab-measure.mjs`): regressions are now measured on the signal's *share* of total activity (rate = count / window-total), not raw count — so a generally busier journal after an apply no longer masquerades as a regression. Falls back to raw delta when the signal is the only activity in the window (preserves prior behavior + tests); output adds `raw_delta_pct`, `pre_total`, `post_total`, `normalized` for transparency. **(2) Memory frontmatter schema** (`agents/adam.md`, `SKILL.md`): the drafting protocol now emits the live auto-memory shape — `name` = slug + a `metadata: {node_type, type, originSessionId}` block — instead of flat `type:`/`originSessionId:`, so auto-applied memories load and categorize correctly. 134 tests (up from 132).
curl -fsSL https://raw.githubusercontent.com/lukaszraczylo/claude-adam/main/install.sh | bash - **v0.6.1** — new `file_reread` signal (MOSS §1 harness self-modification, proposed and approved through ADAM's own `/reflect` loop). Consecutive Reads of the same file at different `offset`/`limit` escaped `retry_loop`'s arg-hash dedup and leaked into `tool_error_loop`; `file_reread` now catches them (same file ≥3× in the 10-event window, offset-agnostic, guarded against double-counting byte-identical reads). Fully wired: detection (`adam-observe.mjs`), 14-day window (`adam-window.mjs`), severity divisor 3 (`adam-score.mjs`), file-basename clustering (`adam-batch.mjs`), and the analyst rubric/spec. 132 tests (up from 126).
``` - **v0.6.0** — review hardening. Struggle signals now emit `active_skills`, so `silent_drift`'s primary cluster key and the §5b skill-attribution sub-clustering (+1 rubric bonus) actually fire (both were silently dead). `proposal_fingerprint` is now deterministically computable via `adam-cooldown.mjs --compute` instead of asking the LLM analyst to hand-compute a djb2 hash; spec now mandates a *stable* cluster id so fingerprints reproduce across runs. `reinforcement` proposals are correctly excluded from A/B tracking (the spec previously contradicted itself). `adam-nudge.mjs` pending-upgrade check now mirrors the full install set (`adam-utils`/`adam-batch`/`adam-rollback` were missing). Doc/test-count drift corrected. 126 tests (up from 114).
- **v0.5.0** — MOSS-grounded self-evolution (arXiv 2605.22794). Transcript capture: `context_window` field on struggle signals captures 8 surrounding events for evidence-based diagnosis. Two-stage analysis pipeline: diagnose+plan → inter-stage validation → implement (§3.3). Evidence batching via `adam-batch.mjs`: pre-clusters journal into coherent failure batches (§3.1). Pre-apply verification: 4-check deterministic gate before auto-apply (§3.4). Auto-rollback via `adam-rollback.mjs`: reverts regressed proposals detected by A/B measurement, creates regression nudges (§3.5). Harness self-modification: new `harness_edit` proposal type lets ADAM propose edits to its own scripts with test-suite-gated apply (§1 Table 1). Keypoint matrix: 5 capability dimensions scored per batch for structured evaluation (§4.2). 114 tests (up from 94).
Pin a release for reproducibility: - **v0.4.0** — expanded struggle detection: `silent_drift` (5 consecutive read-only tools), `error_after_recovery` (same error fingerprint returns after clean recovery); severity-sum scoring with per-type divisors; extended `STRUGGLE_TYPES` set. 94 tests (up from 87).
- **v0.3.3** — analyst observability, A/B measurement, journal hygiene. ISO-week journal rotation replaces 5MB size-based (fixes silent cluster-straddling under-count); per-signal sliding windows via `adam-window.mjs`; error fingerprint normalisation; correction corpus expanded + weak-token co-occurrence requirement (kills the `"actually, I think..."` false positive); mandatory clustering trace + `adam-explain.mjs`; new `nudge` and `reinforcement` proposal types; per-(skill, fingerprint) cooldown via `adam-cooldown.mjs`; `task_completed` scoring (dampener + reinforcement); A/B effectiveness measurement; upgrade UX overhaul (`adam-upgrade.mjs --list/--diff/--accept`); shared `adam-utils.mjs`. 87 tests (up from 30).
```sh - **v0.3.2** — `task_completed` signal: post-task skill capture for downstream reinforcement scoring (consumed in v0.3.3).
curl -fsSL https://raw.githubusercontent.com/lukaszraczylo/claude-adam/v0.3.1/install.sh \ - **v0.3.1** — code review pass: bug fixes (`errorFingerprint` no longer false-positives on `is_error: false`, archive script handles same-millisecond duplicates correctly, `tool_window` clears on session change, nudge filters proposal filenames by pattern), prose conciseness cuts, hardened `install.sh` with curl one-liner + settings.json merge, `adam-uninstall.sh`, isolated test harness.
| VERSION=v0.3.1 bash - **v0.3.0** — causal diagnosis: every proposal carries a `# Diagnosis` block (Trigger/Action/Mismatch/Outcome with verbatim transcript quote), plus `contradiction_flag` heuristic that vetoes auto-apply on obviously-conflicting `skill_edit` additions.
``` - **v0.2.1** — win signals (`correction_free_streak`, `clean_recovery`) feed `skill_edit` auto-apply under a strict gate (≤ 30 LOC, ≤ 2× byte cap, 7d cooldown, 30d blacklist).
- **v0.2.0** — actioned-entry archival via `adam-archive.mjs`; `cursor` field deprecated.
The installer clones the repo to `/tmp`, copies files into `~/.claude/`, and offers to merge ADAM's hook entries into your `~/.claude/settings.json` (with a diff preview and `[y/N]` confirmation — your existing hooks are preserved). Pass `--yes` to skip the prompt; `--dry-run` to preview without writing.
Requires `git`, `curl`, `jq`, and `node` 18+.
### From a clone
```sh
git clone https://github.com/lukaszraczylo/claude-adam
cd claude-adam
./install.sh
```
### Upgrade-safe
These files are **never overwritten** if they already exist:
- `~/.claude/adam/journal.jsonl` — your observation log
- `~/.claude/adam/state.json` — session counters
- `~/.claude/adam/usage.json` — invocation tallies
If you've locally edited any installed file (e.g. `agents/adam.md`), the installer writes the new version to `<file>.adam-new` and warns you instead of clobbering.
After install: run `bash ~/.claude/adam/tests/run-tests.sh` to verify (expect `27 passed, 0 failed`), start a fresh Claude Code session, then run `/reflect`.
## Requirements ## Requirements
- Claude Code v2.1.0+ (for auto skill hot-reload; older versions need session restart after `skill_new` proposals are applied) - **Claude Code v2.1.0+** — for auto skill hot-reload (older versions need a session restart after `skill_new` proposals).
- Node.js 18+ (for the hook; tested on v22) - **Node.js 18+** — tested on v22, used by the hook + helper scripts. Zero npm dependencies.
- Bash 4+, `git`, `curl`, `jq` (for installer + test harness) - **Bash 4+**, `git`, `curl`, `jq` for the installer + test harness.
### Platform support ### Platform support
Tested on **macOS** (Darwin / BSD coreutils) and **Linux** (Alpine, glibc + musl). The install / uninstall / test scripts are written to be portable: `stat` uses BSD `-f` with GNU `-c` fallback, `mktemp -d -t prefix.XXXXXX` works on both, no GNU-only flags. CI smoke verified `27 passed, 0 failed` under `alpine:latest`. Tested on **macOS** (Darwin / BSD coreutils) and **Linux** (Alpine, glibc + musl). The install / uninstall / test scripts are written to be portable: `stat` uses BSD `-f` with GNU `-c` fallback, `mktemp -d -t prefix.XXXXXX` works on both, no GNU-only flags. CI smoke verified under `alpine:latest`.
## Confidence rubric
```
Sum:
+2 Signal repeated ≥3× across ≥2 sessions
+2 Struggle signal appearing ≥1× within a single session (does not stack)
+2 Transcript contains positive endorsement near related action
+1 Multi-axis cluster (≥2 distinct struggle types in same session)
-1 Type-bias penalty (≥3 rejections, applied:rejected <1:2)
+1 Blast radius low (memory or new isolated skill)
0 Blast radius medium (new agent, new hook, edit existing skill)
-1 Blast radius high (CLAUDE.md, settings hooks, edit agent, deletion)
+1 Surgical (one file, ≤50 LOC for non-skill_new; ≤80 LOC for skill_new)
-3 Touches deny-list (settings.json hooks/permissions, CLAUDE.md, deletions)
auto_apply_eligible requires ALL:
confidence ≥ 4
blast_radius == low
type ∈ {memory, skill_new, skill_edit} # skill_edit also passes the win-driven gate
cross_session_evidence == true (single-session-only proposals always queue)
skill_edit additionally requires (v0.2.1+):
win-signal evidence (correction_free_streak / clean_recovery cites target skill)
diff is append-only, ≤30 LOC, resulting size ≤2× original
no auto-edit to same target in past 7 days (cooldown)
no rejection-blacklist on target in past 30 days
contradiction heuristic does not flag (v0.3.0+)
# Diagnosis section present + structurally valid (v0.3.0+)
```
## Lifecycle: how proposals become permanent
Every proposal records the journal entry timestamps that fed its cluster (`source_entries` in frontmatter). When you apply or reject a proposal, the skill calls `adam/scripts/adam-archive.mjs` which moves matching entries from `journal.jsonl` to `journal/actioned-<id>.jsonl`. Effects:
- The `journal.jsonl` stays bounded by **active** observations only.
- The next `/reflect` reads applied/ + rejected/ frontmatter, builds an excluded-timestamps set, and skips any leftover journal entries that were already actioned.
- Rule changes (e.g. lowering a threshold) immediately re-evaluate the remaining active observations — no manual cursor rewind needed.
## What it will not do
- No background LLM spend. The analyst runs only when you invoke `/reflect`.
- No retroactive transcript mining beyond the journal cursor.
- No hard `rm` of any artifact. Deletions are soft (`mv` to `trash/<ts>/`).
- No autonomous edits to `CLAUDE.md`, agents, hooks, or `settings.json` — these always queue for review regardless of confidence.
- No proposal that matches a previously-rejected idea (≥2 token overlap with rejection's `# Why`).
- No invented trigger phrases for new skills — every trigger comes from observed user input.
## Uninstall ## Uninstall
@@ -175,6 +314,20 @@ rm -rf ~/.claude/skills/adam-self-improvement
Then remove the four `adam-*` hook entries from `~/.claude/settings.json`. Then remove the four `adam-*` hook entries from `~/.claude/settings.json`.
## Contributing
Issues and PRs welcome — especially additional signal types, transcript-aware diagnosis improvements, and platform fixes. Run the test suite before opening a PR:
```sh
bash ~/.claude/adam/tests/run-tests.sh
```
## License ## License
[MIT](LICENSE) — © 2026 Lukasz Raczylo [MIT](LICENSE) — © 2026 Lukasz Raczylo
---
<div align="center">
<sub>Named after my son Adam, who taught me that observation is the start of every interesting thing.</sub>
</div>
+44 -7
View File
@@ -3,11 +3,19 @@
// //
// Reads ~/.claude/adam/ab-tracking.jsonl (one line per auto-apply event, // Reads ~/.claude/adam/ab-tracking.jsonl (one line per auto-apply event,
// written by adam-self-improvement/SKILL.md), then for each entry old enough // written by adam-self-improvement/SKILL.md), then for each entry old enough
// (>= --min-age-days; default 7) compares signal counts in the 7-day window // (>= --min-age-days; default 7) compares the originating signal in the 7-day
// BEFORE applied_at against the 7-day window AFTER applied_at across the // window BEFORE applied_at against the 7-day window AFTER applied_at across the
// full journal corpus (active + rotated). Surfaces regressions so /reflect // full journal corpus (active + rotated). Surfaces regressions so /reflect
// can flag proposals that made things worse. // can flag proposals that made things worse.
// //
// Volume normalization: when the windows contain other (non-originating)
// activity, the delta is computed on the signal's SHARE of total activity
// (rate = count / total), not its raw count — so a generally busier journal
// after apply does not masquerade as a regression. When the signal is the only
// activity in the windows, it falls back to the raw-count delta. Output carries
// both `delta_pct` (drives status) and `raw_delta_pct` + `normalized` for
// transparency.
//
// CLI: // CLI:
// adam-ab-measure.mjs [--home <path>] [--format json|table] [--min-age-days N] // adam-ab-measure.mjs [--home <path>] [--format json|table] [--min-age-days N]
// //
@@ -92,31 +100,60 @@ export function computeDeltas(entries, journal, opts = {}) {
const preStart = appliedAt - windowDays * DAY_MS; const preStart = appliedAt - windowDays * DAY_MS;
const postEnd = appliedAt + windowDays * DAY_MS; const postEnd = appliedAt + windowDays * DAY_MS;
// preCount/postCount = originating-signal occurrences; preTotal/postTotal =
// ALL journal entries in the window (the activity denominator).
let preCount = 0; let preCount = 0;
let postCount = 0; let postCount = 0;
let preTotal = 0;
let postTotal = 0;
for (const je of journal || []) { for (const je of journal || []) {
if (!je || typeof je !== "object") continue; if (!je || typeof je !== "object") continue;
if (!sigSet.has(je.type)) continue;
const t = tsMs(je); const t = tsMs(je);
if (Number.isNaN(t)) continue; if (Number.isNaN(t)) continue;
if (t >= preStart && t < appliedAt) preCount++; const inPre = t >= preStart && t < appliedAt;
else if (t >= appliedAt && t < postEnd) postCount++; const inPost = t >= appliedAt && t < postEnd;
if (!inPre && !inPost) continue;
if (inPre) preTotal++; else postTotal++;
if (!sigSet.has(je.type)) continue;
if (inPre) preCount++; else postCount++;
} }
let status; let status;
let deltaPct; let deltaPct;
let rawDeltaPct = null;
let normalized = false;
if (preCount === 0) { if (preCount === 0) {
status = "no_baseline"; status = "no_baseline";
deltaPct = null; deltaPct = null;
} else { } else {
deltaPct = ((postCount - preCount) / preCount) * 100; rawDeltaPct = Math.round(((postCount - preCount) / preCount) * 10000) / 100;
// Volume normalization: when the windows contain non-originating activity,
// compare the signal's SHARE of activity (rate), not its absolute count —
// otherwise a generally busier post-window masquerades as a regression.
// No background (signal IS the only activity) → fall back to raw delta,
// preserving prior behavior.
const hasBackground = (preTotal - preCount) + (postTotal - postCount) > 0;
if (hasBackground && postTotal > 0) {
const preRate = preCount / preTotal; // preTotal >= preCount > 0
const postRate = postCount / postTotal;
deltaPct = ((postRate - preRate) / preRate) * 100;
normalized = true;
} else {
deltaPct = ((postCount - preCount) / preCount) * 100;
}
// Round to 2 dp for stable comparison + presentation. // Round to 2 dp for stable comparison + presentation.
deltaPct = Math.round(deltaPct * 100) / 100; deltaPct = Math.round(deltaPct * 100) / 100;
if (deltaPct <= IMPROVED_PCT) status = "improved"; if (deltaPct <= IMPROVED_PCT) status = "improved";
else if (deltaPct >= REGRESSED_PCT) status = "regressed"; else if (deltaPct >= REGRESSED_PCT) status = "regressed";
else status = "neutral"; else status = "neutral";
} }
out.push({ ...base, pre_count: preCount, post_count: postCount, delta_pct: deltaPct, status }); out.push({
...base,
pre_count: preCount, post_count: postCount,
pre_total: preTotal, post_total: postTotal,
raw_delta_pct: rawDeltaPct, normalized,
delta_pct: deltaPct, status,
});
} }
return out; return out;
} }
+186
View File
@@ -0,0 +1,186 @@
#!/usr/bin/env node
// adam-batch.mjs — pre-clusters windowed journal entries into coherent failure
// batches before analyst dispatch. Implements MOSS §3.1: "anchored to an
// automatically curated batch of production-failure evidence."
//
// Each batch groups entries by (signal_type, cluster_key) where cluster_key
// follows the same clustering rules as agents/adam.md ## Signal types / ## Process step 4:
// correction → tokenized phrase (cross-cwd)
// retry_loop → tool
// weak_agent → subagent_type
// tool_error_loop→ fp
// dead_end → session
// edit_churn → file basename
// file_reread → file basename
// build_loop → session
// subagent_dispatch_pattern → subagent_type
// silent_drift → active_skills[0]
// error_after_recovery → (recovered_from, original_fp)
// correction_free_streak → active_skills[0]
// clean_recovery → (recovered_from, active_skills[0])
// task_completed → sorted tool_kinds tuple
//
// CLI:
// adam-batch.mjs [--input <jsonl-path>] [--min-entries N] [--min-sessions N]
//
// Output: JSON object with `batches` array and `unbatched` count.
import { readFileSync } from "node:fs";
import { readJsonlSafe } from "./adam-utils.mjs";
const DEFAULT_MIN_ENTRIES = 1;
const DEFAULT_MIN_SESSIONS = 1;
const CORRECTION_STOPWORDS = new Set([
"the", "a", "an", "and", "or", "but", "of", "to", "for", "in", "on",
"with", "use", "when", "where", "what", "why", "how", "this", "that",
"these", "those", "is", "are", "was", "were", "be", "been", "being",
"do", "does", "did", "doing", "has", "have", "had", "your", "you",
"i", "it", "as", "at", "by", "from", "not", "no",
]);
function tokenizePhrase(phrase) {
if (!phrase || typeof phrase !== "string") return "";
return phrase.toLowerCase()
.split(/\s+/)
.map(t => t.replace(/^[^\w']+|[^\w']+$/g, ""))
.filter(t => t && !CORRECTION_STOPWORDS.has(t))
.sort()
.join("|");
}
function clusterKey(entry) {
if (!entry || typeof entry !== "object") return null;
const t = entry.type;
switch (t) {
case "correction":
return tokenizePhrase(entry.phrase) || "unknown";
case "retry_loop":
return entry.tool || "unknown";
case "weak_agent":
case "subagent_dispatch_pattern":
return entry.subagent_type || "unknown";
case "tool_error_loop":
return entry.fp || "unknown";
case "dead_end":
case "build_loop":
return entry.session || "unknown";
case "edit_churn":
case "file_reread":
return entry.file ? entry.file.split("/").pop() : "unknown";
case "silent_drift":
case "correction_free_streak":
return Array.isArray(entry.active_skills) ? (entry.active_skills[0] || "") : "";
case "error_after_recovery":
return `${entry.recovered_from || "?"}:${entry.original_fp || "?"}`;
case "clean_recovery":
return `${entry.recovered_from || "?"}:${Array.isArray(entry.active_skills) ? (entry.active_skills[0] || "") : ""}`;
case "task_completed":
return Array.isArray(entry.tool_kinds) ? entry.tool_kinds.slice().sort().join(",") : "unknown";
default:
return entry.session || "unknown";
}
}
function parseArgs(argv) {
const args = { input: null, minEntries: DEFAULT_MIN_ENTRIES, minSessions: DEFAULT_MIN_SESSIONS, help: false };
for (let i = 0; i < argv.length; i++) {
const a = argv[i];
if (a === "--input" && i + 1 < argv.length) args.input = argv[++i];
else if (a === "--min-entries" && i + 1 < argv.length) {
const n = Number(argv[++i]);
if (!Number.isNaN(n) && n > 0) args.minEntries = n;
}
else if (a === "--min-sessions" && i + 1 < argv.length) {
const n = Number(argv[++i]);
if (!Number.isNaN(n) && n > 0) args.minSessions = n;
}
else if (a === "--help" || a === "-h") args.help = true;
}
return args;
}
export function buildBatches(entries, opts = {}) {
const minEntries = opts.minEntries || DEFAULT_MIN_ENTRIES;
const minSessions = opts.minSessions || DEFAULT_MIN_SESSIONS;
const map = new Map();
for (const e of entries || []) {
if (!e || typeof e !== "object" || !e.type) continue;
const key = `${e.type}::${clusterKey(e)}`;
if (!map.has(key)) {
map.set(key, {
batch_id: null,
signal_type: e.type,
cluster_key: clusterKey(e),
entries: [],
sessions: new Set(),
cwds: new Set(),
});
}
const batch = map.get(key);
batch.entries.push(e);
if (e.session) batch.sessions.add(e.session);
if (e.cwd) batch.cwds.add(e.cwd);
}
const batches = [];
let unbatched = 0;
let id = 1;
for (const [, batch] of map) {
if (batch.entries.length < minEntries || batch.sessions.size < minSessions) {
unbatched += batch.entries.length;
continue;
}
batch.batch_id = `b${id++}`;
batches.push({
batch_id: batch.batch_id,
signal_type: batch.signal_type,
cluster_key: batch.cluster_key,
entry_count: batch.entries.length,
session_count: batch.sessions.size,
cwd_count: batch.cwds.size,
has_context_window: batch.entries.some(e => Array.isArray(e.context_window) && e.context_window.length > 0),
entries: batch.entries,
});
}
batches.sort((a, b) => b.entry_count - a.entry_count);
return { batches, unbatched, total: (entries || []).length };
}
function main() {
const args = parseArgs(process.argv.slice(2));
if (args.help) {
process.stdout.write("usage: adam-batch.mjs [--input <jsonl-path>] [--min-entries N] [--min-sessions N]\n");
process.exit(0);
}
try {
let entries;
if (args.input) {
entries = readJsonlSafe(args.input);
} else if (!process.stdin.isTTY) {
const buf = readFileSync(0, "utf8");
entries = [];
for (const line of buf.split("\n")) {
if (!line) continue;
try { entries.push(JSON.parse(line)); } catch { /* skip */ }
}
} else {
process.stderr.write("adam-batch: no input (use --input or pipe)\n");
process.exit(1);
}
const result = buildBatches(entries, { minEntries: args.minEntries, minSessions: args.minSessions });
process.stdout.write(JSON.stringify(result) + "\n");
process.exit(0);
} catch (e) {
process.stderr.write(`adam-batch error: ${e.message}\n`);
process.exit(1);
}
}
if (import.meta.url === `file://${process.argv[1]}`) {
main();
}
export { clusterKey, tokenizePhrase };
+34 -4
View File
@@ -4,8 +4,12 @@
// //
// CLI: // CLI:
// adam-cooldown.mjs --skill <slug> --fingerprint <hash> [--home <path>] // adam-cooldown.mjs --skill <slug> --fingerprint <hash> [--home <path>]
// adam-cooldown.mjs --compute --skill <slug> --cluster <id> [--diff-file <path>]
// → prints {"fingerprint":"<djb2_base36>"}; diff body read from --diff-file
// or stdin. This is how proposal_fingerprint is populated (the analyst
// runs it via Bash after drafting a proposal).
// //
// Output: JSON one-liner with shape // Output (gate mode): JSON one-liner with shape
// { "status": "cool"|"cooldown"|"blacklisted", // { "status": "cool"|"cooldown"|"blacklisted",
// "reason": "<human-readable reason>", // "reason": "<human-readable reason>",
// "blocked_by": { "file": "<basename>", "days_remaining": <int> } | null } // "blocked_by": { "file": "<basename>", "days_remaining": <int> } | null }
@@ -33,12 +37,15 @@ const DAY_MS = 86400000;
export const LEGACY_FINGERPRINT = "legacy"; export const LEGACY_FINGERPRINT = "legacy";
function parseArgs(argv) { function parseArgs(argv) {
const args = { home: null, skill: null, fingerprint: null, help: false }; const args = { home: null, skill: null, fingerprint: null, compute: false, cluster: null, diffFile: null, help: false };
for (let i = 0; i < argv.length; i++) { for (let i = 0; i < argv.length; i++) {
const a = argv[i]; const a = argv[i];
if (a === "--home" && i + 1 < argv.length) args.home = argv[++i]; if (a === "--home" && i + 1 < argv.length) args.home = argv[++i];
else if (a === "--skill" && i + 1 < argv.length) args.skill = argv[++i]; else if (a === "--skill" && i + 1 < argv.length) args.skill = argv[++i];
else if (a === "--fingerprint" && i + 1 < argv.length) args.fingerprint = argv[++i]; else if (a === "--fingerprint" && i + 1 < argv.length) args.fingerprint = argv[++i];
else if (a === "--cluster" && i + 1 < argv.length) args.cluster = argv[++i];
else if (a === "--diff-file" && i + 1 < argv.length) args.diffFile = argv[++i];
else if (a === "--compute") args.compute = true;
else if (a === "--help" || a === "-h") args.help = true; else if (a === "--help" || a === "-h") args.help = true;
} }
return args; return args;
@@ -158,9 +165,11 @@ export function computeProposalFingerprint(proposal) {
if (!proposal || typeof proposal !== "object") return LEGACY_FINGERPRINT; if (!proposal || typeof proposal !== "object") return LEGACY_FINGERPRINT;
const skill = proposal.skill_slug || proposal.target_skill || proposal.skill || ""; const skill = proposal.skill_slug || proposal.target_skill || proposal.skill || "";
const cluster = proposal.signal_cluster_id || proposal.cluster_id || ""; const cluster = proposal.signal_cluster_id || proposal.cluster_id || "";
// normalized_diff_body: whitespace (incl. newlines) collapsed to single
// spaces, then trimmed. Matches agents/adam.md §"Per-(skill, fingerprint)
// cooldown". (No trailing-newline strip needed — \s+ already absorbed them.)
const diff = String(proposal.diff_body || proposal.proposed_change || "") const diff = String(proposal.diff_body || proposal.proposed_change || "")
.replace(/\s+/g, " ") .replace(/\s+/g, " ")
.replace(/\n+$/g, "")
.trim(); .trim();
return djb2(`${skill}\n${cluster}\n${diff}`); return djb2(`${skill}\n${cluster}\n${diff}`);
} }
@@ -168,7 +177,28 @@ export function computeProposalFingerprint(proposal) {
function main() { function main() {
const args = parseArgs(process.argv.slice(2)); const args = parseArgs(process.argv.slice(2));
if (args.help) { if (args.help) {
process.stdout.write("usage: adam-cooldown.mjs --skill <slug> --fingerprint <hash> [--home <path>]\n"); process.stdout.write(
"usage: adam-cooldown.mjs --skill <slug> --fingerprint <hash> [--home <path>]\n" +
" adam-cooldown.mjs --compute --skill <slug> --cluster <id> [--diff-file <path>]\n"
);
process.exit(0);
}
// --compute: deterministically derive a proposal_fingerprint. The analyst
// invokes this (it has Bash) after drafting a proposal, then writes the
// result into proposal frontmatter so the cooldown gate keys on it.
if (args.compute) {
let diff = "";
if (args.diffFile) {
try { diff = readFileSync(args.diffFile, "utf8"); } catch { /* empty → still deterministic */ }
} else {
try { diff = readFileSync(0, "utf8"); } catch { /* no stdin */ }
}
const fp = computeProposalFingerprint({
skill_slug: args.skill || "",
signal_cluster_id: args.cluster || "",
diff_body: diff,
});
process.stdout.write(JSON.stringify({ fingerprint: fp }) + "\n");
process.exit(0); process.exit(0);
} }
if (!args.skill || !args.fingerprint) { if (!args.skill || !args.fingerprint) {
+1
View File
@@ -135,6 +135,7 @@ export function parseTrace(text) {
considered: clusters.length, considered: clusters.length,
emitted, emitted,
skipped: clusters.length - emitted, skipped: clusters.length - emitted,
regressions: 0,
reasons, reasons,
}; };
} }
+242
View File
@@ -0,0 +1,242 @@
#!/usr/bin/env node
// adam-rollback.mjs — auto-reverts proposals that regressed after apply.
//
// Implements MOSS §3.5: "rollback is mandatory because... a candidate that
// passes trial can still regress live."
//
// For each regressed proposal (detected by adam-ab-measure.mjs):
// 1. Reads the applied proposal from applied/
// 2. Parses the `# Rollback` section for undo commands
// 3. Moves proposal from applied/ to proposals/ with `rolled_back: true`
// 4. Creates a regression nudge for next SessionStart
// 5. Removes the ab-tracking entry (so it doesn't re-trigger)
//
// CLI:
// adam-rollback.mjs --proposal-id <id> [--home <path>] [--dry-run]
// adam-rollback.mjs --auto [--home <path>] [--dry-run]
//
// --auto mode: reads ab-measure output, rolls back all regressed proposals.
//
// Output: JSON object with rollback results per proposal.
// Does NOT execute the undo commands itself — outputs them for the skill to
// execute in-context (safety: undo commands may reference files the script
// can't safely modify).
import { readFileSync, writeFileSync, renameSync, readdirSync, existsSync, mkdirSync } from "node:fs";
import { join, basename } from "node:path";
import { homedir } from "node:os";
import { parseFrontmatter, readJsonlSafe, listJsonlFiles } from "./adam-utils.mjs";
function parseArgs(argv) {
const args = { home: null, proposalId: null, auto: false, dryRun: false, help: false };
for (let i = 0; i < argv.length; i++) {
const a = argv[i];
if (a === "--home" && i + 1 < argv.length) args.home = argv[++i];
else if (a === "--proposal-id" && i + 1 < argv.length) args.proposalId = argv[++i];
else if (a === "--auto") args.auto = true;
else if (a === "--dry-run") args.dryRun = true;
else if (a === "--help" || a === "-h") args.help = true;
}
return args;
}
function findAppliedProposal(appliedDir, proposalId) {
if (!existsSync(appliedDir)) return null;
try {
const files = readdirSync(appliedDir).filter(n => n.endsWith(".md"));
for (const f of files) {
if (f.includes(proposalId)) return join(appliedDir, f);
}
} catch { /* skip */ }
return null;
}
function extractRollbackSection(content) {
const idx = content.indexOf("\n# Rollback\n");
if (idx === -1) return null;
let body = content.slice(idx + "\n# Rollback\n".length);
const nextSection = body.search(/\n# |\n---/);
if (nextSection !== -1) body = body.slice(0, nextSection);
return body.trim() || null;
}
function extractUndoCommands(rollbackSection) {
if (!rollbackSection) return [];
const commands = [];
const lines = rollbackSection.split("\n");
let inCodeBlock = false;
let blockLines = [];
for (const line of lines) {
if (line.startsWith("```")) {
if (inCodeBlock) {
if (blockLines.length) commands.push(blockLines.join("\n"));
blockLines = [];
}
inCodeBlock = !inCodeBlock;
continue;
}
if (inCodeBlock) {
blockLines.push(line);
}
}
return commands;
}
export function planRollback(appliedDir, proposalId) {
const path = findAppliedProposal(appliedDir, proposalId);
if (!path) return { status: "not_found", proposal_id: proposalId };
const content = readFileSync(path, "utf8");
const fm = parseFrontmatter(content);
const rollbackSection = extractRollbackSection(content);
const undoCommands = extractUndoCommands(rollbackSection);
return {
status: "planned",
proposal_id: proposalId,
applied_path: path,
type: fm.type || "unknown",
target: fm.target || null,
target_skill: fm.target_skill || null,
undo_commands: undoCommands,
has_rollback_section: !!rollbackSection,
};
}
export function executeRollback(plan, adamRoot, opts = {}) {
const dryRun = opts.dryRun || false;
const proposalsDir = join(adamRoot, "proposals");
const nudgesPath = join(adamRoot, "active-nudges.json");
const now = Date.now();
if (plan.status !== "planned") return { ...plan, action: "skipped" };
const result = {
proposal_id: plan.proposal_id,
type: plan.type,
target: plan.target,
undo_commands: plan.undo_commands,
actions: [],
};
if (dryRun) {
result.actions.push("dry_run: would move applied → proposals");
if (plan.undo_commands.length) {
result.actions.push(`dry_run: would output ${plan.undo_commands.length} undo command(s)`);
}
result.actions.push("dry_run: would create regression nudge");
result.status = "dry_run";
return result;
}
mkdirSync(proposalsDir, { recursive: true });
const destName = `${basename(plan.applied_path).replace(/\.md$/, "")}-rollback.md`;
const destPath = join(proposalsDir, destName);
let content = readFileSync(plan.applied_path, "utf8");
const rollbackMeta = `\nrolled_back: true\nrolled_back_at: "${new Date(now).toISOString()}"`;
content = content.replace(/^(---\n[\s\S]*?)(---)/m, `$1${rollbackMeta}\n$2`);
try {
writeFileSync(destPath, content);
renameSync(plan.applied_path, plan.applied_path + ".rolled-back");
result.actions.push(`moved ${plan.applied_path}${destPath}`);
} catch (e) {
result.status = "move_failed";
result.error = e.message;
return result;
}
try {
let nudges = [];
if (existsSync(nudgesPath)) {
try { nudges = JSON.parse(readFileSync(nudgesPath, "utf8")); } catch { nudges = []; }
}
nudges.push({
kind: "regression_rollback",
message: `adam: rolled back "${plan.proposal_id}" (type: ${plan.type}) — regression detected in A/B measurement. Review with /reflect.`,
created_at: now,
expires_at_ts: now + 7 * 86400000,
max_displays: 3,
displays_used: 0,
source_proposal: plan.proposal_id,
});
writeFileSync(nudgesPath, JSON.stringify(nudges, null, 2));
result.actions.push("regression nudge created");
} catch (e) {
result.actions.push(`nudge failed: ${e.message}`);
}
// Remove the ab-tracking entry for this proposal so it stops re-flagging as a
// regression on every future /reflect (which would trigger endless not_found
// rollback attempts). This is the documented contract for rollback.
try {
const abPath = join(adamRoot, "ab-tracking.jsonl");
if (existsSync(abPath)) {
const before = readJsonlSafe(abPath);
const kept = before.filter((e) => !(e && e.proposal_id === plan.proposal_id));
if (kept.length !== before.length) {
writeFileSync(abPath, kept.length ? kept.map((e) => JSON.stringify(e)).join("\n") + "\n" : "");
result.actions.push(`ab-tracking entry removed (${before.length - kept.length})`);
}
}
} catch (e) {
result.actions.push(`ab-tracking cleanup failed: ${e.message}`);
}
result.status = "rolled_back";
return result;
}
async function main() {
const args = parseArgs(process.argv.slice(2));
if (args.help) {
process.stdout.write(
"usage: adam-rollback.mjs --proposal-id <id> [--home <path>] [--dry-run]\n" +
" adam-rollback.mjs --auto [--home <path>] [--dry-run]\n"
);
process.exit(0);
}
const claudeHome = args.home || join(homedir(), ".claude");
const adamRoot = join(claudeHome, "adam");
const appliedDir = join(adamRoot, "applied");
try {
const results = [];
if (args.auto) {
const abPath = join(adamRoot, "ab-tracking.jsonl");
const entries = readJsonlSafe(abPath);
const { computeDeltas } = await import("./adam-ab-measure.mjs");
const sources = [join(adamRoot, "journal.jsonl"), ...listJsonlFiles(join(adamRoot, "journal"))];
const journalAll = [];
for (const p of sources) for (const e of readJsonlSafe(p)) journalAll.push(e);
const deltas = computeDeltas(entries, journalAll);
const regressed = deltas.filter(d => d.status === "regressed");
for (const d of regressed) {
const plan = planRollback(appliedDir, d.proposal_id);
const result = executeRollback(plan, adamRoot, { dryRun: args.dryRun });
results.push(result);
}
} else if (args.proposalId) {
const plan = planRollback(appliedDir, args.proposalId);
const result = executeRollback(plan, adamRoot, { dryRun: args.dryRun });
results.push(result);
} else {
process.stderr.write("adam-rollback: specify --proposal-id or --auto\n");
process.exit(1);
}
process.stdout.write(JSON.stringify({ rollbacks: results }) + "\n");
process.exit(0);
} catch (e) {
process.stderr.write(`adam-rollback error: ${e.message}\n`);
process.exit(1);
}
}
if (import.meta.url === `file://${process.argv[1]}`) {
main();
}
+38 -3
View File
@@ -23,7 +23,8 @@
// Output: JSON object // Output: JSON object
// { // {
// "sessions": [ // "sessions": [
// {"session_id": "...", "negative_count": N, "task_completed_count": M, "dampener": 1.0} // {"session_id": "...", "negative_count": N, "task_completed_count": M,
// "severity_sum": S, "severity_by_type": {"<type>": N, ...}, "dampener": 1.0}
// ], // ],
// "reinforcement_candidates": [ // "reinforcement_candidates": [
// {"skill_slug": "tdd-loop", "count": 3, "recent_ts": "..."} // {"skill_slug": "tdd-loop", "count": 3, "recent_ts": "..."}
@@ -43,10 +44,33 @@ export const NEGATIVE_SIGNAL_TYPES = new Set([
"retry_loop", "retry_loop",
"build_loop", "build_loop",
"weak_agent", "weak_agent",
"silent_drift",
"error_after_recovery",
]); ]);
export const REINFORCEMENT_THRESHOLD = 3; export const REINFORCEMENT_THRESHOLD = 3;
// Severity divisor per struggle signal type. Severity = max(1, floor(count / divisor)).
// Entries without `count` default to severity 1. Source of truth — referenced by
// agents/adam.md (Confidence rubric → severity-sum bullets).
export const SEVERITY_DIVISORS = {
dead_end: 8,
edit_churn: 4,
tool_error_loop: 3,
retry_loop: 3,
file_reread: 3,
weak_agent: 2,
build_loop: 1,
};
export function entrySeverity(entry) {
if (!entry || typeof entry !== "object") return 1;
const divisor = SEVERITY_DIVISORS[entry.type];
if (!divisor) return 1;
const count = typeof entry.count === "number" && entry.count > 0 ? entry.count : 1;
return Math.max(1, Math.floor(count / divisor));
}
function parseArgs(argv) { function parseArgs(argv) {
const args = { home: null, input: null, help: false }; const args = { home: null, input: null, help: false };
for (let i = 0; i < argv.length; i++) { for (let i = 0; i < argv.length; i++) {
@@ -84,11 +108,22 @@ export function computeSessionScores(entries) {
const sid = e.session || e.session_id || ""; const sid = e.session || e.session_id || "";
if (!sid) continue; if (!sid) continue;
if (!bySession.has(sid)) { if (!bySession.has(sid)) {
bySession.set(sid, { session_id: sid, negative_count: 0, task_completed_count: 0 }); bySession.set(sid, {
session_id: sid,
negative_count: 0,
task_completed_count: 0,
severity_sum: 0,
severity_by_type: {},
});
} }
const slot = bySession.get(sid); const slot = bySession.get(sid);
if (e.type === "task_completed") slot.task_completed_count++; if (e.type === "task_completed") slot.task_completed_count++;
else if (NEGATIVE_SIGNAL_TYPES.has(e.type)) slot.negative_count++; else if (NEGATIVE_SIGNAL_TYPES.has(e.type)) {
slot.negative_count++;
const sev = entrySeverity(e);
slot.severity_sum += sev;
slot.severity_by_type[e.type] = (slot.severity_by_type[e.type] || 0) + sev;
}
} }
const out = []; const out = [];
for (const slot of bySession.values()) { for (const slot of bySession.values()) {
+272
View File
@@ -0,0 +1,272 @@
#!/usr/bin/env node
// adam-skill-utility.mjs — execution-grounded per-skill utility report.
//
// Inspired by SkillsInjector (arXiv 2605.29794v1), which shows skill injection
// should be driven by execution-grounded *utility* Δ(t,s), not surface keyword
// match — and that some topically-relevant skills actively *lower* success.
// The paper learns Δ(t,s) from rollout outcomes. We don't train anything: the
// adam journal already attaches `active_skills` to both positive outcome events
// (task_completed, clean_recovery, correction_free_streak) and negative ones
// (dead_end, tool_error_loop, …). So we approximate Δ(s) as a co-occurrence
// ratio over the data we already collect.
//
// CAVEAT (honest): this is CO-OCCURRENCE, not causation. A skill active during
// a dead_end did not necessarily cause it. Read the report as "which skills
// correlate with friction", a prompt for review — never as proof.
//
// Metric, per skill active during scored events:
// pos / neg — count of positive / negative outcome events it co-occurred with
// share — pos / (pos+neg)
// lift — share global_baseline (>0 above baseline, <0 below)
// wLB — Wilson 95% lower bound of the positive proportion; ranks
// *reliably* below-baseline skills to the top (sample-aware)
// sevNeg — severity-weighted negative sum (adam SEVERITY_DIVISORS)
// topNeg — dominant negative event type
// Rows sorted worst-first (lowest wLB) so harmful/over-eager skills surface.
//
// CLI:
// adam-skill-utility.mjs [--home <path>] [--input <jsonl-path>]
// [--min <n>] [--days <n>] [--json]
// --min min event count (n) to treat a skill's signal as confident (default 8)
// --days only consider events within the last <n> days (default: all)
// --json emit machine-readable JSON instead of the text table
//
// Reuses adam-utils (jsonl IO) and adam-score (canonical NEGATIVE set +
// severity), so the positive/negative taxonomy stays single-sourced.
import { readFileSync } from "node:fs";
import { join } from "node:path";
import { homedir } from "node:os";
import { readJsonlSafe, listJsonlFiles } from "./adam-utils.mjs";
import { NEGATIVE_SIGNAL_TYPES, entrySeverity } from "./adam-score.mjs";
// Positive outcome signals (mirror adam's vocabulary; task_completed is adam's
// canonical "clean task", the same one adam-score uses for reinforcement).
export const POSITIVE_SIGNAL_TYPES = new Set([
"task_completed",
"clean_recovery",
"correction_free_streak",
]);
export const DEFAULT_MIN_SAMPLE = 8;
function round(x) {
return Math.round(x * 1000) / 1000;
}
// Wilson score interval lower bound for a binomial proportion. Sample-aware:
// a skill with 1 pos / 0 neg does NOT outrank one with 40 pos / 2 neg.
export function wilsonLower(pos, n, z = 1.96) {
if (n <= 0) return 0;
const p = pos / n;
const z2 = z * z;
const denom = 1 + z2 / n;
const center = p + z2 / (2 * n);
const margin = z * Math.sqrt((p * (1 - p) + z2 / (4 * n)) / n);
return (center - margin) / denom;
}
// computeSkillUtility: pure. entries → { baseline, totalPos, totalNeg, min, skills[] }.
export function computeSkillUtility(entries, opts = {}) {
const min = Number.isFinite(opts.min) ? opts.min : DEFAULT_MIN_SAMPLE;
const per = new Map();
let totalPos = 0;
let totalNeg = 0;
for (const e of entries || []) {
if (!e || typeof e !== "object") continue;
const isPos = POSITIVE_SIGNAL_TYPES.has(e.type);
const isNeg = NEGATIVE_SIGNAL_TYPES.has(e.type);
if (!isPos && !isNeg) continue;
if (isPos) totalPos++;
else totalNeg++;
const sev = isNeg ? entrySeverity(e) : 0;
const skills = Array.isArray(e.active_skills) ? e.active_skills : [];
for (const slug of skills) {
if (!slug || typeof slug !== "string") continue;
if (!per.has(slug)) {
per.set(slug, { pos: 0, neg: 0, sevNeg: 0, negTypes: {}, recent_ts: null });
}
const s = per.get(slug);
if (isPos) {
s.pos++;
} else {
s.neg++;
s.sevNeg += sev;
s.negTypes[e.type] = (s.negTypes[e.type] || 0) + 1;
}
const ts = typeof e.ts === "string" ? e.ts : null;
if (ts && (!s.recent_ts || ts > s.recent_ts)) s.recent_ts = ts;
}
}
const scored = totalPos + totalNeg;
const baseline = scored ? totalPos / scored : 0;
const skills = [];
for (const [slug, s] of per.entries()) {
const n = s.pos + s.neg;
const share = n ? s.pos / n : 0;
const topNeg = Object.entries(s.negTypes).sort((a, b) => b[1] - a[1])[0];
skills.push({
skill: slug,
n,
pos: s.pos,
neg: s.neg,
share: round(share),
lift: round(share - baseline),
wLB: round(wilsonLower(s.pos, n)),
sevNeg: s.sevNeg,
topNeg: topNeg ? topNeg[0] : null,
lowSample: n < min,
recent_ts: s.recent_ts,
});
}
// Worst-first: lowest Wilson lower bound, then most negatives.
skills.sort(
(a, b) =>
a.wLB - b.wLB ||
b.neg - a.neg ||
(a.skill < b.skill ? -1 : a.skill > b.skill ? 1 : 0),
);
return { baseline: round(baseline), totalPos, totalNeg, min, skills };
}
function parseArgs(argv) {
const args = { home: null, input: null, min: DEFAULT_MIN_SAMPLE, days: null, json: false, help: false };
for (let i = 0; i < argv.length; i++) {
const a = argv[i];
if (a === "--home" && i + 1 < argv.length) args.home = argv[++i];
else if (a === "--input" && i + 1 < argv.length) args.input = argv[++i];
else if (a === "--min" && i + 1 < argv.length) args.min = Number(argv[++i]);
else if (a === "--days" && i + 1 < argv.length) args.days = Number(argv[++i]);
else if (a === "--json") args.json = true;
else if (a === "--help" || a === "-h") args.help = true;
}
return args;
}
function readAllStdin() {
try { return readFileSync(0, "utf8"); } catch { return ""; }
}
function entriesFromText(text) {
const out = [];
for (const line of (text || "").split("\n")) {
if (!line) continue;
try { out.push(JSON.parse(line)); } catch { /* skip */ }
}
return out;
}
// Same gathering strategy as adam-score.mjs: explicit --input, else piped
// stdin (e.g. from adam-window.mjs), else the active journal + rotated files.
function gatherInputEntries(args) {
if (args.input) return readJsonlSafe(args.input);
if (!process.stdin.isTTY) {
const piped = readAllStdin();
if (piped && piped.trim()) return entriesFromText(piped);
}
const home = args.home || join(homedir(), ".claude");
const adamRoot = join(home, "adam");
const sources = [join(adamRoot, "journal.jsonl"), ...listJsonlFiles(join(adamRoot, "journal"))];
const all = [];
for (const p of sources) {
for (const e of readJsonlSafe(p)) all.push(e);
}
return all;
}
function filterByDays(entries, days) {
if (!Number.isFinite(days) || days <= 0) return entries;
// Anchor the window to the newest ts in the data (avoids Date.now()
// nondeterminism and works on historical exports).
let maxMs = 0;
for (const e of entries) {
const ms = e && typeof e.ts === "string" ? Date.parse(e.ts) : NaN;
if (Number.isFinite(ms) && ms > maxMs) maxMs = ms;
}
if (!maxMs) return entries;
const cutoff = maxMs - days * 86400000;
return entries.filter((e) => {
const ms = e && typeof e.ts === "string" ? Date.parse(e.ts) : NaN;
return Number.isFinite(ms) ? ms >= cutoff : false;
});
}
function pad(s, w) {
s = String(s);
return s.length >= w ? s : s + " ".repeat(w - s.length);
}
function padL(s, w) {
s = String(s);
return s.length >= w ? s : " ".repeat(w - s.length) + s;
}
function renderText(report) {
const { baseline, totalPos, totalNeg, min, skills } = report;
const lines = [];
lines.push("adam skill-utility report — execution-grounded Δ(skill) proxy");
lines.push(
`baseline positive-rate ${(baseline * 100).toFixed(1)}% ` +
`(${totalPos} positive / ${totalNeg} negative outcome events) min-sample n≥${min}`,
);
lines.push("CAVEAT: co-occurrence, not causation. Worst-first. ⚠ = below baseline with n≥min.");
lines.push("");
const head =
pad("skill", 44) + padL("n", 5) + padL("pos", 6) + padL("neg", 6) +
padL("share", 8) + padL("lift", 8) + padL("wLB", 7) + padL("sevNeg", 8) +
" " + pad("topNeg", 18) + "flag";
lines.push(head);
lines.push("-".repeat(head.length));
for (const s of skills) {
const below = s.lift < 0 && !s.lowSample;
const flag = below ? "⚠" : s.lowSample ? "·(low n)" : "";
lines.push(
pad(s.skill, 44) +
padL(s.n, 5) +
padL(s.pos, 6) +
padL(s.neg, 6) +
padL((s.share * 100).toFixed(0) + "%", 8) +
padL((s.lift >= 0 ? "+" : "") + (s.lift * 100).toFixed(0) + "%", 8) +
padL(s.wLB.toFixed(2), 7) +
padL(s.sevNeg, 8) +
" " +
pad(s.topNeg || "-", 18) +
flag,
);
}
return lines.join("\n");
}
function main() {
const args = parseArgs(process.argv.slice(2));
if (args.help) {
process.stdout.write(
"usage: adam-skill-utility.mjs [--home <path>] [--input <jsonl-path>] " +
"[--min <n>] [--days <n>] [--json]\n",
);
process.exit(0);
}
try {
let entries = gatherInputEntries(args);
entries = filterByDays(entries, args.days);
const report = computeSkillUtility(entries, { min: args.min });
if (args.json) {
process.stdout.write(JSON.stringify(report) + "\n");
} else {
process.stdout.write(renderText(report) + "\n");
}
process.exit(0);
} catch (e) {
process.stderr.write(`adam-skill-utility error: ${e.message}\n`);
process.exit(1);
}
}
if (import.meta.url === `file://${process.argv[1]}`) {
main();
}
+3
View File
@@ -29,6 +29,9 @@ export const SIGNAL_WINDOWS_DAYS = {
build_loop: 30, build_loop: 30,
weak_agent: 30, weak_agent: 30,
subagent_dispatch_pattern: 30, subagent_dispatch_pattern: 30,
silent_drift: 14,
file_reread: 14,
error_after_recovery: 30,
correction_free_streak: 60, correction_free_streak: 60,
clean_recovery: 60, clean_recovery: 60,
task_completed: 60, task_completed: 60,
+788
View File
@@ -16,6 +16,9 @@ SCORE="$REAL_HOME/.claude/adam/scripts/adam-score.mjs"
ABMEASURE="$REAL_HOME/.claude/adam/scripts/adam-ab-measure.mjs" ABMEASURE="$REAL_HOME/.claude/adam/scripts/adam-ab-measure.mjs"
APPLYREIN="$REAL_HOME/.claude/adam/scripts/adam-apply-reinforcement.mjs" APPLYREIN="$REAL_HOME/.claude/adam/scripts/adam-apply-reinforcement.mjs"
UPGRADE="$REAL_HOME/.claude/adam/scripts/adam-upgrade.mjs" UPGRADE="$REAL_HOME/.claude/adam/scripts/adam-upgrade.mjs"
BATCH="$REAL_HOME/.claude/adam/scripts/adam-batch.mjs"
ROLLBACK="$REAL_HOME/.claude/adam/scripts/adam-rollback.mjs"
SKILLUTIL="$REAL_HOME/.claude/adam/scripts/adam-skill-utility.mjs"
TMP_HOME="$(mktemp -d -t adam-test.XXXXXX)" TMP_HOME="$(mktemp -d -t adam-test.XXXXXX)"
trap 'rm -rf "$TMP_HOME"' EXIT INT TERM trap 'rm -rf "$TMP_HOME"' EXIT INT TERM
@@ -33,6 +36,9 @@ SCORE_RUN() { HOME="$TMP_HOME" node "$SCORE" --home "$TMP_HOME/.claude" "$@";
ABMEASURE_RUN(){ HOME="$TMP_HOME" node "$ABMEASURE" --home "$TMP_HOME/.claude" "$@"; } ABMEASURE_RUN(){ HOME="$TMP_HOME" node "$ABMEASURE" --home "$TMP_HOME/.claude" "$@"; }
APPLYREIN_RUN(){ HOME="$TMP_HOME" node "$APPLYREIN" "$@" --home "$TMP_HOME/.claude"; } APPLYREIN_RUN(){ HOME="$TMP_HOME" node "$APPLYREIN" "$@" --home "$TMP_HOME/.claude"; }
UPGRADE_RUN() { HOME="$TMP_HOME" node "$UPGRADE" "$@"; } UPGRADE_RUN() { HOME="$TMP_HOME" node "$UPGRADE" "$@"; }
BATCH_RUN() { HOME="$TMP_HOME" node "$BATCH" "$@"; }
ROLLBACK_RUN(){ HOME="$TMP_HOME" node "$ROLLBACK" "$@"; }
SKILLUTIL_RUN(){ HOME="$TMP_HOME" node "$SKILLUTIL" "$@"; }
PASS=0 PASS=0
FAIL=0 FAIL=0
@@ -67,6 +73,17 @@ assert_grep() {
fi fi
} }
assert_no_grep() {
local file="$1" pattern="$2" name="$3"
if grep -qE "$pattern" "$file" 2>/dev/null; then
echo " FAIL: $name (pattern $pattern unexpectedly present in $file)"
FAIL=$((FAIL+1))
else
echo " PASS: $name"
PASS=$((PASS+1))
fi
}
# --- Test 1: correction signal --- # --- Test 1: correction signal ---
echo "Test 1: user correction" echo "Test 1: user correction"
reset_state reset_state
@@ -1388,6 +1405,777 @@ else
fi fi
fi fi
# --- Test 78: silent_drift fires after 5 consecutive read-only tools ---
echo "Test 78: silent_drift after 5 reads"
reset_state
for i in 1 2 3 4 5; do
echo "{\"hook_event_name\":\"PostToolUse\",\"tool_name\":\"Read\",\"tool_input\":{\"file_path\":\"/tmp/r-$i\"},\"tool_response\":{\"content\":\"ok\"},\"session_id\":\"sSD\",\"cwd\":\"/tmp/x\"}" \
| HOOK_RUN >/dev/null 2>&1 || true
done
assert_grep "$ROOT/journal.jsonl" '"type":"silent_drift"' "5 consecutive reads emit silent_drift"
assert_grep "$ROOT/journal.jsonl" '"read_count":5' "silent_drift entry records read_count"
# --- Test 79: silent_drift counter resets on action tool ---
echo "Test 79: silent_drift counter resets on action tool"
reset_state
for i in 1 2 3 4; do
echo "{\"hook_event_name\":\"PostToolUse\",\"tool_name\":\"Read\",\"tool_input\":{\"file_path\":\"/tmp/r-$i\"},\"tool_response\":{\"content\":\"ok\"},\"session_id\":\"sSDR\",\"cwd\":\"/tmp/x\"}" \
| HOOK_RUN >/dev/null 2>&1 || true
done
# Action tool — should reset
echo '{"hook_event_name":"PostToolUse","tool_name":"Edit","tool_input":{"file_path":"/tmp/x"},"tool_response":{"content":"ok"},"session_id":"sSDR","cwd":"/tmp/x"}' \
| HOOK_RUN >/dev/null 2>&1 || true
for i in 1 2 3 4; do
echo "{\"hook_event_name\":\"PostToolUse\",\"tool_name\":\"Read\",\"tool_input\":{\"file_path\":\"/tmp/rb-$i\"},\"tool_response\":{\"content\":\"ok\"},\"session_id\":\"sSDR\",\"cwd\":\"/tmp/x\"}" \
| HOOK_RUN >/dev/null 2>&1 || true
done
if grep -qE '"type":"silent_drift"' "$ROOT/journal.jsonl"; then
echo " FAIL: silent_drift fired despite action tool reset"; FAIL=$((FAIL+1))
else
echo " PASS: silent_drift suppressed by intervening action tool"; PASS=$((PASS+1))
fi
# --- Test 80: error_after_recovery fires when same fp returns post-clean_recovery ---
echo "Test 80: error_after_recovery fires when fp returns after recovery"
reset_state
# Build a tool_error_loop with ENOENT
for i in 1 2 3; do
echo '{"hook_event_name":"PostToolUse","tool_name":"Bash","tool_input":{"command":"cat missing"},"tool_response":{"is_error":true,"content":"cat: missing: No such file or directory"},"session_id":"sEAR","cwd":"/tmp/x"}' \
| HOOK_RUN >/dev/null 2>&1 || true
done
# 3 clean tools → clean_recovery
for i in 1 2 3; do
echo "{\"hook_event_name\":\"PostToolUse\",\"tool_name\":\"Read\",\"tool_input\":{\"file_path\":\"/tmp/ok-$i\"},\"tool_response\":{\"content\":\"ok\"},\"session_id\":\"sEAR\",\"cwd\":\"/tmp/x\"}" \
| HOOK_RUN >/dev/null 2>&1 || true
done
# Same fp returns within window
echo '{"hook_event_name":"PostToolUse","tool_name":"Bash","tool_input":{"command":"cat other"},"tool_response":{"is_error":true,"content":"cat: other: No such file or directory"},"session_id":"sEAR","cwd":"/tmp/x"}' \
| HOOK_RUN >/dev/null 2>&1 || true
assert_grep "$ROOT/journal.jsonl" '"type":"error_after_recovery"' "same fp after clean_recovery emits error_after_recovery"
# --- Test 81: error_after_recovery does NOT fire after window expires ---
echo "Test 81: error_after_recovery suppressed beyond window"
reset_state
for i in 1 2 3; do
echo '{"hook_event_name":"PostToolUse","tool_name":"Bash","tool_input":{"command":"cat missing"},"tool_response":{"is_error":true,"content":"cat: missing: No such file or directory"},"session_id":"sEARW","cwd":"/tmp/x"}' \
| HOOK_RUN >/dev/null 2>&1 || true
done
for i in 1 2 3; do
echo "{\"hook_event_name\":\"PostToolUse\",\"tool_name\":\"Read\",\"tool_input\":{\"file_path\":\"/tmp/ok-$i\"},\"tool_response\":{\"content\":\"ok\"},\"session_id\":\"sEARW\",\"cwd\":\"/tmp/x\"}" \
| HOOK_RUN >/dev/null 2>&1 || true
done
# UserPromptSubmit resets tools_since_user + last_errors so the burn reads don't
# trigger a secondary dead_end + clean_recovery cycle (which would create a fresh
# recovery within window and cause error_after_recovery to fire legitimately).
echo '{"hook_event_name":"UserPromptSubmit","prompt":"keep going","session_id":"sEARW","cwd":"/tmp/x"}' \
| HOOK_RUN >/dev/null 2>&1 || true
# Burn through the 5-event window with 6 clean reads (session_post_count: 6 → 12)
for i in 1 2 3 4 5 6; do
echo "{\"hook_event_name\":\"PostToolUse\",\"tool_name\":\"Read\",\"tool_input\":{\"file_path\":\"/tmp/burn-$i\"},\"tool_response\":{\"content\":\"ok\"},\"session_id\":\"sEARW\",\"cwd\":\"/tmp/x\"}" \
| HOOK_RUN >/dev/null 2>&1 || true
done
echo '{"hook_event_name":"PostToolUse","tool_name":"Bash","tool_input":{"command":"cat other"},"tool_response":{"is_error":true,"content":"cat: other: No such file or directory"},"session_id":"sEARW","cwd":"/tmp/x"}' \
| HOOK_RUN >/dev/null 2>&1 || true
if grep -qE '"type":"error_after_recovery"' "$ROOT/journal.jsonl"; then
echo " FAIL: error_after_recovery fired outside 5-event window"; FAIL=$((FAIL+1))
else
echo " PASS: error_after_recovery suppressed outside window"; PASS=$((PASS+1))
fi
# --- Test 82: adam-score.mjs reports severity_sum + severity_by_type ---
echo "Test 82: severity-sum reporting in score.mjs"
SEV_TMP="$(mktemp)"
cat > "$SEV_TMP" <<'EOF'
{"ts":"2026-05-12T10:00:00Z","session":"sSEV","type":"dead_end","count":64}
{"ts":"2026-05-12T10:01:00Z","session":"sSEV","type":"edit_churn","count":8}
{"ts":"2026-05-12T10:02:00Z","session":"sSEV","type":"tool_error_loop","count":3,"fp":"ENOENT:abc"}
EOF
out=$(SCORE_RUN --input "$SEV_TMP" 2>/dev/null)
rm -f "$SEV_TMP"
# Expected: dead_end 64/8=8, edit_churn 8/4=2, tool_error_loop 3/3=1 → sum=11
if echo "$out" | grep -q '"severity_sum":11'; then
echo " PASS: severity_sum=11 reported"; PASS=$((PASS+1))
else
echo " FAIL: severity_sum mismatch (got: $out)"; FAIL=$((FAIL+1))
fi
if echo "$out" | grep -q '"dead_end":8'; then
echo " PASS: severity_by_type.dead_end=8"; PASS=$((PASS+1))
else
echo " FAIL: severity_by_type.dead_end missing/wrong (got: $out)"; FAIL=$((FAIL+1))
fi
# ============================================================
# MOSS-grounded tests: context_window, adam-batch, adam-rollback
# ============================================================
# --- Test 83: context_window attached to tool_error_loop struggle signal ---
echo "Test 83: context_window attached to tool_error_loop"
reset_state
# Fire a user prompt first so the context ring has something.
echo '{"hook_event_name":"UserPromptSubmit","prompt":"run the tests","session_id":"sCW1","cwd":"/tmp/x"}' \
| HOOK_RUN >/dev/null 2>&1 || true
for i in 1 2 3; do
echo '{"hook_event_name":"PostToolUse","tool_name":"Bash","tool_input":{"command":"failing-cmd"},"tool_response":{"is_error":true,"content":"Error: command not found: failing-cmd"},"session_id":"sCW1","cwd":"/tmp/x"}' \
| HOOK_RUN >/dev/null 2>&1 || true
done
assert_grep "$ROOT/journal.jsonl" '"context_window":\[' "tool_error_loop carries context_window"
# --- Test 84: context_window captures preceding user prompt ---
echo "Test 84: context_window captures user prompt text"
# Re-use the journal from test 83
assert_grep "$ROOT/journal.jsonl" '"prompt":"run the tests"' "context_window includes user prompt excerpt"
# --- Test 85: context_window includes tool response excerpts ---
echo "Test 85: context_window includes tool response excerpts"
assert_grep "$ROOT/journal.jsonl" '"response_excerpt"' "context_window entries have response_excerpt"
# --- Test 86: context_window on dead_end signal ---
echo "Test 86: context_window on dead_end signal"
reset_state
echo '{"hook_event_name":"UserPromptSubmit","prompt":"start working","session_id":"sCW2","cwd":"/tmp/x"}' \
| HOOK_RUN >/dev/null 2>&1 || true
for i in 1 2 3 4 5 6 7 8; do
echo "{\"hook_event_name\":\"PostToolUse\",\"tool_name\":\"Bash\",\"tool_input\":{\"command\":\"step$i\"},\"session_id\":\"sCW2\",\"cwd\":\"/tmp/x\"}" \
| HOOK_RUN >/dev/null 2>&1 || true
done
if grep -qE '"type":"dead_end"' "$ROOT/journal.jsonl" && grep -qE '"context_window":\[' "$ROOT/journal.jsonl"; then
echo " PASS: dead_end carries context_window"; PASS=$((PASS+1))
else
echo " FAIL: dead_end missing context_window"; FAIL=$((FAIL+1))
fi
# --- Test 87: context_window NOT on non-struggle signals ---
echo "Test 87: context_window absent from correction_free_streak"
reset_state
for i in 1 2 3 4 5; do
echo "{\"hook_event_name\":\"UserPromptSubmit\",\"prompt\":\"step $i please\",\"session_id\":\"sCW3\",\"cwd\":\"/tmp/x\"}" \
| HOOK_RUN >/dev/null 2>&1 || true
done
# correction_free_streak should have fired
streak_line=$(grep '"type":"correction_free_streak"' "$ROOT/journal.jsonl" | head -1)
if [ -n "$streak_line" ] && ! echo "$streak_line" | grep -q '"context_window"'; then
echo " PASS: correction_free_streak has no context_window"; PASS=$((PASS+1))
else
echo " FAIL: unexpected context_window on non-struggle signal"; FAIL=$((FAIL+1))
fi
# --- Test 88: adam-batch clusters same signal_type + fp into one batch ---
echo "Test 88: adam-batch clusters same (type, fp) into one batch"
batch_input=$(cat <<'EOF'
{"ts":"2026-05-20T10:00:00Z","type":"tool_error_loop","session":"s1","cwd":"/a","tool":"Bash","fp":"ENOENT:abc","count":3}
{"ts":"2026-05-21T10:00:00Z","type":"tool_error_loop","session":"s2","cwd":"/a","tool":"Bash","fp":"ENOENT:abc","count":4}
{"ts":"2026-05-22T10:00:00Z","type":"tool_error_loop","session":"s3","cwd":"/b","tool":"Bash","fp":"ENOENT:abc","count":3}
EOF
)
out=$(echo "$batch_input" | BATCH_RUN 2>/dev/null)
batch_count=$(echo "$out" | node -e 'let b="";process.stdin.on("data",d=>b+=d).on("end",()=>{const j=JSON.parse(b);console.log(j.batches.length)})')
entry_count=$(echo "$out" | node -e 'let b="";process.stdin.on("data",d=>b+=d).on("end",()=>{const j=JSON.parse(b);console.log(j.batches[0]?j.batches[0].entry_count:0)})')
if [ "$batch_count" = "1" ] && [ "$entry_count" = "3" ]; then
echo " PASS: 3 same-fp entries → 1 batch with entry_count=3"; PASS=$((PASS+1))
else
echo " FAIL: expected 1 batch / 3 entries (got batches=$batch_count entries=$entry_count)"; FAIL=$((FAIL+1))
fi
# --- Test 89: adam-batch creates separate batches for different signal types ---
echo "Test 89: adam-batch separates different signal types"
batch_input=$(cat <<'EOF'
{"ts":"2026-05-20T10:00:00Z","type":"correction","session":"s1","cwd":"/a","phrase":"no wrong"}
{"ts":"2026-05-21T10:00:00Z","type":"tool_error_loop","session":"s1","cwd":"/a","fp":"ENOENT:abc","count":3}
EOF
)
out=$(echo "$batch_input" | BATCH_RUN 2>/dev/null)
batch_count=$(echo "$out" | node -e 'let b="";process.stdin.on("data",d=>b+=d).on("end",()=>{const j=JSON.parse(b);console.log(j.batches.length)})')
if [ "$batch_count" = "2" ]; then
echo " PASS: 2 different types → 2 batches"; PASS=$((PASS+1))
else
echo " FAIL: expected 2 batches (got $batch_count)"; FAIL=$((FAIL+1))
fi
# --- Test 90: adam-batch reports session_count correctly ---
echo "Test 90: adam-batch tracks session_count per batch"
batch_input=$(cat <<'EOF'
{"ts":"2026-05-20T10:00:00Z","type":"correction","session":"s1","cwd":"/a","phrase":"no wrong"}
{"ts":"2026-05-21T10:00:00Z","type":"correction","session":"s2","cwd":"/a","phrase":"no wrong"}
{"ts":"2026-05-22T10:00:00Z","type":"correction","session":"s1","cwd":"/b","phrase":"no wrong"}
EOF
)
out=$(echo "$batch_input" | BATCH_RUN 2>/dev/null)
sessions=$(echo "$out" | node -e 'let b="";process.stdin.on("data",d=>b+=d).on("end",()=>{const j=JSON.parse(b);console.log(j.batches[0]?j.batches[0].session_count:0)})')
if [ "$sessions" = "2" ]; then
echo " PASS: session_count=2 for entries from s1+s2"; PASS=$((PASS+1))
else
echo " FAIL: expected session_count=2 (got $sessions)"; FAIL=$((FAIL+1))
fi
# --- Test 91: adam-batch reports has_context_window ---
echo "Test 91: adam-batch reports has_context_window flag"
batch_input=$(cat <<'EOF'
{"ts":"2026-05-20T10:00:00Z","type":"dead_end","session":"s1","cwd":"/a","count":8,"context_window":[{"event":"user","prompt":"hi","ts":"2026-05-20T09:59:00Z"}]}
EOF
)
out=$(echo "$batch_input" | BATCH_RUN 2>/dev/null)
has_cw=$(echo "$out" | node -e 'let b="";process.stdin.on("data",d=>b+=d).on("end",()=>{const j=JSON.parse(b);console.log(j.batches[0]?j.batches[0].has_context_window:"false")})')
if [ "$has_cw" = "true" ]; then
echo " PASS: has_context_window=true when entries have context_window"; PASS=$((PASS+1))
else
echo " FAIL: expected has_context_window=true (got $has_cw)"; FAIL=$((FAIL+1))
fi
# --- Test 92: adam-batch empty input → no batches ---
echo "Test 92: adam-batch produces empty output on empty input"
out=$(echo '' | BATCH_RUN 2>/dev/null)
batch_count=$(echo "$out" | node -e 'let b="";process.stdin.on("data",d=>b+=d).on("end",()=>{try{const j=JSON.parse(b);console.log(j.batches.length)}catch{console.log("parse-error")}})')
total=$(echo "$out" | node -e 'let b="";process.stdin.on("data",d=>b+=d).on("end",()=>{try{const j=JSON.parse(b);console.log(j.total)}catch{console.log("parse-error")}})')
if [ "$batch_count" = "0" ] && [ "$total" = "0" ]; then
echo " PASS: empty input → 0 batches, total=0"; PASS=$((PASS+1))
else
echo " FAIL: expected 0 batches (got batches=$batch_count total=$total)"; FAIL=$((FAIL+1))
fi
# --- Test 93: adam-rollback --proposal-id moves applied proposal to proposals ---
echo "Test 93: adam-rollback moves applied proposal to proposals/"
reset_state
rm -f "$ROOT/proposals/"*rollback* "$ROOT/active-nudges.json"
cat > "$ROOT/applied/2026-05-20T00-00-00Z-rb-test-001.md" <<'EOF'
---
id: rb-test-001
type: skill_new
target: ~/.claude/skills/test-skill/SKILL.md
confidence: 5
blast_radius: low
auto_apply_eligible: true
status: applied
source_entries:
- "2026-05-18T10:00:00Z"
---
# Why
test rollback
# Rollback
```bash
rm -rf ~/.claude/skills/test-skill/
```
EOF
out=$(ROLLBACK_RUN --proposal-id rb-test-001 --home "$TMP_HOME/.claude" 2>/dev/null)
if echo "$out" | grep -q '"status":"rolled_back"'; then
rb_ok=1
else
rb_ok=0
fi
# Verify proposal moved to proposals/
if ls "$ROOT/proposals/"*rb-test-001* >/dev/null 2>&1; then
moved_ok=1
else
moved_ok=0
fi
# Verify original file renamed
if [ -f "$ROOT/applied/2026-05-20T00-00-00Z-rb-test-001.md.rolled-back" ]; then
renamed_ok=1
else
renamed_ok=0
fi
if [ "$rb_ok" = "1" ] && [ "$moved_ok" = "1" ] && [ "$renamed_ok" = "1" ]; then
echo " PASS: rollback moved proposal and renamed applied file"; PASS=$((PASS+1))
else
echo " FAIL: rollback incomplete (status=$rb_ok moved=$moved_ok renamed=$renamed_ok out=$out)"; FAIL=$((FAIL+1))
fi
# --- Test 94: adam-rollback creates regression nudge ---
echo "Test 94: adam-rollback creates regression nudge in active-nudges.json"
if [ -f "$ROOT/active-nudges.json" ]; then
nudge_kind=$(node -e "const j=JSON.parse(require('fs').readFileSync('$ROOT/active-nudges.json','utf8'));console.log((j[0]||{}).kind||'')")
if [ "$nudge_kind" = "regression_rollback" ]; then
echo " PASS: regression nudge created with kind=regression_rollback"; PASS=$((PASS+1))
else
echo " FAIL: nudge kind wrong (got $nudge_kind)"; FAIL=$((FAIL+1))
fi
else
echo " FAIL: active-nudges.json not created"; FAIL=$((FAIL+1))
fi
rm -f "$ROOT/proposals/"*rb-test* "$ROOT/applied/"*rb-test* "$ROOT/active-nudges.json"
# --- Test 95: adam-rollback rolled_back field in proposal frontmatter ---
echo "Test 95: rolled-back proposal has rolled_back: true in frontmatter"
reset_state
rm -f "$ROOT/proposals/"*rollback* "$ROOT/active-nudges.json"
cat > "$ROOT/applied/2026-05-20T00-00-00Z-rb-test-002.md" <<'EOF'
---
id: rb-test-002
type: memory
target: ~/.claude/projects/-Users-nvm/memory/test.md
confidence: 4
blast_radius: low
---
# Why
test
# Rollback
delete the memory file
EOF
ROLLBACK_RUN --proposal-id rb-test-002 --home "$TMP_HOME/.claude" >/dev/null 2>&1 || true
rb_file=$(ls "$ROOT/proposals/"*rb-test-002* 2>/dev/null | head -1)
if [ -n "$rb_file" ] && grep -q 'rolled_back: true' "$rb_file"; then
echo " PASS: rolled-back proposal has rolled_back: true"; PASS=$((PASS+1))
else
echo " FAIL: rolled_back marker missing (file=$rb_file)"; FAIL=$((FAIL+1))
fi
rm -f "$ROOT/proposals/"*rb-test* "$ROOT/applied/"*rb-test* "$ROOT/active-nudges.json"
# --- Test 96: adam-rollback not_found on missing proposal ---
echo "Test 96: adam-rollback returns not_found for missing proposal"
reset_state
out=$(ROLLBACK_RUN --proposal-id nonexistent-999 --home "$TMP_HOME/.claude" 2>/dev/null)
if echo "$out" | grep -q '"status":"not_found"'; then
echo " PASS: not_found status for missing proposal"; PASS=$((PASS+1))
else
echo " FAIL: expected not_found (got: $out)"; FAIL=$((FAIL+1))
fi
# --- Test 97: adam-rollback --dry-run does not move files ---
echo "Test 97: adam-rollback --dry-run leaves files in place"
reset_state
rm -f "$ROOT/proposals/"*rollback* "$ROOT/active-nudges.json"
cat > "$ROOT/applied/2026-05-20T00-00-00Z-rb-dry-001.md" <<'EOF'
---
id: rb-dry-001
type: skill_edit
target: ~/.claude/skills/foo/SKILL.md
confidence: 4
---
# Why
test dry run
# Rollback
revert edit
EOF
out=$(ROLLBACK_RUN --proposal-id rb-dry-001 --dry-run --home "$TMP_HOME/.claude" 2>/dev/null)
if echo "$out" | grep -q '"status":"dry_run"' && [ -f "$ROOT/applied/2026-05-20T00-00-00Z-rb-dry-001.md" ]; then
echo " PASS: dry-run did not move files"; PASS=$((PASS+1))
else
echo " FAIL: dry-run moved files or wrong status (out=$out)"; FAIL=$((FAIL+1))
fi
rm -f "$ROOT/applied/2026-05-20T00-00-00Z-rb-dry-001.md"
# --- Test 98: context_window on edit_churn signal ---
echo "Test 98: context_window on edit_churn signal"
reset_state
echo '{"hook_event_name":"UserPromptSubmit","prompt":"fix the tests","session_id":"sCW4","cwd":"/tmp/x"}' \
| HOOK_RUN >/dev/null 2>&1 || true
for i in 1 2 3 4; do
echo '{"hook_event_name":"PostToolUse","tool_name":"Edit","tool_input":{"file_path":"/tmp/churn.py"},"session_id":"sCW4","cwd":"/tmp/x"}' \
| HOOK_RUN >/dev/null 2>&1 || true
done
churn_line=$(grep '"type":"edit_churn"' "$ROOT/journal.jsonl" | head -1)
if [ -n "$churn_line" ] && echo "$churn_line" | grep -q '"context_window"'; then
echo " PASS: edit_churn carries context_window"; PASS=$((PASS+1))
else
echo " FAIL: edit_churn missing context_window"; FAIL=$((FAIL+1))
fi
# --- Test 99: context_window on build_loop signal ---
echo "Test 99: context_window on build_loop signal"
reset_state
echo '{"hook_event_name":"UserPromptSubmit","prompt":"run the build","session_id":"sCW5","cwd":"/tmp/x"}' \
| HOOK_RUN >/dev/null 2>&1 || true
for i in 1 2; do
echo '{"hook_event_name":"PostToolUse","tool_name":"Bash","tool_input":{"command":"npm run build"},"tool_response":{"is_error":true,"content":"Build failed: TypeError"},"session_id":"sCW5","cwd":"/tmp/x"}' \
| HOOK_RUN >/dev/null 2>&1 || true
done
build_line=$(grep '"type":"build_loop"' "$ROOT/journal.jsonl" | head -1)
if [ -n "$build_line" ] && echo "$build_line" | grep -q '"context_window"'; then
echo " PASS: build_loop carries context_window"; PASS=$((PASS+1))
else
echo " FAIL: build_loop missing context_window"; FAIL=$((FAIL+1))
fi
# --- Test 100: adam-batch --min-entries filter ---
echo "Test 100: adam-batch --min-entries filters small batches"
batch_input=$(cat <<'EOF'
{"ts":"2026-05-20T10:00:00Z","type":"correction","session":"s1","cwd":"/a","phrase":"no wrong"}
{"ts":"2026-05-21T10:00:00Z","type":"tool_error_loop","session":"s1","cwd":"/a","fp":"ENOENT:abc","count":3}
{"ts":"2026-05-22T10:00:00Z","type":"tool_error_loop","session":"s2","cwd":"/a","fp":"ENOENT:abc","count":4}
{"ts":"2026-05-23T10:00:00Z","type":"tool_error_loop","session":"s3","cwd":"/a","fp":"ENOENT:abc","count":5}
EOF
)
out=$(echo "$batch_input" | BATCH_RUN --min-entries 3 2>/dev/null)
batch_count=$(echo "$out" | node -e 'let b="";process.stdin.on("data",d=>b+=d).on("end",()=>{const j=JSON.parse(b);console.log(j.batches.length)})')
unbatched=$(echo "$out" | node -e 'let b="";process.stdin.on("data",d=>b+=d).on("end",()=>{const j=JSON.parse(b);console.log(j.unbatched)})')
if [ "$batch_count" = "1" ] && [ "$unbatched" = "1" ]; then
echo " PASS: --min-entries=3 keeps 1 batch (3 entries), drops 1 singleton"; PASS=$((PASS+1))
else
echo " FAIL: expected 1 batch + 1 unbatched (got batches=$batch_count unbatched=$unbatched)"; FAIL=$((FAIL+1))
fi
# --- Test 101: adam-rollback extracts undo commands from Rollback section ---
echo "Test 101: adam-rollback extracts undo commands from code blocks"
reset_state
rm -f "$ROOT/proposals/"*rollback* "$ROOT/active-nudges.json"
cat > "$ROOT/applied/2026-05-20T00-00-00Z-rb-undo-001.md" <<'HEREDOC'
---
id: rb-undo-001
type: skill_new
target: ~/.claude/skills/test-undo/SKILL.md
confidence: 5
blast_radius: low
---
# Why
test
# Rollback
```bash
rm -rf ~/.claude/skills/test-undo/
```
HEREDOC
out=$(ROLLBACK_RUN --proposal-id rb-undo-001 --home "$TMP_HOME/.claude" 2>/dev/null)
undo_count=$(echo "$out" | node -e 'let b="";process.stdin.on("data",d=>b+=d).on("end",()=>{try{const j=JSON.parse(b);console.log((j.rollbacks[0]||{}).undo_commands?j.rollbacks[0].undo_commands.length:0)}catch{console.log("err")}})')
if [ "$undo_count" = "1" ]; then
echo " PASS: extracted 1 undo command from Rollback section"; PASS=$((PASS+1))
else
echo " FAIL: expected 1 undo command (got $undo_count, out=$out)"; FAIL=$((FAIL+1))
fi
rm -f "$ROOT/proposals/"*rb-undo* "$ROOT/applied/"*rb-undo* "$ROOT/active-nudges.json"
# --- Test 102: context_ring size bounded at 8 ---
echo "Test 102: context_ring bounded at CONTEXT_RING_SIZE=8"
reset_state
# Fire 12 PostToolUse events, then a struggle signal
for i in $(seq 1 12); do
echo "{\"hook_event_name\":\"PostToolUse\",\"tool_name\":\"Read\",\"tool_input\":{\"file_path\":\"/tmp/f-$i\"},\"tool_response\":{\"content\":\"ok\"},\"session_id\":\"sCR\",\"cwd\":\"/tmp/x\"}" \
| HOOK_RUN >/dev/null 2>&1 || true
done
# Next 3 errors to trigger tool_error_loop with context_window
for i in 1 2 3; do
echo '{"hook_event_name":"PostToolUse","tool_name":"Bash","tool_input":{"command":"fail"},"tool_response":{"is_error":true,"content":"Error: fail"},"session_id":"sCR","cwd":"/tmp/x"}' \
| HOOK_RUN >/dev/null 2>&1 || true
done
cw_len=$(grep '"type":"tool_error_loop"' "$ROOT/journal.jsonl" | head -1 | node -e 'let b="";process.stdin.on("data",d=>b+=d).on("end",()=>{try{const j=JSON.parse(b);console.log(j.context_window?j.context_window.length:0)}catch{console.log("err")}})')
if [ "$cw_len" = "8" ]; then
echo " PASS: context_window capped at 8 entries"; PASS=$((PASS+1))
else
echo " FAIL: expected 8 context_window entries (got $cw_len)"; FAIL=$((FAIL+1))
fi
# --- Test 103: silent_drift carries active_skills (its primary cluster key) ---
echo "Test 103: silent_drift emits active_skills (§5b skill-attribution)"
reset_state
echo '{"hook_event_name":"PreToolUse","tool_name":"Skill","tool_input":{"skill":"tdd"},"session_id":"sSK","cwd":"/tmp/x"}' \
| HOOK_RUN >/dev/null 2>&1 || true
for i in 1 2 3 4 5; do
echo "{\"hook_event_name\":\"PostToolUse\",\"tool_name\":\"Read\",\"tool_input\":{\"file_path\":\"/tmp/sk-$i\"},\"tool_response\":{\"content\":\"ok\"},\"session_id\":\"sSK\",\"cwd\":\"/tmp/x\"}" \
| HOOK_RUN >/dev/null 2>&1 || true
done
assert_grep "$ROOT/journal.jsonl" '"type":"silent_drift"' "silent_drift emitted after 5 reads with skill active"
assert_grep "$ROOT/journal.jsonl" '"active_skills":\["tdd"\]' "silent_drift carries active_skills cluster key"
# --- Test 104: retry_loop fires at threshold 3, not below ---
echo "Test 104: retry_loop boundary (2x no fire, 3x fires)"
reset_state
for i in 1 2; do
echo '{"hook_event_name":"PostToolUse","tool_name":"Bash","tool_input":{"command":"make"},"session_id":"sRT","cwd":"/tmp/x"}' \
| HOOK_RUN >/dev/null 2>&1 || true
done
assert_no_grep "$ROOT/journal.jsonl" '"type":"retry_loop"' "2x same args does NOT emit retry_loop"
echo '{"hook_event_name":"PostToolUse","tool_name":"Bash","tool_input":{"command":"make"},"session_id":"sRT","cwd":"/tmp/x"}' \
| HOOK_RUN >/dev/null 2>&1 || true
assert_grep "$ROOT/journal.jsonl" '"type":"retry_loop"' "3x same args emits retry_loop"
# --- Test 105: weak_agent fires at 2 dispatches, not at 1 ---
echo "Test 105: weak_agent boundary (1x no fire, 2x fires)"
reset_state
echo '{"hook_event_name":"PostToolUse","tool_name":"Agent","tool_input":{"subagent_type":"explorer"},"session_id":"sWA","cwd":"/tmp/x"}' \
| HOOK_RUN >/dev/null 2>&1 || true
assert_no_grep "$ROOT/journal.jsonl" '"type":"weak_agent"' "1x agent dispatch does NOT emit weak_agent"
echo '{"hook_event_name":"PostToolUse","tool_name":"Agent","tool_input":{"subagent_type":"explorer"},"session_id":"sWA","cwd":"/tmp/x"}' \
| HOOK_RUN >/dev/null 2>&1 || true
assert_grep "$ROOT/journal.jsonl" '"type":"weak_agent"' "2x same agent in window emits weak_agent"
# --- Test 106: adam-cooldown --compute deterministic + input-sensitive ---
echo "Test 106: adam-cooldown --compute fingerprint"
fp1=$(printf 'add section X' | COOLDOWN_RUN --compute --skill foo --cluster k1 2>/dev/null)
fp2=$(printf 'add section X' | COOLDOWN_RUN --compute --skill foo --cluster k1 2>/dev/null)
fp3=$(printf 'add section X' | COOLDOWN_RUN --compute --skill foo --cluster k2 2>/dev/null)
if [ -n "$fp1" ] && [ "$fp1" = "$fp2" ] && echo "$fp1" | grep -q '"fingerprint":'; then
echo " PASS: --compute deterministic for identical inputs"; PASS=$((PASS+1))
else
echo " FAIL: --compute not deterministic (got '$fp1' vs '$fp2')"; FAIL=$((FAIL+1))
fi
if [ "$fp1" != "$fp3" ]; then
echo " PASS: --compute sensitive to cluster id"; PASS=$((PASS+1))
else
echo " FAIL: --compute ignored cluster id (both '$fp1')"; FAIL=$((FAIL+1))
fi
# --- Test 107: A/B boundary — exactly -25% delta → improved ---
echo "Test 107: A/B exact -25% boundary (4 pre / 3 post → improved)"
reset_state
applied_at_ms=$(node -e 'console.log(Date.now() - 14*86400000)')
cat > "$ROOT/ab-tracking.jsonl" <<EOF
{"applied_at":$applied_at_ms,"proposal_id":"ab-b25-001","proposal_type":"memory","target_skill":"b1","proposal_fingerprint":"fpB1","originating_signals":[{"type":"correction","count":4,"session_ids":["sB1"]}],"pre_window_days":7}
EOF
> "$ROOT/journal.jsonl"
for i in 1 2 3 4; do
pre_ts=$(node -e "console.log(new Date(Date.now() - (15 + $i*0.3) * 86400000).toISOString())")
echo "{\"ts\":\"$pre_ts\",\"session\":\"sB1\",\"type\":\"correction\",\"phrase\":\"x\"}" >> "$ROOT/journal.jsonl"
done
for i in 1 2 3; do
post_ts=$(node -e "console.log(new Date(Date.now() - (8 + $i*0.3) * 86400000).toISOString())")
echo "{\"ts\":\"$post_ts\",\"session\":\"sB1\",\"type\":\"correction\",\"phrase\":\"y\"}" >> "$ROOT/journal.jsonl"
done
out=$(ABMEASURE_RUN --format json 2>/dev/null)
if echo "$out" | node -e 'let b="";process.stdin.on("data",d=>b+=d).on("end",()=>{const a=JSON.parse(b);const e=a.find(x=>x.proposal_id==="ab-b25-001");process.exit(e&&e.pre_count===4&&e.post_count===3&&e.delta_pct===-25&&e.status==="improved"?0:1)})'; then
echo " PASS: -25% boundary classified improved"; PASS=$((PASS+1))
else
echo " FAIL: -25% boundary misclassified (got: $out)"; FAIL=$((FAIL+1))
fi
rm -f "$ROOT/ab-tracking.jsonl"
# --- Test 108: A/B boundary — exactly +25% delta → regressed ---
echo "Test 108: A/B exact +25% boundary (4 pre / 5 post → regressed)"
reset_state
applied_at_ms=$(node -e 'console.log(Date.now() - 14*86400000)')
cat > "$ROOT/ab-tracking.jsonl" <<EOF
{"applied_at":$applied_at_ms,"proposal_id":"ab-b25-002","proposal_type":"memory","target_skill":"b2","proposal_fingerprint":"fpB2","originating_signals":[{"type":"correction","count":4,"session_ids":["sB2"]}],"pre_window_days":7}
EOF
> "$ROOT/journal.jsonl"
for i in 1 2 3 4; do
pre_ts=$(node -e "console.log(new Date(Date.now() - (15 + $i*0.3) * 86400000).toISOString())")
echo "{\"ts\":\"$pre_ts\",\"session\":\"sB2\",\"type\":\"correction\",\"phrase\":\"x\"}" >> "$ROOT/journal.jsonl"
done
for i in 1 2 3 4 5; do
post_ts=$(node -e "console.log(new Date(Date.now() - (8 + $i*0.3) * 86400000).toISOString())")
echo "{\"ts\":\"$post_ts\",\"session\":\"sB2\",\"type\":\"correction\",\"phrase\":\"y\"}" >> "$ROOT/journal.jsonl"
done
out=$(ABMEASURE_RUN --format json 2>/dev/null)
if echo "$out" | node -e 'let b="";process.stdin.on("data",d=>b+=d).on("end",()=>{const a=JSON.parse(b);const e=a.find(x=>x.proposal_id==="ab-b25-002");process.exit(e&&e.pre_count===4&&e.post_count===5&&e.delta_pct===25&&e.status==="regressed"?0:1)})'; then
echo " PASS: +25% boundary classified regressed"; PASS=$((PASS+1))
else
echo " FAIL: +25% boundary misclassified (got: $out)"; FAIL=$((FAIL+1))
fi
rm -f "$ROOT/ab-tracking.jsonl"
# --- Test 109: cooldown blacklist 30d boundary (day 29 active, day 31 expired) ---
echo "Test 109: blacklist 30d boundary"
reset_state
ts29=$(node -e 'console.log(Date.now() - 29*86400000)')
cat > "$ROOT/rejected/2026-blk-29.md" <<EOF
---
id: blk-29
type: skill_edit
target_skill: blkskill
proposal_fingerprint: fpZ
auto_apply_blacklist: true
applied_at: $ts29
---
body
EOF
out29=$(COOLDOWN_RUN --skill blkskill --fingerprint fpZ 2>/dev/null)
if echo "$out29" | grep -q '"status":"blacklisted"'; then
echo " PASS: day-29 blacklist still active"; PASS=$((PASS+1))
else
echo " FAIL: day-29 should be blacklisted (got: $out29)"; FAIL=$((FAIL+1))
fi
rm -f "$ROOT/rejected/2026-blk-29.md"
ts31=$(node -e 'console.log(Date.now() - 31*86400000)')
cat > "$ROOT/rejected/2026-blk-31.md" <<EOF
---
id: blk-31
type: skill_edit
target_skill: blkskill
proposal_fingerprint: fpZ
auto_apply_blacklist: true
applied_at: $ts31
---
body
EOF
out31=$(COOLDOWN_RUN --skill blkskill --fingerprint fpZ 2>/dev/null)
if echo "$out31" | grep -q '"status":"cool"'; then
echo " PASS: day-31 blacklist expired → cool"; PASS=$((PASS+1))
else
echo " FAIL: day-31 should be cool (got: $out31)"; FAIL=$((FAIL+1))
fi
rm -f "$ROOT/rejected/2026-blk-31.md"
# --- Test 110: file_reread fires on 3x offset-shifted same-file reads, not 2x ---
echo "Test 110: file_reread (offset-shifted same-file reads escape retry_loop)"
reset_state
for off in 0 100; do
echo "{\"hook_event_name\":\"PostToolUse\",\"tool_name\":\"Read\",\"tool_input\":{\"file_path\":\"/tmp/big.go\",\"offset\":$off},\"tool_response\":{\"content\":\"ok\"},\"session_id\":\"sFR\",\"cwd\":\"/tmp/x\"}" \
| HOOK_RUN >/dev/null 2>&1 || true
done
assert_no_grep "$ROOT/journal.jsonl" '"type":"file_reread"' "2x same-file reads does NOT emit file_reread"
echo '{"hook_event_name":"PostToolUse","tool_name":"Read","tool_input":{"file_path":"/tmp/big.go","offset":200},"tool_response":{"content":"ok"},"session_id":"sFR","cwd":"/tmp/x"}' \
| HOOK_RUN >/dev/null 2>&1 || true
assert_grep "$ROOT/journal.jsonl" '"type":"file_reread"' "3x offset-shifted same-file reads emit file_reread"
assert_no_grep "$ROOT/journal.jsonl" '"type":"retry_loop"' "offset-shifted reads do NOT emit retry_loop (argsHash differs)"
assert_grep "$ROOT/journal.jsonl" '"type":"file_reread".*"context_window"' "file_reread carries context_window (in STRUGGLE_TYPES)"
# --- Test 111: byte-identical reread is caught by retry_loop, not double-counted as file_reread ---
echo "Test 111: identical reads → retry_loop (file_reread guard avoids double-count)"
reset_state
for i in 1 2 3; do
echo '{"hook_event_name":"PostToolUse","tool_name":"Read","tool_input":{"file_path":"/tmp/same.go"},"tool_response":{"content":"ok"},"session_id":"sFR2","cwd":"/tmp/x"}' \
| HOOK_RUN >/dev/null 2>&1 || true
done
assert_grep "$ROOT/journal.jsonl" '"type":"retry_loop"' "3x byte-identical reads emit retry_loop"
assert_no_grep "$ROOT/journal.jsonl" '"type":"file_reread"' "byte-identical reads NOT double-counted as file_reread (sameToolArgs>=RETRY guard)"
# --- Test 112: A/B volume normalization — busier journal does NOT fake a regression ---
echo "Test 112: A/B volume-normalized (raw +200% but flat share → neutral)"
reset_state
applied_at_ms=$(node -e 'console.log(Date.now() - 14*86400000)')
cat > "$ROOT/ab-tracking.jsonl" <<EOF
{"applied_at":$applied_at_ms,"proposal_id":"ab-vol-001","proposal_type":"memory","target_skill":"vol","proposal_fingerprint":"fpV","originating_signals":[{"type":"correction","count":2,"session_ids":["sV"]}],"pre_window_days":7}
EOF
> "$ROOT/journal.jsonl"
# pre window: 2 correction + 8 dead_end (rate 0.2)
for i in 1 2; do ts=$(node -e "console.log(new Date(Date.now()-(15+$i*0.2)*86400000).toISOString())"); echo "{\"ts\":\"$ts\",\"session\":\"sV\",\"type\":\"correction\"}" >> "$ROOT/journal.jsonl"; done
for i in 1 2 3 4 5 6 7 8; do ts=$(node -e "console.log(new Date(Date.now()-(15+$i*0.1)*86400000).toISOString())"); echo "{\"ts\":\"$ts\",\"session\":\"sV\",\"type\":\"dead_end\"}" >> "$ROOT/journal.jsonl"; done
# post window: 6 correction + 24 dead_end (rate 0.2 — share unchanged, raw count +200%)
for i in $(seq 1 6); do ts=$(node -e "console.log(new Date(Date.now()-(8+$i*0.1)*86400000).toISOString())"); echo "{\"ts\":\"$ts\",\"session\":\"sV\",\"type\":\"correction\"}" >> "$ROOT/journal.jsonl"; done
for i in $(seq 1 24); do ts=$(node -e "console.log(new Date(Date.now()-(8+$i*0.05)*86400000).toISOString())"); echo "{\"ts\":\"$ts\",\"session\":\"sV\",\"type\":\"dead_end\"}" >> "$ROOT/journal.jsonl"; done
out=$(ABMEASURE_RUN --format json 2>/dev/null)
if echo "$out" | node -e 'let b="";process.stdin.on("data",d=>b+=d).on("end",()=>{const a=JSON.parse(b);const e=a.find(x=>x.proposal_id==="ab-vol-001");process.exit(e&&e.normalized===true&&e.raw_delta_pct===200&&e.status==="neutral"?0:1)})'; then
echo " PASS: volume growth normalized → neutral (raw +200%)"; PASS=$((PASS+1))
else
echo " FAIL: volume normalization wrong (got: $out)"; FAIL=$((FAIL+1))
fi
rm -f "$ROOT/ab-tracking.jsonl"
# --- Test 113: A/B genuine rate regression still flagged ---
echo "Test 113: A/B genuine share increase → regressed"
reset_state
applied_at_ms=$(node -e 'console.log(Date.now() - 14*86400000)')
cat > "$ROOT/ab-tracking.jsonl" <<EOF
{"applied_at":$applied_at_ms,"proposal_id":"ab-vol-002","proposal_type":"memory","target_skill":"vol2","proposal_fingerprint":"fpV2","originating_signals":[{"type":"correction","count":2,"session_ids":["sV2"]}],"pre_window_days":7}
EOF
> "$ROOT/journal.jsonl"
# pre: 2 correction + 8 dead_end (rate 0.2)
for i in 1 2; do ts=$(node -e "console.log(new Date(Date.now()-(15+$i*0.2)*86400000).toISOString())"); echo "{\"ts\":\"$ts\",\"session\":\"sV2\",\"type\":\"correction\"}" >> "$ROOT/journal.jsonl"; done
for i in 1 2 3 4 5 6 7 8; do ts=$(node -e "console.log(new Date(Date.now()-(15+$i*0.1)*86400000).toISOString())"); echo "{\"ts\":\"$ts\",\"session\":\"sV2\",\"type\":\"dead_end\"}" >> "$ROOT/journal.jsonl"; done
# post: 6 correction + 6 dead_end (rate 0.5 — share up → genuine regression)
for i in $(seq 1 6); do ts=$(node -e "console.log(new Date(Date.now()-(8+$i*0.1)*86400000).toISOString())"); echo "{\"ts\":\"$ts\",\"session\":\"sV2\",\"type\":\"correction\"}" >> "$ROOT/journal.jsonl"; done
for i in 1 2 3 4 5 6; do ts=$(node -e "console.log(new Date(Date.now()-(8+$i*0.07)*86400000).toISOString())"); echo "{\"ts\":\"$ts\",\"session\":\"sV2\",\"type\":\"dead_end\"}" >> "$ROOT/journal.jsonl"; done
out=$(ABMEASURE_RUN --format json 2>/dev/null)
if echo "$out" | node -e 'let b="";process.stdin.on("data",d=>b+=d).on("end",()=>{const a=JSON.parse(b);const e=a.find(x=>x.proposal_id==="ab-vol-002");process.exit(e&&e.normalized===true&&e.status==="regressed"?0:1)})'; then
echo " PASS: genuine share increase → regressed"; PASS=$((PASS+1))
else
echo " FAIL: genuine regression missed (got: $out)"; FAIL=$((FAIL+1))
fi
rm -f "$ROOT/ab-tracking.jsonl"
# --- Test 114: update notifier nudges from cache when a newer release exists (no network) ---
echo "Test 114: update notifier — cached newer release prints nudge"
reset_state
printf 'v0.6.2\n' > "$ROOT/.version"
node -e "require('fs').writeFileSync('$ROOT/.update-check.json', JSON.stringify({last_check: Date.now(), latest: 'v9.9.9'}))"
out=$(echo '{"hook_event_name":"SessionStart","session_id":"sUP"}' | NUDGE_RUN 2>/dev/null)
if echo "$out" | grep -q "update available: v0.6.2 → v9.9.9"; then
echo " PASS: update nudge printed from cache (offline)"; PASS=$((PASS+1))
else
echo " FAIL: expected update nudge (got: $out)"; FAIL=$((FAIL+1))
fi
rm -f "$ROOT/.version" "$ROOT/.update-check.json"
# --- Test 115: update notifier silent when installed is current ---
echo "Test 115: update notifier — up-to-date is silent"
reset_state
printf 'v9.9.9\n' > "$ROOT/.version"
node -e "require('fs').writeFileSync('$ROOT/.update-check.json', JSON.stringify({last_check: Date.now(), latest: 'v9.9.9'}))"
out=$(echo '{"hook_event_name":"SessionStart","session_id":"sUP2"}' | NUDGE_RUN 2>/dev/null)
if echo "$out" | grep -q "update available"; then
echo " FAIL: nudged despite being current (got: $out)"; FAIL=$((FAIL+1))
else
echo " PASS: no nudge when up-to-date"; PASS=$((PASS+1))
fi
rm -f "$ROOT/.version" "$ROOT/.update-check.json"
# --- Test 116: ADAM_NO_UPDATE_CHECK disables the notifier ---
echo "Test 116: ADAM_NO_UPDATE_CHECK opt-out"
reset_state
printf 'v0.6.2\n' > "$ROOT/.version"
node -e "require('fs').writeFileSync('$ROOT/.update-check.json', JSON.stringify({last_check: Date.now(), latest: 'v9.9.9'}))"
out=$(echo '{"hook_event_name":"SessionStart","session_id":"sUP3"}' | HOME="$TMP_HOME" ADAM_NO_UPDATE_CHECK=1 node "$NUDGE" 2>/dev/null)
if echo "$out" | grep -q "update available"; then
echo " FAIL: notifier ran despite opt-out (got: $out)"; FAIL=$((FAIL+1))
else
echo " PASS: ADAM_NO_UPDATE_CHECK suppressed the check"; PASS=$((PASS+1))
fi
rm -f "$ROOT/.version" "$ROOT/.update-check.json"
# --- Test 117: no .version marker → notifier no-op (no crash) ---
echo "Test 117: missing .version marker → notifier silent, hook still runs"
reset_state
node -e "require('fs').writeFileSync('$ROOT/.update-check.json', JSON.stringify({last_check: Date.now(), latest: 'v9.9.9'}))"
out=$(echo '{"hook_event_name":"SessionStart","session_id":"sUP4"}' | NUDGE_RUN 2>/dev/null)
if echo "$out" | grep -q "update available"; then
echo " FAIL: nudged without a .version marker (got: $out)"; FAIL=$((FAIL+1))
else
echo " PASS: no marker → no update nudge"; PASS=$((PASS+1))
fi
rm -f "$ROOT/.update-check.json"
# --- Test 118: rollback removes the proposal's ab-tracking entry (stops re-flagging) ---
echo "Test 118: rollback purges ab-tracking entry by proposal_id"
reset_state
rm -f "$ROOT/proposals/"*rollback* "$ROOT/active-nudges.json"
cat > "$ROOT/applied/2026-05-20T00-00-00Z-rb-ab-001.md" <<'EOF'
---
id: rb-ab-001
type: memory
target: ~/.claude/projects/-Users-nvm/memory/x.md
confidence: 5
blast_radius: low
status: applied
source_entries:
- "2026-05-18T10:00:00Z"
---
# Why
test
# Rollback
```bash
rm -f x
```
EOF
cat > "$ROOT/ab-tracking.jsonl" <<'EOF'
{"applied_at":1,"proposal_id":"rb-ab-001","proposal_type":"memory","target_skill":"x","proposal_fingerprint":"f1","originating_signals":[{"type":"correction","count":3}],"pre_window_days":7}
{"applied_at":2,"proposal_id":"keep-me-002","proposal_type":"memory","target_skill":"y","proposal_fingerprint":"f2","originating_signals":[{"type":"correction","count":3}],"pre_window_days":7}
EOF
ROLLBACK_RUN --proposal-id rb-ab-001 --home "$TMP_HOME/.claude" >/dev/null 2>&1 || true
if grep -q '"proposal_id":"rb-ab-001"' "$ROOT/ab-tracking.jsonl"; then
echo " FAIL: rolled-back proposal still in ab-tracking.jsonl"; FAIL=$((FAIL+1))
else
echo " PASS: rolled-back proposal removed from ab-tracking.jsonl"; PASS=$((PASS+1))
fi
if grep -q '"proposal_id":"keep-me-002"' "$ROOT/ab-tracking.jsonl"; then
echo " PASS: unrelated ab-tracking entry preserved"; PASS=$((PASS+1))
else
echo " FAIL: rollback clobbered an unrelated ab-tracking entry"; FAIL=$((FAIL+1))
fi
rm -f "$ROOT/proposals/"*rb-ab-001* "$ROOT/applied/"*rb-ab-001* "$ROOT/ab-tracking.jsonl" "$ROOT/active-nudges.json"
# --- Test 119: adam-skill-utility ranks friction-correlated skills below baseline ---
echo "Test 119: adam-skill-utility computes per-skill good:bad utility (execution-grounded Δ)"
reset_state
SU_INPUT="$TMP_HOME/su-input.jsonl"
{
for i in 1 2 3 4 5; do echo "{\"ts\":\"2026-05-20T0$i:00:00Z\",\"session\":\"sSU\",\"type\":\"task_completed\",\"active_skills\":[\"goodskill\"]}"; done
for i in 1 2 3 4 5; do echo "{\"ts\":\"2026-05-20T1$i:00:00Z\",\"session\":\"sSU\",\"type\":\"dead_end\",\"count\":8,\"active_skills\":[\"badskill\"]}"; done
} > "$SU_INPUT"
su_out=$(SKILLUTIL_RUN --input "$SU_INPUT" --json --min 3 2>/dev/null)
su_check=$(echo "$su_out" | node -e '
let buf=""; process.stdin.on("data",d=>buf+=d).on("end",()=>{
try {
const p=JSON.parse(buf);
const bad=p.skills.find(s=>s.skill==="badskill");
const good=p.skills.find(s=>s.skill==="goodskill");
const ok = bad && good && bad.lift<0 && good.lift>0 && p.skills[0].skill==="badskill" && bad.neg===5 && good.pos===5;
console.log(ok?"ok":"bad:"+JSON.stringify({bad,good,first:p.skills[0]&&p.skills[0].skill}));
} catch(e){ console.log("parse-error:"+e.message); }
});')
if [ "$su_check" = "ok" ]; then
echo " PASS: badskill below baseline + ranked worst-first, goodskill above"; PASS=$((PASS+1))
else
echo " FAIL: skill-utility ranking wrong ($su_check)"; FAIL=$((FAIL+1))
fi
rm -f "$SU_INPUT"
echo echo
echo "Results: $PASS passed, $FAIL failed" echo "Results: $PASS passed, $FAIL failed"
[ "$FAIL" = "0" ] [ "$FAIL" = "0" ]
+192 -19
View File
@@ -8,6 +8,71 @@ tools: Read, Write, Edit, Grep, Glob, Bash
You analyse Claude Code's own behaviour to propose targeted, surgical improvements. You operate offline (no LLM round-trips outside this run) and produce **files**, not actions. Main-thread Claude reviews and applies changes with the user. You analyse Claude Code's own behaviour to propose targeted, surgical improvements. You operate offline (no LLM round-trips outside this run) and produce **files**, not actions. Main-thread Claude reviews and applies changes with the user.
## Stage mode
The skill dispatches you in one of two stages (MOSS-inspired multi-stage pipeline — §3.3: "a single prompt asked to diagnose, plan, implement, verify, and decide overloads context and produces lower-quality output than a sequenced flow"):
- **`stage=diagnose`**: Read batched journal entries, cluster, diagnose root causes, plan fix types. Output diagnoses JSON to `/tmp/adam-diagnoses.json`. Do NOT draft proposals.
- **`stage=implement`**: Read approved diagnoses from `/tmp/adam-diagnoses.json`. Draft full proposal files to `proposals_dir/`. Emit the clustering trace and punch list.
If no `stage` is specified in the dispatch prompt, run **both stages sequentially** within a single pass (backward-compatible with pre-MOSS flow).
### Diagnose-stage output format
When `stage=diagnose`, write `/tmp/adam-diagnoses.json` containing:
```json
{
"diagnoses": [
{
"cluster_id": "c1",
"signal_type": "correction",
"cluster_key": "wrong|approach",
"count": 5,
"sessions": 3,
"diagnosis": {
"trigger": "...",
"action": "...",
"mismatch": "...",
"outcome": "... `verbatim quote` ..."
},
"plan": {
"type": "memory",
"target": "~/.claude/projects/-Users-nvm/memory/go-test-cache.md",
"scope": "add feedback memory about go test -count=1"
},
"keypoints": {
"tool_selection": 1,
"scope_discipline": 2,
"error_recovery": 0,
"first_attempt": 0,
"build_reliability": 1
},
"gates": {
"threshold": "pass",
"cross_session": "pass",
"window": "in:5/out:0",
"contradiction": "none"
},
"source_entries": ["2026-05-20T10:00:00Z", "2026-05-21T11:00:00Z"],
"context_evidence": ["... excerpts from context_window ..."]
}
],
"skipped": [
{"cluster_id": "c3", "signal_type": "retry_loop", "reason": "threshold", "count": 2}
],
"summary": "considered=4 diagnosed=2 skipped=2"
}
```
The skill validates diagnoses between stages (see SKILL.md §2 "Inter-stage validation").
## Context window evidence
Journal entries for struggle signals now carry a `context_window` field — an array of the last 8 events (user prompts, tool calls, responses) surrounding the friction point. This is the ADAM equivalent of MOSS's "original transcript captured by auto-scan at evidence time" (§3.4).
When drafting diagnoses, **prefer `context_window` evidence over transcript file lookups** when it is present. The `context_window` is already scoped to the friction point and more reliable than file-based transcript pulls. Fall back to `transcripts_root` only when `context_window` is absent (pre-upgrade entries).
## Karpathy constraints (mandatory) ## Karpathy constraints (mandatory)
You MUST obey these on every proposal: You MUST obey these on every proposal:
@@ -38,6 +103,9 @@ Per-signal windows (single source of truth: `SIGNAL_WINDOWS_DAYS` in `~/.claude/
| `build_loop` | 30 d | build/test failure patterns | | `build_loop` | 30 d | build/test failure patterns |
| `weak_agent` | 30 d | subagent quality signal | | `weak_agent` | 30 d | subagent quality signal |
| `subagent_dispatch_pattern` | 30 d | dispatch routing pattern | | `subagent_dispatch_pattern` | 30 d | dispatch routing pattern |
| `silent_drift` | 14 d | exploration-without-action is task-local |
| `file_reread` | 14 d | redundant same-file reads are task-local |
| `error_after_recovery` | 30 d | recovery-then-same-error patterns persist |
| `correction_free_streak` | 60 d | wins accumulate slowly | | `correction_free_streak` | 60 d | wins accumulate slowly |
| `clean_recovery` | 60 d | wins accumulate slowly | | `clean_recovery` | 60 d | wins accumulate slowly |
| `task_completed` | 60 d | recipe wins accumulate slowly | | `task_completed` | 60 d | recipe wins accumulate slowly |
@@ -59,6 +127,9 @@ The hook emits these `type` values into the journal:
| `edit_churn` | same file edited 4× in window | file basename | | `edit_churn` | same file edited 4× in window | file basename |
| `build_loop` | 2 build/test/compile commands fail in session | session | | `build_loop` | 2 build/test/compile commands fail in session | session |
| `subagent_dispatch_pattern` | same subagent dispatched ≥3× cumulatively | subagent_type | | `subagent_dispatch_pattern` | same subagent dispatched ≥3× cumulatively | subagent_type |
| `silent_drift` | 5 consecutive read-only PostToolUse without an action tool (reset on action or UserPromptSubmit) | `active_skills[0]` |
| `file_reread` | same file Read ≥3× in the 10-tool window, ignoring offset/limit (escapes `retry_loop`'s argsHash dedup) | file basename |
| `error_after_recovery` | same error fingerprint returns within 5 PostToolUse of a `clean_recovery` | (`recovered_from`, `original_fp`) |
| `correction_free_streak` | 5 clean UserPromptSubmits in a row (no correction phrase) | `active_skills[0]` | | `correction_free_streak` | 5 clean UserPromptSubmits in a row (no correction phrase) | `active_skills[0]` |
| `clean_recovery` | 3 clean PostToolUse events after a `tool_error_loop`/`dead_end`/`retry_loop` | (`recovered_from`, `active_skills[0]`) | | `clean_recovery` | 3 clean PostToolUse events after a `tool_error_loop`/`dead_end`/`retry_loop` | (`recovered_from`, `active_skills[0]`) |
| `task_completed` | UserPromptSubmit closes a run of ≥5 tool calls with ≥3 distinct tool kinds and 0 corrections | sorted `tool_kinds` tuple | | `task_completed` | UserPromptSubmit closes a run of ≥5 tool calls with ≥3 distinct tool kinds and 0 corrections | sorted `tool_kinds` tuple |
@@ -84,10 +155,18 @@ The hook emits these `type` values into the journal:
- `edit_churn`: cluster by file basename pattern (e.g. `*.test.ts`). - `edit_churn`: cluster by file basename pattern (e.g. `*.test.ts`).
- `build_loop`: cluster by `session`. - `build_loop`: cluster by `session`.
- `subagent_dispatch_pattern`: cluster by `subagent_type`. - `subagent_dispatch_pattern`: cluster by `subagent_type`.
- `silent_drift`: cluster by `active_skills[0]` (empty string when no skill is active).
- `file_reread`: cluster by file basename (same offset-agnostic same-file re-Read pattern).
- `error_after_recovery`: cluster by (`recovered_from`, `original_fp`).
- `correction_free_streak`: cluster by `active_skills[0]`. Treat ≥3 streaks across ≥2 sessions naming the same skill as cross-session evidence. - `correction_free_streak`: cluster by `active_skills[0]`. Treat ≥3 streaks across ≥2 sessions naming the same skill as cross-session evidence.
- `clean_recovery`: cluster by (`recovered_from`, `active_skills[0]`). A win cluster qualifies for `skill_edit` only when the named skill exists in `skills_root`. - `clean_recovery`: cluster by (`recovered_from`, `active_skills[0]`). A win cluster qualifies for `skill_edit` only when the named skill exists in `skills_root`.
- `task_completed`: cluster by sorted `tool_kinds` tuple (the multi-tool recipe). Single entry qualifies for `skill_new` proposal (drafting protocol applies). Cross-session evidence requires ≥2 entries from distinct sessions with same tuple — without it, proposal queues, never auto-applies. Run the existing skill-overlap rule before drafting: if the recipe matches an existing skill's name/description tokens, route to `skill_edit` instead. - `task_completed`: cluster by sorted `tool_kinds` tuple (the multi-tool recipe). Single entry qualifies for `skill_new` proposal (drafting protocol applies). Cross-session evidence requires ≥2 entries from distinct sessions with same tuple — without it, proposal queues, never auto-applies. Run the existing skill-overlap rule before drafting: if the recipe matches an existing skill's name/description tokens, route to `skill_edit` instead.
5. **Multi-axis correlation**: for each session that produced ≥2 distinct struggle types (`tool_error_loop`, `dead_end`, `weak_agent`, `retry_loop`, `edit_churn`, `build_loop`), tag clusters from that session as `multi_axis: true`. This grants +1 confidence at scoring. 5. **Multi-axis correlation**: for each session that produced ≥2 distinct struggle types (`tool_error_loop`, `dead_end`, `weak_agent`, `retry_loop`, `edit_churn`, `build_loop`, `silent_drift`, `file_reread`, `error_after_recovery`), tag clusters from that session as `multi_axis: true`. This grants +1 confidence at scoring.
5b. **Skill-attribution sub-clustering**: after primary clustering (step 4), for every struggle cluster (`tool_error_loop`, `dead_end`, `weak_agent`, `retry_loop`, `edit_churn`, `build_loop`, `silent_drift`, `file_reread`, `error_after_recovery`) that contains entries with non-empty `active_skills[0]`:
- Split into per-skill sub-clusters keyed on `active_skills[0]`. Entries with empty `active_skills` stay in the original cluster.
- If a sub-cluster has ≥3 entries AND names a skill that exists in `skills_root`, mark it as a candidate for `skill_edit` (struggle-driven variant; see "Struggle-driven `skill_edit` eligibility"). Otherwise treat the parent cluster normally.
- The umbrella cluster (cross-skill) still emits its usual proposal type (memory, etc.) — sub-clusters do NOT replace it, they supplement it.
6. For each cluster qualifying under the rubric — ≥3 occurrences across ≥2 sessions, OR (for struggle types) ≥1 entry within a single session, OR (for `correction`) ≥3 occurrences across ≥2 cwds: 6. For each cluster qualifying under the rubric — ≥3 occurrences across ≥2 sessions, OR (for struggle types) ≥1 entry within a single session, OR (for `correction`) ≥3 occurrences across ≥2 cwds:
a. If cluster topic matches a rejected idea via the rejected-ideas fuzzy set (≥2 token overlap with rejection's `# Why`), skip with reason `"rejected-similar"`. a. If cluster topic matches a rejected idea via the rejected-ideas fuzzy set (≥2 token overlap with rejection's `# Why`), skip with reason `"rejected-similar"`.
b. Pull ~20 messages of transcript context from `transcripts_root` to enrich. Never read full transcripts. b. Pull ~20 messages of transcript context from `transcripts_root` to enrich. Never read full transcripts.
@@ -171,10 +250,12 @@ Required structure:
```markdown ```markdown
--- ---
name: <human-readable name, ≤80 chars> name: <slug — snake_case, MUST equal the target filename without `.md`, e.g. feedback_go_test_cache>
description: <one-line description used to decide future relevance — be specific, ≤200 chars> description: "<one-line used to decide future relevance — be specific, ≤200 chars>"
type: user | feedback | project | reference metadata:
originSessionId: <session_id from journal entries that fed this cluster> node_type: memory
type: user | feedback | project | reference
originSessionId: <session_id from journal entries that fed this cluster>
--- ---
<Body content per type, see CLAUDE.md memory schema: <Body content per type, see CLAUDE.md memory schema:
@@ -184,12 +265,17 @@ originSessionId: <session_id from journal entries that fed this cluster>
- reference: pointer to external system + what's there.> - reference: pointer to external system + what's there.>
``` ```
The frontmatter MUST match the live auto-memory schema exactly: `name` is the
slug (NOT a prose title), and `node_type`, `type`, `originSessionId` live under
a `metadata:` block (verify against an existing file in the target memory dir
before drafting — match its shape).
Constraints: Constraints:
- Frontmatter fields `name`, `description`, `type` are **required**. Skill enforces this at apply time. - Top-level `name` + `description` and nested `metadata.node_type` (always `memory`) + `metadata.type` are **required**. Skill enforces this at apply time.
- `originSessionId` is required — must be a `session` value from one of the cluster's journal entries. - `metadata.originSessionId` is required — must be a `session` value from one of the cluster's journal entries.
- ≤50 LOC of body content. Surgical. - ≤50 LOC of body content. Surgical.
- Slug (used in `target` path filename) must not collide with any existing memory file. - `name`/slug (also the `target` path filename) must not collide with any existing memory file.
- For `type=feedback` and `type=project`, body MUST contain `**Why:**` and `**How to apply:**` lines (CLAUDE.md memory schema). - For `type: feedback` and `type: project`, body MUST contain `**Why:**` and `**How to apply:**` lines (CLAUDE.md memory schema).
## Diagnosis drafting protocol (required for every proposal) ## Diagnosis drafting protocol (required for every proposal)
@@ -254,6 +340,21 @@ A `skill_edit` proposal sets `auto_apply_eligible: true` ONLY when ALL hold:
If any of (3)(9) fails: still emit the proposal, but `auto_apply_eligible: false` — main thread queues for review. If any of (3)(9) fails: still emit the proposal, but `auto_apply_eligible: false` — main thread queues for review.
## Struggle-driven `skill_edit` eligibility
Skill-attribution sub-clustering (step 5b) produces struggle-driven `skill_edit` candidates: a sub-cluster of ≥3 struggle entries all naming the same `active_skills[0]` that exists in `skills_root`. These proposals are emitted but **ALWAYS queue**`auto_apply_eligible: false` regardless of confidence. Negative evidence on a skill is a weaker basis for self-modification than positive evidence (the skill may be active during friction caused by something else), so the human reviews every one.
A struggle-driven `skill_edit` proposal MUST:
1. Set `target` to the matched skill's `SKILL.md` path.
2. Cluster severity-sum ≥ 10 (same threshold as the +1 rubric bullet).
3. Sub-cluster names exactly one skill (no ambiguity across distinct `active_skills[0]` values).
4. `# Proposed change` is an append-only diff adding a `## When struggling` section (naive default body: a checkpoint-or-pause rule appropriate to the dominant signal — e.g. `dead_end` → "After 16 PostToolUse events without UserPromptSubmit, emit a one-line checkpoint summary before continuing.").
5. Frontmatter includes `struggle_evidence: "<ts of one source entry naming this skill>"` and `struggle_signals: [<list of signal types in the sub-cluster>]`. The win-driven `win_evidence` field is omitted.
6. Subject to the same Per-(skill, fingerprint) cooldown as win-driven `skill_edit`.
If gate (2) or (3) fails: skip the sub-cluster (the parent cluster still produces its umbrella proposal). The sub-cluster's `source_entries` overlap with the parent's — the apply pipeline handles dedup via the excluded-timestamps set.
## Per-(skill, fingerprint) cooldown ## Per-(skill, fingerprint) cooldown
The cooldown gate is keyed on **(target_skill, proposal_fingerprint)** — not on target_skill alone. A rejected/applied proposal for skill `X` with fingerprint `A` does NOT block future proposals for skill `X` with fingerprint `B`. The cooldown gate is keyed on **(target_skill, proposal_fingerprint)** — not on target_skill alone. A rejected/applied proposal for skill `X` with fingerprint `A` does NOT block future proposals for skill `X` with fingerprint `B`.
@@ -261,10 +362,18 @@ The cooldown gate is keyed on **(target_skill, proposal_fingerprint)** — not o
`proposal_fingerprint` is computed deterministically as `djb2(skill_slug + "\n" + signal_cluster_id + "\n" + normalized_diff_body)` returned as base36, where: `proposal_fingerprint` is computed deterministically as `djb2(skill_slug + "\n" + signal_cluster_id + "\n" + normalized_diff_body)` returned as base36, where:
- `skill_slug` — target skill basename (or proposed slug for `skill_new`) - `skill_slug` — target skill basename (or proposed slug for `skill_new`)
- `signal_cluster_id`the cluster id you assigned in the clustering trace (e.g. `c1`, `tool_error_loop-ECONNREFUSED:5432`) - `signal_cluster_id`a **stable** cluster id derived from signal type + key (e.g. `tool_error_loop-ECONNREFUSED:5432`), NOT the ephemeral per-run trace id (`c1`). Stability matters: the same logical proposal must hash identically across `/reflect` runs or the cooldown can never match a prior applied/rejected record.
- `normalized_diff_body` — proposal's `# Proposed change` section with all whitespace collapsed to single spaces and trailing newlines stripped - `normalized_diff_body` — proposal's `# Proposed change` section with all whitespace collapsed to single spaces and trimmed
Both apply-time and analyst-time checks invoke `adam-cooldown.mjs --skill <slug> --fingerprint <hash>`. The script returns one of `{"status":"cool"}`, `{"status":"cooldown",...}`, or `{"status":"blacklisted",...}`. Auto-apply requires `cool`. Do NOT hand-compute the hash (an LLM cannot reproduce djb2 reliably). Run the canonical implementation (`computeProposalFingerprint()` in `adam-cooldown.mjs`) via Bash, then write the result into frontmatter:
```bash
node ~/.claude/adam/scripts/adam-cooldown.mjs --compute \
--skill <slug> --cluster <signal_cluster_id> --diff-file <file-with-Proposed-change-body>
# → {"fingerprint":"<djb2_base36>"} (diff body may also be piped on stdin)
```
Both apply-time and analyst-time *gate* checks then invoke `adam-cooldown.mjs --skill <slug> --fingerprint <hash>`. The script returns one of `{"status":"cool"}`, `{"status":"cooldown",...}`, or `{"status":"blacklisted",...}`. Auto-apply requires `cool`.
Backward compat: proposals from before this rubric version (no `proposal_fingerprint` field) are treated as `fingerprint = "legacy"`. The cooldown script matches legacy applied/rejected records against any query fingerprint for the same skill — i.e. coarse-grained gating until those records age out of their windows (7d / 30d). Backward compat: proposals from before this rubric version (no `proposal_fingerprint` field) are treated as `fingerprint = "legacy"`. The cooldown script matches legacy applied/rejected records against any query fingerprint for the same skill — i.e. coarse-grained gating until those records age out of their windows (7d / 30d).
@@ -282,7 +391,7 @@ The skill (`adam-self-improvement/SKILL.md` §1) runs `adam-score.mjs` immediate
## A/B effectiveness ## A/B effectiveness
Every auto-applied edit (`skill_edit`, `skill_new`, `memory`, `nudge`, `reinforcement`) gets a one-line tracking entry written to `~/.claude/adam/ab-tracking.jsonl` by `adam-self-improvement/SKILL.md` immediately after the proposal is moved to `applied/`. Schema: Every auto-applied edit (`skill_edit`, `skill_new`, `memory`, `nudge`) gets a one-line tracking entry written to `~/.claude/adam/ab-tracking.jsonl` by `adam-self-improvement/SKILL.md` immediately after the proposal is moved to `applied/`. **`reinforcement` is the one exception — it is a positive-only ledger and is intentionally NOT A/B-tracked (see §"`reinforcement` proposals"), to avoid skewing regression detection.** Schema:
```json ```json
{"applied_at":<ms>,"proposal_id":"<id>","proposal_type":"...","target_skill":"<slug>","proposal_fingerprint":"<hash>","originating_signals":[{"type":"<signal>","count":<N>,"session_ids":[...]}],"pre_window_days":7} {"applied_at":<ms>,"proposal_id":"<id>","proposal_type":"...","target_skill":"<slug>","proposal_fingerprint":"<hash>","originating_signals":[{"type":"<signal>","count":<N>,"session_ids":[...]}],"pre_window_days":7}
@@ -299,17 +408,39 @@ After ≥7 days, `~/.claude/adam/scripts/adam-ab-measure.mjs` reads each entry a
The `/reflect` skill runs `adam-ab-measure.mjs --format json` before dispatching this agent, filters to `status == "regressed"`, and passes the list as `ab_regressions` (each object has `proposal_id`, `target_skill`, `proposal_type`, `delta_pct`, `pre_count`, `post_count`). The `/reflect` skill runs `adam-ab-measure.mjs --format json` before dispatching this agent, filters to `status == "regressed"`, and passes the list as `ab_regressions` (each object has `proposal_id`, `target_skill`, `proposal_type`, `delta_pct`, `pre_count`, `post_count`).
**When `ab_regressions` is non-empty, you MUST emit a `## Regressions` section at the TOP of your output (above the proposals listing).** One bullet per regressed proposal listing `proposal_id`, `target_skill`, `delta_pct`, plus the short suggestion `consider revert via /reflect --revert <proposal_id>` (the revert mechanism itself is out of scope for this release — the message stands as a hint). **When `ab_regressions` is non-empty, you MUST emit a `## Regressions` section at the TOP of your output (above the proposals listing).** One bullet per regressed proposal listing `proposal_id`, `target_skill`, `delta_pct`. The skill auto-rolls back regressed proposals via `adam-rollback.mjs` before dispatching you — this section is your record of what was rolled back and why.
The clustering trace summary (see §"Clustering trace") adds an extra `regressions=<N>` key alongside `considered/emitted/skipped`. When no `ab_regressions` arrive (or list is empty), emit `regressions=0`. The clustering trace summary (see §"Clustering trace") adds an extra `regressions=<N>` key alongside `considered/emitted/skipped`. When no `ab_regressions` arrive (or list is empty), emit `regressions=0`.
## Keypoint matrix (MOSS §3.3/§4.2)
When running in `stage=diagnose`, you MUST produce a **keypoint matrix** alongside each batch diagnosis. This structured evaluation replaces ad-hoc confidence with per-capability scoring.
Capability dimensions (score each 02 per batch: 0=no signal, 1=partial, 2=strong evidence):
| dimension | description | positive signals | negative signals |
|---|---|---|---|
| `tool_selection` | correct tool chosen first try | low `retry_loop` | high `retry_loop`, `weak_agent` |
| `scope_discipline` | stays within requested scope | low `edit_churn`, low `dead_end` | high `edit_churn`, `dead_end`, `silent_drift` |
| `error_recovery` | recovers from errors without user help | `clean_recovery` | `error_after_recovery`, `tool_error_loop` |
| `first_attempt` | succeeds without corrections | `correction_free_streak` | `correction` |
| `build_reliability` | builds/tests pass on first try | `task_completed` with build tools | `build_loop` |
The matrix goes into the diagnosis output as `keypoints: {tool_selection: N, scope_discipline: N, ...}`. The implement stage uses it to:
1. Prioritize proposals targeting the weakest dimensions.
2. Include `keypoint_target: "<dimension>"` in proposal frontmatter.
3. Track dimension trends across `/reflect` runs (persisted in `~/.claude/adam/keypoint-history.jsonl`).
## Confidence rubric (deterministic — do NOT vibe) ## Confidence rubric (deterministic — do NOT vibe)
Sum: Sum:
- Signal repeated ≥3× across ≥2 sessions: **+2** - Signal repeated ≥3× across ≥2 sessions: **+2**
- Struggle signal (`tool_error_loop`, `dead_end`, `weak_agent`, `retry_loop`, `edit_churn`, `build_loop`) appearing ≥1× within a single session: **+2** *(each struggle entry already represents a hook-side threshold crossing — e.g. 8 tools without a prompt, 3 same-args retries, 4 edits to one file. Treat each entry as one piece of evidence. Does not stack with the cross-session bonus.)* - Struggle signal (`tool_error_loop`, `dead_end`, `weak_agent`, `retry_loop`, `edit_churn`, `build_loop`, `silent_drift`, `file_reread`, `error_after_recovery`) appearing ≥1× within a single session: **+2** *(each struggle entry already represents a hook-side threshold crossing — e.g. 8 tools without a prompt, 3 same-args retries, 4 edits to one file, 5 read-only tools in a row, same-fp error after a recovery. Treat each entry as one piece of evidence. Does not stack with the cross-session bonus.)*
- Transcript contains positive endorsement (`yes`, `exactly`, `do that`, `keep doing`) within 2 messages of related action: **+2** - Transcript contains positive endorsement (`yes`, `exactly`, `do that`, `keep doing`) within 2 messages of related action: **+2**
- Multi-axis cluster (≥2 distinct struggle types in same session): **+1** - Multi-axis cluster (≥2 distinct struggle types in same session): **+1**
- Cluster severity-sum ≥ 10 (severity per entry = `max(1, floor(count / divisor))` using `SEVERITY_DIVISORS` from `adam-score.mjs``dead_end:8, edit_churn:4, tool_error_loop:3, retry_loop:3, file_reread:3, weak_agent:2, build_loop:1`; entries without `count` count as 1): **+1**
- Cluster severity-sum ≥ 32: **+1** *(additive — a severity-sum of 32 gets +1 from the previous bullet AND +1 here, total +2.)*
- Skill-attributed sub-cluster (≥3 entries naming the same `active_skills[0]` that exists in `skills_root`): **+1**
- Type-bias penalty from feedback loop (≥3 rejections, applied:rejected ratio <1:2 for this `type`): **-1** - Type-bias penalty from feedback loop (≥3 rejections, applied:rejected ratio <1:2 for this `type`): **-1**
- Diagnosis flags `Mismatch: unclear` (causation could not be reconstructed from transcript context): **-1** - Diagnosis flags `Mismatch: unclear` (causation could not be reconstructed from transcript context): **-1**
- Blast radius: low **+1**, medium **0**, high **-1** (default per type — see Proposal types table) - Blast radius: low **+1**, medium **0**, high **-1** (default per type — see Proposal types table)
@@ -328,13 +459,14 @@ Sum:
|---|---|---|---| |---|---|---|---|
| `memory` | `~/.claude/projects/-Users-nvm/memory/*.md` | low | yes if conf≥4 AND cross_session | | `memory` | `~/.claude/projects/-Users-nvm/memory/*.md` | low | yes if conf≥4 AND cross_session |
| `skill_new` | new dir under `~/.claude/skills/` | low | yes if conf≥4 AND cross_session | | `skill_new` | new dir under `~/.claude/skills/` | low | yes if conf≥4 AND cross_session |
| `skill_edit` | existing skill file | medium | yes if win-evidence + LOC + cooldown gates all pass (see "Win-driven skill_edit eligibility") | | `skill_edit` | existing skill file | medium | yes (win-driven only) if win-evidence + LOC + cooldown gates all pass (see "Win-driven skill_edit eligibility"); struggle-driven variant ALWAYS queues (see "Struggle-driven skill_edit eligibility") |
| `nudge` | append to `~/.claude/adam/active-nudges.json` | low | yes when `dead_end_count ≥ 3` in a single session (single-session evidence sufficient; skips cross-session gate). Does NOT modify skills/memories/CLAUDE.md — only seeds a SessionStart reminder for a future session. | | `nudge` | append to `~/.claude/adam/active-nudges.json` | low | yes when `dead_end_count ≥ 3` in a single session (single-session evidence sufficient; skips cross-session gate). Does NOT modify skills/memories/CLAUDE.md — only seeds a SessionStart reminder for a future session. |
| `reinforcement` | append entry to `~/.claude/adam/reinforcements.jsonl` | low | yes if conf≥4 AND blast_radius=low (same gate as memory). Applies via `adam-apply-reinforcement.mjs`; appends one JSONL entry, no code/memory/skill changes. | | `reinforcement` | append entry to `~/.claude/adam/reinforcements.jsonl` | low | yes if conf≥4 AND blast_radius=low (same gate as memory). Applies via `adam-apply-reinforcement.mjs`; appends one JSONL entry, no code/memory/skill changes. |
| `agent_new` | new file under `~/.claude/agents/` | medium | no | | `agent_new` | new file under `~/.claude/agents/` | medium | no |
| `agent_edit` | existing agent file | medium | no | | `agent_edit` | existing agent file | medium | no |
| `claude_md_edit` | `~/.claude/CLAUDE.md` | high | no | | `claude_md_edit` | `~/.claude/CLAUDE.md` | high | no |
| `hook_new` / `hook_edit` | `settings.json` hooks | high | no | | `hook_new` / `hook_edit` | `settings.json` hooks | high | no |
| `harness_edit` | adam's own scripts/agent/hooks (see "Harness self-modification") | high | **never** |
| `deletion` | any skill/agent (soft delete) | high | no | | `deletion` | any skill/agent (soft delete) | high | no |
### `nudge` proposals ### `nudge` proposals
@@ -363,6 +495,42 @@ A `reinforcement` proposal is logged when `adam-score.mjs` reports `count >= 3`
Note that `task_completed` alone — without an adjacent negative signal cluster — is NOT a proposal source. It is a urgency *modifier* (see "Scoring: task_completed dampener") and a reinforcement input only. Note that `task_completed` alone — without an adjacent negative signal cluster — is NOT a proposal source. It is a urgency *modifier* (see "Scoring: task_completed dampener") and a reinforcement input only.
### `harness_edit` proposals (MOSS §1 Table 1)
MOSS's core thesis: "routing, hook ordering, state invariants, and dispatch live in code rather than in any text artifact, an entire class of structural failure is physically unreachable from the text layer." This proposal type extends ADAM's evolution scope to its own harness.
**Allowed targets** (harness files that ADAM may propose edits to):
| target | what it controls |
|---|---|
| `~/.claude/adam/scripts/adam-observe.mjs` | signal detection regexes, thresholds, counters |
| `~/.claude/adam/scripts/adam-score.mjs` | severity divisors, dampener thresholds |
| `~/.claude/adam/scripts/adam-window.mjs` | per-signal sliding window durations |
| `~/.claude/adam/scripts/adam-batch.mjs` | evidence batching logic |
| `~/.claude/agents/adam.md` | this agent's own rubric, clustering, proposal rules |
| `~/.claude/hooks/adam-observe.mjs` | hook integration, event routing |
**Gates (all must hold — stricter than any other type):**
1. `confidence ≥ 5`
2. `cross_session_evidence == true` (≥5 occurrences across ≥3 sessions)
3. `auto_apply_eligible: false`**always**. Harness edits are never auto-applied.
4. `blast_radius: high`
5. Proposal includes a `# Test verification` section with the command `bash ~/.claude/adam/tests/run-tests.sh` and the expected result "140 passed, 0 failed" (or current pass count). The skill runs this test before applying.
6. Change is surgical: ≤30 LOC diff, single file.
7. `# Diagnosis` reconstructs the causal chain from harness-level behavior (not from text-artifact behavior). The mismatch must name a specific code path (function, regex, threshold) in the target file.
**When to propose `harness_edit`:**
- Signal detection misses a recurring friction pattern (false negative in adam-observe.mjs)
- A/B measurement shows systematic bias (e.g., windows too short/long in adam-window.mjs)
- Scoring thresholds produce consistently over/under-weighted proposals (adam-score.mjs)
- Batch clustering produces too-coarse or too-fine groupings (adam-batch.mjs)
**When NOT to propose `harness_edit`:**
- The fix is achievable via a text-mutable type (skill, memory, nudge)
- Evidence is from a single session only
- The change would affect test outcomes without clear improvement evidence
## Special handling ## Special handling
### CLAUDE.md edits ### CLAUDE.md edits
@@ -389,7 +557,7 @@ Filename: `proposals_dir/YYYY-MM-DD-NNN-<type>-<slug>.md` (NNN is daily counter
```markdown ```markdown
--- ---
id: YYYY-MM-DD-NNN id: YYYY-MM-DD-NNN
type: skill_new | memory | skill_edit | nudge | reinforcement | agent_new | agent_edit | claude_md_edit | hook_new | hook_edit | deletion type: skill_new | memory | skill_edit | nudge | reinforcement | agent_new | agent_edit | claude_md_edit | hook_new | hook_edit | harness_edit | deletion
target: <absolute path — for skill_new, the will-be path: ~/.claude/skills/<slug>/SKILL.md> target: <absolute path — for skill_new, the will-be path: ~/.claude/skills/<slug>/SKILL.md>
confidence: <int> confidence: <int>
blast_radius: low | medium | high blast_radius: low | medium | high
@@ -402,7 +570,7 @@ source_entries:
- "<another ts>" - "<another ts>"
- "..." - "..."
# skill_edit / skill_new — required for cooldown gate (see "Per-(skill, fingerprint) cooldown" below) # skill_edit / skill_new — required for cooldown gate (see "Per-(skill, fingerprint) cooldown" below)
proposal_fingerprint: "<djb2_base36 hash — computed via computeProposalFingerprint() in adam-cooldown.mjs>" proposal_fingerprint: "<djb2_base36 hash — compute via `adam-cooldown.mjs --compute`; see §Per-(skill, fingerprint) cooldown>"
target_skill: "<slug — populated for skill_edit (basename of target dir) and skill_new (proposed slug)>" target_skill: "<slug — populated for skill_edit (basename of target dir) and skill_new (proposed slug)>"
# A/B effectiveness — required on every proposal; consumed at apply time to seed ab-tracking.jsonl # A/B effectiveness — required on every proposal; consumed at apply time to seed ab-tracking.jsonl
originating_signals: originating_signals:
@@ -415,6 +583,11 @@ bytes_after: <int>
contradiction_flag: "<one-line summary or null>" contradiction_flag: "<one-line summary or null>"
# optional — auto-populated from Diagnosis Mismatch line # optional — auto-populated from Diagnosis Mismatch line
diagnosis_summary: "<≤120 chars, single sentence>" diagnosis_summary: "<≤120 chars, single sentence>"
# keypoint matrix — which capability dimension this proposal targets (MOSS §4.2)
keypoint_target: "<tool_selection | scope_discipline | error_recovery | first_attempt | build_reliability>"
# harness_edit only — test command and expected output
test_command: "bash ~/.claude/adam/tests/run-tests.sh"
test_expected: "<N> passed, 0 failed"
--- ---
# Why # Why
@@ -453,7 +626,7 @@ Print a single JSON line to stdout:
## What you must NOT do ## What you must NOT do
- Do not call other agents. - Do not call other agents.
- Do not write to `~/.claude/skills/`, `~/.claude/agents/`, `settings.json`, `CLAUDE.md`, or any existing skill/agent file directly. All changes go through proposal files for main-thread review and apply. - Do not write to `~/.claude/skills/`, `~/.claude/agents/`, `settings.json`, `CLAUDE.md`, adam scripts, or any existing skill/agent/harness file directly. All changes go through proposal files for main-thread review and apply. This includes `harness_edit` proposals — you draft the diff, the skill applies it after test verification.
- Do not delete files. Deletion proposals describe a soft-move; the main thread executes it. - Do not delete files. Deletion proposals describe a soft-move; the main thread executes it.
- Do not write outside `proposals_dir/` and `state_path`. - Do not write outside `proposals_dir/` and `state_path`.
- Do not invent trigger phrases for `skill_new` — every trigger must come from observed user input. - Do not invent trigger phrases for `skill_new` — every trigger must come from observed user input.
+13
View File
@@ -0,0 +1,13 @@
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 160 160" width="160" height="160" role="img" aria-label="claude-adam logo">
<title>claude-adam</title>
<desc>A swaddled baby — rounded A-shape bundle with a face inside and small hands extending from the wrap-band. Dark-background variant.</desc>
<g stroke="#f0f6fc">
<path d="M 36 134 Q 30 78 80 28 Q 130 78 124 134 Z" fill="none" stroke-width="9" stroke-linejoin="round"/>
<path d="M 16 100 L 44 100 Q 80 115 116 100 L 144 100" fill="none" stroke-width="6" stroke-linecap="round"/>
<path d="M 75 78 Q 80 82 85 78" fill="none" stroke-width="2.5" stroke-linecap="round"/>
</g>
<g fill="#f0f6fc">
<circle cx="72" cy="64" r="3.2"/>
<circle cx="88" cy="64" r="3.2"/>
</g>
</svg>

After

Width:  |  Height:  |  Size: 763 B

+13
View File
@@ -0,0 +1,13 @@
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 160 160" width="160" height="160" role="img" aria-label="claude-adam logo">
<title>claude-adam</title>
<desc>A swaddled baby — rounded A-shape bundle with a face inside and small hands extending from the wrap-band. Light-background variant.</desc>
<g stroke="#24292f">
<path d="M 36 134 Q 30 78 80 28 Q 130 78 124 134 Z" fill="none" stroke-width="9" stroke-linejoin="round"/>
<path d="M 16 100 L 44 100 Q 80 115 116 100 L 144 100" fill="none" stroke-width="6" stroke-linecap="round"/>
<path d="M 75 78 Q 80 82 85 78" fill="none" stroke-width="2.5" stroke-linecap="round"/>
</g>
<g fill="#24292f">
<circle cx="72" cy="64" r="3.2"/>
<circle cx="88" cy="64" r="3.2"/>
</g>
</svg>

After

Width:  |  Height:  |  Size: 764 B

+19
View File
@@ -0,0 +1,19 @@
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 160 160" width="160" height="160" role="img" aria-label="claude-adam logo">
<title>claude-adam</title>
<desc>A swaddled baby — rounded A-shape bundle with a face inside and small hands extending from the wrap-band. Adapts to light/dark via embedded media query + currentColor fallback.</desc>
<style>
svg { color: #24292f; }
@media (prefers-color-scheme: dark) {
svg { color: #f0f6fc; }
}
</style>
<g stroke="currentColor">
<path d="M 36 134 Q 30 78 80 28 Q 130 78 124 134 Z" fill="none" stroke-width="9" stroke-linejoin="round"/>
<path d="M 16 100 L 44 100 Q 80 115 116 100 L 144 100" fill="none" stroke-width="6" stroke-linecap="round"/>
<path d="M 75 78 Q 80 82 85 78" fill="none" stroke-width="2.5" stroke-linecap="round"/>
</g>
<g fill="currentColor">
<circle cx="72" cy="64" r="3.2"/>
<circle cx="88" cy="64" r="3.2"/>
</g>
</svg>

After

Width:  |  Height:  |  Size: 946 B

+89 -4
View File
@@ -1,9 +1,17 @@
#!/usr/bin/env node #!/usr/bin/env node
// adam-nudge.mjs — SessionStart hook. Prints two kinds of reminders: // adam-nudge.mjs — SessionStart hook. Prints reminders:
// 1. Pending proposals (≥3 queued in adam/proposals/). // 1. Pending proposals (≥3 queued in adam/proposals/).
// 2. Cross-session nudges (entries in adam/active-nudges.json whose // 2. Cross-session nudges (entries in adam/active-nudges.json whose
// source_session differs from the current session and that haven't // source_session differs from the current session and that haven't
// expired or exhausted their max_displays). // expired or exhausted their max_displays).
// 3. Pending local-edit upgrades (`.adam-new` sidecars).
// 4. New-release notice: if a newer GitHub release exists than the installed
// `.version`, print a notify-only one-line update prompt. Cached + checked
// at most once/day, network call hard-capped at 1.5s, fully best-effort —
// never blocks SessionStart. Opt out with ADAM_NO_UPDATE_CHECK=1.
// NOTE: notify-only by design — applying an update re-runs install.sh,
// which resets ADAM's own /reflect-applied skill edits. The user chooses
// when to accept that, so we never auto-install.
import { readdirSync, readFileSync, writeFileSync, existsSync } from "node:fs"; import { readdirSync, readFileSync, writeFileSync, existsSync } from "node:fs";
import { join } from "node:path"; import { join } from "node:path";
import { homedir } from "node:os"; import { homedir } from "node:os";
@@ -14,7 +22,13 @@ const ADAM_ROOT = join(CLAUDE_ROOT, "adam");
const PROPOSALS = join(ADAM_ROOT, "proposals"); const PROPOSALS = join(ADAM_ROOT, "proposals");
const NUDGES_FILE = join(ADAM_ROOT, "active-nudges.json"); const NUDGES_FILE = join(ADAM_ROOT, "active-nudges.json");
const STATE_FILE = join(ADAM_ROOT, "state.json"); const STATE_FILE = join(ADAM_ROOT, "state.json");
const VERSION_FILE = join(ADAM_ROOT, ".version");
const UPDATE_CHECK_FILE = join(ADAM_ROOT, ".update-check.json");
const THRESHOLD = 3; const THRESHOLD = 3;
const UPDATE_CHECK_INTERVAL_MS = 24 * 60 * 60 * 1000;
const UPDATE_FETCH_TIMEOUT_MS = 1500;
const RELEASES_API = "https://api.github.com/repos/lukaszraczylo/claude-adam/releases/latest";
const INSTALL_ONELINER = "curl -fsSL https://raw.githubusercontent.com/lukaszraczylo/claude-adam/main/install.sh | bash";
// Known installable paths (mirrors install.sh copy_file list). Checking a // Known installable paths (mirrors install.sh copy_file list). Checking a
// fixed shortlist keeps SessionStart latency under control vs full FS walk. // fixed shortlist keeps SessionStart latency under control vs full FS walk.
@@ -33,6 +47,9 @@ const PENDING_CHECK_PATHS = [
"adam/scripts/adam-score.mjs", "adam/scripts/adam-score.mjs",
"adam/scripts/adam-ab-measure.mjs", "adam/scripts/adam-ab-measure.mjs",
"adam/scripts/adam-apply-reinforcement.mjs", "adam/scripts/adam-apply-reinforcement.mjs",
"adam/scripts/adam-utils.mjs",
"adam/scripts/adam-batch.mjs",
"adam/scripts/adam-rollback.mjs",
"adam/tests/run-tests.sh", "adam/tests/run-tests.sh",
]; ];
@@ -115,7 +132,75 @@ function emitPendingUpgrades() {
} catch { /* never break SessionStart */ } } catch { /* never break SessionStart */ }
} }
function main() { // --- update notifier (notify-only; see header note) ---
function readVersion() {
try { return readFileSync(VERSION_FILE, "utf8").trim() || null; } catch { return null; }
}
// Parse "vX.Y.Z" (leading v optional; pre-release/build suffix ignored).
function parseSemver(s) {
if (typeof s !== "string") return null;
const m = s.trim().replace(/^v/i, "").match(/^(\d+)\.(\d+)\.(\d+)/);
return m ? [Number(m[1]), Number(m[2]), Number(m[3])] : null;
}
// isNewer(a, b): true iff version a is strictly newer than b. Unparseable → false.
function isNewer(a, b) {
const pa = parseSemver(a), pb = parseSemver(b);
if (!pa || !pb) return false;
for (let i = 0; i < 3; i++) { if (pa[i] !== pb[i]) return pa[i] > pb[i]; }
return false;
}
async function fetchLatestTag() {
// Best-effort, hard-capped. Any failure (offline / timeout / rate-limit /
// parse / fetch-unavailable) returns null and the caller silently skips.
try {
if (typeof fetch !== "function") return null;
const ctrl = new AbortController();
const timer = setTimeout(() => ctrl.abort(), UPDATE_FETCH_TIMEOUT_MS);
let tag = null;
try {
const res = await fetch(RELEASES_API, {
signal: ctrl.signal,
headers: { "User-Agent": "claude-adam-nudge", "Accept": "application/vnd.github+json" },
});
if (res && res.ok) {
const j = await res.json();
if (j && typeof j.tag_name === "string") tag = j.tag_name;
}
} finally { clearTimeout(timer); }
return tag;
} catch { return null; }
}
function printUpdateNudge(latest, installed) {
process.stdout.write(
`[adam] update available: ${installed}${latest}. Apply: ${INSTALL_ONELINER}\n` +
` (re-runs install.sh — resets ADAM's own /reflect-applied skill edits; apply when you're ready)\n`
);
}
async function emitUpdateCheck() {
if (process.env.ADAM_NO_UPDATE_CHECK) return; // explicit opt-out
const installed = readVersion();
if (!installed) return; // no marker → nothing to compare
const cache = readJson(UPDATE_CHECK_FILE, {}) || {};
const now = Date.now();
let nudged = false;
// Instant nudge from cache (no network).
if (cache.latest && isNewer(cache.latest, installed)) { printUpdateNudge(cache.latest, installed); nudged = true; }
// Refresh cache at most once/day, best-effort — drives the nudge on the NEXT run.
if (!cache.last_check || (now - Number(cache.last_check)) > UPDATE_CHECK_INTERVAL_MS) {
const latest = await fetchLatestTag();
const next = { last_check: now, latest: latest || cache.latest || null };
try { writeFileSync(UPDATE_CHECK_FILE, JSON.stringify(next)); } catch { /* swallow */ }
if (latest && !nudged && isNewer(latest, installed)) printUpdateNudge(latest, installed);
}
}
async function main() {
const stdinSession = readSessionInput(); const stdinSession = readSessionInput();
const stateSession = (() => { const stateSession = (() => {
const st = readJson(STATE_FILE, null); const st = readJson(STATE_FILE, null);
@@ -125,7 +210,7 @@ function main() {
emitProposalReminder(); emitProposalReminder();
emitActiveNudges(currentSession); emitActiveNudges(currentSession);
emitPendingUpgrades(); emitPendingUpgrades();
await emitUpdateCheck();
} }
try { main(); } catch { /* never block SessionStart */ } main().catch(() => { /* never block SessionStart */ }).finally(() => process.exit(0));
process.exit(0);
+117 -3
View File
@@ -87,6 +87,12 @@ function normalizeErrorText(text) {
const ERROR_RE = /\b(error|failed|exception|traceback|denied|cannot|unable to|not found|undefined|nullpointer|typeerror|syntaxerror|panic|fatal|enoent|econnrefused|etimedout|eaccess|segfault|crashed|uncaught)\b/i; const ERROR_RE = /\b(error|failed|exception|traceback|denied|cannot|unable to|not found|undefined|nullpointer|typeerror|syntaxerror|panic|fatal|enoent|econnrefused|etimedout|eaccess|segfault|crashed|uncaught)\b/i;
const BUILD_RE = /\b(build|compile|make|gradle|cargo|tsc|webpack|vite|rollup|pytest|jest|mocha|vitest|go\s+test|npm\s+test|yarn\s+test|npm\s+run\s+build|yarn\s+build|ctest|ninja|bazel)\b/i; const BUILD_RE = /\b(build|compile|make|gradle|cargo|tsc|webpack|vite|rollup|pytest|jest|mocha|vitest|go\s+test|npm\s+test|yarn\s+test|npm\s+run\s+build|yarn\s+build|ctest|ninja|bazel)\b/i;
const EDIT_TOOLS = new Set(["Edit", "Write", "MultiEdit", "NotebookEdit"]); const EDIT_TOOLS = new Set(["Edit", "Write", "MultiEdit", "NotebookEdit"]);
const READ_ONLY_TOOLS = new Set([
"Read", "Grep", "Glob", "ToolSearch", "WebFetch", "WebSearch",
"mcp__filepuff__file_read", "mcp__filepuff__file_search",
"mcp__filepuff__find_definition", "mcp__filepuff__find_references",
"mcp__filepuff__ast_query", "mcp__filepuff__symbol_at", "mcp__filepuff__ping",
]);
const WINDOW_SIZE = 10; const WINDOW_SIZE = 10;
const RETRY_THRESHOLD = 3; const RETRY_THRESHOLD = 3;
const AGENT_RESPAWN_THRESHOLD = 2; const AGENT_RESPAWN_THRESHOLD = 2;
@@ -98,10 +104,20 @@ const BUILD_LOOP_THRESHOLD = 2;
const SUBAGENT_DISPATCH_THRESHOLD = 3; const SUBAGENT_DISPATCH_THRESHOLD = 3;
const CORRECTION_FREE_THRESHOLD = 5; const CORRECTION_FREE_THRESHOLD = 5;
const CLEAN_RECOVERY_WINDOW = 3; const CLEAN_RECOVERY_WINDOW = 3;
const STRUGGLE_TYPES = new Set(["tool_error_loop", "dead_end", "retry_loop"]); const SILENT_DRIFT_THRESHOLD = 5;
const FILE_REREAD_THRESHOLD = 3;
const ERROR_AFTER_RECOVERY_WINDOW = 5;
const RECENT_RECOVERIES_MAX = 3;
const STRUGGLE_TYPES = new Set([
"tool_error_loop", "dead_end", "retry_loop", "weak_agent",
"edit_churn", "build_loop", "silent_drift", "error_after_recovery",
"file_reread",
]);
const ACTIVE_SKILLS_LOOKBACK = 10; const ACTIVE_SKILLS_LOOKBACK = 10;
const TASK_TOOL_MIN = 5; const TASK_TOOL_MIN = 5;
const TASK_DIVERSITY_MIN = 3; const TASK_DIVERSITY_MIN = 3;
const CONTEXT_RING_SIZE = 8;
const CONTEXT_EXCERPT_LEN = 200;
const STATE_MAX_BYTES = 1_000_000; const STATE_MAX_BYTES = 1_000_000;
function safeRead(path, fallback) { function safeRead(path, fallback) {
@@ -217,6 +233,20 @@ function activeNames(state, kind) {
return [...seen]; return [...seen];
} }
function excerpt(text, len) {
if (!text || typeof text !== "string") return null;
return text.length > len ? text.slice(0, len) + "…" : text;
}
function pushContext(state, entry) {
state.context_ring.push(entry);
if (state.context_ring.length > CONTEXT_RING_SIZE) state.context_ring.shift();
}
function snapshotContext(state) {
return state.context_ring.length ? state.context_ring.slice() : undefined;
}
function errorFingerprint(toolResponse) { function errorFingerprint(toolResponse) {
if (!toolResponse) return null; if (!toolResponse) return null;
let text = ""; let text = "";
@@ -268,6 +298,8 @@ function resetFrictionCounters(state) {
state.edit_churn_emitted = {}; state.edit_churn_emitted = {};
state.build_failure_count = 0; state.build_failure_count = 0;
state.build_loop_emitted = false; state.build_loop_emitted = false;
state.silentDriftCounter = 0;
state.silentDriftEmitted = false;
} }
function resetSessionLocal(state) { function resetSessionLocal(state) {
@@ -276,7 +308,10 @@ function resetSessionLocal(state) {
state.subagent_dispatch_emitted = {}; state.subagent_dispatch_emitted = {};
state.correctionFreeCounter = 0; state.correctionFreeCounter = 0;
state.recoveryWatch = null; state.recoveryWatch = null;
state.recentRecoveries = [];
state.session_post_count = 0;
state.tool_window = []; state.tool_window = [];
state.context_ring = [];
state.task_tool_kinds = {}; state.task_tool_kinds = {};
state.task_tool_count = 0; state.task_tool_count = 0;
state.task_corrections = 0; state.task_corrections = 0;
@@ -299,6 +334,11 @@ function ensureStateDefaults(state) {
if (!state.task_tool_kinds || typeof state.task_tool_kinds !== "object") state.task_tool_kinds = {}; if (!state.task_tool_kinds || typeof state.task_tool_kinds !== "object") state.task_tool_kinds = {};
if (typeof state.task_tool_count !== "number") state.task_tool_count = 0; if (typeof state.task_tool_count !== "number") state.task_tool_count = 0;
if (typeof state.task_corrections !== "number") state.task_corrections = 0; if (typeof state.task_corrections !== "number") state.task_corrections = 0;
if (typeof state.silentDriftCounter !== "number") state.silentDriftCounter = 0;
if (typeof state.silentDriftEmitted !== "boolean") state.silentDriftEmitted = false;
if (!Array.isArray(state.recentRecoveries)) state.recentRecoveries = [];
if (typeof state.session_post_count !== "number") state.session_post_count = 0;
if (!Array.isArray(state.context_ring)) state.context_ring = [];
} }
function main() { function main() {
@@ -323,6 +363,7 @@ function main() {
if (event === "UserPromptSubmit") { if (event === "UserPromptSubmit") {
const prompt = (input.prompt || "").slice(0, 200); const prompt = (input.prompt || "").slice(0, 200);
pushContext(state, { event: "user", prompt: excerpt(prompt, CONTEXT_EXCERPT_LEN), ts });
if (isCorrection(prompt)) { if (isCorrection(prompt)) {
const last = state.tool_window[state.tool_window.length - 1] || {}; const last = state.tool_window[state.tool_window.length - 1] || {};
appendJournal({ appendJournal({
@@ -389,9 +430,31 @@ function main() {
const argsHash = djb2(JSON.stringify(input.tool_input || {})); const argsHash = djb2(JSON.stringify(input.tool_input || {}));
const file = (input.tool_input && (input.tool_input.file_path || input.tool_input.path)) || null; const file = (input.tool_input && (input.tool_input.file_path || input.tool_input.path)) || null;
const toolResponse = input.tool_response;
const respExcerpt = (() => {
if (!toolResponse) return null;
const text = typeof toolResponse === "string" ? toolResponse
: typeof toolResponse.content === "string" ? toolResponse.content
: null;
return excerpt(text, CONTEXT_EXCERPT_LEN);
})();
pushContext(state, {
event: "tool", tool, ts,
input_excerpt: excerpt(JSON.stringify(input.tool_input || {}), CONTEXT_EXCERPT_LEN),
response_excerpt: respExcerpt,
is_error: !!(toolResponse && toolResponse.is_error),
});
let struggleEmittedThisTurn = null; let struggleEmittedThisTurn = null;
const emit = (entry) => { const emit = (entry) => {
if (STRUGGLE_TYPES.has(entry.type)) struggleEmittedThisTurn = entry.type; if (STRUGGLE_TYPES.has(entry.type)) {
entry.context_window = snapshotContext(state);
// Struggle signals carry the active skill set so the analyst can run
// skill-attribution sub-clustering (agents/adam.md §5b) and so silent_drift
// — whose primary cluster key IS active_skills[0] — clusters correctly.
if (entry.active_skills === undefined) entry.active_skills = activeNames(state, "skill");
struggleEmittedThisTurn = entry.type;
}
appendJournal(entry); appendJournal(entry);
}; };
@@ -402,12 +465,34 @@ function main() {
} }
state.tool_window.push(windowEntry); state.tool_window.push(windowEntry);
if (state.tool_window.length > WINDOW_SIZE) state.tool_window.shift(); if (state.tool_window.length > WINDOW_SIZE) state.tool_window.shift();
state.session_post_count += 1;
const sameToolArgs = state.tool_window.filter(e => e.tool === tool && e.argsHash === argsHash).length; const sameToolArgs = state.tool_window.filter(e => e.tool === tool && e.argsHash === argsHash).length;
if (sameToolArgs >= RETRY_THRESHOLD) { if (sameToolArgs >= RETRY_THRESHOLD) {
emit({ ts, session, cwd, type: "retry_loop", tool, count: sameToolArgs }); emit({ ts, session, cwd, type: "retry_loop", tool, count: sameToolArgs });
} }
// Offset-aware same-file reread: consecutive Reads of the same file_path
// (ignoring offset/limit) escape the argsHash-based retry_loop dedup above.
// Emit a distinct, actionable signal instead of leaking into tool_error_loop.
if (READ_ONLY_TOOLS.has(tool) && file) {
const sameFileReads = state.tool_window.filter(e => e.tool === tool && e.file === file).length;
if (sameFileReads >= FILE_REREAD_THRESHOLD && sameToolArgs < RETRY_THRESHOLD) {
emit({ ts, session, cwd, type: "file_reread", tool, file, count: sameFileReads });
}
}
if (READ_ONLY_TOOLS.has(tool)) {
state.silentDriftCounter += 1;
if (state.silentDriftCounter >= SILENT_DRIFT_THRESHOLD && !state.silentDriftEmitted) {
emit({ ts, session, cwd, type: "silent_drift", read_count: state.silentDriftCounter, last_tool: tool });
state.silentDriftEmitted = true;
}
} else {
state.silentDriftCounter = 0;
state.silentDriftEmitted = false;
}
if (tool === "Agent") { if (tool === "Agent") {
const subagent = (input.tool_input && (input.tool_input.subagent_type || input.tool_input.agent)) || "unknown"; const subagent = (input.tool_input && (input.tool_input.subagent_type || input.tool_input.agent)) || "unknown";
const recent = state.tool_window.slice(-5).filter(e => e.tool === "Agent" && e.subagent === subagent).length; const recent = state.tool_window.slice(-5).filter(e => e.tool === "Agent" && e.subagent === subagent).length;
@@ -423,6 +508,23 @@ function main() {
const fp = errorFingerprint(input.tool_response); const fp = errorFingerprint(input.tool_response);
if (fp) { if (fp) {
bumpUsage("payload:tool_response_error_seen"); bumpUsage("payload:tool_response_error_seen");
if (state.recentRecoveries.length) {
const keep = [];
for (const rec of state.recentRecoveries) {
const tools_since = state.session_post_count - rec.emitted_at_count;
if (tools_since > ERROR_AFTER_RECOVERY_WINDOW) continue;
if (Array.isArray(rec.fps) && rec.fps.includes(fp)) {
emit({
ts, session, cwd, type: "error_after_recovery",
recovered_from: rec.recovered_from, original_fp: fp,
tools_since_recovery: tools_since,
});
continue;
}
keep.push(rec);
}
state.recentRecoveries = keep;
}
state.last_errors.push({ tool, fp }); state.last_errors.push({ tool, fp });
if (state.last_errors.length > ERROR_RING_SIZE) state.last_errors.shift(); if (state.last_errors.length > ERROR_RING_SIZE) state.last_errors.shift();
const sameError = state.last_errors.filter(e => e.fp === fp).length; const sameError = state.last_errors.filter(e => e.fp === fp).length;
@@ -468,7 +570,13 @@ function main() {
state.task_tool_kinds[tool] = (state.task_tool_kinds[tool] || 0) + 1; state.task_tool_kinds[tool] = (state.task_tool_kinds[tool] || 0) + 1;
if (struggleEmittedThisTurn) { if (struggleEmittedThisTurn) {
state.recoveryWatch = { recovered_from: struggleEmittedThisTurn, since_ts: ts, clean_count: 0, window_tools: [] }; state.recoveryWatch = {
recovered_from: struggleEmittedThisTurn,
since_ts: ts,
clean_count: 0,
window_tools: [],
watched_fps: state.last_errors.map(e => e.fp),
};
} else if (state.recoveryWatch) { } else if (state.recoveryWatch) {
const turnHadError = fp !== null; const turnHadError = fp !== null;
if (turnHadError) { if (turnHadError) {
@@ -485,6 +593,12 @@ function main() {
active_skills: activeNames(state, "skill"), active_skills: activeNames(state, "skill"),
active_agents: activeNames(state, "agent"), active_agents: activeNames(state, "agent"),
}); });
state.recentRecoveries.push({
recovered_from: state.recoveryWatch.recovered_from,
fps: state.recoveryWatch.watched_fps || [],
emitted_at_count: state.session_post_count,
});
if (state.recentRecoveries.length > RECENT_RECOVERIES_MAX) state.recentRecoveries.shift();
state.recoveryWatch = null; state.recoveryWatch = null;
} }
} }
+16 -1
View File
@@ -126,7 +126,8 @@ copy_file "$SRC/adam/scripts/adam-archive.mjs" "$DEST/adam
copy_file "$SRC/adam/scripts/adam-upgrade.mjs" "$DEST/adam/scripts/adam-upgrade.mjs" copy_file "$SRC/adam/scripts/adam-upgrade.mjs" "$DEST/adam/scripts/adam-upgrade.mjs"
# v0.3.3 helper scripts — invoked from SKILL.md / hooks / analyst flow # v0.3.3 helper scripts — invoked from SKILL.md / hooks / analyst flow
for _adam_script in adam-utils adam-window adam-explain adam-nudge-eligibility adam-cooldown \ for _adam_script in adam-utils adam-window adam-explain adam-nudge-eligibility adam-cooldown \
adam-score adam-ab-measure adam-apply-reinforcement; do adam-score adam-ab-measure adam-apply-reinforcement adam-batch adam-rollback \
adam-skill-utility; do
copy_file "$SRC/adam/scripts/${_adam_script}.mjs" \ copy_file "$SRC/adam/scripts/${_adam_script}.mjs" \
"$DEST/adam/scripts/${_adam_script}.mjs" "$DEST/adam/scripts/${_adam_script}.mjs"
run "chmod +x \"$DEST/adam/scripts/${_adam_script}.mjs\"" run "chmod +x \"$DEST/adam/scripts/${_adam_script}.mjs\""
@@ -143,6 +144,20 @@ copy_file "$SRC/adam/tests/fixtures/seed-corrections.jsonl" "$DEST/adam
# install marker — used by future runs to detect local mtime drift # install marker — used by future runs to detect local mtime drift
run "touch \"$DEST/adam/.install-marker\"" run "touch \"$DEST/adam/.install-marker\""
# version marker — records the installed release tag for the update notifier
# (adam-nudge.mjs compares it against the latest GitHub release).
ADAM_VERSION=""
if [ -n "$VERSION" ]; then
ADAM_VERSION="$VERSION"
elif [ "$PIPED" = 1 ] && [ -n "${REF:-}" ]; then
ADAM_VERSION="$REF"
else
ADAM_VERSION="$(git -C "$SRC" describe --tags --abbrev=0 2>/dev/null || true)"
fi
[ -z "$ADAM_VERSION" ] && ADAM_VERSION="unknown"
run "printf '%s\\n' \"$ADAM_VERSION\" > \"$DEST/adam/.version\""
log " version marker: $ADAM_VERSION"
# --------------------------------------------------------------------- settings.json # --------------------------------------------------------------------- settings.json
SETTINGS="$DEST/settings.json" SETTINGS="$DEST/settings.json"
EXAMPLE="$SRC/settings.json.example" EXAMPLE="$SRC/settings.json.example"
+116 -17
View File
@@ -65,32 +65,94 @@ Filter to `status == "regressed"` before passing to the analyst as
effectiveness") to surface a `## Regressions` section at the top of its output effectiveness") to surface a `## Regressions` section at the top of its output
when this list is non-empty. If the script fails: log stderr, pass `[]`. when this list is non-empty. If the script fails: log stderr, pass `[]`.
### 2. Dispatch the analyst **Auto-rollback** (MOSS §3.5): if any entries have `status == "regressed"`, run the rollback script to auto-revert them before analyst dispatch:
Use the Agent tool with `subagent_type: "adam"` and prompt: ```bash
node ~/.claude/adam/scripts/adam-rollback.mjs --auto --home ~/.claude > /tmp/adam-rollback-results.json 2> /tmp/adam-rollback.log
```
For each rolled-back proposal, print to user: `adam: rolled back "<proposal_id>" — regression detected (delta: <delta_pct>%)`. The rollback script moves the proposal from `applied/` back to `proposals/` with `rolled_back: true` and creates a regression nudge. If the script fails: log stderr, continue (rollback is best-effort).
**Evidence batching** (MOSS §3.1): pre-cluster the windowed journal into coherent failure batches:
```bash
node ~/.claude/adam/scripts/adam-batch.mjs --input /tmp/adam-windowed-journal.jsonl > /tmp/adam-batches.json 2> /tmp/adam-batch.log
```
This groups entries by (signal_type, cluster_key) and reports per-batch metadata including `has_context_window` (whether transcript evidence is attached). If the script fails: log stderr, pass `null` to the analyst (graceful degradation — analyst falls back to raw journal clustering).
**Skill utility** (execution-grounded selection signal, in the spirit of SkillsInjector arXiv 2605.29794 — utility Δ(s), not surface match): compute per-skill good:bad outcome ratios over the windowed journal:
```bash
node ~/.claude/adam/scripts/adam-skill-utility.mjs --input /tmp/adam-windowed-journal.jsonl --json > /tmp/adam-skill-utility.json 2> /tmp/adam-skill-utility.log
```
This ranks skills by how often they co-occur with positive (`task_completed`, `clean_recovery`, `correction_free_streak`) vs negative outcome events, surfacing skills below the baseline positive rate (with sufficient sample) — advisory candidates for description disambiguation or archival. **CO-OCCURRENCE, NOT CAUSATION**: display the worst 3 below-baseline skills (`lift < 0`, not low-sample) to the *user* as a one-line advisory before listing proposals (e.g. `skill-utility: chezmoi 9% pos n=85, ghostty-config 14% pos n=50, …`). Do NOT feed this into the analyst's proposal machinery or auto-draft skill-archival from it — the human decides. If the script fails: log stderr, skip (best-effort).
### 2. Dispatch the analyst (two-stage pipeline)
MOSS §3.3: "A single prompt asked to diagnose, plan, implement, verify, and decide overloads context and produces lower-quality output than a sequenced flow." The analyst is dispatched in two stages with a validation gate between them.
**Stage 1 — Diagnose + Plan**: Use the Agent tool with `subagent_type: "adam"` and prompt:
``` ```
Run a single analysis pass. stage=diagnose
Read the batched journal entries, cluster by signal type, diagnose root causes,
plan fix types, and score the keypoint matrix. Write diagnoses to /tmp/adam-diagnoses.json.
Do NOT draft proposal files.
Inputs: Inputs:
- windowed_journal_path: /tmp/adam-windowed-journal.jsonl # pre-filtered by adam-window.mjs - windowed_journal_path: /tmp/adam-windowed-journal.jsonl
- scores_path: /tmp/adam-scores.json # per-session dampeners + reinforcement candidates - batches_path: /tmp/adam-batches.json # pre-clustered evidence batches
- ab_regressions_path: /tmp/adam-ab-regressions.json # A/B deltas for prior auto-applied proposals - scores_path: /tmp/adam-scores.json
- ab_regressions_path: /tmp/adam-ab-regressions.json
- journal_path: ~/.claude/adam/journal.jsonl # raw — fallback only - journal_path: ~/.claude/adam/journal.jsonl # raw — fallback only
- state_path: ~/.claude/adam/state.json - state_path: ~/.claude/adam/state.json
- usage_path: ~/.claude/adam/usage.json - usage_path: ~/.claude/adam/usage.json
- applied_dir: ~/.claude/adam/applied/
- rejected_dir: ~/.claude/adam/rejected/
- transcripts_root: ~/.claude/projects/
- skills_root: ~/.claude/skills/
Use batches_path for pre-clustered evidence when available. Prefer context_window
fields in journal entries over transcript file lookups. Write /tmp/adam-diagnoses.json
per the "Diagnose-stage output format" in your system prompt.
```
Wait for return.
**Inter-stage validation** (§2a): after stage 1 returns, read `/tmp/adam-diagnoses.json` and validate each diagnosis:
1. Every `source_entries` timestamp exists in the windowed journal (read `/tmp/adam-windowed-journal.jsonl`, check timestamps match).
2. Every diagnosis has all four fields (`trigger`, `action`, `mismatch`, `outcome`).
3. The planned `type` is a valid proposal type.
4. Remove diagnoses that fail validation — log a one-line warning per removal.
If all diagnoses are removed or the file is missing/empty, print "adam: no valid diagnoses — nothing to implement" and skip to §6.
**Stage 2 — Implement**: Use the Agent tool with `subagent_type: "adam"` and prompt:
```
stage=implement
Read the validated diagnoses and draft full proposal files.
Inputs:
- diagnoses_path: /tmp/adam-diagnoses.json # validated stage-1 output
- windowed_journal_path: /tmp/adam-windowed-journal.jsonl
- scores_path: /tmp/adam-scores.json
- ab_regressions_path: /tmp/adam-ab-regressions.json
- state_path: ~/.claude/adam/state.json
- usage_path: ~/.claude/adam/usage.json
- proposals_dir: ~/.claude/adam/proposals/ - proposals_dir: ~/.claude/adam/proposals/
- applied_dir: ~/.claude/adam/applied/ - applied_dir: ~/.claude/adam/applied/
- rejected_dir: ~/.claude/adam/rejected/ - rejected_dir: ~/.claude/adam/rejected/
- transcripts_root: ~/.claude/projects/ - transcripts_root: ~/.claude/projects/
- skills_root: ~/.claude/skills/ - skills_root: ~/.claude/skills/
The windowed_journal is already filtered by per-signal age (see Draft proposal files to proposals_dir/ for each diagnosis. Score against the
SIGNAL_WINDOWS_DAYS in adam-window.mjs) AND by actioned-exclusion. Read it as confidence rubric. Emit the clustering trace and punch list as your final message.
your primary input — do not re-apply window math. Fall back to journal_path
only if windowed_journal_path is missing or empty.
Follow your system prompt exactly. Emit a single JSON punch list as your final message.
``` ```
Wait for return. Wait for return.
@@ -112,11 +174,29 @@ node ~/.claude/adam/scripts/adam-explain.mjs --mode full # verbatim trace
node ~/.claude/adam/scripts/adam-explain.mjs --mode json # machine-readable node ~/.claude/adam/scripts/adam-explain.mjs --mode json # machine-readable
``` ```
### 3. Auto-apply high-confidence items ### 3. Pre-apply verification gate (MOSS §3.4)
MOSS §3.4: "Verification must therefore be runtime, on a production-equivalent environment, and against the same prompts that produced the failure evidence." Before auto-applying, verify each proposal deterministically:
For each id in `high_confidence`: For each id in `high_confidence`:
- Read the proposal file from `~/.claude/adam/proposals/<id>-*.md`. - Read the proposal file from `~/.claude/adam/proposals/<id>-*.md`.
- Verify in front of the user: print `id`, `target`, `confidence`, `blast_radius`, `cross_session_evidence`, `auto_apply_eligible`. - **Verification checks** (all must pass for auto-apply to proceed):
1. **Source entries exist**: every timestamp in `source_entries` frontmatter must appear in `/tmp/adam-windowed-journal.jsonl`. If any are missing, the evidence is stale or was already actioned — demote to `queued`.
2. **Diagnosis grounded**: the `# Diagnosis` section must have all four fields (Trigger, Action, Mismatch, Outcome) with ≥1 backtick-wrapped quote. If malformed, demote to `queued`.
3. **Type-evidence match**: the proposal `type` must match what the evidence supports:
- `correction` signals → `memory`, `skill_new`, `skill_edit` (not `nudge`)
- `dead_end` signals → `nudge`, `skill_new`, `skill_edit` (not `memory`)
- `tool_error_loop` signals → `memory`, `skill_new`, `skill_edit`
- `harness_edit` → must cite harness-level evidence (false negative, scoring bias, window miscalibration)
If mismatch, demote to `queued`.
4. **No conflicting applied proposal**: grep `~/.claude/adam/applied/` for any proposal with the same `target` applied in the last 7 days. If found, demote to `queued` (prevents stacking rapid edits).
- Print verification result: `verified: <id> (4/4 checks passed)` or `demoted: <id> (failed: <check_name>)`.
- Demoted proposals are moved from `high_confidence` to `queued` for manual review.
### 3a. Apply verified high-confidence items
For each id that passed verification:
- Print `id`, `target`, `confidence`, `blast_radius`, `cross_session_evidence`, `auto_apply_eligible`.
- Apply the change: - Apply the change:
- **For `skill_new`**: `mkdir -p ~/.claude/skills/<slug>/`, then `Write` the proposal's `# Proposed change` body to `~/.claude/skills/<slug>/SKILL.md`. After write, print: "skill `<slug>` written to `~/.claude/skills/<slug>/SKILL.md` — activates immediately — Claude Code v2.1.0+ auto-hot-reloads user-level skills, no restart needed." - **For `skill_new`**: `mkdir -p ~/.claude/skills/<slug>/`, then `Write` the proposal's `# Proposed change` body to `~/.claude/skills/<slug>/SKILL.md`. After write, print: "skill `<slug>` written to `~/.claude/skills/<slug>/SKILL.md` — activates immediately — Claude Code v2.1.0+ auto-hot-reloads user-level skills, no restart needed."
- **For `memory`**: `Write` the proposal's `# Proposed change` body (which MUST include the auto-memory frontmatter — see "Memory drafting protocol" in `agents/adam.md`) to the path in `target`. Then update `MEMORY.md` index with a one-line pointer. - **For `memory`**: `Write` the proposal's `# Proposed change` body (which MUST include the auto-memory frontmatter — see "Memory drafting protocol" in `agents/adam.md`) to the path in `target`. Then update `MEMORY.md` index with a one-line pointer.
@@ -143,13 +223,13 @@ For each id in `high_confidence`:
8. Add `last_auto_edit: <iso8601 utc now>` to the proposal frontmatter before moving it. 8. Add `last_auto_edit: <iso8601 utc now>` to the proposal frontmatter before moving it.
9. Tell user: "skill `<slug>` extended (added <N> lines) — auto-applied via win-evidence gate." 9. Tell user: "skill `<slug>` extended (added <N> lines) — auto-applied via win-evidence gate."
- Move proposal to `~/.claude/adam/applied/<UTC-ts>-<id>.md`. - Move proposal to `~/.claude/adam/applied/<UTC-ts>-<id>.md`.
- **A/B tracking append**: as a separate atomic step right after the move, append one JSON line to `~/.claude/adam/ab-tracking.jsonl` (create with empty contents if absent). Read fields from the proposal's frontmatter (`proposal_fingerprint`, `originating_signals` — both populated per `agents/adam.md`; `originating_signals` is a list of `{type, count, session_ids}` objects). Schema: - **A/B tracking append** (skip for `reinforcement` — positive-only ledger, intentionally not A/B-tracked per `agents/adam.md` §"`reinforcement` proposals"): as a separate atomic step right after the move, append one JSON line to `~/.claude/adam/ab-tracking.jsonl` (create with empty contents if absent). Read fields from the proposal's frontmatter (`proposal_fingerprint`, `originating_signals` — both populated per `agents/adam.md`; `originating_signals` is a list of `{type, count, session_ids}` objects). Schema:
```json ```json
{ {
"applied_at": <unix_ms now>, "applied_at": <unix_ms now>,
"proposal_id": "<id>", "proposal_id": "<id>",
"proposal_type": "skill_edit|skill_new|memory|nudge|reinforcement", "proposal_type": "skill_edit|skill_new|memory|nudge",
"target_skill": "<slug or target basename>", "target_skill": "<slug or target basename>",
"proposal_fingerprint": "<hash>", "proposal_fingerprint": "<hash>",
"originating_signals": [{"type":"<signal>","count":<N>,"session_ids":[...]}], "originating_signals": [{"type":"<signal>","count":<N>,"session_ids":[...]}],
@@ -174,6 +254,12 @@ c. On **approve**:
- For `skill_new`: `mkdir -p ~/.claude/skills/<slug>/`, then write `# Proposed change` body to `<slug>/SKILL.md`. Tell user: "skill `<slug>` written — activates immediately (CC v2.1.0+ auto-hot-reload)." - For `skill_new`: `mkdir -p ~/.claude/skills/<slug>/`, then write `# Proposed change` body to `<slug>/SKILL.md`. Tell user: "skill `<slug>` written — activates immediately (CC v2.1.0+ auto-hot-reload)."
- For `skill_edit`: apply the unified diff in `# Proposed change` to the existing SKILL.md at `target` (append-only — never replace existing content). - For `skill_edit`: apply the unified diff in `# Proposed change` to the existing SKILL.md at `target` (append-only — never replace existing content).
- For `memory`: write `# Proposed change` body (must include auto-memory frontmatter) to `target` and update `MEMORY.md` index with a one-line pointer. - For `memory`: write `# Proposed change` body (must include auto-memory frontmatter) to `target` and update `MEMORY.md` index with a one-line pointer.
- For `harness_edit` (MOSS §1): apply the unified diff to the target harness file. **Before applying**:
1. Run `bash ~/.claude/adam/tests/run-tests.sh` — capture pass count.
2. Apply the diff via `Edit`.
3. Run `bash ~/.claude/adam/tests/run-tests.sh` again — verify pass count is equal or higher and 0 failures.
4. If test regression: revert the edit, print "harness_edit reverted — test regression detected", leave proposal in `proposals/`.
5. If tests pass: tell user "harness edit applied to `<target>` — tests pass (<N> passed)."
- For all others: apply via Write/Edit per the proposal's `# Proposed change`. - For all others: apply via Write/Edit per the proposal's `# Proposed change`.
- Move proposal to `~/.claude/adam/applied/<ts>-<id>.md`. - Move proposal to `~/.claude/adam/applied/<ts>-<id>.md`.
- Archive: `node ~/.claude/adam/scripts/adam-archive.mjs ~/.claude/adam/applied/<ts>-<id>.md`. - Archive: `node ~/.claude/adam/scripts/adam-archive.mjs ~/.claude/adam/applied/<ts>-<id>.md`.
@@ -191,6 +277,10 @@ End with one block:
``` ```
adam reflect summary: adam reflect summary:
observations processed: <new> observations processed: <new>
batches formed: <N>
diagnoses validated: <N>/<total>
rolled back (regression): <N>
verification passed: <N>/<total high_confidence>
auto-applied: <N> auto-applied: <N>
approved: <N> approved: <N>
rejected: <N> rejected: <N>
@@ -198,6 +288,14 @@ adam reflect summary:
failed: <N> failed: <N>
``` ```
**Keypoint history**: after all proposals are processed, append one JSON line to `~/.claude/adam/keypoint-history.jsonl` with the aggregate keypoint scores from the diagnose stage:
```json
{"ts":"<iso>","session":"<session_id>","keypoints":{"tool_selection":N,"scope_discipline":N,"error_recovery":N,"first_attempt":N,"build_reliability":N},"proposals_emitted":N,"proposals_applied":N}
```
This builds a longitudinal record of which capabilities are improving across `/reflect` runs.
## Karpathy constraints (you must enforce on each apply) ## Karpathy constraints (you must enforce on each apply)
Before writing any proposal: Before writing any proposal:
@@ -210,7 +308,8 @@ Before writing any proposal:
- For `skill_new`: confirm the slug doesn't collide with any existing skill in `~/.claude/skills/`. If it does, refuse and ask user to rename. - For `skill_new`: confirm the slug doesn't collide with any existing skill in `~/.claude/skills/`. If it does, refuse and ask user to rename.
- For `skill_edit`: confirm the diff is append-only (no `-` lines that remove existing content) and that target SKILL.md exists. When auto-applying, ALSO re-verify the eligibility gate steps in §3 (cooldown, blacklist, byte cap) before any `Edit` call — never trust frontmatter alone. - For `skill_edit`: confirm the diff is append-only (no `-` lines that remove existing content) and that target SKILL.md exists. When auto-applying, ALSO re-verify the eligibility gate steps in §3 (cooldown, blacklist, byte cap) before any `Edit` call — never trust frontmatter alone.
- For `skill_edit` with `auto_apply_eligible: true`: confirm `contradiction_flag` is absent or null in frontmatter. Refuse auto-apply if `contradiction_flag` is set with any non-empty value (treat the agent's flag as a hard veto on auto-apply; user can still manually approve in walk-the-queue if they disagree with the heuristic). - For `skill_edit` with `auto_apply_eligible: true`: confirm `contradiction_flag` is absent or null in frontmatter. Refuse auto-apply if `contradiction_flag` is set with any non-empty value (treat the agent's flag as a hard veto on auto-apply; user can still manually approve in walk-the-queue if they disagree with the heuristic).
- For `memory`: confirm `# Proposed change` body starts with `---` frontmatter containing required fields `name`, `description`, `type`, `originSessionId`. Refuse if frontmatter missing — agent must redraft per the Memory drafting protocol. - For `memory`: confirm `# Proposed change` body starts with `---` frontmatter matching the live auto-memory schema — top-level `name` (the slug) + `description`, plus a `metadata:` block with `node_type: memory`, `type`, and `originSessionId`. Cross-check the shape against an existing file in the target memory dir. Refuse if frontmatter is flat (`type:`/`originSessionId:` at top level) or missing the `metadata:` block — agent must redraft per the Memory drafting protocol.
- For `harness_edit`: confirm `auto_apply_eligible: false` (never auto-apply). Confirm `confidence ≥ 5`. Confirm `# Test verification` section names the test command. Confirm diff is ≤30 LOC and targets a single allowed harness file (see `agents/adam.md` §"Harness self-modification"). Run test suite before AND after applying — revert on any regression.
- Confirm `source_entries` is present in proposal frontmatter as a non-empty list (used for archive). Warn (do not refuse) if missing — legacy proposals from before v0.2.0 won't have it. - Confirm `source_entries` is present in proposal frontmatter as a non-empty list (used for archive). Warn (do not refuse) if missing — legacy proposals from before v0.2.0 won't have it.
If any check fails, refuse to apply and ask the user how to proceed. If any check fails, refuse to apply and ask the user how to proceed.