17 Commits

Author SHA1 Message Date
lukaszraczylo fcddb6bf79 feat(v0.6.3): release-update notifier (notify-only, SessionStart)
Adds a lightweight "new release available" notice without auto-installing —
because re-running install.sh overwrites ADAM's own /reflect-applied skill
edits, so the user must choose when to take an update.

- install.sh writes ~/.claude/adam/.version (the installed release tag) on
  every install. Derived from $VERSION / piped REF / `git describe --tags`.
- adam-nudge.mjs (SessionStart) compares .version against the latest GitHub
  release at most once/day. Cached in ~/.claude/adam/.update-check.json; the
  cache drives an instant nudge (no network on the hot path) and is refreshed
  best-effort with a 1.5s AbortController cap. fetch unavailable / offline /
  timeout / rate-limit / parse error all degrade to silent no-op. Opt out with
  ADAM_NO_UPDATE_CHECK=1. main() is now async; never blocks SessionStart.
- README: "Staying up to date" section; pin example bumped to v0.6.3.

Tests: 134 -> 138. Notifier verified fully offline (cache-driven): nudges when
a newer release is cached, silent when current, suppressed by the opt-out env,
and no-ops when the .version marker is absent.
2026-05-29 13:13:59 +01:00
lukaszraczylo d929101af4 fix(v0.6.2): A/B volume normalization + memory frontmatter schema
Two issues surfaced by running ADAM's /reflect loop on a large real journal
(4015 entries, 119 sessions) — both caused false/broken auto-apply behavior.

1. A/B over-reported regressions (adam-ab-measure.mjs).
   Regressions were measured on RAW originating-signal counts pre vs post. On a
   busy, growing journal almost every signal count rises post-apply regardless
   of whether the proposal helped — so the loop flagged 9 false "regressions"
   (and would auto-roll-back good proposals). Now the delta is computed on the
   signal's SHARE of total activity (rate = count / window-total). Falls back to
   the raw-count delta when the signal is the only activity in the window
   (preserves prior behavior + all existing A/B tests). Output adds
   raw_delta_pct, pre_total, post_total, normalized for transparency.

2. Memory frontmatter drift (agents/adam.md, SKILL.md).
   The drafting protocol emitted flat `type:`/`originSessionId:` with a prose
   `name`, but the live auto-memory store uses `name` = slug plus a
   `metadata: {node_type, type, originSessionId}` block. Auto-applied memories
   could fail to load/categorize. Protocol + apply-time validation now require
   the live metadata.* schema and cross-checking against an existing file.

Tests: 132 -> 134. New: volume growth (raw +200%) with flat activity-share
classifies neutral, not regressed; a genuine share increase still classifies
regressed.
2026-05-29 12:37:10 +01:00
lukaszraczylo 3a54d7d3e1 feat(v0.6.1): file_reread signal — catch offset-shifted same-file re-reads
Proposed and approved through ADAM's own /reflect harness_edit loop (MOSS §1):
the analyst surfaced 23 tool_error_loop entries across 4 sessions whose context
windows were really redundant re-reads of one file.

retry_loop keys on argsHash of the full tool_input (including offset/limit), so
consecutive Reads of the SAME file at different offsets escaped dedup and leaked
into tool_error_loop fingerprints. The new file_reread signal catches them:
same file Read >=3x in the 10-event window, offset-agnostic (keyed on file
path), guarded by `sameToolArgs < RETRY_THRESHOLD` so byte-identical reads stay
with retry_loop (no double-count).

Fully wired end-to-end (not a half-dead signal):
- adam-observe.mjs: detection + STRUGGLE_TYPES membership (so it carries
  context_window + active_skills like other struggle signals).
- adam-window.mjs: 14-day sliding window (task-local, like retry_loop).
- adam-score.mjs: severity divisor 3.
- adam-batch.mjs: file-basename clustering.
- agents/adam.md + README: signal tables, clustering rules, rubric, windows.

Tests: 126 -> 132 (file_reread fires on 3x offset-shifted reads, not on 2x;
byte-identical reads route to retry_loop not file_reread; carries context_window).
2026-05-29 11:31:50 +01:00
lukaszraczylo 4b36d6c09e feat(v0.6.0): review hardening — live active_skills clustering, computable fingerprints
Full codebase review (multi-agent, adversarially verified) surfaced several
documented-but-dead mechanisms and doc/code drift. Fixes:

- adam-observe: struggle signals now emit `active_skills`, so silent_drift's
  primary cluster key AND §5b skill-attribution sub-clustering (+1 rubric
  bonus) actually fire — both were silently dead (no struggle signal carried
  the field).
- adam-cooldown: new `--compute` CLI deterministically derives
  proposal_fingerprint. The exported computeProposalFingerprint() was never
  called and the analyst was told to hand-compute a djb2 hash it cannot
  reproduce. Spec now mandates a *stable* cluster id so fingerprints reproduce
  across /reflect runs. Removed one dead normalization line.
- spec: reinforcement proposals excluded from A/B tracking — agents/adam.md
  contradicted itself (:376 included, :476 excluded); SKILL.md aligned.
- adam-nudge: PENDING_CHECK_PATHS now mirrors the full install set
  (adam-utils / adam-batch / adam-rollback were missing).
- adam-explain: synthesized clustering summary carries `regressions: 0`
  (structural consistency with parsed summaries).
- docs: test-count drift (87/94 -> 126) and "350-line hook" (-> ~600) fixed;
  adam-score header documents severity_sum/severity_by_type; adam-batch §4
  reference corrected.

Tests: +12 assertions (114 -> 126), all green. New regression tests cover the
active_skills fix and --compute, plus boundary gaps the review flagged:
retry_loop/weak_agent thresholds, A/B exact +/-25% deltas, cooldown 30d
blacklist edge.
2026-05-29 01:57:44 +01:00
lukaszraczylo 2d9257922f docs: update README for v0.5.0 release — MOSS-grounded improvements 2026-05-24 11:19:15 +01:00
lukaszraczylo 440fb52eb1 feat: apply MOSS-grounded self-evolution improvements to ADAM
Implements 7 improvements grounded in MOSS paper (arXiv 2605.22794):

1. Transcript capture (§3.4): context_ring buffer in adam-observe.mjs
   captures last 8 events around struggle signals as context_window.

2. Evidence batching (§3.1): new adam-batch.mjs pre-clusters windowed
   journal entries into coherent failure batches by (signal_type, cluster_key).

3. Multi-stage analysis (§3.3): SKILL.md dispatches adam agent in two
   stages (diagnose+plan → implement) with inter-stage validation gate.

4. Pre-apply verification (§3.4): 4-check deterministic gate before
   auto-apply (source entries exist, diagnosis grounded, type-evidence
   match, no conflicting recent proposals).

5. Auto-rollback (§3.5): new adam-rollback.mjs reverts regressed proposals
   detected by A/B measurement, creates regression nudges.

6. Harness self-modification (§1 Table 1): new harness_edit proposal type
   targeting adam's own scripts with stricter gates (confidence≥5, never
   auto-apply, test-suite-gated).

7. Keypoint matrix evaluation (§4.2): 5 capability dimensions
   (tool_selection, scope_discipline, error_recovery, first_attempt,
   build_reliability) scored per batch for structured evaluation.

Test suite: 94 → 114 tests (20 new), all passing.
2026-05-24 11:15:32 +01:00
lukaszraczylo a48c705c0a feat(adam): smarter signals & clustering
- New signal types in hooks/adam-observe.mjs:
  - silent_drift: 5 consecutive read-only PostToolUse without an action tool
  - error_after_recovery: same error fingerprint returns within 5 events of clean_recovery
- Severity-weighted scoring in adam/scripts/adam-score.mjs:
  - SEVERITY_DIVISORS exported per struggle signal type
  - Per-session severity_sum + severity_by_type added to JSON output
- Skill-attribution clustering in agents/adam.md:
  - Sub-cluster struggle signals on active_skills[0]
  - New struggle-driven skill_edit variant (always queues, never auto-applies)
- Rubric updates:
  - +1 for cluster severity-sum >= 10, additional +1 for >= 32
  - +1 for skill-attributed sub-cluster naming an existing skill
  - silent_drift + error_after_recovery added to struggle signal list
- Window: silent_drift 14d, error_after_recovery 30d
- Tests: 94 passing (78-82 new)

Backward compat: entries without count default to severity 1. Existing
win-driven skill_edit gate untouched. No journal migration.
2026-05-13 19:21:59 +01:00
lukaszraczylo a8883aa8b7 fix(logo): explicit light/dark variants + <picture> for GitHub
The prior logo.svg used currentColor, which resolves to black when the
SVG is loaded via <img> on GitHub — making the logo invisible in dark
mode (the GitHub default for many users).

Fix uses GitHub's supported <picture> + prefers-color-scheme media-
source pattern in README:

- assets/logo-light.svg — explicit GitHub light-theme text color #24292f
- assets/logo-dark.svg  — explicit GitHub dark-theme text color #f0f6fc
- assets/logo.svg       — kept with embedded @media + currentColor for
                          standalone use (markmorph notes, anywhere
                          else the SVG is loaded outside <picture>)

README updates the <img> tag to a <picture> with media-conditioned
source so GitHub's renderer picks the right variant per theme.
2026-05-13 02:07:11 +01:00
lukaszraczylo 7ed2aecdfa docs(logo): swap to swaddled-baby design with hands
Replaces the geometric-A-with-observation-dot with a softer, more
on-theme design: a swaddled-baby silhouette (rounded A-shape bundle),
face nestled inside, and the wrap-band extended past the bundle on
both sides as little hands. Maintains currentColor + zero external
assets; reads cleanly down to favicon size.

Ties the visual identity to the 'Story behind Adam' section: the
project is named after the author's son, and now the logo is too.
2026-05-13 02:02:02 +01:00
lukaszraczylo a30f8b1158 docs: replace ASCII pipeline diagram with mermaid flowchart
GitHub renders mermaid natively. Diagram now shows three subgraphs
(Observation → Analysis → Review + apply) with a nested Pre-processors
subgraph inside Analysis. Includes:

- Dotted edge labeled 'user runs /reflect' marking the observe→analyze
  boundary.
- Diamond gate node for auto-apply decision (conf≥4 · low blast ·
  cooldown cool) with explicit yes/no branches.
- Feedback loop: applied/ entries measure back into adam-ab-measure.mjs
  on subsequent reflects.
- Color-coded classDef for stores (blue), processes (orange), and the
  clustering trace artifact (purple).

ASCII art retired — diagram now legible at any zoom on github.com.
2026-05-13 01:54:38 +01:00
lukaszraczylo d3e4350d71 docs: modernize README + add SVG logo + inspiration story
- New 'Story behind Adam' section at the top: the project is named after
  the author's newborn son, whose observe-act-adjust-observe-again
  learning loop is the methodology ADAM applies to LLM sessions.
- New SVG logo at assets/logo.svg: stylized 'A' with a captured
  observation point inside the apex and a feedback crossbar. Uses
  currentColor + gradient so it adapts to light/dark GitHub themes.
- Centered header block with project tagline + 5 badges (License,
  Version, Tests, Node, Platform).
- New 'Highlights' section: 8 emoji-tagged one-liners covering the
  v0.3.3 design pillars (zero LLM cost observation, A/B measurement,
  sliding windows, observability, etc.).
- New 'How it works' ASCII pipeline diagram: observation -> analysis
  pre-processors -> analyst -> review + apply.
- Signals table now includes per-signal sliding window column.
- Rubric section restructured: gates, modifiers (dampener), and
  skill_edit-specific requirements clearly separated.
- New 'Inspecting the analyst's reasoning' section documenting
  adam-explain.mjs + /reflect --explain.
- Layout updated for v0.3.3 state files (active-nudges.json,
  ab-tracking.jsonl, reinforcements.jsonl, last-trace.txt) and all
  9 new helper scripts under adam/scripts/.
- Test count: 27 -> 87.
- Closing line crediting Adam.
2026-05-13 01:50:59 +01:00
lukaszraczylo 871592a75b Merge branch 'adam-v0.3.3-fixes' (v0.3.3) 2026-05-13 01:02:40 +01:00
lukaszraczylo 012c40b9ab chore(v0.3.3): analyst observability, A/B measurement, journal hygiene
Storage/window/exclusion split (#7): ISO-week journal rotation with safety
fuse replaces size-based rotation (fixes silent under-counting when clusters
straddle boundaries). Per-signal sliding windows via adam-window.mjs guard
against stale signal accumulation. Legacy YYYY-MM-DD-<ts>.jsonl files remain
readable.

Error fingerprint normalization (#3): adam-observe.mjs extracts canonical
error codes (ENOENT, ECONNREFUSED, etc.) and normalizes paths/timestamps/hex
before hashing. 'Connection refused' and 'ECONNREFUSED' now cluster identically.

Correction corpus expansion (#1): strong tokens (stop, wrong, undo, try again,
different approach, etc.) fire on any occurrence. Weak tokens (no, actually,
wait) require negation/contrast co-occurrence within 8 tokens. Kills the
'actually, I think...' false positive.

Analyst observability (#6): mandatory clustering trace block; adam-explain.mjs
parses to summary/full/json. Cluster decisions now surface rejection reasons
(threshold, contradiction, window). Persisted to ~/.claude/adam/last-trace.txt.

Dead_end nudge proposal type (#2): single-session auto-apply gate (>=3
dead_end events). Action appends to active-nudges.json, surfaced via
adam-nudge.mjs at next SessionStart. Lower blast than skill_edit.

Per-(skill, fingerprint) cooldown (#4): adam-cooldown.mjs replaces coarse
per-skill check. proposal_fingerprint = djb2(skill_slug + cluster_id +
normalized_diff_body). Legacy applied/rejected records gate via 'legacy'
fingerprint fallback through resolveSkill helper (handles target_skill,
skill, or target: <path>).

task_completed scoring integration (#8): adam-score.mjs computes per-session
urgency dampener (3 task_completed -> 0.5) and reinforcement candidates
(skills cited in >=3 clean completions). New 'reinforcement' proposal type
appends to reinforcements.jsonl on apply (no code/memory mutation).

A/B effectiveness measurement (#5): every auto-applied edit appends to
ab-tracking.jsonl. adam-ab-measure.mjs computes 7d pre/post signal-count
delta per entry (improved / neutral / regressed / no_baseline / pending).
Analyst surfaces regressions at top of /reflect output.

Upgrade UX overhaul (#9): adam-upgrade.mjs implements --list/--diff/--accept
/--accept-all. SessionStart nudge prints pending-merge warning when
.adam-new files exist (latency ~20ms via fixed shortlist). install.sh
emits unmissable final-message hint after creating any .adam-new file.

Simplify pass: adam-utils.mjs deduplicates readJsonlSafe / listJsonlFiles /
parseFrontmatter across 8 scripts. Net -46 LOC.

Test coverage: 30 -> 87 tests. Every new feature has feature-validating
assertions (false-case coverage included). T77 statically verifies install.sh
references every adam-*.mjs source script (would have caught the missing
adam-utils inclusion that review #2 surfaced).
2026-05-13 01:02:33 +01:00
lukaszraczylo 7ddda26bb4 feat: task_completed signal — post-task skill capture (v0.3.2)
Adds an 11th signal type emitted when a run of work (between two
UserPromptSubmit events) crosses three quality gates:
  - >=5 tool calls (TASK_TOOL_MIN)
  - >=3 distinct tool kinds (TASK_DIVERSITY_MIN, filters single-tool
    sweeps like "wrote 5 files")
  - 0 correction signals during the run (filters tasks where the user
    pushed back; correction-during-task disqualifies the recipe)

Payload carries tool_count, tool_kinds, active_skills, active_agents
so the agent can cluster by sorted tool-kind tuple and route through
the existing skill-overlap rule (skill_new vs skill_edit).

Importantly: cross_session_evidence is FALSE on first occurrence,
so resulting skill_new proposals always queue for review — they only
auto-apply when the same multi-tool recipe recurs in a second session
(then the existing rubric kicks in). Post-task creation captures novel
patterns while preserving the rule "auto-apply requires cross-session".

Hook adds state fields: task_tool_count, task_tool_kinds, task_corrections.
All reset on UserPromptSubmit boundary and on session change.

Agent gets one new signal-types-table row and one clustering bullet
referencing the existing skill-overlap rule.

3 new tests (30 passed, 0 failed):
  - 5 tools + 5 kinds + 0 corrections fires task_completed
  - 5 tools + 1 kind (Edit only) does NOT fire (diversity gate)
  - 5 tools + 3 kinds + correction-on-closing-prompt does NOT fire
2026-05-10 22:34:33 +01:00
lukaszraczylo 6d8ff37cb2 v0.3.1: code review pass + DX overhaul
Bug fixes (HIGH):
- adam-observe.mjs: errorFingerprint no longer false-positives when
  toolResponse.is_error === false; ERROR_RE only used as fallback when
  is_error is undefined.
- adam-observe.mjs: resetSessionLocal now clears tool_window so retry_loop
  cannot fire on the first tool of a new session by matching prior session.
- adam-archive.mjs: ts dedup uses Map<ts, count> instead of Set<ts>; two
  journal entries sharing a millisecond are no longer both archived when
  only one is referenced in source_entries.
- adam-nudge.mjs: only counts proposal filenames matching
  /^\d{4}-\d{2}-\d{3}-/ pattern; README/notes in proposals/ no longer bump.
- skills/adam-self-improvement/SKILL.md: contradiction_flag veto now applied
  at apply time (carry-over from earlier review).

Test isolation:
- adam/tests/run-tests.sh: ALWAYS runs against an isolated $HOME under
  mktemp -d. Previously truncated live ~/.claude/adam/journal.jsonl on
  every run — destructive on production state.

Conciseness:
- agents/adam.md: -19 LOC (cuts: vestigial cursor sentence, duplicate
  not-do bullets, blast-radius bullet collapse, Inputs paths delegate to
  SKILL.md, win-cluster-vs-struggle-cluster commentary already enforced
  by cluster-key separation, # Overlap section spec compressed).
- skills/adam-self-improvement/SKILL.md: -4 LOC (framing paragraph, dead
  catch-all bullet for non-eligible types).

Auto-prune script DELETED:
- The cumulative-count primitive cannot distinguish "never used" from
  "used before tracking began"; mtime gate is meaningless for installed
  files. Auto-prune deferred to v0.4 with a per-key lastSeen schema.

Cross-platform:
- macOS (BSD coreutils) and Linux (Alpine, glibc + musl) verified.
- All scripts use portable forms (stat -f || stat -c, mktemp -d -t).
- README documents platform support explicitly.

DX overhaul:
- install.sh: hardened — supports `curl | bash` via auto-clone,
  --version=vX.Y.Z pinning, --yes / --dry-run flags, jq-based
  settings.json merge with diff prompt and backup, conservative file
  copy that detects local mtime drift and writes <file>.adam-new
  instead of clobbering, idempotent across re-runs.
- adam-uninstall.sh: NEW. Soft-archives ~/.claude/adam/ to .bak.<ts>/
  by default; --purge to delete; --yes for non-interactive; jq-based
  settings.json cleanup with diff prompt.
- README.md: curl one-liner install + version-pinned variant at top,
  What's New section through v0.3.1, upgrade-safe data files callout,
  uninstaller documentation, platform support note, expanded rubric
  showing skill_edit gate.

Test count: 27 passed, 0 failed (was 27 — no regression).
2026-05-10 21:33:17 +01:00
lukaszraczylo 780401e96a feat: causal diagnosis step on every proposal (v0.3.0)
Closes the gap between categorical signal capture (we saw 3 retries) and
causal proposal drafting (here is why and what to do). Mirrors the NL trace
reflection step Hermes Agent uses before mutating prompts.

Adds # Diagnosis section to every proposal body — four labelled lines:
- Trigger: what the user wanted / context
- Action:  what the assistant did
- Mismatch: how the action diverged
- Outcome: surfacing event with >=1 verbatim transcript quote

Constraints:
- <=5 LOC of prose total
- >=1 backtick-wrapped quote <=80 chars from transcript context window
- Cannot speculate; "Mismatch: unclear" is allowed but takes -1 confidence
- Win clusters use "Mismatch: None" with recovery quote in Outcome

Skill enforces structure at apply time (presence + 4 labelled lines + quote)
for both auto-apply and walk-the-queue paths. No semantic check — humans
judge causal correctness during walk-the-queue.

Adds optional frontmatter field `diagnosis_summary` (<=120 chars from the
Mismatch line) so applied/ and rejected/ are searchable by causal pattern.

New rubric penalty: -1 confidence when Diagnosis flags Mismatch: unclear.
Stops weak-causation proposals from auto-applying (drops below conf>=4).

No hook changes. All 27 tests still pass.

Spec: ~/.claude/docs/superpowers/specs/2026-05-10-adam-causal-diagnosis-design.md
2026-05-10 21:02:36 +01:00
lukaszraczylo 2dc76bf203 feat: lessons-learned loop — win signals + skill_edit auto-apply
Adds two new hook signal types:
- correction_free_streak: 5 consecutive UserPromptSubmits without a correction phrase
- clean_recovery: 3 clean PostToolUse events after a struggle signal
  (tool_error_loop / dead_end / retry_loop)

Both carry active_skills/active_agents payloads computed from a 10-event
activity ring, so ADAM can attribute wins to whichever skill was active
during the streak/recovery.

Promotes skill_edit to auto-apply under a strict gate (all required):
- conf >= 4 + cross-session evidence (existing rules)
- # Why cites a win-signal entry whose active_skills includes target
- diff append-only, +lines <= 30
- resulting SKILL.md size <= 2x current size
- 7-day cooldown per target (last_auto_edit in applied/ frontmatter)
- 30-day blacklist on user rejection (auto_apply_blacklist in rejected/)

Skill enforces the gate at apply time as defense in depth: re-stats target,
re-checks cooldown and blacklist, verifies append-only, reverts and refuses
on byte-cap breach. User-rejected skill_edit proposals automatically write
auto_apply_blacklist: true.

Win signals participate in the existing v0.2.0 source_entries archive
lifecycle, so already-applied evidence does not re-cluster.

Test suite: +5 cases (5 new asserts pass), 27 total passing.

Spec:  ~/.claude/docs/superpowers/specs/2026-05-10-adam-proactive-design.md
Plan:  ~/.claude/docs/superpowers/plans/2026-05-10-adam-proactive.md
2026-05-10 20:51:12 +01:00
24 changed files with 5950 additions and 282 deletions
+297 -83
View File
@@ -1,118 +1,332 @@
<div align="center">
<picture>
<source media="(prefers-color-scheme: dark)" srcset="./assets/logo-dark.svg">
<img src="./assets/logo-light.svg" alt="claude-adam logo" width="128" height="128" />
</picture>
# claude-adam
Self-improvement layer for [Claude Code](https://claude.com/claude-code) that observes friction signals during your sessions and proposes targeted improvements (new skills, memory entries, agent edits) which you can review and apply.
**A self-improvement layer for [Claude Code](https://claude.com/claude-code).**
## What it does
Watches the friction in your coding sessions, clusters the signals via an LLM analyst, and proposes targeted improvements — new skills, memory entries, agent edits — that you review and apply.
A lightweight Node.js hook (`adam-observe.mjs`) runs on `UserPromptSubmit`, `PreToolUse`, and `PostToolUse` events. It detects:
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
[![Version](https://img.shields.io/github/v/release/lukaszraczylo/claude-adam?label=version&color=blue)](https://github.com/lukaszraczylo/claude-adam/releases)
[![Tests](https://img.shields.io/badge/tests-138%20passing-brightgreen.svg)](./adam/tests/run-tests.sh)
[![Node](https://img.shields.io/badge/node-22%2B-339933.svg)](https://nodejs.org)
[![Platform](https://img.shields.io/badge/platform-macOS%20%7C%20Linux-lightgrey.svg)]()
| Signal | Trigger |
|---|---|
| `correction` | User prompt contains "no", "stop", "wrong", "actually", etc. after a tool call |
| `retry_loop` | Same tool + same args called 3× in a 10-event window |
| `weak_agent` | Same subagent dispatched 2× in last 5 tool calls |
| `tool_error_loop` | Same error fingerprint appears 3× in a 5-event ring |
| `dead_end` | 8 PostToolUse events without a UserPromptSubmit between them |
| `edit_churn` | Same file edited 4× in a window |
| `build_loop` | 2× build/test/compile commands fail in same session |
| `subagent_dispatch_pattern` | Same subagent dispatched ≥3× cumulatively |
</div>
---
## The story behind Adam
Adam is my newborn son.
Watching him over the last few months — the way he observes the world, tries something, watches what happens, adjusts, and tries again — I realised that the most powerful learning loop in nature is also one of the simplest. No grand theory. No instruction manual. Just relentless feedback and pattern recognition, applied to every waking moment.
LLMs can learn the same way. Give them a hook into the real friction of your work — the corrections, the dead-ends, the moments you say *"no, try again"* — and let them propose improvements grounded in **what actually happened**. Not what they assume might help. What you actually struggled with.
**claude-adam** is that loop, wired into Claude Code. It's named after Adam because the methodology is his.
---
## Highlights
- 🔍 **Zero LLM cost at observation time.** Deterministic regex + counter detection in a Node hook. The analyst only runs when you invoke `/reflect`.
- 📡 **11 signal types.** Friction (`correction`, `tool_error_loop`, `dead_end`, `edit_churn`, …) + reinforcement (`task_completed`, `correction_free_streak`, `clean_recovery`) + meta.
- 🛡️ **Tight auto-apply gates.** Confidence ≥ 4, cross-session evidence, contradiction veto, per-(skill, fingerprint) cooldown. Most things queue for your manual review.
- 📊 **A/B effectiveness measurement.** Every auto-applied edit gets a 7-day pre/post signal-count delta. If a proposed fix made things worse, the next `/reflect` says so.
-**Per-signal sliding windows.** Stale friction doesn't accumulate forever. `dead_end` 7d, `correction` 30d, reinforcement signals 60d.
- 🔬 **Observable.** Every clustering decision (passed / threshold-blocked / window-filtered / contradiction-vetoed) emits a trace. `/reflect --explain` shows it.
- 📦 **Pure Node.** Zero npm dependencies. Runs on macOS and Linux (Alpine smoke-tested).
## Quick start
```sh
curl -fsSL https://raw.githubusercontent.com/lukaszraczylo/claude-adam/main/install.sh | bash
```
The installer copies files into `~/.claude/`, offers to merge ADAM's hook entries into `~/.claude/settings.json` (with a diff preview and `[y/N]` confirm), and preserves any local edits via `.adam-new` sidecar files. Pass `--yes` to skip prompts, `--dry-run` to preview.
Then:
```sh
bash ~/.claude/adam/tests/run-tests.sh # expect: 138 passed, 0 failed
# … start a fresh Claude Code session …
/reflect # walks the proposal queue
/reflect --explain # also shows the analyst's clustering trace
```
Pin a release for reproducibility:
```sh
curl -fsSL https://raw.githubusercontent.com/lukaszraczylo/claude-adam/v0.6.3/install.sh \
| VERSION=v0.6.3 bash
```
### Staying up to date
`install.sh` records the installed release in `~/.claude/adam/.version`. The
SessionStart hook (`adam-nudge.mjs`) then checks the latest GitHub release **at
most once a day** (cached in `~/.claude/adam/.update-check.json`, network call
hard-capped at 1.5 s, fully best-effort — it never blocks or slows session
start). When a newer release exists it prints a one-line, **notify-only** prompt:
```
[adam] update available: v0.6.3 → v0.6.4. Apply: curl -fsSL …/install.sh | bash
(re-runs install.sh — resets ADAM's own /reflect-applied skill edits; apply when you're ready)
```
It is deliberately **not** auto-applied: re-running `install.sh` overwrites
ADAM's own `/reflect`-applied skill edits, so you decide when to take an update.
Disable the check entirely with `ADAM_NO_UPDATE_CHECK=1` in your environment.
## How it works
```mermaid
flowchart TB
subgraph OBS["Observation (deterministic, in-hook, zero LLM cost)"]
direction LR
EV["Tool event /<br/>user prompt"] --> OBSERVE["adam-observe.mjs<br/><sub>regex · counters · ring buffers</sub>"]
OBSERVE --> JOURNAL[("journal.jsonl<br/><sub>append-only signal log</sub>")]
end
JOURNAL -. user runs <code>/reflect</code> .-> ANALYSIS
subgraph ANALYSIS["Analysis (LLM, only on demand)"]
direction TB
subgraph PRE["Pre-processors (deterministic)"]
direction LR
W["adam-window.mjs<br/><sub>per-signal sliding window</sub>"]
S["adam-score.mjs<br/><sub>task_completed dampener<br/>+ reinforcement candidates</sub>"]
AB["adam-ab-measure.mjs<br/><sub>7d pre/post deltas<br/>on prior auto-applies</sub>"]
end
AGENT["adam subagent<br/><sub>cluster · score · diagnose</sub>"]
PRE --> AGENT
AGENT --> PROPOSALS[("proposals/")]
AGENT --> TRACE[["clustering trace<br/><sub>adam-explain.mjs renders</sub>"]]
end
PROPOSALS --> REVIEW
subgraph REVIEW["Review + apply"]
direction TB
GATE{"auto-apply<br/>gates pass?<br/><sub>conf≥4 · low blast<br/>· cooldown cool</sub>"}
GATE -->|yes| APPLIED[("applied/<br/>+ ab-tracking.jsonl")]
GATE -->|no| QUEUE["walk-the-queue<br/><sub>approve · reject · edit</sub>"]
QUEUE -->|approve| APPLIED
QUEUE -->|reject| REJECTED[("rejected/")]
end
APPLIED -. measures back into .-> AB
classDef store fill:#e8f4fd,stroke:#5b9bd5,stroke-width:2px,color:#1f3a5f
classDef proc fill:#fff4e6,stroke:#e8a33d,stroke-width:1px,color:#5a3d0f
classDef trace fill:#f0e8fd,stroke:#7e5dc0,stroke-width:1px,color:#2f1e60
class JOURNAL,PROPOSALS,APPLIED,REJECTED store
class EV,OBSERVE,W,S,AB,AGENT,QUEUE proc
class TRACE trace
```
The observation layer is a ~600-line Node hook. Pure regex, counters, ring buffers — no LLM in the hot path. Signals append one JSONL line per detection to `~/.claude/adam/journal.jsonl`.
The analysis layer is an LLM subagent invoked by `/reflect`. Before the analyst runs, three deterministic pre-processors filter and enrich the journal: `adam-window.mjs` drops stale entries per per-signal age, `adam-score.mjs` computes per-session urgency dampeners + reinforcement candidates, and `adam-ab-measure.mjs` checks whether previously auto-applied edits actually reduced their originating signal.
The analyst clusters signals, scores them against a deterministic rubric (see below), and emits proposal markdown files to `~/.claude/adam/proposals/`. Each proposal carries a `# Diagnosis` block (Trigger / Action / Mismatch / Outcome with a verbatim transcript quote), a `# Success criterion`, and the source journal-entry timestamps it clustered.
Auto-apply runs only for low-blast types (memory entries, new skills, ephemeral nudges, reinforcement logs) backed by cross-session evidence. Everything else queues for your manual approve / reject / edit walk.
## Signals
| Signal | Trigger | Window* |
|---|---|---|
| `correction` | Strong tokens (`stop`, `wrong`, `undo`, …) OR weak tokens (`no`, `actually`, `wait`) with negation/contrast nearby | 30d |
| `retry_loop` | Same tool + same args called 3× in a 10-event window | 14d |
| `weak_agent` | Same subagent dispatched 2× in last 5 tool calls | 30d |
| `tool_error_loop` | Same error fingerprint 3× in a 5-event ring (fingerprints normalised — `ECONNREFUSED` and `"Connection refused"` cluster) | 30d |
| `dead_end` | 8 PostToolUse events without a UserPromptSubmit between them | 7d |
| `edit_churn` | Same file edited 4× in a window | 14d |
| `file_reread` | Same file Read ≥3× in the 10-event window, ignoring offset/limit (catches re-reads that escape `retry_loop`'s arg-hash dedup) | 14d |
| `build_loop` | 2× build/test/compile commands fail in same session | 30d |
| `subagent_dispatch_pattern` | Same subagent dispatched ≥ 3× cumulatively | 30d |
| `correction_free_streak` | 5 clean UserPromptSubmits in a row — reinforcement input | 60d |
| `clean_recovery` | 3 clean PostToolUse events after a struggle signal — reinforcement input | 60d |
| `task_completed` | 5 tools / 3 kinds / 0 corrections — fed into the urgency dampener + reinforcement candidates | 60d |
\* Per-signal sliding window for `/reflect` analysis. See `SIGNAL_WINDOWS_DAYS` in `adam/scripts/adam-window.mjs`.
Detection is local, regex-based, zero LLM cost. Signals append to `~/.claude/adam/journal.jsonl`.
When you run `/reflect`, the `adam` subagent reads the journal, clusters signals, scores them against a deterministic rubric, and emits proposal files to `~/.claude/adam/proposals/`. Auto-applied proposals only ship for low-blast types (memory, new skills) backed by cross-session evidence; everything else queues for your manual approve/reject/edit walk.
## Auto-apply rubric
## Why
```
Sum:
+2 Signal repeated ≥ 3× across ≥ 2 sessions (within signal's window)
+2 Struggle signal appearing ≥ 1× within a single session (does not stack)
+2 Transcript contains positive endorsement near related action
+1 Multi-axis cluster (≥ 2 distinct struggle types in same session)
-1 Type-bias penalty (≥ 3 rejections, applied:rejected < 1:2)
+1 Blast radius low (memory or new isolated skill)
0 Blast radius medium (new agent, new hook, edit existing skill)
-1 Blast radius high (CLAUDE.md, settings hooks, edit agent, deletion)
+1 Surgical (one file, ≤ 50 LOC for non-skill_new; ≤ 80 LOC for skill_new)
-3 Touches deny-list (settings.json hooks/permissions, CLAUDE.md, deletions)
```
LLM coding sessions reveal repeated friction the moment you stop and look. ADAM looks so you don't have to.
Modifiers applied at scoring time:
- × `dampener` from `adam-score.mjs` (0.5 / 0.75 / 1.0 based on session's `task_completed` count) — sessions that net-succeeded score lower urgency.
`auto_apply_eligible` requires **all** of:
- `confidence ≥ 4`
- `blast_radius == low`
- `type ∈ {memory, skill_new, nudge, reinforcement}` (or `skill_edit` via the win-driven gate)
- `cross_session_evidence == true` (except `nudge`, which is single-session by design)
- `adam-cooldown.mjs` returns `cool` for `(target_skill, proposal_fingerprint)`
- `contradiction_flag` unset
`skill_edit` additionally requires:
- Win-signal evidence (`correction_free_streak` / `clean_recovery` cites target skill)
- Diff is append-only, ≤ 30 LOC, resulting size ≤ 2× original
- No auto-edit to same target in past 7 days (per-fingerprint cooldown)
- No rejection-blacklist on target in past 30 days
- `# Diagnosis` section present + structurally valid
Everything else queues.
## Lifecycle: from signal to permanent improvement
Every proposal records the journal entry timestamps that fed its cluster (`source_entries` in frontmatter). When you apply or reject a proposal, the skill calls `adam-archive.mjs` which moves matching entries from `journal.jsonl` to `journal/actioned-<id>.jsonl`. The result:
- `journal.jsonl` stays bounded by **active** observations only.
- The next `/reflect` reads `applied/` + `rejected/` frontmatter, builds an excluded-timestamps set, and skips any leftover journal entries that were already actioned.
- Rule changes (e.g. lowering a threshold) immediately re-evaluate the remaining active observations — no manual cursor rewind needed.
Auto-applied proposals additionally append to `~/.claude/adam/ab-tracking.jsonl`. The next time `/reflect` runs (and 7+ days have passed), `adam-ab-measure.mjs` computes a pre/post delta of the originating signal count. Status: `improved` / `neutral` / `regressed` / `no_baseline` / `pending`. Regressions surface at the top of the analyst's output so a bad fix doesn't quietly persist.
## Inspecting the analyst's reasoning
Every `/reflect` run also writes the analyst's clustering trace to `~/.claude/adam/last-trace.txt`. The trace records, per cluster: signal type, occurrence count, sessions, which gates passed or failed, and whether the cluster produced a proposal or was skipped (with reason: `threshold` / `cross_session` / `window` / `contradiction` / `other`).
```sh
node ~/.claude/adam/scripts/adam-explain.mjs --mode summary # SUMMARY + per-decision counts
node ~/.claude/adam/scripts/adam-explain.mjs --mode full # verbatim trace + rejection histogram
node ~/.claude/adam/scripts/adam-explain.mjs --mode json # machine-readable
```
Or pass `--explain` to `/reflect` to render the full trace inline.
## What it will not do
- 🚫 No background LLM spend. The analyst runs only when you invoke `/reflect`.
- 🚫 No retroactive transcript mining beyond the journal.
- 🚫 No hard `rm` of any artifact. Deletions are soft (`mv` to `trash/<ts>/`).
- 🚫 No autonomous edits to `CLAUDE.md`, agents, hooks, or `settings.json` — these always queue for review regardless of confidence.
- 🚫 No proposal that matches a previously-rejected idea (≥ 2 token overlap with rejection's `# Why`).
- 🚫 No invented trigger phrases for new skills — every trigger comes from observed user input.
## Layout
```
~/.claude/
├── hooks/
│ ├── adam-observe.mjs # signal collector
│ └── adam-nudge.mjs # SessionStart reminder when ≥3 proposals queued
├── agents/adam.md # analyst subagent (system prompt + rubric)
├── skills/adam-self-improvement/SKILL.md # /reflect protocol
├── commands/reflect.md # /reflect slash command
│ ├── adam-observe.mjs # signal collector (UserPromptSubmit / PreToolUse / PostToolUse)
│ └── adam-nudge.mjs # SessionStart reminder + pending-upgrade warning
├── agents/adam.md # analyst subagent (system prompt + rubric)
├── skills/adam-self-improvement/
│ └── SKILL.md # /reflect protocol
├── commands/reflect.md # /reflect slash command
└── adam/
├── journal.jsonl # append-only signal log (active observations)
├── journal/ # rotated daily logs + actioned-<id>.jsonl per applied/rejected proposal
├── state.json # per-session counters (cursor field is vestigial as of v0.2.0)
├── usage.json # skill/agent invocation tallies + payload visibility counters
├── proposals/ # queued, awaiting review
├── applied/ # approved + auto-applied archive
├── rejected/ # rejected (with reason)
├── trash/ # soft-deleted artifacts (recoverable)
├── scripts/ # adam-archive.mjs (called by skill on apply/reject)
── tests/run-tests.sh # 21 verification tests
├── journal.jsonl # active observations
├── journal/ # rotated weekly (YYYY-Www.jsonl) + actioned-<id>.jsonl
├── state.json # per-session counters
├── usage.json # invocation tallies + visibility metrics
├── active-nudges.json # ephemeral SessionStart reminders (auto-expire)
├── ab-tracking.jsonl # one entry per auto-apply, drives effectiveness measurement
├── reinforcements.jsonl # appended on reinforcement proposal apply
├── last-trace.txt # most recent analyst clustering trace
├── proposals/ # queued, awaiting review
── applied/ # approved + auto-applied archive
├── rejected/ # rejected with reason
├── trash/ # soft-deleted artifacts (recoverable)
├── scripts/
│ ├── adam-utils.mjs # shared journal-reading + frontmatter parsing
│ ├── adam-window.mjs # per-signal sliding-window filter
│ ├── adam-score.mjs # urgency dampener + reinforcement candidates
│ ├── adam-ab-measure.mjs # 7d pre/post delta per auto-applied edit
│ ├── adam-cooldown.mjs # per-(skill, fingerprint) cooldown gate
│ ├── adam-nudge-eligibility.mjs # dead_end session-count check
│ ├── adam-explain.mjs # clustering trace parser/renderer
│ ├── adam-apply-reinforcement.mjs # reinforcement proposal apply
│ ├── adam-upgrade.mjs # .adam-new file UX (list/diff/accept)
│ └── adam-archive.mjs # post-apply journal cleanup
└── tests/run-tests.sh # 138 isolated tests; never touches live state
```
## Install
## What's new
```sh
./install.sh
```
The script copies files into `~/.claude/`. **It does NOT modify your `settings.json`** — wire the hook entries manually using `settings.json.example` as reference. Merging into existing settings prevents accidental clobber of your other hooks.
After install:
1. Run the test suite: `bash ~/.claude/adam/tests/run-tests.sh` — must show `18 passed, 0 failed`.
2. Add the hook entries from `settings.json.example` to `~/.claude/settings.json` (preserve your existing hooks; ADAM's are additive).
3. Restart Claude Code, or just run `/reflect` to trigger the skill — Claude Code v2.1.0+ auto-hot-reloads user-level skills, no restart needed.
- **v0.6.3** — release-update notifier. `install.sh` now writes a `~/.claude/adam/.version` marker; `adam-nudge.mjs` (SessionStart) compares it against the latest GitHub release at most once/day (cached, 1.5 s network cap, best-effort — never blocks) and prints a **notify-only** one-line update prompt. Deliberately not auto-applied: re-running the installer resets ADAM's own `/reflect`-applied skill edits, so you choose when to update. Opt out with `ADAM_NO_UPDATE_CHECK=1`. See "Staying up to date". 138 tests (up from 134).
- **v0.6.2** — two fixes surfaced by running ADAM's loop on a large real journal. **(1) A/B volume normalization** (`adam-ab-measure.mjs`): regressions are now measured on the signal's *share* of total activity (rate = count / window-total), not raw count — so a generally busier journal after an apply no longer masquerades as a regression. Falls back to raw delta when the signal is the only activity in the window (preserves prior behavior + tests); output adds `raw_delta_pct`, `pre_total`, `post_total`, `normalized` for transparency. **(2) Memory frontmatter schema** (`agents/adam.md`, `SKILL.md`): the drafting protocol now emits the live auto-memory shape — `name` = slug + a `metadata: {node_type, type, originSessionId}` block — instead of flat `type:`/`originSessionId:`, so auto-applied memories load and categorize correctly. 134 tests (up from 132).
- **v0.6.1** — new `file_reread` signal (MOSS §1 harness self-modification, proposed and approved through ADAM's own `/reflect` loop). Consecutive Reads of the same file at different `offset`/`limit` escaped `retry_loop`'s arg-hash dedup and leaked into `tool_error_loop`; `file_reread` now catches them (same file ≥3× in the 10-event window, offset-agnostic, guarded against double-counting byte-identical reads). Fully wired: detection (`adam-observe.mjs`), 14-day window (`adam-window.mjs`), severity divisor 3 (`adam-score.mjs`), file-basename clustering (`adam-batch.mjs`), and the analyst rubric/spec. 132 tests (up from 126).
- **v0.6.0** — review hardening. Struggle signals now emit `active_skills`, so `silent_drift`'s primary cluster key and the §5b skill-attribution sub-clustering (+1 rubric bonus) actually fire (both were silently dead). `proposal_fingerprint` is now deterministically computable via `adam-cooldown.mjs --compute` instead of asking the LLM analyst to hand-compute a djb2 hash; spec now mandates a *stable* cluster id so fingerprints reproduce across runs. `reinforcement` proposals are correctly excluded from A/B tracking (the spec previously contradicted itself). `adam-nudge.mjs` pending-upgrade check now mirrors the full install set (`adam-utils`/`adam-batch`/`adam-rollback` were missing). Doc/test-count drift corrected. 126 tests (up from 114).
- **v0.5.0** — MOSS-grounded self-evolution (arXiv 2605.22794). Transcript capture: `context_window` field on struggle signals captures 8 surrounding events for evidence-based diagnosis. Two-stage analysis pipeline: diagnose+plan → inter-stage validation → implement (§3.3). Evidence batching via `adam-batch.mjs`: pre-clusters journal into coherent failure batches (§3.1). Pre-apply verification: 4-check deterministic gate before auto-apply (§3.4). Auto-rollback via `adam-rollback.mjs`: reverts regressed proposals detected by A/B measurement, creates regression nudges (§3.5). Harness self-modification: new `harness_edit` proposal type lets ADAM propose edits to its own scripts with test-suite-gated apply (§1 Table 1). Keypoint matrix: 5 capability dimensions scored per batch for structured evaluation (§4.2). 114 tests (up from 94).
- **v0.4.0** — expanded struggle detection: `silent_drift` (5 consecutive read-only tools), `error_after_recovery` (same error fingerprint returns after clean recovery); severity-sum scoring with per-type divisors; extended `STRUGGLE_TYPES` set. 94 tests (up from 87).
- **v0.3.3** — analyst observability, A/B measurement, journal hygiene. ISO-week journal rotation replaces 5MB size-based (fixes silent cluster-straddling under-count); per-signal sliding windows via `adam-window.mjs`; error fingerprint normalisation; correction corpus expanded + weak-token co-occurrence requirement (kills the `"actually, I think..."` false positive); mandatory clustering trace + `adam-explain.mjs`; new `nudge` and `reinforcement` proposal types; per-(skill, fingerprint) cooldown via `adam-cooldown.mjs`; `task_completed` scoring (dampener + reinforcement); A/B effectiveness measurement; upgrade UX overhaul (`adam-upgrade.mjs --list/--diff/--accept`); shared `adam-utils.mjs`. 87 tests (up from 30).
- **v0.3.2** — `task_completed` signal: post-task skill capture for downstream reinforcement scoring (consumed in v0.3.3).
- **v0.3.1** — code review pass: bug fixes (`errorFingerprint` no longer false-positives on `is_error: false`, archive script handles same-millisecond duplicates correctly, `tool_window` clears on session change, nudge filters proposal filenames by pattern), prose conciseness cuts, hardened `install.sh` with curl one-liner + settings.json merge, `adam-uninstall.sh`, isolated test harness.
- **v0.3.0** — causal diagnosis: every proposal carries a `# Diagnosis` block (Trigger/Action/Mismatch/Outcome with verbatim transcript quote), plus `contradiction_flag` heuristic that vetoes auto-apply on obviously-conflicting `skill_edit` additions.
- **v0.2.1** — win signals (`correction_free_streak`, `clean_recovery`) feed `skill_edit` auto-apply under a strict gate (≤ 30 LOC, ≤ 2× byte cap, 7d cooldown, 30d blacklist).
- **v0.2.0** — actioned-entry archival via `adam-archive.mjs`; `cursor` field deprecated.
## Requirements
- Claude Code v2.1.0+ (for auto skill hot-reload; older versions need session restart after `skill_new` proposals are applied)
- Node.js 18+ (for the hook; tested on v22)
- Bash (for the test harness)
- **Claude Code v2.1.0+** — for auto skill hot-reload (older versions need a session restart after `skill_new` proposals).
- **Node.js 18+** — tested on v22, used by the hook + helper scripts. Zero npm dependencies.
- **Bash 4+**, `git`, `curl`, `jq` — for the installer + test harness.
## Confidence rubric
### Platform support
```
Sum:
+2 Signal repeated ≥3× across ≥2 sessions
+2 Struggle signal appearing ≥1× within a single session (does not stack)
+2 Transcript contains positive endorsement near related action
+1 Multi-axis cluster (≥2 distinct struggle types in same session)
-1 Type-bias penalty (≥3 rejections, applied:rejected <1:2)
+1 Blast radius low (memory or new isolated skill)
0 Blast radius medium (new agent, new hook, edit existing skill)
-1 Blast radius high (CLAUDE.md, settings hooks, edit agent, deletion)
+1 Surgical (one file, ≤50 LOC for non-skill_new; ≤80 LOC for skill_new)
-3 Touches deny-list (settings.json hooks/permissions, CLAUDE.md, deletions)
auto_apply_eligible requires ALL:
confidence ≥ 4
blast_radius == low
type ∈ {memory, skill_new}
cross_session_evidence == true (single-session-only proposals always queue)
```
## Lifecycle: how proposals become permanent
Every proposal records the journal entry timestamps that fed its cluster (`source_entries` in frontmatter). When you apply or reject a proposal, the skill calls `adam/scripts/adam-archive.mjs` which moves matching entries from `journal.jsonl` to `journal/actioned-<id>.jsonl`. Effects:
- The `journal.jsonl` stays bounded by **active** observations only.
- The next `/reflect` reads applied/ + rejected/ frontmatter, builds an excluded-timestamps set, and skips any leftover journal entries that were already actioned.
- Rule changes (e.g. lowering a threshold) immediately re-evaluate the remaining active observations — no manual cursor rewind needed.
## What it will not do
- No background LLM spend. The analyst runs only when you invoke `/reflect`.
- No retroactive transcript mining beyond the journal cursor.
- No hard `rm` of any artifact. Deletions are soft (`mv` to `trash/<ts>/`).
- No autonomous edits to `CLAUDE.md`, agents, hooks, or `settings.json` — these always queue for review regardless of confidence.
- No proposal that matches a previously-rejected idea (≥2 token overlap with rejection's `# Why`).
- No invented trigger phrases for new skills — every trigger comes from observed user input.
Tested on **macOS** (Darwin / BSD coreutils) and **Linux** (Alpine, glibc + musl). The install / uninstall / test scripts are written to be portable: `stat` uses BSD `-f` with GNU `-c` fallback, `mktemp -d -t prefix.XXXXXX` works on both, no GNU-only flags. CI smoke verified under `alpine:latest`.
## Uninstall
One-shot:
```sh
rm -rf ~/.claude/{hooks/adam-*.mjs,agents/adam.md,skills/adam-self-improvement,commands/reflect.md,adam}
curl -fsSL https://raw.githubusercontent.com/lukaszraczylo/claude-adam/main/adam-uninstall.sh | bash
```
The uninstaller archives `~/.claude/adam/` to `~/.claude/adam.bak.<ts>/` (preserving your journal/proposals data), removes ADAM files, and offers to strip ADAM hook entries from `~/.claude/settings.json` with a diff prompt. Pass `--yes` to skip the prompt; `--purge` to delete the data archive instead of preserving it.
Manual:
```sh
mv ~/.claude/adam ~/.claude/adam.bak.$(date +%s)
rm -f ~/.claude/hooks/adam-*.mjs ~/.claude/agents/adam.md ~/.claude/commands/reflect.md
rm -rf ~/.claude/skills/adam-self-improvement
```
Then remove the four `adam-*` hook entries from `~/.claude/settings.json`.
## Contributing
Issues and PRs welcome — especially additional signal types, transcript-aware diagnosis improvements, and platform fixes. Run the test suite before opening a PR:
```sh
bash ~/.claude/adam/tests/run-tests.sh
```
## License
[MIT](LICENSE) — © 2026 Lukasz Raczylo
---
<div align="center">
<sub>Named after my son Adam, who taught me that observation is the start of every interesting thing.</sub>
</div>
+88
View File
@@ -0,0 +1,88 @@
#!/usr/bin/env bash
# ADAM uninstaller — reverses install.sh.
# Soft-archives ~/.claude/adam/ (your journal/proposals are preserved by default).
# Removes hook entries from settings.json with a diff prompt.
#
# Usage: ./adam-uninstall.sh [--yes] [--purge]
# --purge: also delete ~/.claude/adam/ data (destructive)
set -euo pipefail
DEST="${HOME}/.claude"
ASSUME_YES=0
PURGE=0
BAK=""
for arg in "$@"; do
case "$arg" in
--yes|-y) ASSUME_YES=1 ;;
--purge) PURGE=1 ;;
--help|-h) sed -n '2,8p' "$0"; exit 0 ;;
*) echo "unknown: $arg" >&2; exit 1 ;;
esac
done
log() { printf ' %s\n' "$*"; }
need() { command -v "$1" >/dev/null 2>&1 || { echo "missing: $1" >&2; exit 1; }; }
need jq
[ -d "$DEST" ] || { echo "$DEST not found"; exit 1; }
log "removing ADAM files"
rm -f "$DEST/hooks/adam-observe.mjs" "$DEST/hooks/adam-nudge.mjs"
rm -f "$DEST/agents/adam.md" "$DEST/commands/reflect.md"
rm -rf "$DEST/skills/adam-self-improvement"
if [ -d "$DEST/adam" ]; then
if [ "$PURGE" = 1 ]; then
log "purging $DEST/adam (--purge)"
rm -rf "$DEST/adam"
else
BAK="$DEST/adam.bak.$(date +%s)"
log "archiving $DEST/adam -> $BAK"
mv "$DEST/adam" "$BAK"
fi
fi
# settings.json — strip ADAM hook entries
SETTINGS="$DEST/settings.json"
if [ -f "$SETTINGS" ]; then
TMP="$(mktemp -t adam-uninstall.XXXXXX)"
jq '
.hooks //= {}
| .hooks |= with_entries(
.value |= (
map(.hooks |= map(select(
(.command // "") | test("adam-(observe|nudge)\\.mjs") | not
)))
| map(select((.hooks // []) | length > 0))
)
)
| .hooks |= with_entries(select((.value | length) > 0))
' "$SETTINGS" > "$TMP"
if cmp -s "$SETTINGS" "$TMP"; then
log "settings.json already clean"
rm -f "$TMP"
else
log ""
log "settings.json changes:"
diff -u "$SETTINGS" "$TMP" | sed 's/^/ /' || true
log ""
if [ "$ASSUME_YES" = 1 ]; then REPLY=y
else printf ' apply? [y/N] '; read -r REPLY </dev/tty || REPLY=n
fi
case "$REPLY" in
y|Y|yes|YES)
cp "$SETTINGS" "$SETTINGS.adam-bak.$(date +%s)"
mv "$TMP" "$SETTINGS"
log " settings.json cleaned"
;;
*) rm -f "$TMP"; log " skipped — edit settings.json manually" ;;
esac
fi
fi
log ""
log "ADAM uninstalled."
[ "$PURGE" = 0 ] && [ -n "$BAK" ] && [ -d "$BAK" ] && log "data archive: $BAK"
+227
View File
@@ -0,0 +1,227 @@
#!/usr/bin/env node
// adam-ab-measure.mjs — A/B effectiveness measurement on auto-applied edits.
//
// Reads ~/.claude/adam/ab-tracking.jsonl (one line per auto-apply event,
// written by adam-self-improvement/SKILL.md), then for each entry old enough
// (>= --min-age-days; default 7) compares the originating signal in the 7-day
// window BEFORE applied_at against the 7-day window AFTER applied_at across the
// full journal corpus (active + rotated). Surfaces regressions so /reflect
// can flag proposals that made things worse.
//
// Volume normalization: when the windows contain other (non-originating)
// activity, the delta is computed on the signal's SHARE of total activity
// (rate = count / total), not its raw count — so a generally busier journal
// after apply does not masquerade as a regression. When the signal is the only
// activity in the windows, it falls back to the raw-count delta. Output carries
// both `delta_pct` (drives status) and `raw_delta_pct` + `normalized` for
// transparency.
//
// CLI:
// adam-ab-measure.mjs [--home <path>] [--format json|table] [--min-age-days N]
//
// Output (default `table`): aligned columns sorted regressed-first.
// Output (`json`): array of deltas.
// Empty / missing tracking file → empty output, exit 0.
// Exit 1 only on I/O failure.
import { join } from "node:path";
import { homedir } from "node:os";
import { readJsonlSafe, listJsonlFiles } from "./adam-utils.mjs";
const DAY_MS = 86400000;
export const DEFAULT_PRE_WINDOW_DAYS = 7;
export const DEFAULT_MIN_AGE_DAYS = 7;
const REGRESSED_PCT = 25;
const IMPROVED_PCT = -25;
function parseArgs(argv) {
const args = { home: null, format: "table", minAgeDays: DEFAULT_MIN_AGE_DAYS, help: false };
for (let i = 0; i < argv.length; i++) {
const a = argv[i];
if (a === "--home" && i + 1 < argv.length) args.home = argv[++i];
else if (a === "--format" && i + 1 < argv.length) args.format = argv[++i];
else if (a === "--min-age-days" && i + 1 < argv.length) {
const n = Number(argv[++i]);
if (!Number.isNaN(n) && n >= 0) args.minAgeDays = n;
}
else if (a === "--help" || a === "-h") args.help = true;
}
return args;
}
function loadJournalAll(claudeHome) {
const adamRoot = join(claudeHome, "adam");
const sources = [join(adamRoot, "journal.jsonl"), ...listJsonlFiles(join(adamRoot, "journal"))];
const all = [];
for (const p of sources) for (const e of readJsonlSafe(p)) all.push(e);
return all;
}
function tsMs(e) {
if (!e || typeof e.ts !== "string") return NaN;
return Date.parse(e.ts);
}
// computeDeltas: pure function — entries = ab-tracking objects, journal = list
// of journal entries (any source). opts.now is unix ms; opts.minAgeDays is the
// floor for non-pending.
export function computeDeltas(entries, journal, opts = {}) {
const now = typeof opts.now === "number" ? opts.now : Date.now();
const minAgeDays = typeof opts.minAgeDays === "number" ? opts.minAgeDays : DEFAULT_MIN_AGE_DAYS;
const out = [];
for (const e of entries || []) {
if (!e || typeof e !== "object") continue;
const appliedAt = Number(e.applied_at);
if (!appliedAt || Number.isNaN(appliedAt)) continue;
const ageDays = (now - appliedAt) / DAY_MS;
// Symmetric window: same span applied to pre AND post sides. JSONL schema
// field stays `pre_window_days` for backward compat with existing
// ab-tracking.jsonl entries — local name reflects symmetry.
const windowDays = typeof e.pre_window_days === "number" ? e.pre_window_days : DEFAULT_PRE_WINDOW_DAYS;
const signals = Array.isArray(e.originating_signals)
? e.originating_signals.map((s) => (s && typeof s === "object" ? s.type : null)).filter(Boolean)
: [];
const sigSet = new Set(signals);
const base = {
proposal_id: e.proposal_id || "",
proposal_type: e.proposal_type || "",
target_skill: e.target_skill || "",
applied_at: appliedAt,
applied_at_iso: new Date(appliedAt).toISOString(),
signal_types: [...sigSet],
};
if (ageDays < minAgeDays) {
out.push({ ...base, pre_count: null, post_count: null, delta_pct: null, status: "pending" });
continue;
}
const preStart = appliedAt - windowDays * DAY_MS;
const postEnd = appliedAt + windowDays * DAY_MS;
// preCount/postCount = originating-signal occurrences; preTotal/postTotal =
// ALL journal entries in the window (the activity denominator).
let preCount = 0;
let postCount = 0;
let preTotal = 0;
let postTotal = 0;
for (const je of journal || []) {
if (!je || typeof je !== "object") continue;
const t = tsMs(je);
if (Number.isNaN(t)) continue;
const inPre = t >= preStart && t < appliedAt;
const inPost = t >= appliedAt && t < postEnd;
if (!inPre && !inPost) continue;
if (inPre) preTotal++; else postTotal++;
if (!sigSet.has(je.type)) continue;
if (inPre) preCount++; else postCount++;
}
let status;
let deltaPct;
let rawDeltaPct = null;
let normalized = false;
if (preCount === 0) {
status = "no_baseline";
deltaPct = null;
} else {
rawDeltaPct = Math.round(((postCount - preCount) / preCount) * 10000) / 100;
// Volume normalization: when the windows contain non-originating activity,
// compare the signal's SHARE of activity (rate), not its absolute count —
// otherwise a generally busier post-window masquerades as a regression.
// No background (signal IS the only activity) → fall back to raw delta,
// preserving prior behavior.
const hasBackground = (preTotal - preCount) + (postTotal - postCount) > 0;
if (hasBackground && postTotal > 0) {
const preRate = preCount / preTotal; // preTotal >= preCount > 0
const postRate = postCount / postTotal;
deltaPct = ((postRate - preRate) / preRate) * 100;
normalized = true;
} else {
deltaPct = ((postCount - preCount) / preCount) * 100;
}
// Round to 2 dp for stable comparison + presentation.
deltaPct = Math.round(deltaPct * 100) / 100;
if (deltaPct <= IMPROVED_PCT) status = "improved";
else if (deltaPct >= REGRESSED_PCT) status = "regressed";
else status = "neutral";
}
out.push({
...base,
pre_count: preCount, post_count: postCount,
pre_total: preTotal, post_total: postTotal,
raw_delta_pct: rawDeltaPct, normalized,
delta_pct: deltaPct, status,
});
}
return out;
}
const STATUS_ORDER = { regressed: 0, neutral: 1, no_baseline: 2, improved: 3, pending: 4 };
function sortForTable(deltas) {
return [...deltas].sort((a, b) => {
const sa = STATUS_ORDER[a.status] ?? 99;
const sb = STATUS_ORDER[b.status] ?? 99;
if (sa !== sb) return sa - sb;
return a.applied_at - b.applied_at;
});
}
function padRight(s, n) { s = String(s); return s.length >= n ? s : s + " ".repeat(n - s.length); }
export function formatTable(deltas) {
if (!deltas || !deltas.length) return "";
const rows = sortForTable(deltas);
const headers = ["proposal_id", "target", "type", "applied_at(iso)", "pre/post", "delta%", "status"];
const data = rows.map((d) => [
d.proposal_id || "-",
d.target_skill || "-",
d.proposal_type || "-",
d.applied_at_iso || "-",
d.pre_count == null ? "-" : `${d.pre_count}/${d.post_count}`,
d.delta_pct == null ? "-" : `${d.delta_pct.toFixed(2)}`,
d.status || "-",
]);
const widths = headers.map((h, i) => Math.max(h.length, ...data.map((r) => String(r[i]).length)));
const lines = [];
lines.push(headers.map((h, i) => padRight(h, widths[i])).join(" | "));
lines.push(widths.map((w) => "-".repeat(w)).join("-+-"));
for (const r of data) lines.push(r.map((c, i) => padRight(c, widths[i])).join(" | "));
return lines.join("\n");
}
export function formatJson(deltas) {
return JSON.stringify(deltas || []);
}
function main() {
const args = parseArgs(process.argv.slice(2));
if (args.help) {
process.stdout.write("usage: adam-ab-measure.mjs [--home <path>] [--format json|table] [--min-age-days N]\n");
process.exit(0);
}
const claudeHome = args.home || join(homedir(), ".claude");
const trackingPath = join(claudeHome, "adam", "ab-tracking.jsonl");
try {
const entries = readJsonlSafe(trackingPath);
if (!entries.length) {
if (args.format === "json") process.stdout.write("[]\n");
// table mode prints nothing on empty input — exit 0.
process.exit(0);
}
const journal = loadJournalAll(claudeHome);
const deltas = computeDeltas(entries, journal, { minAgeDays: args.minAgeDays });
const out = args.format === "json" ? formatJson(deltas) : formatTable(deltas);
if (out) process.stdout.write(out + "\n");
process.exit(0);
} catch (e) {
process.stderr.write(`adam-ab-measure error: ${e.message}\n`);
process.exit(1);
}
}
if (import.meta.url === `file://${process.argv[1]}`) {
main();
}
+91
View File
@@ -0,0 +1,91 @@
#!/usr/bin/env node
// adam-apply-reinforcement.mjs — apply-path for `reinforcement` proposals.
//
// Reads a proposal markdown file, validates the apply gate
// (confidence >= 4 AND blast_radius == "low" AND type == "reinforcement"),
// and on success appends one JSON line to ~/.claude/adam/reinforcements.jsonl
// of shape `{ts, skill_slug, count, source_session}`.
//
// CLI: adam-apply-reinforcement.mjs <proposal-path> [--home <path>]
// Output: JSON one-liner on stdout: {"status":"applied"|"gated", "reason":"..."}
// Exit: 0 on apply, 0 on gated, 1 on I/O or parse error.
//
// SKILL.md invokes this in the auto-apply path when the proposal type is
// `reinforcement`. No code/memory/skill modifications.
import { readFileSync, appendFileSync, existsSync, mkdirSync } from "node:fs";
import { join, dirname } from "node:path";
import { homedir } from "node:os";
import { parseFrontmatter } from "./adam-utils.mjs";
// Re-exported for backward compat — callers historically imported it from here.
export { parseFrontmatter };
function parseArgs(argv) {
const args = { home: null, path: null, help: false };
for (let i = 0; i < argv.length; i++) {
const a = argv[i];
if (a === "--home" && i + 1 < argv.length) args.home = argv[++i];
else if (a === "--help" || a === "-h") args.help = true;
else if (!args.path && !a.startsWith("--")) args.path = a;
}
return args;
}
export function checkGate(fm) {
if ((fm.type || "") !== "reinforcement") {
return { ok: false, reason: `type != reinforcement (got: ${fm.type || "<none>"})` };
}
const conf = Number(fm.confidence);
if (Number.isNaN(conf) || conf < 4) {
return { ok: false, reason: `confidence < 4 (got: ${fm.confidence ?? "<none>"})` };
}
if ((fm.blast_radius || "").toLowerCase() !== "low") {
return { ok: false, reason: `blast_radius != low (got: ${fm.blast_radius || "<none>"})` };
}
if (!fm.skill_slug) {
return { ok: false, reason: "skill_slug missing in frontmatter" };
}
return { ok: true };
}
export function buildEntry(fm, now = Date.now()) {
return {
ts: now,
skill_slug: String(fm.skill_slug),
count: Number(fm.count) || 0,
source_session: fm.source_session || "",
};
}
function main() {
const args = parseArgs(process.argv.slice(2));
if (args.help || !args.path) {
process.stdout.write("usage: adam-apply-reinforcement.mjs <proposal-path> [--home <path>]\n");
process.exit(args.help ? 0 : 1);
}
const claudeHome = args.home || join(homedir(), ".claude");
const outPath = join(claudeHome, "adam", "reinforcements.jsonl");
try {
const content = readFileSync(args.path, "utf8");
const fm = parseFrontmatter(content);
const gate = checkGate(fm);
if (!gate.ok) {
process.stdout.write(JSON.stringify({ status: "gated", reason: gate.reason }) + "\n");
process.exit(0);
}
const entry = buildEntry(fm);
const dir = dirname(outPath);
if (!existsSync(dir)) mkdirSync(dir, { recursive: true });
appendFileSync(outPath, JSON.stringify(entry) + "\n");
process.stdout.write(JSON.stringify({ status: "applied", path: outPath }) + "\n");
process.exit(0);
} catch (e) {
process.stderr.write(`adam-apply-reinforcement error: ${e.message}\n`);
process.exit(1);
}
}
if (import.meta.url === `file://${process.argv[1]}`) {
main();
}
+8 -40
View File
@@ -8,49 +8,12 @@
import { readFileSync, writeFileSync, appendFileSync, mkdirSync, existsSync } from "node:fs";
import { join } from "node:path";
import { homedir } from "node:os";
import { parseFrontmatter } from "./adam-utils.mjs";
const ROOT = join(homedir(), ".claude", "adam");
const JOURNAL = join(ROOT, "journal.jsonl");
const JOURNAL_DIR = join(ROOT, "journal");
function parseFrontmatter(content) {
const m = content.match(/^---\n([\s\S]*?)\n---/);
if (!m) return {};
const fm = {};
const lines = m[1].split("\n");
let i = 0;
while (i < lines.length) {
const line = lines[i];
const idx = line.indexOf(":");
if (idx === -1) { i++; continue; }
const key = line.slice(0, idx).trim();
const value = line.slice(idx + 1).trim();
if (key === "source_entries") {
const arr = [];
if (value.startsWith("[") && value.endsWith("]")) {
const inner = value.slice(1, -1)
.split(",")
.map(s => s.trim().replace(/^['"]|['"]$/g, ""));
arr.push(...inner.filter(Boolean));
fm[key] = arr;
i++;
continue;
}
i++;
while (i < lines.length && /^\s*-\s+/.test(lines[i])) {
const item = lines[i].replace(/^\s*-\s+/, "").trim().replace(/^['"]|['"]$/g, "");
if (item) arr.push(item);
i++;
}
fm[key] = arr;
continue;
}
fm[key] = value;
i++;
}
return fm;
}
function main() {
const proposalPath = process.argv[2];
if (!proposalPath) {
@@ -81,15 +44,20 @@ function main() {
}
const lines = readFileSync(JOURNAL, "utf8").split("\n").filter(Boolean);
const tsSet = new Set(sourceEntries);
// tsCounts: how many entries with this ts the proposal claims as its own.
// Same-millisecond duplicates: only consume up to the recorded count.
const tsCounts = new Map();
for (const ts of sourceEntries) tsCounts.set(ts, (tsCounts.get(ts) || 0) + 1);
const matched = [];
const remaining = [];
for (const line of lines) {
try {
const e = JSON.parse(line);
if (e.ts && tsSet.has(e.ts)) {
const remainingCount = e.ts ? (tsCounts.get(e.ts) || 0) : 0;
if (remainingCount > 0) {
matched.push(line);
tsCounts.set(e.ts, remainingCount - 1);
} else {
remaining.push(line);
}
+186
View File
@@ -0,0 +1,186 @@
#!/usr/bin/env node
// adam-batch.mjs — pre-clusters windowed journal entries into coherent failure
// batches before analyst dispatch. Implements MOSS §3.1: "anchored to an
// automatically curated batch of production-failure evidence."
//
// Each batch groups entries by (signal_type, cluster_key) where cluster_key
// follows the same clustering rules as agents/adam.md ## Signal types / ## Process step 4:
// correction → tokenized phrase (cross-cwd)
// retry_loop → tool
// weak_agent → subagent_type
// tool_error_loop→ fp
// dead_end → session
// edit_churn → file basename
// file_reread → file basename
// build_loop → session
// subagent_dispatch_pattern → subagent_type
// silent_drift → active_skills[0]
// error_after_recovery → (recovered_from, original_fp)
// correction_free_streak → active_skills[0]
// clean_recovery → (recovered_from, active_skills[0])
// task_completed → sorted tool_kinds tuple
//
// CLI:
// adam-batch.mjs [--input <jsonl-path>] [--min-entries N] [--min-sessions N]
//
// Output: JSON object with `batches` array and `unbatched` count.
import { readFileSync } from "node:fs";
import { readJsonlSafe } from "./adam-utils.mjs";
const DEFAULT_MIN_ENTRIES = 1;
const DEFAULT_MIN_SESSIONS = 1;
const CORRECTION_STOPWORDS = new Set([
"the", "a", "an", "and", "or", "but", "of", "to", "for", "in", "on",
"with", "use", "when", "where", "what", "why", "how", "this", "that",
"these", "those", "is", "are", "was", "were", "be", "been", "being",
"do", "does", "did", "doing", "has", "have", "had", "your", "you",
"i", "it", "as", "at", "by", "from", "not", "no",
]);
function tokenizePhrase(phrase) {
if (!phrase || typeof phrase !== "string") return "";
return phrase.toLowerCase()
.split(/\s+/)
.map(t => t.replace(/^[^\w']+|[^\w']+$/g, ""))
.filter(t => t && !CORRECTION_STOPWORDS.has(t))
.sort()
.join("|");
}
function clusterKey(entry) {
if (!entry || typeof entry !== "object") return null;
const t = entry.type;
switch (t) {
case "correction":
return tokenizePhrase(entry.phrase) || "unknown";
case "retry_loop":
return entry.tool || "unknown";
case "weak_agent":
case "subagent_dispatch_pattern":
return entry.subagent_type || "unknown";
case "tool_error_loop":
return entry.fp || "unknown";
case "dead_end":
case "build_loop":
return entry.session || "unknown";
case "edit_churn":
case "file_reread":
return entry.file ? entry.file.split("/").pop() : "unknown";
case "silent_drift":
case "correction_free_streak":
return Array.isArray(entry.active_skills) ? (entry.active_skills[0] || "") : "";
case "error_after_recovery":
return `${entry.recovered_from || "?"}:${entry.original_fp || "?"}`;
case "clean_recovery":
return `${entry.recovered_from || "?"}:${Array.isArray(entry.active_skills) ? (entry.active_skills[0] || "") : ""}`;
case "task_completed":
return Array.isArray(entry.tool_kinds) ? entry.tool_kinds.slice().sort().join(",") : "unknown";
default:
return entry.session || "unknown";
}
}
function parseArgs(argv) {
const args = { input: null, minEntries: DEFAULT_MIN_ENTRIES, minSessions: DEFAULT_MIN_SESSIONS, help: false };
for (let i = 0; i < argv.length; i++) {
const a = argv[i];
if (a === "--input" && i + 1 < argv.length) args.input = argv[++i];
else if (a === "--min-entries" && i + 1 < argv.length) {
const n = Number(argv[++i]);
if (!Number.isNaN(n) && n > 0) args.minEntries = n;
}
else if (a === "--min-sessions" && i + 1 < argv.length) {
const n = Number(argv[++i]);
if (!Number.isNaN(n) && n > 0) args.minSessions = n;
}
else if (a === "--help" || a === "-h") args.help = true;
}
return args;
}
export function buildBatches(entries, opts = {}) {
const minEntries = opts.minEntries || DEFAULT_MIN_ENTRIES;
const minSessions = opts.minSessions || DEFAULT_MIN_SESSIONS;
const map = new Map();
for (const e of entries || []) {
if (!e || typeof e !== "object" || !e.type) continue;
const key = `${e.type}::${clusterKey(e)}`;
if (!map.has(key)) {
map.set(key, {
batch_id: null,
signal_type: e.type,
cluster_key: clusterKey(e),
entries: [],
sessions: new Set(),
cwds: new Set(),
});
}
const batch = map.get(key);
batch.entries.push(e);
if (e.session) batch.sessions.add(e.session);
if (e.cwd) batch.cwds.add(e.cwd);
}
const batches = [];
let unbatched = 0;
let id = 1;
for (const [, batch] of map) {
if (batch.entries.length < minEntries || batch.sessions.size < minSessions) {
unbatched += batch.entries.length;
continue;
}
batch.batch_id = `b${id++}`;
batches.push({
batch_id: batch.batch_id,
signal_type: batch.signal_type,
cluster_key: batch.cluster_key,
entry_count: batch.entries.length,
session_count: batch.sessions.size,
cwd_count: batch.cwds.size,
has_context_window: batch.entries.some(e => Array.isArray(e.context_window) && e.context_window.length > 0),
entries: batch.entries,
});
}
batches.sort((a, b) => b.entry_count - a.entry_count);
return { batches, unbatched, total: (entries || []).length };
}
function main() {
const args = parseArgs(process.argv.slice(2));
if (args.help) {
process.stdout.write("usage: adam-batch.mjs [--input <jsonl-path>] [--min-entries N] [--min-sessions N]\n");
process.exit(0);
}
try {
let entries;
if (args.input) {
entries = readJsonlSafe(args.input);
} else if (!process.stdin.isTTY) {
const buf = readFileSync(0, "utf8");
entries = [];
for (const line of buf.split("\n")) {
if (!line) continue;
try { entries.push(JSON.parse(line)); } catch { /* skip */ }
}
} else {
process.stderr.write("adam-batch: no input (use --input or pipe)\n");
process.exit(1);
}
const result = buildBatches(entries, { minEntries: args.minEntries, minSessions: args.minSessions });
process.stdout.write(JSON.stringify(result) + "\n");
process.exit(0);
} catch (e) {
process.stderr.write(`adam-batch error: ${e.message}\n`);
process.exit(1);
}
}
if (import.meta.url === `file://${process.argv[1]}`) {
main();
}
export { clusterKey, tokenizePhrase };
+221
View File
@@ -0,0 +1,221 @@
#!/usr/bin/env node
// adam-cooldown.mjs — per-(skill, proposal_fingerprint) cooldown / blacklist
// gate. Replaces the previous coarse per-skill cooldown.
//
// CLI:
// adam-cooldown.mjs --skill <slug> --fingerprint <hash> [--home <path>]
// adam-cooldown.mjs --compute --skill <slug> --cluster <id> [--diff-file <path>]
// → prints {"fingerprint":"<djb2_base36>"}; diff body read from --diff-file
// or stdin. This is how proposal_fingerprint is populated (the analyst
// runs it via Bash after drafting a proposal).
//
// Output (gate mode): JSON one-liner with shape
// { "status": "cool"|"cooldown"|"blacklisted",
// "reason": "<human-readable reason>",
// "blocked_by": { "file": "<basename>", "days_remaining": <int> } | null }
//
// Rules:
// - applied/*.md with target_skill == <skill> AND
// (proposal_fingerprint == <fingerprint> OR missing/legacy)
// within 7 days of `applied_at` → "cooldown"
// - rejected/*.md with same skill match AND
// auto_apply_blacklist: true within 30 days of applied_at → "blacklisted"
// - else "cool"
//
// Backward compat: proposals without `proposal_fingerprint` field are treated
// as fingerprint == "legacy" so historical applied/rejected records still
// produce coarse-grained gating until they age out of their windows.
import { readFileSync, readdirSync, existsSync, statSync } from "node:fs";
import { join } from "node:path";
import { homedir } from "node:os";
import { parseFrontmatter } from "./adam-utils.mjs";
export const COOLDOWN_DAYS = 7;
export const BLACKLIST_DAYS = 30;
const DAY_MS = 86400000;
export const LEGACY_FINGERPRINT = "legacy";
function parseArgs(argv) {
const args = { home: null, skill: null, fingerprint: null, compute: false, cluster: null, diffFile: null, help: false };
for (let i = 0; i < argv.length; i++) {
const a = argv[i];
if (a === "--home" && i + 1 < argv.length) args.home = argv[++i];
else if (a === "--skill" && i + 1 < argv.length) args.skill = argv[++i];
else if (a === "--fingerprint" && i + 1 < argv.length) args.fingerprint = argv[++i];
else if (a === "--cluster" && i + 1 < argv.length) args.cluster = argv[++i];
else if (a === "--diff-file" && i + 1 < argv.length) args.diffFile = argv[++i];
else if (a === "--compute") args.compute = true;
else if (a === "--help" || a === "-h") args.help = true;
}
return args;
}
// Pull applied_at as epoch ms. Accept ms-number, ISO string, or fall back to
// the file's mtime so we never crash on legacy records.
function frontmatterTimestampMs(fm, filePath) {
const raw = fm.applied_at;
if (raw) {
const asNum = Number(raw);
if (!Number.isNaN(asNum) && asNum > 0) return asNum;
const asIso = Date.parse(raw);
if (!Number.isNaN(asIso)) return asIso;
}
try { return statSync(filePath).mtimeMs; } catch { return 0; }
}
function fingerprintMatches(recordFp, queryFp) {
// Missing / empty field on legacy records → coarse fallback: any fingerprint
// query matches (so the historical applied/rejected record still gates).
if (!recordFp || recordFp === LEGACY_FINGERPRINT) return true;
return recordFp === queryFp;
}
// Resolve a frontmatter record to its skill slug. Modern records use
// `target_skill`; legacy v0.2.x records used `target` with a full path
// (e.g. `skills/foo/SKILL.md`). Falls back through both before giving up.
function resolveSkill(fm) {
if (fm.target_skill) return fm.target_skill;
if (fm.skill) return fm.skill;
if (fm.target) {
const base = fm.target.split("/").filter(Boolean);
// skills/<slug>/SKILL.md → <slug>; <slug>.md → <slug>; else last segment
if (base.length >= 2 && base[base.length - 1] === "SKILL.md") {
return base[base.length - 2];
}
return base[base.length - 1].replace(/\.md$/, "");
}
return "";
}
function scanDir(dir, predicate) {
if (!existsSync(dir)) return [];
let names;
try { names = readdirSync(dir); } catch { return []; }
const out = [];
for (const name of names) {
if (!name.endsWith(".md")) continue;
const p = join(dir, name);
let content;
try { content = readFileSync(p, "utf8"); } catch { continue; }
const fm = parseFrontmatter(content);
const hit = predicate(fm, p, name);
if (hit) out.push(hit);
}
return out;
}
export function checkCooldown(home, skill, fingerprint, now = Date.now()) {
const adamRoot = join(home, "adam");
const appliedDir = join(adamRoot, "applied");
const rejectedDir = join(adamRoot, "rejected");
// Applied → cooldown
const appliedHits = scanDir(appliedDir, (fm, p, name) => {
if (resolveSkill(fm) !== skill) return null;
if (!fingerprintMatches(fm.proposal_fingerprint, fingerprint)) return null;
const tsMs = frontmatterTimestampMs(fm, p);
if (!tsMs) return null;
const ageDays = (now - tsMs) / DAY_MS;
if (ageDays > COOLDOWN_DAYS) return null;
return { name, daysRemaining: Math.max(0, Math.ceil(COOLDOWN_DAYS - ageDays)) };
});
// Rejected → blacklisted (requires auto_apply_blacklist: true)
const blacklistHits = scanDir(rejectedDir, (fm, p, name) => {
if (resolveSkill(fm) !== skill) return null;
if (!fingerprintMatches(fm.proposal_fingerprint, fingerprint)) return null;
const flag = (fm.auto_apply_blacklist || "").toLowerCase();
if (flag !== "true") return null;
const tsMs = frontmatterTimestampMs(fm, p);
if (!tsMs) return null;
const ageDays = (now - tsMs) / DAY_MS;
if (ageDays > BLACKLIST_DAYS) return null;
return { name, daysRemaining: Math.max(0, Math.ceil(BLACKLIST_DAYS - ageDays)) };
});
if (blacklistHits.length) {
const h = blacklistHits[0];
return {
status: "blacklisted",
reason: `auto_apply_blacklist active on rejected/${h.name}`,
blocked_by: { file: h.name, days_remaining: h.daysRemaining },
};
}
if (appliedHits.length) {
const h = appliedHits[0];
return {
status: "cooldown",
reason: `applied within ${COOLDOWN_DAYS}d (applied/${h.name})`,
blocked_by: { file: h.name, days_remaining: h.daysRemaining },
};
}
return { status: "cool", reason: "no recent applied/rejected match", blocked_by: null };
}
// djb2 hash returned as base36 — deterministic, no deps.
function djb2(s) {
let h = 5381;
for (let i = 0; i < s.length; i++) {
h = (((h << 5) + h) ^ s.charCodeAt(i)) >>> 0; // xor variant, force u32
}
return h.toString(36);
}
export function computeProposalFingerprint(proposal) {
if (!proposal || typeof proposal !== "object") return LEGACY_FINGERPRINT;
const skill = proposal.skill_slug || proposal.target_skill || proposal.skill || "";
const cluster = proposal.signal_cluster_id || proposal.cluster_id || "";
// normalized_diff_body: whitespace (incl. newlines) collapsed to single
// spaces, then trimmed. Matches agents/adam.md §"Per-(skill, fingerprint)
// cooldown". (No trailing-newline strip needed — \s+ already absorbed them.)
const diff = String(proposal.diff_body || proposal.proposed_change || "")
.replace(/\s+/g, " ")
.trim();
return djb2(`${skill}\n${cluster}\n${diff}`);
}
function main() {
const args = parseArgs(process.argv.slice(2));
if (args.help) {
process.stdout.write(
"usage: adam-cooldown.mjs --skill <slug> --fingerprint <hash> [--home <path>]\n" +
" adam-cooldown.mjs --compute --skill <slug> --cluster <id> [--diff-file <path>]\n"
);
process.exit(0);
}
// --compute: deterministically derive a proposal_fingerprint. The analyst
// invokes this (it has Bash) after drafting a proposal, then writes the
// result into proposal frontmatter so the cooldown gate keys on it.
if (args.compute) {
let diff = "";
if (args.diffFile) {
try { diff = readFileSync(args.diffFile, "utf8"); } catch { /* empty → still deterministic */ }
} else {
try { diff = readFileSync(0, "utf8"); } catch { /* no stdin */ }
}
const fp = computeProposalFingerprint({
skill_slug: args.skill || "",
signal_cluster_id: args.cluster || "",
diff_body: diff,
});
process.stdout.write(JSON.stringify({ fingerprint: fp }) + "\n");
process.exit(0);
}
if (!args.skill || !args.fingerprint) {
process.stderr.write("adam-cooldown: --skill and --fingerprint required\n");
process.exit(1);
}
const home = args.home || join(homedir(), ".claude");
try {
const result = checkCooldown(home, args.skill, args.fingerprint);
process.stdout.write(JSON.stringify(result) + "\n");
process.exit(0);
} catch (e) {
process.stderr.write(`adam-cooldown error: ${e.message}\n`);
process.exit(1);
}
}
if (import.meta.url === `file://${process.argv[1]}`) {
main();
}
+239
View File
@@ -0,0 +1,239 @@
#!/usr/bin/env node
// adam-explain.mjs — render the analyst's clustering trace in summary / full / json modes.
//
// The adam agent ALWAYS emits a fenced ```trace block after its proposals. The
// adam-self-improvement skill persists the most recent trace to
// ~/.claude/adam/last-trace.txt. This tool parses and presents it.
//
// Trace line grammar (per agents/adam.md "Clustering trace (always emit)"):
// <cluster_id> | signal=<type> count=<N> sessions=<M> | gates: threshold=<pass|fail:<reason>>, cross_session=<pass|fail>, window=<in:<N>/out:<M>>, contradiction=<none|vetoed:[[memory-name]]> | decision: <proposal_emitted:<type>|skipped:<reason>>
// Trailing summary line:
// SUMMARY: considered=<N> emitted=<M> skipped=<N-M> reasons={threshold:X, contradiction:Y, window:Z, other:W}
//
// Usage: adam-explain.mjs [--input <path>] [--mode summary|full|json] [--home <path>]
import { readFileSync, existsSync } from "node:fs";
import { join } from "node:path";
import { homedir } from "node:os";
function parseArgs(argv) {
const args = { input: null, mode: "summary", home: null };
for (let i = 0; i < argv.length; i++) {
const a = argv[i];
if (a === "--input" && i + 1 < argv.length) args.input = argv[++i];
else if (a === "--mode" && i + 1 < argv.length) args.mode = argv[++i];
else if (a === "--home" && i + 1 < argv.length) args.home = argv[++i];
}
return args;
}
// Extract the inside of a ```trace ... ``` fenced block, or return the input
// verbatim if no fence is present (tolerant mode).
function extractFenced(text) {
const m = text.match(/```trace\s*\n([\s\S]*?)\n```/);
return m ? m[1] : text;
}
// Parse a single cluster line. Returns null when no recognisable structure.
function parseClusterLine(line) {
// Three pipe-separated chunks: id | signal=… count=… sessions=… | gates: … | decision: …
const parts = line.split("|").map((s) => s.trim());
if (parts.length < 4) return null;
const id = parts[0];
if (!id) return null;
const sigChunk = parts[1];
const gatesChunk = parts[2];
const decisionChunk = parts.slice(3).join("|").trim();
const sigM = sigChunk.match(/signal=(\S+)\s+count=(\d+)\s+sessions=(\d+)/);
if (!sigM) return null;
const signal = sigM[1];
const count = Number(sigM[2]);
const sessions = Number(sigM[3]);
if (!gatesChunk.startsWith("gates:")) return null;
const gatesBody = gatesChunk.slice("gates:".length).trim();
const gates = {};
// gates body is a comma-separated list of key=value with possible commas inside values
// we accept simple key=token,key=token,… — split on commas not inside [[ ]]
const tokens = [];
let depth = 0;
let buf = "";
for (const ch of gatesBody) {
if (ch === "[") depth++;
else if (ch === "]") depth--;
if (ch === "," && depth === 0) { tokens.push(buf.trim()); buf = ""; }
else buf += ch;
}
if (buf.trim()) tokens.push(buf.trim());
for (const t of tokens) {
const idx = t.indexOf("=");
if (idx === -1) continue;
gates[t.slice(0, idx).trim()] = t.slice(idx + 1).trim();
}
if (!decisionChunk.startsWith("decision:")) return null;
const decision = decisionChunk.slice("decision:".length).trim();
return { id, signal, count, sessions, gates, decision };
}
function parseSummaryLine(line) {
// SUMMARY: considered=N emitted=M skipped=K [regressions=R] reasons={...}
// `regressions` is optional (added in #5 — A/B measurement). Pre-existing
// traces lacking the token still parse.
const m = line.match(
/^SUMMARY:\s*considered=(\d+)\s+emitted=(\d+)\s+skipped=(\d+)(?:\s+regressions=(\d+))?\s+reasons=\{([^}]*)\}\s*$/
);
if (!m) return null;
const reasons = {};
for (const piece of m[5].split(",")) {
const kv = piece.trim();
if (!kv) continue;
const idx = kv.indexOf(":");
if (idx === -1) continue;
reasons[kv.slice(0, idx).trim()] = Number(kv.slice(idx + 1).trim()) || 0;
}
return {
considered: Number(m[1]),
emitted: Number(m[2]),
skipped: Number(m[3]),
regressions: m[4] != null ? Number(m[4]) : 0,
reasons,
};
}
export function parseTrace(text) {
const body = extractFenced(text || "");
const lines = body.split("\n").map((l) => l.replace(/\s+$/, "")).filter((l) => l.trim().length);
const clusters = [];
let summary = null;
const warnings = [];
for (const line of lines) {
if (line.startsWith("SUMMARY:")) {
const s = parseSummaryLine(line);
if (s) summary = s;
else warnings.push(`malformed summary line: ${line}`);
continue;
}
const c = parseClusterLine(line);
if (c) clusters.push(c);
else warnings.push(`malformed cluster line: ${line}`);
}
// Synthesize a summary from clusters when none provided AND clusters parsed.
if (!summary && clusters.length) {
const reasons = { threshold: 0, contradiction: 0, window: 0, other: 0 };
let emitted = 0;
for (const c of clusters) {
if (c.decision.startsWith("proposal_emitted")) emitted++;
else {
const r = (c.decision.match(/^skipped:(\S+)/) || [])[1] || "other";
reasons[r in reasons ? r : "other"]++;
}
}
summary = {
considered: clusters.length,
emitted,
skipped: clusters.length - emitted,
regressions: 0,
reasons,
};
}
return { clusters, summary, warnings };
}
function countByDecision(clusters) {
const counts = {};
for (const c of clusters) {
const key = c.decision.startsWith("proposal_emitted")
? "proposal_emitted"
: (c.decision.match(/^skipped:(\S+)/)?.[1] || "other");
counts[key] = (counts[key] || 0) + 1;
}
return counts;
}
export function formatSummary(parsed) {
const s = parsed.summary;
if (!s) return "no trace data";
const reasonStr = Object.entries(s.reasons)
.map(([k, v]) => `${k}:${v}`)
.join(", ");
const counts = countByDecision(parsed.clusters);
const breakdown = Object.entries(counts)
.map(([k, v]) => `${k}=${v}`)
.join(" ");
const head = `considered=${s.considered} emitted=${s.emitted} skipped=${s.skipped} reasons={${reasonStr}}`;
return breakdown ? `${head}\nclusters by decision: ${breakdown}` : head;
}
export function formatFull(parsed) {
const lines = [];
for (const c of parsed.clusters) {
const gatesStr = Object.entries(c.gates).map(([k, v]) => `${k}=${v}`).join(", ");
lines.push(`${c.id} | signal=${c.signal} count=${c.count} sessions=${c.sessions} | gates: ${gatesStr} | decision: ${c.decision}`);
}
if (parsed.summary) {
const s = parsed.summary;
const reasonStr = Object.entries(s.reasons).map(([k, v]) => `${k}:${v}`).join(", ");
lines.push(`SUMMARY: considered=${s.considered} emitted=${s.emitted} skipped=${s.skipped} regressions=${s.regressions ?? 0} reasons={${reasonStr}}`);
}
// Histogram footer: only count actual rejection reasons from clusters.
const hist = {};
for (const c of parsed.clusters) {
const m = c.decision.match(/^skipped:(\S+)/);
if (m) hist[m[1]] = (hist[m[1]] || 0) + 1;
}
const histStr = Object.entries(hist).map(([k, v]) => `${k} ${v}`).join(", ");
lines.push(`Rejection reasons: ${histStr || "none"}`);
return lines.join("\n");
}
export function formatJson(parsed) {
return JSON.stringify({
clusters: parsed.clusters,
summary: parsed.summary,
}, null, 2);
}
function main() {
const args = parseArgs(process.argv.slice(2));
const claudeHome = args.home || join(homedir(), ".claude");
const defaultInput = join(claudeHome, "adam", "last-trace.txt");
const inputPath = args.input || defaultInput;
let raw;
try {
if (!existsSync(inputPath)) {
process.stderr.write(`adam-explain: input not found: ${inputPath}\n`);
process.exit(1);
}
raw = readFileSync(inputPath, "utf8");
} catch (e) {
process.stderr.write(`adam-explain: read failed: ${e.message}\n`);
process.exit(1);
}
const parsed = parseTrace(raw);
if (!parsed.clusters.length && !parsed.summary) {
process.stderr.write(`adam-explain: no parseable trace lines in ${inputPath}\n`);
process.exit(1);
}
for (const w of parsed.warnings) {
process.stderr.write(`adam-explain: warn: ${w}\n`);
}
const mode = args.mode || "summary";
let out;
if (mode === "full") out = formatFull(parsed);
else if (mode === "json") out = formatJson(parsed);
else out = formatSummary(parsed);
process.stdout.write(out + "\n");
}
if (import.meta.url === `file://${process.argv[1]}`) {
try { main(); } catch (e) {
process.stderr.write(`adam-explain error: ${e.message}\n`);
process.exit(1);
}
}
+97
View File
@@ -0,0 +1,97 @@
#!/usr/bin/env node
// adam-nudge-eligibility.mjs — checks whether the current session has accrued
// enough `dead_end` entries to warrant emitting a cross-session nudge.
//
// CLI: adam-nudge-eligibility.mjs [--home <path>] [--session <id>]
// --home defaults to $HOME/.claude
// --session if absent, reads state.json `session_id` field
//
// Output (stdout): JSON one-liner
// eligible: {"eligible": true, "session_id": "...", "dead_end_count": N, "last_ts": "..."}
// not: {"eligible": false, "session_id": "...", "dead_end_count": N, "last_ts": "..."|null}
// Exit codes:
// 0 — read succeeded (eligible OR not)
// 1 — read failure / unable to resolve session
//
// Threshold: ≥3 dead_end entries within a single session_id across the active
// journal + all rotated journal/*.jsonl files. Threshold matches the
// "dead-end checkpoint" feedback rule.
import { readFileSync, existsSync } from "node:fs";
import { join } from "node:path";
import { homedir } from "node:os";
import { readJsonlSafe, listJsonlFiles } from "./adam-utils.mjs";
export const DEAD_END_THRESHOLD = 3;
function parseArgs(argv) {
const args = { home: null, session: null, help: false };
for (let i = 0; i < argv.length; i++) {
const a = argv[i];
if (a === "--home" && i + 1 < argv.length) args.home = argv[++i];
else if (a === "--session" && i + 1 < argv.length) args.session = argv[++i];
else if (a === "--help" || a === "-h") args.help = true;
}
return args;
}
function resolveSession(home, fallback) {
if (fallback) return fallback;
const statePath = join(home, "adam", "state.json");
if (!existsSync(statePath)) return null;
try {
const st = JSON.parse(readFileSync(statePath, "utf8"));
return st && typeof st.session_id === "string" ? st.session_id : null;
} catch { return null; }
}
export function checkEligibility(home, sessionId) {
const adamRoot = join(home, "adam");
const sources = [
join(adamRoot, "journal.jsonl"),
...listJsonlFiles(join(adamRoot, "journal")),
];
let count = 0;
let lastTs = null;
for (const p of sources) {
for (const e of readJsonlSafe(p)) {
if (!e || e.type !== "dead_end") continue;
if (e.session !== sessionId && e.session_id !== sessionId) continue;
count++;
const ts = typeof e.ts === "string" ? e.ts : null;
if (ts && (!lastTs || ts > lastTs)) lastTs = ts;
}
}
return {
eligible: count >= DEAD_END_THRESHOLD,
session_id: sessionId,
dead_end_count: count,
last_ts: lastTs,
};
}
function main() {
const args = parseArgs(process.argv.slice(2));
if (args.help) {
process.stdout.write("usage: adam-nudge-eligibility.mjs [--home <path>] [--session <id>]\n");
process.exit(0);
}
const home = args.home || join(homedir(), ".claude");
const sessionId = resolveSession(home, args.session);
if (!sessionId) {
process.stderr.write("adam-nudge-eligibility: no session_id (pass --session or seed state.json)\n");
process.exit(1);
}
try {
const result = checkEligibility(home, sessionId);
process.stdout.write(JSON.stringify(result) + "\n");
process.exit(0);
} catch (e) {
process.stderr.write(`adam-nudge-eligibility error: ${e.message}\n`);
process.exit(1);
}
}
if (import.meta.url === `file://${process.argv[1]}`) {
main();
}
+225
View File
@@ -0,0 +1,225 @@
#!/usr/bin/env node
// adam-rollback.mjs — auto-reverts proposals that regressed after apply.
//
// Implements MOSS §3.5: "rollback is mandatory because... a candidate that
// passes trial can still regress live."
//
// For each regressed proposal (detected by adam-ab-measure.mjs):
// 1. Reads the applied proposal from applied/
// 2. Parses the `# Rollback` section for undo commands
// 3. Moves proposal from applied/ to proposals/ with `rolled_back: true`
// 4. Creates a regression nudge for next SessionStart
// 5. Removes the ab-tracking entry (so it doesn't re-trigger)
//
// CLI:
// adam-rollback.mjs --proposal-id <id> [--home <path>] [--dry-run]
// adam-rollback.mjs --auto [--home <path>] [--dry-run]
//
// --auto mode: reads ab-measure output, rolls back all regressed proposals.
//
// Output: JSON object with rollback results per proposal.
// Does NOT execute the undo commands itself — outputs them for the skill to
// execute in-context (safety: undo commands may reference files the script
// can't safely modify).
import { readFileSync, writeFileSync, renameSync, readdirSync, existsSync, mkdirSync } from "node:fs";
import { join, basename } from "node:path";
import { homedir } from "node:os";
import { parseFrontmatter, readJsonlSafe, listJsonlFiles } from "./adam-utils.mjs";
function parseArgs(argv) {
const args = { home: null, proposalId: null, auto: false, dryRun: false, help: false };
for (let i = 0; i < argv.length; i++) {
const a = argv[i];
if (a === "--home" && i + 1 < argv.length) args.home = argv[++i];
else if (a === "--proposal-id" && i + 1 < argv.length) args.proposalId = argv[++i];
else if (a === "--auto") args.auto = true;
else if (a === "--dry-run") args.dryRun = true;
else if (a === "--help" || a === "-h") args.help = true;
}
return args;
}
function findAppliedProposal(appliedDir, proposalId) {
if (!existsSync(appliedDir)) return null;
try {
const files = readdirSync(appliedDir).filter(n => n.endsWith(".md"));
for (const f of files) {
if (f.includes(proposalId)) return join(appliedDir, f);
}
} catch { /* skip */ }
return null;
}
function extractRollbackSection(content) {
const idx = content.indexOf("\n# Rollback\n");
if (idx === -1) return null;
let body = content.slice(idx + "\n# Rollback\n".length);
const nextSection = body.search(/\n# |\n---/);
if (nextSection !== -1) body = body.slice(0, nextSection);
return body.trim() || null;
}
function extractUndoCommands(rollbackSection) {
if (!rollbackSection) return [];
const commands = [];
const lines = rollbackSection.split("\n");
let inCodeBlock = false;
let blockLines = [];
for (const line of lines) {
if (line.startsWith("```")) {
if (inCodeBlock) {
if (blockLines.length) commands.push(blockLines.join("\n"));
blockLines = [];
}
inCodeBlock = !inCodeBlock;
continue;
}
if (inCodeBlock) {
blockLines.push(line);
}
}
return commands;
}
export function planRollback(appliedDir, proposalId) {
const path = findAppliedProposal(appliedDir, proposalId);
if (!path) return { status: "not_found", proposal_id: proposalId };
const content = readFileSync(path, "utf8");
const fm = parseFrontmatter(content);
const rollbackSection = extractRollbackSection(content);
const undoCommands = extractUndoCommands(rollbackSection);
return {
status: "planned",
proposal_id: proposalId,
applied_path: path,
type: fm.type || "unknown",
target: fm.target || null,
target_skill: fm.target_skill || null,
undo_commands: undoCommands,
has_rollback_section: !!rollbackSection,
};
}
export function executeRollback(plan, adamRoot, opts = {}) {
const dryRun = opts.dryRun || false;
const proposalsDir = join(adamRoot, "proposals");
const nudgesPath = join(adamRoot, "active-nudges.json");
const now = Date.now();
if (plan.status !== "planned") return { ...plan, action: "skipped" };
const result = {
proposal_id: plan.proposal_id,
type: plan.type,
target: plan.target,
undo_commands: plan.undo_commands,
actions: [],
};
if (dryRun) {
result.actions.push("dry_run: would move applied → proposals");
if (plan.undo_commands.length) {
result.actions.push(`dry_run: would output ${plan.undo_commands.length} undo command(s)`);
}
result.actions.push("dry_run: would create regression nudge");
result.status = "dry_run";
return result;
}
mkdirSync(proposalsDir, { recursive: true });
const destName = `${basename(plan.applied_path).replace(/\.md$/, "")}-rollback.md`;
const destPath = join(proposalsDir, destName);
let content = readFileSync(plan.applied_path, "utf8");
const rollbackMeta = `\nrolled_back: true\nrolled_back_at: "${new Date(now).toISOString()}"`;
content = content.replace(/^(---\n[\s\S]*?)(---)/m, `$1${rollbackMeta}\n$2`);
try {
writeFileSync(destPath, content);
renameSync(plan.applied_path, plan.applied_path + ".rolled-back");
result.actions.push(`moved ${plan.applied_path}${destPath}`);
} catch (e) {
result.status = "move_failed";
result.error = e.message;
return result;
}
try {
let nudges = [];
if (existsSync(nudgesPath)) {
try { nudges = JSON.parse(readFileSync(nudgesPath, "utf8")); } catch { nudges = []; }
}
nudges.push({
kind: "regression_rollback",
message: `adam: rolled back "${plan.proposal_id}" (type: ${plan.type}) — regression detected in A/B measurement. Review with /reflect.`,
created_at: now,
expires_at_ts: now + 7 * 86400000,
max_displays: 3,
displays_used: 0,
source_proposal: plan.proposal_id,
});
writeFileSync(nudgesPath, JSON.stringify(nudges, null, 2));
result.actions.push("regression nudge created");
} catch (e) {
result.actions.push(`nudge failed: ${e.message}`);
}
result.status = "rolled_back";
return result;
}
async function main() {
const args = parseArgs(process.argv.slice(2));
if (args.help) {
process.stdout.write(
"usage: adam-rollback.mjs --proposal-id <id> [--home <path>] [--dry-run]\n" +
" adam-rollback.mjs --auto [--home <path>] [--dry-run]\n"
);
process.exit(0);
}
const claudeHome = args.home || join(homedir(), ".claude");
const adamRoot = join(claudeHome, "adam");
const appliedDir = join(adamRoot, "applied");
try {
const results = [];
if (args.auto) {
const abPath = join(adamRoot, "ab-tracking.jsonl");
const entries = readJsonlSafe(abPath);
const { computeDeltas } = await import("./adam-ab-measure.mjs");
const sources = [join(adamRoot, "journal.jsonl"), ...listJsonlFiles(join(adamRoot, "journal"))];
const journalAll = [];
for (const p of sources) for (const e of readJsonlSafe(p)) journalAll.push(e);
const deltas = computeDeltas(entries, journalAll);
const regressed = deltas.filter(d => d.status === "regressed");
for (const d of regressed) {
const plan = planRollback(appliedDir, d.proposal_id);
const result = executeRollback(plan, adamRoot, { dryRun: args.dryRun });
results.push(result);
}
} else if (args.proposalId) {
const plan = planRollback(appliedDir, args.proposalId);
const result = executeRollback(plan, adamRoot, { dryRun: args.dryRun });
results.push(result);
} else {
process.stderr.write("adam-rollback: specify --proposal-id or --auto\n");
process.exit(1);
}
process.stdout.write(JSON.stringify({ rollbacks: results }) + "\n");
process.exit(0);
} catch (e) {
process.stderr.write(`adam-rollback error: ${e.message}\n`);
process.exit(1);
}
}
if (import.meta.url === `file://${process.argv[1]}`) {
main();
}
+204
View File
@@ -0,0 +1,204 @@
#!/usr/bin/env node
// adam-score.mjs — computes per-session urgency dampeners + reinforcement
// candidates from `task_completed` signals.
//
// Effects:
// 1. Dampener:
// task_completed_count >= 3 → 0.5
// task_completed_count >= 1 → 0.75
// else → 1.0
// Analyst multiplies a cluster's urgency by the dampener of the session
// it originated from.
// 2. Reinforcement candidates: per skill, count of clean task_completed
// events citing it (via `active_skills` payload). Skills with count >= 3
// are surfaced as reinforcement proposal candidates (low blast,
// confidence ≥ 4 required for auto-apply, same gate as memory).
//
// CLI:
// adam-score.mjs [--home <path>] [--input <jsonl-path>]
//
// --input defaults to: stdout of adam-window.mjs (preferred) — if missing,
// falls back to the raw active journal.
//
// Output: JSON object
// {
// "sessions": [
// {"session_id": "...", "negative_count": N, "task_completed_count": M,
// "severity_sum": S, "severity_by_type": {"<type>": N, ...}, "dampener": 1.0}
// ],
// "reinforcement_candidates": [
// {"skill_slug": "tdd-loop", "count": 3, "recent_ts": "..."}
// ]
// }
import { readFileSync } from "node:fs";
import { join } from "node:path";
import { homedir } from "node:os";
import { readJsonlSafe, listJsonlFiles } from "./adam-utils.mjs";
export const NEGATIVE_SIGNAL_TYPES = new Set([
"correction",
"tool_error_loop",
"dead_end",
"edit_churn",
"retry_loop",
"build_loop",
"weak_agent",
"silent_drift",
"error_after_recovery",
]);
export const REINFORCEMENT_THRESHOLD = 3;
// Severity divisor per struggle signal type. Severity = max(1, floor(count / divisor)).
// Entries without `count` default to severity 1. Source of truth — referenced by
// agents/adam.md (Confidence rubric → severity-sum bullets).
export const SEVERITY_DIVISORS = {
dead_end: 8,
edit_churn: 4,
tool_error_loop: 3,
retry_loop: 3,
file_reread: 3,
weak_agent: 2,
build_loop: 1,
};
export function entrySeverity(entry) {
if (!entry || typeof entry !== "object") return 1;
const divisor = SEVERITY_DIVISORS[entry.type];
if (!divisor) return 1;
const count = typeof entry.count === "number" && entry.count > 0 ? entry.count : 1;
return Math.max(1, Math.floor(count / divisor));
}
function parseArgs(argv) {
const args = { home: null, input: null, help: false };
for (let i = 0; i < argv.length; i++) {
const a = argv[i];
if (a === "--home" && i + 1 < argv.length) args.home = argv[++i];
else if (a === "--input" && i + 1 < argv.length) args.input = argv[++i];
else if (a === "--help" || a === "-h") args.help = true;
}
return args;
}
function readAllStdin() {
try { return readFileSync(0, "utf8"); } catch { return ""; }
}
function entriesFromText(text) {
const out = [];
for (const line of (text || "").split("\n")) {
if (!line) continue;
try { out.push(JSON.parse(line)); } catch { /* skip */ }
}
return out;
}
function computeDampener(taskCompletedCount) {
if (taskCompletedCount >= 3) return 0.5;
if (taskCompletedCount >= 1) return 0.75;
return 1.0;
}
export function computeSessionScores(entries) {
const bySession = new Map();
for (const e of entries || []) {
if (!e || typeof e !== "object") continue;
const sid = e.session || e.session_id || "";
if (!sid) continue;
if (!bySession.has(sid)) {
bySession.set(sid, {
session_id: sid,
negative_count: 0,
task_completed_count: 0,
severity_sum: 0,
severity_by_type: {},
});
}
const slot = bySession.get(sid);
if (e.type === "task_completed") slot.task_completed_count++;
else if (NEGATIVE_SIGNAL_TYPES.has(e.type)) {
slot.negative_count++;
const sev = entrySeverity(e);
slot.severity_sum += sev;
slot.severity_by_type[e.type] = (slot.severity_by_type[e.type] || 0) + sev;
}
}
const out = [];
for (const slot of bySession.values()) {
out.push({
...slot,
dampener: computeDampener(slot.task_completed_count),
});
}
// Stable ordering by session_id for deterministic output.
out.sort((a, b) => (a.session_id < b.session_id ? -1 : a.session_id > b.session_id ? 1 : 0));
return out;
}
export function computeReinforcementCandidates(entries) {
const counts = new Map();
for (const e of entries || []) {
if (!e || e.type !== "task_completed") continue;
const skills = Array.isArray(e.active_skills) ? e.active_skills : [];
for (const slug of skills) {
if (!slug || typeof slug !== "string") continue;
if (!counts.has(slug)) counts.set(slug, { count: 0, recent_ts: null });
const slot = counts.get(slug);
slot.count++;
const ts = typeof e.ts === "string" ? e.ts : null;
if (ts && (!slot.recent_ts || ts > slot.recent_ts)) slot.recent_ts = ts;
}
}
const out = [];
for (const [slug, { count, recent_ts }] of counts.entries()) {
if (count < REINFORCEMENT_THRESHOLD) continue;
out.push({ skill_slug: slug, count, recent_ts });
}
out.sort((a, b) => b.count - a.count || (a.skill_slug < b.skill_slug ? -1 : 1));
return out;
}
function gatherInputEntries(args) {
if (args.input) return readJsonlSafe(args.input);
// Honor piped stdin only when it is non-empty AND not a TTY.
if (!process.stdin.isTTY) {
const piped = readAllStdin();
if (piped && piped.trim()) return entriesFromText(piped);
}
// Default fallback: active journal + rotated files.
const home = args.home || join(homedir(), ".claude");
const adamRoot = join(home, "adam");
const sources = [
join(adamRoot, "journal.jsonl"),
...listJsonlFiles(join(adamRoot, "journal")),
];
const all = [];
for (const p of sources) {
for (const e of readJsonlSafe(p)) all.push(e);
}
return all;
}
function main() {
const args = parseArgs(process.argv.slice(2));
if (args.help) {
process.stdout.write("usage: adam-score.mjs [--home <path>] [--input <jsonl-path>]\n");
process.exit(0);
}
try {
const entries = gatherInputEntries(args);
const sessions = computeSessionScores(entries);
const reinforcement_candidates = computeReinforcementCandidates(entries);
process.stdout.write(JSON.stringify({ sessions, reinforcement_candidates }) + "\n");
process.exit(0);
} catch (e) {
process.stderr.write(`adam-score error: ${e.message}\n`);
process.exit(1);
}
}
if (import.meta.url === `file://${process.argv[1]}`) {
main();
}
+251
View File
@@ -0,0 +1,251 @@
#!/usr/bin/env node
// adam-upgrade.mjs — review/accept pending `.adam-new` files from install.sh.
//
// install.sh writes <file>.adam-new next to user-modified ADAM files instead
// of clobbering. This tool surfaces those pending merges and lets users
// review the diff + accept atomically.
//
// CLI:
// adam-upgrade.mjs --list [--home <path>]
// adam-upgrade.mjs --diff [<path>] [--home <path>]
// adam-upgrade.mjs --accept <path> [--home <path>]
// adam-upgrade.mjs --accept-all [--home <path>]
// adam-upgrade.mjs --help
import {
readdirSync,
statSync,
existsSync,
renameSync,
readFileSync,
unlinkSync,
} from "node:fs";
import { join, dirname, basename } from "node:path";
import { homedir } from "node:os";
import { spawnSync } from "node:child_process";
const EXCLUDE_DIRS = new Set([
".git",
"node_modules",
"journal",
"trash",
"proposals",
"applied",
"rejected",
]);
// Walk a directory tree, collecting paths to files ending in `.adam-new`.
// Excludes the dirs above defensively (no point in surfacing journal entries).
export function findPending(home) {
const root = home;
const out = [];
function walk(dir) {
let entries;
try {
entries = readdirSync(dir, { withFileTypes: true });
} catch {
return;
}
for (const e of entries) {
const full = join(dir, e.name);
if (e.isDirectory()) {
if (EXCLUDE_DIRS.has(e.name)) continue;
walk(full);
} else if (e.isFile() && e.name.endsWith(".adam-new")) {
out.push(full);
}
}
}
walk(root);
return out.sort();
}
function fileSize(p) {
try { return statSync(p).size; } catch { return 0; }
}
function fileAgeDays(p) {
try {
const mtime = statSync(p).mtimeMs;
return Math.floor((Date.now() - mtime) / 86400000);
} catch { return 0; }
}
// Produce a unified diff between two files. Prefer the system `diff -u` binary
// (universally available, accurate). On systems without `diff`, fall back to a
// naive line-by-line diff prefixed with MISSING:/NEW: so the tool still works.
export function diffPaths(orig, neu) {
const r = spawnSync("diff", ["-u", orig, neu], { encoding: "utf8" });
if (r.error || r.status === null || r.status === 2) {
// diff binary missing or fatal error — naive fallback
let a = [], b = [];
try { a = readFileSync(orig, "utf8").split("\n"); } catch {}
try { b = readFileSync(neu, "utf8").split("\n"); } catch {}
const max = Math.max(a.length, b.length);
const lines = [];
for (let i = 0; i < max; i++) {
const la = a[i], lb = b[i];
if (la === lb) continue;
if (la !== undefined) lines.push(`MISSING: ${la}`);
if (lb !== undefined) lines.push(`NEW: ${lb}`);
}
return lines.join("\n");
}
return r.stdout || "";
}
// Atomic swap: rename orig → orig.adam-prev, rename neu → orig. Overwrites any
// prior .adam-prev backup (safe: a previous accept already promoted it).
export function acceptOne(orig, neu) {
if (!existsSync(neu)) {
throw new Error(`missing pending file: ${neu}`);
}
const prev = `${orig}.adam-prev`;
if (existsSync(orig)) {
if (existsSync(prev)) {
try { unlinkSync(prev); } catch {}
}
renameSync(orig, prev);
}
renameSync(neu, orig);
return { orig, prev };
}
export function acceptAll(home) {
const pending = findPending(home);
const results = [];
for (const neu of pending) {
const orig = neu.replace(/\.adam-new$/, "");
try {
const r = acceptOne(orig, neu);
results.push({ ok: true, ...r });
} catch (err) {
results.push({ ok: false, orig, error: String(err && err.message || err) });
}
}
return results;
}
function parseArgs(argv) {
const args = { cmd: null, target: null, home: null };
for (let i = 0; i < argv.length; i++) {
const a = argv[i];
if (a === "--list") args.cmd = "list";
else if (a === "--diff") args.cmd = "diff";
else if (a === "--accept") args.cmd = "accept";
else if (a === "--accept-all") args.cmd = "accept-all";
else if (a === "--help" || a === "-h") args.cmd = "help";
else if (a === "--home" && i + 1 < argv.length) args.home = argv[++i];
else if (!a.startsWith("--") && args.target == null) args.target = a;
}
return args;
}
function usage() {
process.stdout.write(
"adam-upgrade — review pending `.adam-new` files from install.sh\n" +
"\n" +
"Usage:\n" +
" adam-upgrade.mjs --list [--home <path>]\n" +
" adam-upgrade.mjs --diff [<path>] [--home <path>]\n" +
" adam-upgrade.mjs --accept <path> [--home <path>]\n" +
" adam-upgrade.mjs --accept-all [--home <path>]\n" +
" adam-upgrade.mjs --help\n"
);
}
function resolveHome(args) {
if (args.home) return args.home;
return join(process.env.HOME || homedir(), ".claude");
}
function cmdList(args) {
const home = resolveHome(args);
const pending = findPending(home);
for (const neu of pending) {
const orig = neu.replace(/\.adam-new$/, "");
const origSize = fileSize(orig);
const newSize = fileSize(neu);
const age = fileAgeDays(neu);
process.stdout.write(`${neu} (orig: ${origSize}, new: ${newSize}, age: ${age}d)\n`);
}
process.stderr.write(`${pending.length} pending\n`);
return 0;
}
function cmdDiff(args) {
const home = resolveHome(args);
let targets;
if (args.target) {
// Allow either passing the orig path or the .adam-new path.
const t = args.target;
const orig = t.endsWith(".adam-new") ? t.replace(/\.adam-new$/, "") : t;
targets = [orig];
} else {
targets = findPending(home).map((n) => n.replace(/\.adam-new$/, ""));
}
for (const orig of targets) {
const neu = `${orig}.adam-new`;
process.stdout.write(`=== ${orig} ===\n`);
if (!existsSync(neu)) {
process.stderr.write(`no pending: ${neu}\n`);
continue;
}
process.stdout.write(diffPaths(orig, neu));
process.stdout.write("\n");
}
return 0;
}
function cmdAccept(args) {
if (!args.target) {
process.stderr.write("error: --accept requires a <path>\n");
return 1;
}
const t = args.target;
const orig = t.endsWith(".adam-new") ? t.replace(/\.adam-new$/, "") : t;
const neu = `${orig}.adam-new`;
try {
const r = acceptOne(orig, neu);
process.stdout.write(`accepted: ${r.orig} (backup: ${r.prev})\n`);
return 0;
} catch (err) {
process.stderr.write(`error: ${err.message}\n`);
return 1;
}
}
function cmdAcceptAll(args) {
const home = resolveHome(args);
const results = acceptAll(home);
for (const r of results) {
if (r.ok) {
process.stdout.write(`accepted: ${r.orig} (backup: ${r.prev})\n`);
} else {
process.stderr.write(`error: ${r.orig}: ${r.error}\n`);
}
}
return 0;
}
function main() {
const args = parseArgs(process.argv.slice(2));
if (!args.cmd || args.cmd === "help") { usage(); return 0; }
if (args.cmd === "list") return cmdList(args);
if (args.cmd === "diff") return cmdDiff(args);
if (args.cmd === "accept") return cmdAccept(args);
if (args.cmd === "accept-all") return cmdAcceptAll(args);
usage();
return 1;
}
// Only run main() when invoked as a script (not when imported for tests).
const invokedAsScript = (() => {
try {
const argv1 = process.argv[1] || "";
return argv1.endsWith("adam-upgrade.mjs");
} catch { return true; }
})();
if (invokedAsScript) {
process.exit(main());
}
+92
View File
@@ -0,0 +1,92 @@
// adam-utils.mjs — shared helpers used across adam-* scripts.
//
// Pure library: no shebang, not a CLI. Imported by adam-window.mjs,
// adam-score.mjs, adam-ab-measure.mjs, adam-nudge-eligibility.mjs (jsonl
// helpers) and adam-apply-reinforcement.mjs, adam-archive.mjs,
// adam-cooldown.mjs (parseFrontmatter).
//
// All helpers swallow read/parse failures by design — callers expect to keep
// going on a corrupt line/file rather than abort the whole pipeline.
import { readFileSync, readdirSync, existsSync } from "node:fs";
import { join } from "node:path";
// listJsonlFiles: list *.jsonl files in `dir`. Missing dir or read failure
// returns []. Filenames are joined with `dir` so callers can read directly.
export function listJsonlFiles(dir) {
if (!existsSync(dir)) return [];
try {
return readdirSync(dir)
.filter((n) => n.endsWith(".jsonl"))
.map((n) => join(dir, n));
} catch { return []; }
}
// readJsonlSafe: read a .jsonl file and return an array of parsed objects.
// Missing file, unreadable file, or any malformed line are silently skipped.
export function readJsonlSafe(path) {
if (!existsSync(path)) return [];
let buf;
try { buf = readFileSync(path, "utf8"); } catch { return []; }
const out = [];
for (const line of buf.split("\n")) {
if (!line) continue;
try { out.push(JSON.parse(line)); } catch { /* skip malformed */ }
}
return out;
}
// parseFrontmatter: parse a markdown YAML-ish frontmatter block into a flat
// object. Supports:
// - inline scalars key: value
// - inline arrays key: [a, b, c]
// - block-form arrays key:\n - a\n - b
// Quotes around scalar values are stripped. Comment-only lines (`# ...`) and
// keys with empty inline values that are NOT followed by a block array are
// skipped (preserves prior cooldown.mjs behavior). Missing frontmatter → {}.
export function parseFrontmatter(content) {
const m = content.match(/^---\n([\s\S]*?)\n---/);
if (!m) return {};
const out = {};
const lines = m[1].split("\n");
let i = 0;
while (i < lines.length) {
const line = lines[i];
const idx = line.indexOf(":");
if (idx === -1) { i++; continue; }
const key = line.slice(0, idx).trim();
if (!key || key.startsWith("#")) { i++; continue; }
const rawValue = line.slice(idx + 1).trim();
if (rawValue.startsWith("[") && rawValue.endsWith("]")) {
const inner = rawValue.slice(1, -1)
.split(",")
.map((s) => s.trim().replace(/^['"]|['"]$/g, ""))
.filter(Boolean);
out[key] = inner;
i++;
continue;
}
if (!rawValue) {
// Possible block-form array: look ahead for ` - item` lines.
const arr = [];
let j = i + 1;
while (j < lines.length && /^\s*-\s+/.test(lines[j])) {
const item = lines[j].replace(/^\s*-\s+/, "").trim().replace(/^['"]|['"]$/g, "");
if (item) arr.push(item);
j++;
}
if (arr.length) {
out[key] = arr;
i = j;
continue;
}
// Empty value, no block follow-up: skip (cooldown/apply-reinforcement
// expectation — empty scalars are noise).
i++;
continue;
}
out[key] = rawValue.replace(/^['"]|['"]$/g, "");
i++;
}
return out;
}
+177
View File
@@ -0,0 +1,177 @@
#!/usr/bin/env node
// adam-window.mjs — per-signal sliding-window filter over the ADAM journal.
//
// Reads all journal sources (active journal.jsonl + rotated journal/*.jsonl,
// including both new YYYY-Www.jsonl format and legacy YYYY-MM-DD-<ts>.jsonl
// size-rotated files), applies a per-signal-type age cutoff based on each
// entry's `ts` field, and emits the filtered JSONL stream to stdout.
//
// Exclusion: entries whose `ts` appears in any applied/*.md or rejected/*.md
// proposal frontmatter `source_entries` array are dropped (same semantics the
// adam agent previously enforced manually). Keeps actioned signals out of the
// next /reflect even if they're inside the analysis window.
//
// Usage: adam-window.mjs [--home <path>] default: $HOME/.claude
// Output: filtered JSONL on stdout. One-line summary on stderr.
import { readFileSync, readdirSync, existsSync, statSync } from "node:fs";
import { join } from "node:path";
import { homedir } from "node:os";
import { readJsonlSafe, listJsonlFiles } from "./adam-utils.mjs";
// Per-signal sliding window in days. Source of truth — referenced by agents/adam.md.
export const SIGNAL_WINDOWS_DAYS = {
dead_end: 7,
correction: 30,
tool_error_loop: 30,
edit_churn: 14,
retry_loop: 14,
build_loop: 30,
weak_agent: 30,
subagent_dispatch_pattern: 30,
silent_drift: 14,
file_reread: 14,
error_after_recovery: 30,
correction_free_streak: 60,
clean_recovery: 60,
task_completed: 60,
};
// Fallback window for unknown / future signal types.
export const DEFAULT_WINDOW_DAYS = 30;
const DAY_MS = 86400000;
function parseArgs(argv) {
const args = { home: null };
for (let i = 0; i < argv.length; i++) {
if (argv[i] === "--home" && i + 1 < argv.length) {
args.home = argv[++i];
}
}
return args;
}
// Crude single-pass frontmatter source_entries extractor. Mirrors adam-archive.mjs
// parsing: handles both YAML block form and inline-array form. Only pulls the
// `source_entries` key — we don't need anything else for exclusion.
function extractSourceEntries(content) {
const m = content.match(/^---\n([\s\S]*?)\n---/);
if (!m) return [];
const lines = m[1].split("\n");
const out = [];
for (let i = 0; i < lines.length; i++) {
const line = lines[i];
const idx = line.indexOf(":");
if (idx === -1) continue;
const key = line.slice(0, idx).trim();
if (key !== "source_entries") continue;
const value = line.slice(idx + 1).trim();
if (value.startsWith("[") && value.endsWith("]")) {
const inner = value.slice(1, -1).split(",")
.map((s) => s.trim().replace(/^['"]|['"]$/g, ""))
.filter(Boolean);
out.push(...inner);
continue;
}
let j = i + 1;
while (j < lines.length && /^\s*-\s+/.test(lines[j])) {
const item = lines[j].replace(/^\s*-\s+/, "").trim().replace(/^['"]|['"]$/g, "");
if (item) out.push(item);
j++;
}
}
return out;
}
function buildExclusionSet(...dirs) {
const set = new Set();
for (const dir of dirs) {
if (!existsSync(dir)) continue;
let names;
try { names = readdirSync(dir); } catch { continue; }
for (const name of names) {
if (!name.endsWith(".md")) continue;
const p = join(dir, name);
try {
const content = readFileSync(p, "utf8");
for (const ts of extractSourceEntries(content)) set.add(ts);
} catch { /* skip */ }
}
}
return set;
}
function windowDaysFor(type) {
if (Object.prototype.hasOwnProperty.call(SIGNAL_WINDOWS_DAYS, type)) {
return SIGNAL_WINDOWS_DAYS[type];
}
return DEFAULT_WINDOW_DAYS;
}
export function filterEntries(entries, exclusionSet, now = new Date()) {
const nowMs = now.getTime();
const dropped = { stale: {}, excluded: 0, no_ts: 0 };
const kept = [];
for (const e of entries) {
if (!e || typeof e !== "object") continue;
if (!e.ts || typeof e.ts !== "string") {
dropped.no_ts++;
continue;
}
if (exclusionSet.has(e.ts)) {
dropped.excluded++;
continue;
}
const type = e.type || "unknown";
const days = windowDaysFor(type);
const tsMs = Date.parse(e.ts);
if (Number.isNaN(tsMs)) {
dropped.no_ts++;
continue;
}
if (nowMs - tsMs > days * DAY_MS) {
dropped.stale[type] = (dropped.stale[type] || 0) + 1;
continue;
}
kept.push(e);
}
return { kept, dropped };
}
function main() {
const args = parseArgs(process.argv.slice(2));
const claudeHome = args.home || join(homedir(), ".claude");
const adamRoot = join(claudeHome, "adam");
const activeJournal = join(adamRoot, "journal.jsonl");
const journalDir = join(adamRoot, "journal");
const appliedDir = join(adamRoot, "applied");
const rejectedDir = join(adamRoot, "rejected");
const sources = [activeJournal, ...listJsonlFiles(journalDir)];
const all = [];
for (const p of sources) {
for (const e of readJsonlSafe(p)) all.push(e);
}
const exclusion = buildExclusionSet(appliedDir, rejectedDir);
const { kept, dropped } = filterEntries(all, exclusion);
// Stable output: sort by ts ascending so downstream clustering sees chronological order.
kept.sort((a, b) => (a.ts < b.ts ? -1 : a.ts > b.ts ? 1 : 0));
const out = kept.map((e) => JSON.stringify(e)).join("\n");
if (out) process.stdout.write(out + "\n");
const staleParts = Object.entries(dropped.stale).map(([t, n]) => `${t}=${n}`).join(",") || "none";
process.stderr.write(
`windowed: ${all.length} in, ${kept.length} out (stale: ${staleParts}; excluded: ${dropped.excluded}; no_ts: ${dropped.no_ts})\n`
);
}
if (import.meta.url === `file://${process.argv[1]}`) {
try { main(); } catch (e) {
process.stderr.write(`adam-window error: ${e.message}\n`);
process.exit(1);
}
}
+154
View File
@@ -0,0 +1,154 @@
#!/usr/bin/env node
// Test driver for ~/.claude/hooks/adam-observe.mjs.
// Usage: node test-hook.mjs (runs all tests in this file).
// Spawns the hook with synthesized stdin in a tmp HOME, asserts journal contents.
import { spawnSync } from "node:child_process";
import { mkdtempSync, mkdirSync, readFileSync, existsSync, rmSync } from "node:fs";
import { tmpdir } from "node:os";
import { join } from "node:path";
import { fileURLToPath } from "node:url";
const HOOK = join(fileURLToPath(new URL("../../hooks/adam-observe.mjs", import.meta.url)));
export function newTmpHome() {
const home = mkdtempSync(join(tmpdir(), "adam-test-"));
mkdirSync(join(home, ".claude/adam"), { recursive: true });
return home;
}
export function feed(home, input) {
const r = spawnSync("node", [HOOK], {
input: JSON.stringify(input),
env: { ...process.env, HOME: home },
encoding: "utf8",
timeout: 5000,
});
if (r.status !== 0) throw new Error(`hook exit ${r.status}: ${r.stderr}`);
return r;
}
export function readJournal(home) {
const p = join(home, ".claude/adam/journal.jsonl");
if (!existsSync(p)) return [];
return readFileSync(p, "utf8")
.trim().split("\n").filter(Boolean).map((l) => JSON.parse(l));
}
export function assert(cond, msg) {
if (!cond) { console.error(`FAIL: ${msg}`); process.exit(1); }
console.log(`ok: ${msg}`);
}
export function cleanup(home) { try { rmSync(home, { recursive: true, force: true }); } catch {} }
// Tests below this line — added by subsequent tasks.
function testCorrectionFreeStreak() {
const home = newTmpHome();
try {
for (let i = 0; i < 5; i++) {
feed(home, {
hook_event_name: "UserPromptSubmit",
session_id: "s1",
cwd: "/x",
prompt: `please continue with the work item ${i}`,
});
}
const j = readJournal(home);
const streaks = j.filter(e => e.type === "correction_free_streak");
assert(streaks.length === 1, "exactly one correction_free_streak after 5 clean prompts");
assert(streaks[0].streak === 5, "streak field is 5");
assert(streaks[0].session === "s1", "session id captured");
} finally { cleanup(home); }
}
function testStreakResetsOnSessionChange() {
const home = newTmpHome();
try {
// 4 in s1 (counter=4, no streak yet), then 1 in s2 (counter must reset → 1, no streak)
for (let i = 0; i < 4; i++) feed(home, { hook_event_name: "UserPromptSubmit", session_id: "s1", cwd: "/x", prompt: "ok" });
feed(home, { hook_event_name: "UserPromptSubmit", session_id: "s2", cwd: "/x", prompt: "ok" });
const j = readJournal(home);
assert(j.filter(e => e.type === "correction_free_streak").length === 0, "no streak when session changes mid-streak");
} finally { cleanup(home); }
}
function testCleanRecovery() {
const home = newTmpHome();
try {
// Trigger tool_error_loop: 3 PostToolUse with same error fingerprint.
for (let i = 0; i < 3; i++) {
feed(home, {
hook_event_name: "PostToolUse",
session_id: "s1", cwd: "/x",
tool_name: "Bash",
tool_input: { command: `echo ${i}` },
tool_response: { is_error: true, content: "error: command not found" },
});
}
// Then 3 clean PostToolUse events.
for (let i = 0; i < 3; i++) {
feed(home, {
hook_event_name: "PostToolUse",
session_id: "s1", cwd: "/x",
tool_name: "Read",
tool_input: { file_path: `/tmp/ok-${i}` },
tool_response: { content: "fine" },
});
}
const j = readJournal(home);
const recs = j.filter(e => e.type === "clean_recovery");
assert(recs.length === 1, "one clean_recovery emitted after 3 clean tools post-struggle");
assert(recs[0].recovered_from === "tool_error_loop", "recovered_from set");
} finally { cleanup(home); }
}
function testRecoveryResetsOnError() {
const home = newTmpHome();
try {
for (let i = 0; i < 3; i++) {
feed(home, {
hook_event_name: "PostToolUse", session_id: "s1", cwd: "/x",
tool_name: "Bash",
tool_input: { command: `cmd ${i}` },
tool_response: { is_error: true, content: "error: failed" },
});
}
feed(home, { hook_event_name: "PostToolUse", session_id: "s1", cwd: "/x",
tool_name: "Read", tool_input: { file_path: "/tmp/a" }, tool_response: { content: "ok" } });
feed(home, { hook_event_name: "PostToolUse", session_id: "s1", cwd: "/x",
tool_name: "Read", tool_input: { file_path: "/tmp/b" }, tool_response: { content: "ok" } });
feed(home, { hook_event_name: "PostToolUse", session_id: "s1", cwd: "/x",
tool_name: "Bash", tool_input: { command: "x" }, tool_response: { is_error: true, content: "error: again" } });
feed(home, { hook_event_name: "PostToolUse", session_id: "s1", cwd: "/x",
tool_name: "Read", tool_input: { file_path: "/tmp/c" }, tool_response: { content: "ok" } });
const j = readJournal(home);
assert(j.filter(e => e.type === "clean_recovery").length === 0, "no clean_recovery when error breaks the streak");
} finally { cleanup(home); }
}
function testActiveSkillsPayload() {
const home = newTmpHome();
try {
feed(home, { hook_event_name: "PreToolUse", session_id: "s1", cwd: "/x",
tool_name: "Skill", tool_input: { skill: "my-skill" } });
for (let i = 0; i < 5; i++) {
feed(home, { hook_event_name: "UserPromptSubmit", session_id: "s1", cwd: "/x", prompt: "ok" });
}
const j = readJournal(home);
const s = j.find(e => e.type === "correction_free_streak");
assert(s && Array.isArray(s.active_skills) && s.active_skills.includes("my-skill"),
"correction_free_streak payload includes active skill");
} finally { cleanup(home); }
}
async function main() {
testCorrectionFreeStreak();
testStreakResetsOnSessionChange();
testCleanRecovery();
testRecoveryResetsOnError();
testActiveSkillsPayload();
console.log("all tests passed");
}
if (import.meta.url === `file://${process.argv[1]}`) main();
+1863 -33
View File
File diff suppressed because it is too large Load Diff
+427 -37
View File
@@ -8,6 +8,71 @@ tools: Read, Write, Edit, Grep, Glob, Bash
You analyse Claude Code's own behaviour to propose targeted, surgical improvements. You operate offline (no LLM round-trips outside this run) and produce **files**, not actions. Main-thread Claude reviews and applies changes with the user.
## Stage mode
The skill dispatches you in one of two stages (MOSS-inspired multi-stage pipeline — §3.3: "a single prompt asked to diagnose, plan, implement, verify, and decide overloads context and produces lower-quality output than a sequenced flow"):
- **`stage=diagnose`**: Read batched journal entries, cluster, diagnose root causes, plan fix types. Output diagnoses JSON to `/tmp/adam-diagnoses.json`. Do NOT draft proposals.
- **`stage=implement`**: Read approved diagnoses from `/tmp/adam-diagnoses.json`. Draft full proposal files to `proposals_dir/`. Emit the clustering trace and punch list.
If no `stage` is specified in the dispatch prompt, run **both stages sequentially** within a single pass (backward-compatible with pre-MOSS flow).
### Diagnose-stage output format
When `stage=diagnose`, write `/tmp/adam-diagnoses.json` containing:
```json
{
"diagnoses": [
{
"cluster_id": "c1",
"signal_type": "correction",
"cluster_key": "wrong|approach",
"count": 5,
"sessions": 3,
"diagnosis": {
"trigger": "...",
"action": "...",
"mismatch": "...",
"outcome": "... `verbatim quote` ..."
},
"plan": {
"type": "memory",
"target": "~/.claude/projects/-Users-nvm/memory/go-test-cache.md",
"scope": "add feedback memory about go test -count=1"
},
"keypoints": {
"tool_selection": 1,
"scope_discipline": 2,
"error_recovery": 0,
"first_attempt": 0,
"build_reliability": 1
},
"gates": {
"threshold": "pass",
"cross_session": "pass",
"window": "in:5/out:0",
"contradiction": "none"
},
"source_entries": ["2026-05-20T10:00:00Z", "2026-05-21T11:00:00Z"],
"context_evidence": ["... excerpts from context_window ..."]
}
],
"skipped": [
{"cluster_id": "c3", "signal_type": "retry_loop", "reason": "threshold", "count": 2}
],
"summary": "considered=4 diagnosed=2 skipped=2"
}
```
The skill validates diagnoses between stages (see SKILL.md §2 "Inter-stage validation").
## Context window evidence
Journal entries for struggle signals now carry a `context_window` field — an array of the last 8 events (user prompts, tool calls, responses) surrounding the friction point. This is the ADAM equivalent of MOSS's "original transcript captured by auto-scan at evidence time" (§3.4).
When drafting diagnoses, **prefer `context_window` evidence over transcript file lookups** when it is present. The `context_window` is already scoped to the friction point and more reliable than file-based transcript pulls. Fall back to `transcripts_root` only when `context_window` is absent (pre-upgrade entries).
## Karpathy constraints (mandatory)
You MUST obey these on every proposal:
@@ -18,16 +83,35 @@ You MUST obey these on every proposal:
4. **Verifiable success criterion** — every proposal has a `# Success criterion` section describing a runnable check.
5. **Naive then optimize** — first proposal for a pattern is the boring obvious solution.
## Inputs (passed in dispatch prompt)
## Inputs
- `journal_path`: `~/.claude/adam/journal.jsonl`
- `state_path`: `~/.claude/adam/state.json` (cursor)
- `usage_path`: `~/.claude/adam/usage.json`
- `proposals_dir`: `~/.claude/adam/proposals/`
- `applied_dir`: `~/.claude/adam/applied/`
- `rejected_dir`: `~/.claude/adam/rejected/`
- `transcripts_root`: `~/.claude/projects/`
- `skills_root`: `~/.claude/skills/`
Paths arrive via the dispatch prompt — see `~/.claude/skills/adam-self-improvement/SKILL.md` §2.
## Analysis window
The journal you receive is **pre-filtered** by `~/.claude/adam/scripts/adam-window.mjs` before this agent runs. You do NOT apply window math yourself — every entry in the input stream is already within its signal type's freshness window. The same script also drops entries whose `ts` already appears in `applied/*.md` or `rejected/*.md` frontmatter `source_entries`, so the manual excluded-timestamps computation in the Process section below becomes a no-op when the pre-filter is healthy (still keep the logic — it's the fallback if the pre-filter is bypassed).
Per-signal windows (single source of truth: `SIGNAL_WINDOWS_DAYS` in `~/.claude/adam/scripts/adam-window.mjs`):
| signal | window | rationale |
|---|---|---|
| `dead_end` | 7 d | autonomy friction — fix-or-forget fast |
| `correction` | 30 d | user phrasing patterns drift slowly |
| `tool_error_loop` | 30 d | error fingerprints stable across days |
| `edit_churn` | 14 d | per-file churn is task-local |
| `retry_loop` | 14 d | tool-arg retries are task-local |
| `build_loop` | 30 d | build/test failure patterns |
| `weak_agent` | 30 d | subagent quality signal |
| `subagent_dispatch_pattern` | 30 d | dispatch routing pattern |
| `silent_drift` | 14 d | exploration-without-action is task-local |
| `file_reread` | 14 d | redundant same-file reads are task-local |
| `error_after_recovery` | 30 d | recovery-then-same-error patterns persist |
| `correction_free_streak` | 60 d | wins accumulate slowly |
| `clean_recovery` | 60 d | wins accumulate slowly |
| `task_completed` | 60 d | recipe wins accumulate slowly |
| (unknown / new types) | 30 d | `DEFAULT_WINDOW_DAYS` fallback |
Cross-session evidence gate: "≥3 occurrences across ≥2 sessions" is now scoped — it means **≥3 occurrences across ≥2 sessions WITHIN the signal's analysis window**. Entries that fall outside the window are not visible to clustering or scoring at all.
## Signal types
@@ -43,6 +127,12 @@ The hook emits these `type` values into the journal:
| `edit_churn` | same file edited 4× in window | file basename |
| `build_loop` | 2 build/test/compile commands fail in session | session |
| `subagent_dispatch_pattern` | same subagent dispatched ≥3× cumulatively | subagent_type |
| `silent_drift` | 5 consecutive read-only PostToolUse without an action tool (reset on action or UserPromptSubmit) | `active_skills[0]` |
| `file_reread` | same file Read ≥3× in the 10-tool window, ignoring offset/limit (escapes `retry_loop`'s argsHash dedup) | file basename |
| `error_after_recovery` | same error fingerprint returns within 5 PostToolUse of a `clean_recovery` | (`recovered_from`, `original_fp`) |
| `correction_free_streak` | 5 clean UserPromptSubmits in a row (no correction phrase) | `active_skills[0]` |
| `clean_recovery` | 3 clean PostToolUse events after a `tool_error_loop`/`dead_end`/`retry_loop` | (`recovered_from`, `active_skills[0]`) |
| `task_completed` | UserPromptSubmit closes a run of ≥5 tool calls with ≥3 distinct tool kinds and 0 corrections | sorted `tool_kinds` tuple |
## Process
@@ -65,10 +155,22 @@ The hook emits these `type` values into the journal:
- `edit_churn`: cluster by file basename pattern (e.g. `*.test.ts`).
- `build_loop`: cluster by `session`.
- `subagent_dispatch_pattern`: cluster by `subagent_type`.
5. **Multi-axis correlation**: for each session that produced ≥2 distinct struggle types (`tool_error_loop`, `dead_end`, `weak_agent`, `retry_loop`, `edit_churn`, `build_loop`), tag clusters from that session as `multi_axis: true`. This grants +1 confidence at scoring.
- `silent_drift`: cluster by `active_skills[0]` (empty string when no skill is active).
- `file_reread`: cluster by file basename (same offset-agnostic same-file re-Read pattern).
- `error_after_recovery`: cluster by (`recovered_from`, `original_fp`).
- `correction_free_streak`: cluster by `active_skills[0]`. Treat ≥3 streaks across ≥2 sessions naming the same skill as cross-session evidence.
- `clean_recovery`: cluster by (`recovered_from`, `active_skills[0]`). A win cluster qualifies for `skill_edit` only when the named skill exists in `skills_root`.
- `task_completed`: cluster by sorted `tool_kinds` tuple (the multi-tool recipe). Single entry qualifies for `skill_new` proposal (drafting protocol applies). Cross-session evidence requires ≥2 entries from distinct sessions with same tuple — without it, proposal queues, never auto-applies. Run the existing skill-overlap rule before drafting: if the recipe matches an existing skill's name/description tokens, route to `skill_edit` instead.
5. **Multi-axis correlation**: for each session that produced ≥2 distinct struggle types (`tool_error_loop`, `dead_end`, `weak_agent`, `retry_loop`, `edit_churn`, `build_loop`, `silent_drift`, `file_reread`, `error_after_recovery`), tag clusters from that session as `multi_axis: true`. This grants +1 confidence at scoring.
5b. **Skill-attribution sub-clustering**: after primary clustering (step 4), for every struggle cluster (`tool_error_loop`, `dead_end`, `weak_agent`, `retry_loop`, `edit_churn`, `build_loop`, `silent_drift`, `file_reread`, `error_after_recovery`) that contains entries with non-empty `active_skills[0]`:
- Split into per-skill sub-clusters keyed on `active_skills[0]`. Entries with empty `active_skills` stay in the original cluster.
- If a sub-cluster has ≥3 entries AND names a skill that exists in `skills_root`, mark it as a candidate for `skill_edit` (struggle-driven variant; see "Struggle-driven `skill_edit` eligibility"). Otherwise treat the parent cluster normally.
- The umbrella cluster (cross-skill) still emits its usual proposal type (memory, etc.) — sub-clusters do NOT replace it, they supplement it.
6. For each cluster qualifying under the rubric — ≥3 occurrences across ≥2 sessions, OR (for struggle types) ≥1 entry within a single session, OR (for `correction`) ≥3 occurrences across ≥2 cwds:
a. If cluster topic matches a rejected idea via the rejected-ideas fuzzy set (≥2 token overlap with rejection's `# Why`), skip with reason `"rejected-similar"`.
b. Pull ~20 messages of transcript context from `transcripts_root` to enrich. Never read full transcripts.
b1. **Causal diagnosis** (required for every proposal type): from the pulled context, draft a `# Diagnosis` block per the "Diagnosis drafting protocol". Cite ≥1 verbatim transcript quote within the `source_entries` window. If causation cannot be reconstructed, write `Mismatch: unclear` and apply `-1` confidence (rubric penalty). Diagnosis writes the proposal's narrative *before* the proposal body is drafted in step 6e.
c. **Solution synthesis** (when candidate type is `skill_new` AND cluster qualifies): pull additional ~30 messages around friction events (~50 messages total). Extract:
- Concrete trigger phrases the user says verbatim.
- Tools / files involved.
@@ -84,7 +186,7 @@ The hook emits these `type` values into the journal:
g. Apply feedback bias (step 1e) and multi-axis bonus.
h. **Record `source_entries`**: list every journal entry timestamp that fed this cluster. Goes in proposal frontmatter as a YAML block-form array (one `- "<ts>"` per line). The skill consumes this on apply/reject to archive matching entries out of `journal.jsonl` and into `journal/actioned-<id>.jsonl`.
i. Emit proposal file to `proposals_dir/`.
7. Emit punch list to stdout (last message): `{"new":N, "high_confidence":[...], "queued":[...], "skipped":[...]}`. The `cursor` field in `state.json` is vestigial as of v0.2.0 — do not read or write it.
7. Emit punch list to stdout (last message): `{"new":N, "high_confidence":[...], "queued":[...], "skipped":[...]}`.
## Skill overlap rule
@@ -139,10 +241,6 @@ Constraints:
- ≤80 lines of body content. Karpathy "Surgical".
- Slug MUST NOT collide with any existing skill name in `skills_root`.
When the main thread applies a `skill_new` proposal:
1. Creates `~/.claude/skills/<slug>/` directory.
2. Writes the `# Proposed change` body to `<slug>/SKILL.md`.
3. Tells the user: "skill `<slug>` written. Activates immediately on next user turn (CC v2.1.0+ auto-hot-reload)."
## Memory drafting protocol (for `memory` proposals)
@@ -152,10 +250,12 @@ Required structure:
```markdown
---
name: <human-readable name, ≤80 chars>
description: <one-line description used to decide future relevance — be specific, ≤200 chars>
type: user | feedback | project | reference
originSessionId: <session_id from journal entries that fed this cluster>
name: <slug — snake_case, MUST equal the target filename without `.md`, e.g. feedback_go_test_cache>
description: "<one-line used to decide future relevance — be specific, ≤200 chars>"
metadata:
node_type: memory
type: user | feedback | project | reference
originSessionId: <session_id from journal entries that fed this cluster>
---
<Body content per type, see CLAUDE.md memory schema:
@@ -165,46 +265,272 @@ originSessionId: <session_id from journal entries that fed this cluster>
- reference: pointer to external system + what's there.>
```
The frontmatter MUST match the live auto-memory schema exactly: `name` is the
slug (NOT a prose title), and `node_type`, `type`, `originSessionId` live under
a `metadata:` block (verify against an existing file in the target memory dir
before drafting — match its shape).
Constraints:
- Frontmatter fields `name`, `description`, `type` are **required**. Skill enforces this at apply time.
- `originSessionId` is required — must be a `session` value from one of the cluster's journal entries.
- Top-level `name` + `description` and nested `metadata.node_type` (always `memory`) + `metadata.type` are **required**. Skill enforces this at apply time.
- `metadata.originSessionId` is required — must be a `session` value from one of the cluster's journal entries.
- ≤50 LOC of body content. Surgical.
- Slug (used in `target` path filename) must not collide with any existing memory file.
- For `type=feedback` and `type=project`, body MUST contain `**Why:**` and `**How to apply:**` lines (CLAUDE.md memory schema).
- `name`/slug (also the `target` path filename) must not collide with any existing memory file.
- For `type: feedback` and `type: project`, body MUST contain `**Why:**` and `**How to apply:**` lines (CLAUDE.md memory schema).
## Diagnosis drafting protocol (required for every proposal)
Every proposal's body MUST include a `# Diagnosis` section between `# Why` and `# Assumptions`. It states the causal chain — *trigger → action → mismatch → outcome* — that motivates the proposed change, grounded in transcript evidence.
Required structure (exactly four labelled lines):
```markdown
# Diagnosis
**Trigger:** <what the user wanted / context the assistant was in — 1 sentence>
**Action:** <what the assistant did — 1 sentence, name specific tools/files when relevant>
**Mismatch:** <how the action diverged from the trigger — 1 sentence>
**Outcome:** <what surfaced the mismatch — user correction quote, error message, dead end — must include ≥1 verbatim quote ≤80 chars from transcript, in backticks>
```
Constraints:
1. ≤5 LOC of prose total.
2. ≥1 verbatim transcript quote, max 80 chars, wrapped in backticks.
3. The quote MUST appear within ~20 messages of one of the `source_entries` timestamps (transcript context window already pulled in step 6b).
4. No speculation — if causation is unclear from available context, write `Mismatch: unclear — see Outcome` and the cluster takes a `-1` rubric penalty (see rubric).
5. For win clusters (`correction_free_streak`, `clean_recovery`) where there is no failure: `Mismatch: None` is a valid value. Outcome cites the recovery quote or the silence ("no correction across N prompts" + closest journal `ts`).
Example — struggle cluster:
```markdown
# Diagnosis
**Trigger:** User asked to run Go tests in three different sessions, expected fresh results each time.
**Action:** Assistant ran `go test ./...` without `-count=1` flag.
**Mismatch:** Go's test cache returned stale passes from prior runs; assistant did not invalidate.
**Outcome:** User corrected with `"no use go test -count=1"` (s-aaa, 2026-05-10T10:00).
```
Example — win cluster:
```markdown
# Diagnosis
**Trigger:** Bash commands failed 3× with the same fingerprint; user did not intervene.
**Action:** Assistant switched from Bash to `Read` + `Edit` for the same goal, finished without further error.
**Mismatch:** None — recovery confirms the alternate tool is the right path here.
**Outcome:** Three clean PostToolUse events after the loop (`recovered_from: tool_error_loop`, s-bbb).
```
After drafting the four lines, set proposal frontmatter `diagnosis_summary` to a single sentence ≤120 chars derived from the **Mismatch** line — used for skim/search across `applied/` and `rejected/`.
## Win-driven `skill_edit` eligibility
A `skill_edit` proposal sets `auto_apply_eligible: true` ONLY when ALL hold:
1. `confidence ≥ 4`.
2. `cross_session_evidence == true`.
3. `# Why` cites ≥1 win-signal entry (`clean_recovery` or `correction_free_streak`) whose `active_skills` includes the target skill slug. Record this entry's `ts` in frontmatter field `win_evidence`.
4. Diff is append-only — verify no `-` lines on existing SKILL.md content.
5. Diff `+` lines ≤ 30.
6. Resulting SKILL.md size ≤ 2× current size. Record both byte counts in frontmatter fields `bytes_before`, `bytes_after`.
7. No entry in `applied_dir/` for the same `target` with `last_auto_edit` newer than 7 days ago (cooldown).
8. No entry in `rejected_dir/` for this `target` with `auto_apply_blacklist: true` newer than 30 days ago.
9. **Contradiction check passes.** Tokenize both the existing SKILL.md and the new appended section per the same tokenizer + stopword list as the skill-overlap rule. Search for negation tokens (`never`, `not`, `no`, `don't`, `avoid`, `forbid`, `stop`, `disable`) in the existing content; take a 6-token window around each match. If the new section contains an assertion token (`always`, `must`, `should`, `do`, `enable`, `yes`) whose surrounding 6-token window shares ≥2 content tokens with the existing negation window → flag as contradiction. Repeat in the inverse direction (negations in new section vs assertions in existing). On any flag: set `auto_apply_eligible: false` and add frontmatter field `contradiction_flag: "<one-line summary naming the negation token, the conflicting tokens, and the line in existing content where the negation appears>"`. Heuristic only — false positives queue for review, never silently auto-apply.
If any of (3)(9) fails: still emit the proposal, but `auto_apply_eligible: false` — main thread queues for review.
## Struggle-driven `skill_edit` eligibility
Skill-attribution sub-clustering (step 5b) produces struggle-driven `skill_edit` candidates: a sub-cluster of ≥3 struggle entries all naming the same `active_skills[0]` that exists in `skills_root`. These proposals are emitted but **ALWAYS queue**`auto_apply_eligible: false` regardless of confidence. Negative evidence on a skill is a weaker basis for self-modification than positive evidence (the skill may be active during friction caused by something else), so the human reviews every one.
A struggle-driven `skill_edit` proposal MUST:
1. Set `target` to the matched skill's `SKILL.md` path.
2. Cluster severity-sum ≥ 10 (same threshold as the +1 rubric bullet).
3. Sub-cluster names exactly one skill (no ambiguity across distinct `active_skills[0]` values).
4. `# Proposed change` is an append-only diff adding a `## When struggling` section (naive default body: a checkpoint-or-pause rule appropriate to the dominant signal — e.g. `dead_end` → "After 16 PostToolUse events without UserPromptSubmit, emit a one-line checkpoint summary before continuing.").
5. Frontmatter includes `struggle_evidence: "<ts of one source entry naming this skill>"` and `struggle_signals: [<list of signal types in the sub-cluster>]`. The win-driven `win_evidence` field is omitted.
6. Subject to the same Per-(skill, fingerprint) cooldown as win-driven `skill_edit`.
If gate (2) or (3) fails: skip the sub-cluster (the parent cluster still produces its umbrella proposal). The sub-cluster's `source_entries` overlap with the parent's — the apply pipeline handles dedup via the excluded-timestamps set.
## Per-(skill, fingerprint) cooldown
The cooldown gate is keyed on **(target_skill, proposal_fingerprint)** — not on target_skill alone. A rejected/applied proposal for skill `X` with fingerprint `A` does NOT block future proposals for skill `X` with fingerprint `B`.
`proposal_fingerprint` is computed deterministically as `djb2(skill_slug + "\n" + signal_cluster_id + "\n" + normalized_diff_body)` returned as base36, where:
- `skill_slug` — target skill basename (or proposed slug for `skill_new`)
- `signal_cluster_id` — a **stable** cluster id derived from signal type + key (e.g. `tool_error_loop-ECONNREFUSED:5432`), NOT the ephemeral per-run trace id (`c1`). Stability matters: the same logical proposal must hash identically across `/reflect` runs or the cooldown can never match a prior applied/rejected record.
- `normalized_diff_body` — proposal's `# Proposed change` section with all whitespace collapsed to single spaces and trimmed
Do NOT hand-compute the hash (an LLM cannot reproduce djb2 reliably). Run the canonical implementation (`computeProposalFingerprint()` in `adam-cooldown.mjs`) via Bash, then write the result into frontmatter:
```bash
node ~/.claude/adam/scripts/adam-cooldown.mjs --compute \
--skill <slug> --cluster <signal_cluster_id> --diff-file <file-with-Proposed-change-body>
# → {"fingerprint":"<djb2_base36>"} (diff body may also be piped on stdin)
```
Both apply-time and analyst-time *gate* checks then invoke `adam-cooldown.mjs --skill <slug> --fingerprint <hash>`. The script returns one of `{"status":"cool"}`, `{"status":"cooldown",...}`, or `{"status":"blacklisted",...}`. Auto-apply requires `cool`.
Backward compat: proposals from before this rubric version (no `proposal_fingerprint` field) are treated as `fingerprint = "legacy"`. The cooldown script matches legacy applied/rejected records against any query fingerprint for the same skill — i.e. coarse-grained gating until those records age out of their windows (7d / 30d).
## Scoring: task_completed dampener
Before scoring each cluster's confidence, multiply the cluster's urgency score by the `dampener` value reported by `adam-score.mjs` for the session the cluster originated from:
- `task_completed_count >= 3` in that session → dampener `0.5`
- `task_completed_count >= 1` in that session → dampener `0.75`
- otherwise → dampener `1.0`
Rationale: sessions that successfully closed several multi-tool tasks alongside the friction signal are noisier proposal sources than sessions that produced only friction. The dampener does not zero out signals; it down-weights urgency so cross-session friction beats single-session friction-with-recoveries.
The skill (`adam-self-improvement/SKILL.md` §1) runs `adam-score.mjs` immediately after `adam-window.mjs` and passes both outputs into the analyst's dispatch prompt.
## A/B effectiveness
Every auto-applied edit (`skill_edit`, `skill_new`, `memory`, `nudge`) gets a one-line tracking entry written to `~/.claude/adam/ab-tracking.jsonl` by `adam-self-improvement/SKILL.md` immediately after the proposal is moved to `applied/`. **`reinforcement` is the one exception — it is a positive-only ledger and is intentionally NOT A/B-tracked (see §"`reinforcement` proposals"), to avoid skewing regression detection.** Schema:
```json
{"applied_at":<ms>,"proposal_id":"<id>","proposal_type":"...","target_skill":"<slug>","proposal_fingerprint":"<hash>","originating_signals":[{"type":"<signal>","count":<N>,"session_ids":[...]}],"pre_window_days":7}
```
After ≥7 days, `~/.claude/adam/scripts/adam-ab-measure.mjs` reads each entry and compares signal counts in the 7-day window BEFORE `applied_at` against the 7-day window AFTER (raw journal counts — does NOT use `adam-window.mjs` filtering). Status assignment:
- `delta_pct = (post - pre) / pre * 100`
- `pre == 0``no_baseline` (cold start, no measurement possible)
- `delta_pct <= -25``improved`
- `-25 < delta_pct < 25``neutral`
- `delta_pct >= 25``regressed`
- entry younger than 7 days → `pending`
The `/reflect` skill runs `adam-ab-measure.mjs --format json` before dispatching this agent, filters to `status == "regressed"`, and passes the list as `ab_regressions` (each object has `proposal_id`, `target_skill`, `proposal_type`, `delta_pct`, `pre_count`, `post_count`).
**When `ab_regressions` is non-empty, you MUST emit a `## Regressions` section at the TOP of your output (above the proposals listing).** One bullet per regressed proposal listing `proposal_id`, `target_skill`, `delta_pct`. The skill auto-rolls back regressed proposals via `adam-rollback.mjs` before dispatching you — this section is your record of what was rolled back and why.
The clustering trace summary (see §"Clustering trace") adds an extra `regressions=<N>` key alongside `considered/emitted/skipped`. When no `ab_regressions` arrive (or list is empty), emit `regressions=0`.
## Keypoint matrix (MOSS §3.3/§4.2)
When running in `stage=diagnose`, you MUST produce a **keypoint matrix** alongside each batch diagnosis. This structured evaluation replaces ad-hoc confidence with per-capability scoring.
Capability dimensions (score each 02 per batch: 0=no signal, 1=partial, 2=strong evidence):
| dimension | description | positive signals | negative signals |
|---|---|---|---|
| `tool_selection` | correct tool chosen first try | low `retry_loop` | high `retry_loop`, `weak_agent` |
| `scope_discipline` | stays within requested scope | low `edit_churn`, low `dead_end` | high `edit_churn`, `dead_end`, `silent_drift` |
| `error_recovery` | recovers from errors without user help | `clean_recovery` | `error_after_recovery`, `tool_error_loop` |
| `first_attempt` | succeeds without corrections | `correction_free_streak` | `correction` |
| `build_reliability` | builds/tests pass on first try | `task_completed` with build tools | `build_loop` |
The matrix goes into the diagnosis output as `keypoints: {tool_selection: N, scope_discipline: N, ...}`. The implement stage uses it to:
1. Prioritize proposals targeting the weakest dimensions.
2. Include `keypoint_target: "<dimension>"` in proposal frontmatter.
3. Track dimension trends across `/reflect` runs (persisted in `~/.claude/adam/keypoint-history.jsonl`).
## Confidence rubric (deterministic — do NOT vibe)
Sum:
- Signal repeated ≥3× across ≥2 sessions: **+2**
- Struggle signal (`tool_error_loop`, `dead_end`, `weak_agent`, `retry_loop`, `edit_churn`, `build_loop`) appearing ≥1× within a single session: **+2** *(each struggle entry already represents a hook-side threshold crossing — e.g. 8 tools without a prompt, 3 same-args retries, 4 edits to one file. Treat each entry as one piece of evidence. Does not stack with the cross-session bonus.)*
- Struggle signal (`tool_error_loop`, `dead_end`, `weak_agent`, `retry_loop`, `edit_churn`, `build_loop`, `silent_drift`, `file_reread`, `error_after_recovery`) appearing ≥1× within a single session: **+2** *(each struggle entry already represents a hook-side threshold crossing — e.g. 8 tools without a prompt, 3 same-args retries, 4 edits to one file, 5 read-only tools in a row, same-fp error after a recovery. Treat each entry as one piece of evidence. Does not stack with the cross-session bonus.)*
- Transcript contains positive endorsement (`yes`, `exactly`, `do that`, `keep doing`) within 2 messages of related action: **+2**
- Multi-axis cluster (≥2 distinct struggle types in same session): **+1**
- Cluster severity-sum ≥ 10 (severity per entry = `max(1, floor(count / divisor))` using `SEVERITY_DIVISORS` from `adam-score.mjs``dead_end:8, edit_churn:4, tool_error_loop:3, retry_loop:3, file_reread:3, weak_agent:2, build_loop:1`; entries without `count` count as 1): **+1**
- Cluster severity-sum ≥ 32: **+1** *(additive — a severity-sum of 32 gets +1 from the previous bullet AND +1 here, total +2.)*
- Skill-attributed sub-cluster (≥3 entries naming the same `active_skills[0]` that exists in `skills_root`): **+1**
- Type-bias penalty from feedback loop (≥3 rejections, applied:rejected ratio <1:2 for this `type`): **-1**
- Blast radius low (memory file or new isolated skill): **+1**
- Blast radius medium (new agent, new hook, edit existing skill): **0**
- Blast radius high (CLAUDE.md, settings.json hooks, edit agent, deletion): **-1**
- Diagnosis flags `Mismatch: unclear` (causation could not be reconstructed from transcript context): **-1**
- Blast radius: low **+1**, medium **0**, high **-1** (default per type — see Proposal types table)
- Surgical (one file, ≤50 LOC for non-skill_new; ≤80 LOC for skill_new): **+1**
- Touches deny-list (settings.json hooks/permissions, CLAUDE.md, deletions): **-3**
`auto_apply_eligible: true` requires **all** of:
- `confidence ≥ 4`
- `blast_radius == "low"`
- `type ∈ {memory, skill_new}`
- `type ∈ {memory, skill_new, skill_edit}``skill_edit` additionally requires the win-driven gate (see "Win-driven `skill_edit` eligibility")
- `cross_session_evidence == true` — the +2 signal-repetition bonus came from the cross-session bullet (≥3× across ≥2 sessions). **Single-session-only struggle proposals always queue, never auto-apply, regardless of total confidence.** Record as frontmatter field `cross_session_evidence: true|false` on every proposal.
## Proposal types
| Type | Target | Default blast | Auto-apply? |
|---|---|---|---|
| `memory` | `~/.claude/projects/<encoded-home>/memory/*.md` | low | yes if conf≥4 AND cross_session |
| `memory` | `~/.claude/projects/-Users-nvm/memory/*.md` | low | yes if conf≥4 AND cross_session |
| `skill_new` | new dir under `~/.claude/skills/` | low | yes if conf≥4 AND cross_session |
| `skill_edit` | existing skill file | medium | no |
| `skill_edit` | existing skill file | medium | yes (win-driven only) if win-evidence + LOC + cooldown gates all pass (see "Win-driven skill_edit eligibility"); struggle-driven variant ALWAYS queues (see "Struggle-driven skill_edit eligibility") |
| `nudge` | append to `~/.claude/adam/active-nudges.json` | low | yes when `dead_end_count ≥ 3` in a single session (single-session evidence sufficient; skips cross-session gate). Does NOT modify skills/memories/CLAUDE.md — only seeds a SessionStart reminder for a future session. |
| `reinforcement` | append entry to `~/.claude/adam/reinforcements.jsonl` | low | yes if conf≥4 AND blast_radius=low (same gate as memory). Applies via `adam-apply-reinforcement.mjs`; appends one JSONL entry, no code/memory/skill changes. |
| `agent_new` | new file under `~/.claude/agents/` | medium | no |
| `agent_edit` | existing agent file | medium | no |
| `claude_md_edit` | `~/.claude/CLAUDE.md` | high | no |
| `hook_new` / `hook_edit` | `settings.json` hooks | high | no |
| `harness_edit` | adam's own scripts/agent/hooks (see "Harness self-modification") | high | **never** |
| `deletion` | any skill/agent (soft delete) | high | no |
### `nudge` proposals
A `nudge` proposal does NOT modify any persistent rubric/skill/memory artifact. Its sole side-effect is to append an entry to `~/.claude/adam/active-nudges.json` so the next SessionStart hook surfaces a one-line reminder to the user in a *different* session.
Trigger: `adam-nudge-eligibility.mjs --session <id>` returns `eligible: true` (i.e. ≥3 `dead_end` entries inside a single session). Distinguished from `skill_edit` precisely because there is no learning artifact to mutate — the action surfaces a checkpoint reminder, not a behavior change.
`active-nudges.json` entry shape (created by the skill at apply time):
```json
{
"kind": "dead_end_reminder",
"message": "adam: previous session hit 3 dead_ends — consider a checkpoint before continuing.",
"created_at": <ms>,
"expires_at_ts": <ms now + 7 days>,
"max_displays": 3,
"displays_used": 0,
"source_session": "<originating session_id>"
}
```
### `reinforcement` proposals
A `reinforcement` proposal is logged when `adam-score.mjs` reports `count >= 3` clean `task_completed` events citing the same `active_skills[0]` slug. Frontmatter MUST include `skill_slug`, `count`, `source_session`, `confidence`, `blast_radius: low`. Apply gate (`confidence >= 4 AND blast_radius == low`) is identical to the `memory` gate — when both hold, the skill invokes `~/.claude/adam/scripts/adam-apply-reinforcement.mjs <proposal-path>` which appends one JSON line to `~/.claude/adam/reinforcements.jsonl` of shape `{ts, skill_slug, count, source_session}`. No code/memory/skill modifications either side of the gate — reinforcements are a positive-only ledger, separate from `ab-tracking.jsonl` (A/B intentionally does NOT measure positive signals to avoid skewing regression detection).
Note that `task_completed` alone — without an adjacent negative signal cluster — is NOT a proposal source. It is a urgency *modifier* (see "Scoring: task_completed dampener") and a reinforcement input only.
### `harness_edit` proposals (MOSS §1 Table 1)
MOSS's core thesis: "routing, hook ordering, state invariants, and dispatch live in code rather than in any text artifact, an entire class of structural failure is physically unreachable from the text layer." This proposal type extends ADAM's evolution scope to its own harness.
**Allowed targets** (harness files that ADAM may propose edits to):
| target | what it controls |
|---|---|
| `~/.claude/adam/scripts/adam-observe.mjs` | signal detection regexes, thresholds, counters |
| `~/.claude/adam/scripts/adam-score.mjs` | severity divisors, dampener thresholds |
| `~/.claude/adam/scripts/adam-window.mjs` | per-signal sliding window durations |
| `~/.claude/adam/scripts/adam-batch.mjs` | evidence batching logic |
| `~/.claude/agents/adam.md` | this agent's own rubric, clustering, proposal rules |
| `~/.claude/hooks/adam-observe.mjs` | hook integration, event routing |
**Gates (all must hold — stricter than any other type):**
1. `confidence ≥ 5`
2. `cross_session_evidence == true` (≥5 occurrences across ≥3 sessions)
3. `auto_apply_eligible: false`**always**. Harness edits are never auto-applied.
4. `blast_radius: high`
5. Proposal includes a `# Test verification` section with the command `bash ~/.claude/adam/tests/run-tests.sh` and the expected result "138 passed, 0 failed" (or current pass count). The skill runs this test before applying.
6. Change is surgical: ≤30 LOC diff, single file.
7. `# Diagnosis` reconstructs the causal chain from harness-level behavior (not from text-artifact behavior). The mismatch must name a specific code path (function, regex, threshold) in the target file.
**When to propose `harness_edit`:**
- Signal detection misses a recurring friction pattern (false negative in adam-observe.mjs)
- A/B measurement shows systematic bias (e.g., windows too short/long in adam-window.mjs)
- Scoring thresholds produce consistently over/under-weighted proposals (adam-score.mjs)
- Batch clustering produces too-coarse or too-fine groupings (adam-batch.mjs)
**When NOT to propose `harness_edit`:**
- The fix is achievable via a text-mutable type (skill, memory, nudge)
- Evidence is from a single session only
- The change would affect test outcomes without clear improvement evidence
## Special handling
### CLAUDE.md edits
@@ -231,7 +557,7 @@ Filename: `proposals_dir/YYYY-MM-DD-NNN-<type>-<slug>.md` (NNN is daily counter
```markdown
---
id: YYYY-MM-DD-NNN
type: skill_new | memory | skill_edit | agent_new | agent_edit | claude_md_edit | hook_new | hook_edit | deletion
type: skill_new | memory | skill_edit | nudge | reinforcement | agent_new | agent_edit | claude_md_edit | hook_new | hook_edit | harness_edit | deletion
target: <absolute path — for skill_new, the will-be path: ~/.claude/skills/<slug>/SKILL.md>
confidence: <int>
blast_radius: low | medium | high
@@ -243,11 +569,33 @@ source_entries:
- "<journal entry ts that fed this cluster>"
- "<another ts>"
- "..."
# skill_edit / skill_new — required for cooldown gate (see "Per-(skill, fingerprint) cooldown" below)
proposal_fingerprint: "<djb2_base36 hash — compute via `adam-cooldown.mjs --compute`; see §Per-(skill, fingerprint) cooldown>"
target_skill: "<slug — populated for skill_edit (basename of target dir) and skill_new (proposed slug)>"
# A/B effectiveness — required on every proposal; consumed at apply time to seed ab-tracking.jsonl
originating_signals:
- {type: "<signal_type>", count: <N>, session_ids: ["<sid>", "..."]}
# skill_edit only — required when auto_apply_eligible: true
win_evidence: "<ts of triggering clean_recovery or correction_free_streak entry>"
bytes_before: <int>
bytes_after: <int>
# skill_edit only — populated when contradiction heuristic flags a conflict (sets auto_apply_eligible: false)
contradiction_flag: "<one-line summary or null>"
# optional — auto-populated from Diagnosis Mismatch line
diagnosis_summary: "<≤120 chars, single sentence>"
# keypoint matrix — which capability dimension this proposal targets (MOSS §4.2)
keypoint_target: "<tool_selection | scope_discipline | error_recovery | first_attempt | build_reliability>"
# harness_edit only — test command and expected output
test_command: "bash ~/.claude/adam/tests/run-tests.sh"
test_expected: "<N> passed, 0 failed"
---
# Why
<observed evidence: session ids, dates, quotes from transcript synthesis>
# Diagnosis
<four labelled lines per "Diagnosis drafting protocol": Trigger / Action / Mismatch / Outcome — Outcome must contain ≥1 backtick-wrapped transcript quote ≤80 chars>
# Assumptions
- <assumption 1>
- <assumption 2>
@@ -258,8 +606,8 @@ source_entries:
<for memory: full memory file body (frontmatter + content)>
<for others: unified diff or full file content; for deletion: soft-delete command>
# Overlap (skill_edit only)
<existing skill id, rule matched (name|description), overlapping tokens>
# Overlap
<conditional — see Skill overlap rule §6: only emitted for `skill_edit` proposals>
# Success criterion
<runnable check>
@@ -277,11 +625,53 @@ Print a single JSON line to stdout:
## What you must NOT do
- Do not read full transcripts — ~20 messages base context per cluster, +30 for skill_new solution synthesis (50 total cap).
- Do not call other agents.
- Do not write to `~/.claude/skills/`, `~/.claude/agents/`, `settings.json`, `CLAUDE.md`, or any existing skill/agent file directly. All changes go through proposal files for main-thread review and apply.
- Do not write to `~/.claude/skills/`, `~/.claude/agents/`, `settings.json`, `CLAUDE.md`, adam scripts, or any existing skill/agent/harness file directly. All changes go through proposal files for main-thread review and apply. This includes `harness_edit` proposals — you draft the diff, the skill applies it after test verification.
- Do not delete files. Deletion proposals describe a soft-move; the main thread executes it.
- Do not write outside `proposals_dir/` and `state_path`.
- Do not propose anything matching a `rejected/` entry (≥2 token overlap with rejection's `# Why`).
- Do not invent trigger phrases for `skill_new` — every trigger must come from observed user input.
- Do not stack the cross-session and single-session repetition bonuses — pick whichever qualifies, never both.
## Clustering trace (always emit)
After your proposals are written and BEFORE the final punch-list JSON line, you MUST emit a fenced code block tagged ` ```trace ` containing one line per cluster considered during this pass. This is mandatory regardless of whether any proposals were emitted, and regardless of any flags. The skill controls whether to SHOW this block to the user; you always produce it.
Line format (one cluster per line, all four pipe-separated chunks required):
```
<cluster_id> | signal=<type> count=<N> sessions=<M> | gates: threshold=<pass|fail:<reason>>, cross_session=<pass|fail>, window=<in:<N>/out:<M>>, contradiction=<none|vetoed:[[memory-name]]> | decision: <proposal_emitted:<type>|skipped:<reason>>
```
Field semantics:
- `cluster_id` — short stable identifier you assign per cluster this pass (e.g. `c1`, `c2`, …, or `<signal>-<short-key>`). Used by humans + adam-explain.mjs.
- `signal=<type>` — the journal signal type (e.g. `correction`, `dead_end`).
- `count=<N>` — number of journal entries that fell into this cluster.
- `sessions=<M>` — distinct session ids contributing.
- `gates:` — four sub-fields, all required:
- `threshold=pass` if the cluster met the "≥3 across ≥2 sessions" (or single-session struggle) rubric gate, else `fail:<short reason>` (e.g. `fail:only_1_session`, `fail:count_below_3`).
- `cross_session=pass|fail` — boolean restatement matching the `cross_session_evidence` rubric field.
- `window=in:<N>/out:<M>` — entries that survived per-signal sliding window vs entries dropped as stale. Pre-filter from `adam-window.mjs` makes `out` usually 0; record what you observed.
- `contradiction=none` for non-skill_edit clusters; for `skill_edit` set `vetoed:[[<memory-or-skill-name>]]` when the contradiction heuristic flagged a conflict, else `none`.
- `decision:` — one of:
- `proposal_emitted:<type>` (e.g. `proposal_emitted:memory`, `proposal_emitted:skill_new`).
- `skipped:<reason>` where reason is a single token from `{threshold, contradiction, window, rejected-similar, type-bias, deletion-criteria, claude-md-scope, overlap, other}`.
After the cluster lines, emit exactly one summary line (this trailing line is REQUIRED — adam-explain.mjs falls back to synthesising it from the cluster lines if you omit it, but you should always write it):
```
SUMMARY: considered=<N> emitted=<M> skipped=<N-M> regressions=<R> reasons={threshold:X, contradiction:Y, window:Z, other:W}
```
`reasons` keys: the same skip-reason tokens used in `decision:`; values are counts; include all four canonical keys (`threshold`, `contradiction`, `window`, `other`) even when zero — `other` is the catch-all for any reason not in the first three. `regressions=<R>` is the count of entries with `status == "regressed"` in the `ab_regressions` input (0 when empty/absent — see §"A/B effectiveness").
Worked example (4 clusters, 2 emitted, 2 skipped):
```trace
c1 | signal=correction count=5 sessions=3 | gates: threshold=pass, cross_session=pass, window=in:5/out:0, contradiction=none | decision: proposal_emitted:memory
c2 | signal=dead_end count=1 sessions=1 | gates: threshold=pass, cross_session=fail, window=in:1/out:0, contradiction=none | decision: proposal_emitted:skill_new
c3 | signal=retry_loop count=2 sessions=1 | gates: threshold=fail:count_below_3, cross_session=fail, window=in:2/out:0, contradiction=none | decision: skipped:threshold
c4 | signal=tool_error_loop count=4 sessions=2 | gates: threshold=pass, cross_session=pass, window=in:4/out:6, contradiction=none | decision: skipped:window
SUMMARY: considered=4 emitted=2 skipped=2 regressions=0 reasons={threshold:1, contradiction:0, window:1, other:0}
```
Clusters that were filtered out entirely BEFORE clustering (e.g. excluded by `applied/*.md` `source_entries`) do not appear here — only clusters that the agent actually considered as candidates. Note: the trace lives entirely in your final assistant message, alongside the punch-list JSON; nothing else writes to disk on the agent side.
+13
View File
@@ -0,0 +1,13 @@
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 160 160" width="160" height="160" role="img" aria-label="claude-adam logo">
<title>claude-adam</title>
<desc>A swaddled baby — rounded A-shape bundle with a face inside and small hands extending from the wrap-band. Dark-background variant.</desc>
<g stroke="#f0f6fc">
<path d="M 36 134 Q 30 78 80 28 Q 130 78 124 134 Z" fill="none" stroke-width="9" stroke-linejoin="round"/>
<path d="M 16 100 L 44 100 Q 80 115 116 100 L 144 100" fill="none" stroke-width="6" stroke-linecap="round"/>
<path d="M 75 78 Q 80 82 85 78" fill="none" stroke-width="2.5" stroke-linecap="round"/>
</g>
<g fill="#f0f6fc">
<circle cx="72" cy="64" r="3.2"/>
<circle cx="88" cy="64" r="3.2"/>
</g>
</svg>

After

Width:  |  Height:  |  Size: 763 B

+13
View File
@@ -0,0 +1,13 @@
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 160 160" width="160" height="160" role="img" aria-label="claude-adam logo">
<title>claude-adam</title>
<desc>A swaddled baby — rounded A-shape bundle with a face inside and small hands extending from the wrap-band. Light-background variant.</desc>
<g stroke="#24292f">
<path d="M 36 134 Q 30 78 80 28 Q 130 78 124 134 Z" fill="none" stroke-width="9" stroke-linejoin="round"/>
<path d="M 16 100 L 44 100 Q 80 115 116 100 L 144 100" fill="none" stroke-width="6" stroke-linecap="round"/>
<path d="M 75 78 Q 80 82 85 78" fill="none" stroke-width="2.5" stroke-linecap="round"/>
</g>
<g fill="#24292f">
<circle cx="72" cy="64" r="3.2"/>
<circle cx="88" cy="64" r="3.2"/>
</g>
</svg>

After

Width:  |  Height:  |  Size: 764 B

+19
View File
@@ -0,0 +1,19 @@
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 160 160" width="160" height="160" role="img" aria-label="claude-adam logo">
<title>claude-adam</title>
<desc>A swaddled baby — rounded A-shape bundle with a face inside and small hands extending from the wrap-band. Adapts to light/dark via embedded media query + currentColor fallback.</desc>
<style>
svg { color: #24292f; }
@media (prefers-color-scheme: dark) {
svg { color: #f0f6fc; }
}
</style>
<g stroke="currentColor">
<path d="M 36 134 Q 30 78 80 28 Q 130 78 124 134 Z" fill="none" stroke-width="9" stroke-linejoin="round"/>
<path d="M 16 100 L 44 100 Q 80 115 116 100 L 144 100" fill="none" stroke-width="6" stroke-linecap="round"/>
<path d="M 75 78 Q 80 82 85 78" fill="none" stroke-width="2.5" stroke-linecap="round"/>
</g>
<g fill="currentColor">
<circle cx="72" cy="64" r="3.2"/>
<circle cx="88" cy="64" r="3.2"/>
</g>
</svg>

After

Width:  |  Height:  |  Size: 946 B

+209 -8
View File
@@ -1,15 +1,216 @@
#!/usr/bin/env node
import { readdirSync } from "node:fs";
// adam-nudge.mjs — SessionStart hook. Prints reminders:
// 1. Pending proposals (≥3 queued in adam/proposals/).
// 2. Cross-session nudges (entries in adam/active-nudges.json whose
// source_session differs from the current session and that haven't
// expired or exhausted their max_displays).
// 3. Pending local-edit upgrades (`.adam-new` sidecars).
// 4. New-release notice: if a newer GitHub release exists than the installed
// `.version`, print a notify-only one-line update prompt. Cached + checked
// at most once/day, network call hard-capped at 1.5s, fully best-effort —
// never blocks SessionStart. Opt out with ADAM_NO_UPDATE_CHECK=1.
// NOTE: notify-only by design — applying an update re-runs install.sh,
// which resets ADAM's own /reflect-applied skill edits. The user chooses
// when to accept that, so we never auto-install.
import { readdirSync, readFileSync, writeFileSync, existsSync } from "node:fs";
import { join } from "node:path";
import { homedir } from "node:os";
const PROPOSALS = join(homedir(), ".claude", "adam", "proposals");
const HOME = process.env.HOME || homedir();
const CLAUDE_ROOT = join(HOME, ".claude");
const ADAM_ROOT = join(CLAUDE_ROOT, "adam");
const PROPOSALS = join(ADAM_ROOT, "proposals");
const NUDGES_FILE = join(ADAM_ROOT, "active-nudges.json");
const STATE_FILE = join(ADAM_ROOT, "state.json");
const VERSION_FILE = join(ADAM_ROOT, ".version");
const UPDATE_CHECK_FILE = join(ADAM_ROOT, ".update-check.json");
const THRESHOLD = 3;
const UPDATE_CHECK_INTERVAL_MS = 24 * 60 * 60 * 1000;
const UPDATE_FETCH_TIMEOUT_MS = 1500;
const RELEASES_API = "https://api.github.com/repos/lukaszraczylo/claude-adam/releases/latest";
const INSTALL_ONELINER = "curl -fsSL https://raw.githubusercontent.com/lukaszraczylo/claude-adam/main/install.sh | bash";
try {
const files = readdirSync(PROPOSALS).filter(f => f.endsWith(".md"));
if (files.length >= THRESHOLD) {
process.stdout.write(`adam: ${files.length} proposals queued. Run /reflect to review.\n`);
// Known installable paths (mirrors install.sh copy_file list). Checking a
// fixed shortlist keeps SessionStart latency under control vs full FS walk.
const PENDING_CHECK_PATHS = [
"hooks/adam-observe.mjs",
"hooks/adam-nudge.mjs",
"agents/adam.md",
"skills/adam-self-improvement/SKILL.md",
"commands/reflect.md",
"adam/scripts/adam-archive.mjs",
"adam/scripts/adam-upgrade.mjs",
"adam/scripts/adam-window.mjs",
"adam/scripts/adam-explain.mjs",
"adam/scripts/adam-nudge-eligibility.mjs",
"adam/scripts/adam-cooldown.mjs",
"adam/scripts/adam-score.mjs",
"adam/scripts/adam-ab-measure.mjs",
"adam/scripts/adam-apply-reinforcement.mjs",
"adam/scripts/adam-utils.mjs",
"adam/scripts/adam-batch.mjs",
"adam/scripts/adam-rollback.mjs",
"adam/tests/run-tests.sh",
];
function readJson(path, fallback) {
if (!existsSync(path)) return fallback;
try { return JSON.parse(readFileSync(path, "utf8")); } catch { return fallback; }
}
function readSessionInput() {
// SessionStart payload arrives on stdin; capture session_id if present.
// We don't block on stdin — best-effort, non-blocking.
try {
const buf = readFileSync(0, "utf8");
if (!buf) return null;
const parsed = JSON.parse(buf);
return parsed && typeof parsed.session_id === "string" ? parsed.session_id : null;
} catch { return null; }
}
function emitProposalReminder() {
try {
const PROPOSAL_RE = /^\d{4}-\d{2}-\d{2}-\d{3}-/;
const files = readdirSync(PROPOSALS).filter((f) => PROPOSAL_RE.test(f) && f.endsWith(".md"));
if (files.length >= THRESHOLD) {
process.stdout.write(`adam: ${files.length} proposals queued. Run /reflect to review.\n`);
}
} catch { /* proposals dir absent → silent */ }
}
function emitActiveNudges(currentSession) {
if (!existsSync(NUDGES_FILE)) return;
const raw = readJson(NUDGES_FILE, null);
if (!Array.isArray(raw)) return;
const now = Date.now();
const kept = [];
let mutated = false;
for (const entry of raw) {
if (!entry || typeof entry !== "object") { mutated = true; continue; }
const expires = Number(entry.expires_at_ts || 0);
if (!expires || expires <= now) { mutated = true; continue; }
const sourceSession = entry.source_session || "";
const max = Number(entry.max_displays || 0);
const used = Number(entry.displays_used || 0);
if (max > 0 && used >= max) { mutated = true; continue; }
// Cross-session gate: only print when current session differs.
if (sourceSession && currentSession && sourceSession === currentSession) {
kept.push(entry);
continue;
}
if (typeof entry.message === "string" && entry.message) {
process.stdout.write(entry.message + "\n");
const nextUsed = used + 1;
mutated = true;
if (max > 0 && nextUsed >= max) continue; // drop after exhaustion
kept.push({ ...entry, displays_used: nextUsed });
} else {
kept.push(entry);
}
}
} catch {}
process.exit(0);
if (mutated) {
try { writeFileSync(NUDGES_FILE, JSON.stringify(kept, null, 2)); } catch { /* swallow */ }
}
}
function emitPendingUpgrades() {
// Cheap: stat a fixed shortlist of `.adam-new` candidates. Non-fatal.
try {
let count = 0;
for (const rel of PENDING_CHECK_PATHS) {
const p = join(CLAUDE_ROOT, `${rel}.adam-new`);
try {
if (existsSync(p)) count++;
} catch { /* per-path swallow */ }
}
if (count > 0) {
process.stdout.write(
`[adam] ${count} pending upgrade(s). Review: node ~/.claude/adam/scripts/adam-upgrade.mjs --list\n`
);
}
} catch { /* never break SessionStart */ }
}
// --- update notifier (notify-only; see header note) ---
function readVersion() {
try { return readFileSync(VERSION_FILE, "utf8").trim() || null; } catch { return null; }
}
// Parse "vX.Y.Z" (leading v optional; pre-release/build suffix ignored).
function parseSemver(s) {
if (typeof s !== "string") return null;
const m = s.trim().replace(/^v/i, "").match(/^(\d+)\.(\d+)\.(\d+)/);
return m ? [Number(m[1]), Number(m[2]), Number(m[3])] : null;
}
// isNewer(a, b): true iff version a is strictly newer than b. Unparseable → false.
function isNewer(a, b) {
const pa = parseSemver(a), pb = parseSemver(b);
if (!pa || !pb) return false;
for (let i = 0; i < 3; i++) { if (pa[i] !== pb[i]) return pa[i] > pb[i]; }
return false;
}
async function fetchLatestTag() {
// Best-effort, hard-capped. Any failure (offline / timeout / rate-limit /
// parse / fetch-unavailable) returns null and the caller silently skips.
try {
if (typeof fetch !== "function") return null;
const ctrl = new AbortController();
const timer = setTimeout(() => ctrl.abort(), UPDATE_FETCH_TIMEOUT_MS);
let tag = null;
try {
const res = await fetch(RELEASES_API, {
signal: ctrl.signal,
headers: { "User-Agent": "claude-adam-nudge", "Accept": "application/vnd.github+json" },
});
if (res && res.ok) {
const j = await res.json();
if (j && typeof j.tag_name === "string") tag = j.tag_name;
}
} finally { clearTimeout(timer); }
return tag;
} catch { return null; }
}
function printUpdateNudge(latest, installed) {
process.stdout.write(
`[adam] update available: ${installed}${latest}. Apply: ${INSTALL_ONELINER}\n` +
` (re-runs install.sh — resets ADAM's own /reflect-applied skill edits; apply when you're ready)\n`
);
}
async function emitUpdateCheck() {
if (process.env.ADAM_NO_UPDATE_CHECK) return; // explicit opt-out
const installed = readVersion();
if (!installed) return; // no marker → nothing to compare
const cache = readJson(UPDATE_CHECK_FILE, {}) || {};
const now = Date.now();
let nudged = false;
// Instant nudge from cache (no network).
if (cache.latest && isNewer(cache.latest, installed)) { printUpdateNudge(cache.latest, installed); nudged = true; }
// Refresh cache at most once/day, best-effort — drives the nudge on the NEXT run.
if (!cache.last_check || (now - Number(cache.last_check)) > UPDATE_CHECK_INTERVAL_MS) {
const latest = await fetchLatestTag();
const next = { last_check: now, latest: latest || cache.latest || null };
try { writeFileSync(UPDATE_CHECK_FILE, JSON.stringify(next)); } catch { /* swallow */ }
if (latest && !nudged && isNewer(latest, installed)) printUpdateNudge(latest, installed);
}
}
async function main() {
const stdinSession = readSessionInput();
const stateSession = (() => {
const st = readJson(STATE_FILE, null);
return st && typeof st.session_id === "string" ? st.session_id : null;
})();
const currentSession = stdinSession || stateSession || "";
emitProposalReminder();
emitActiveNudges(currentSession);
emitPendingUpgrades();
await emitUpdateCheck();
}
main().catch(() => { /* never block SessionStart */ }).finally(() => process.exit(0));
+388 -24
View File
@@ -14,11 +14,85 @@ const JOURNAL = join(ROOT, "journal.jsonl");
const STATE = join(ROOT, "state.json");
const USAGE = join(ROOT, "usage.json");
const JOURNAL_DIR = join(ROOT, "journal");
// Safety fuse only — primary rotation is weekly (ISO Monday 00:00 UTC).
// If active journal exceeds this even mid-week, force-rotate to avoid runaway growth.
// Override via $ADAM_MAX_JOURNAL_BYTES (used by tests).
const MAX_JOURNAL_BYTES = Number(process.env.ADAM_MAX_JOURNAL_BYTES) || 50 * 1024 * 1024;
const CORRECTION_RE = /\b(no|stop|don't|don\'t|wrong|actually|nope|undo|revert)\b/i;
// Strong-correction tokens: any single occurrence in a prompt is a correction.
// Weak tokens (no/actually/wait) require co-occurrence with a negation/contrast
// token within an 8-token window — see isCorrection() below.
const CORRECTION_RE = /\b(stop|don't|don\'t|wrong|nope|undo|revert|incorrect|nevermind|never\s+mind|disregard|redo)\b|that's\s+wrong|hold\s+on|wait\s+wait|try\s+again|different\s+approach|that's\s+not\s+what\s+i\s+meant|not\s+what\s+i\s+wanted|start\s+over|go\s+back/i;
const WEAK_CORRECTION_TOKENS = new Set(["no", "actually", "wait"]);
const NEGATION_RE = /^(not|wrong|but|isn't|isn\'t|didn't|didn\'t|aren't|aren\'t|won't|won\'t|shouldn't|shouldn\'t|don't|don\'t|nope|bad|broken|fail|fails|failed|failing)$/i;
const WEAK_WINDOW = 8;
function isCorrection(text) {
if (!text || typeof text !== "string") return false;
if (CORRECTION_RE.test(text)) return true;
// Weak-token path: token must co-occur with a negation/contrast within WEAK_WINDOW tokens.
const tokens = text.toLowerCase().split(/\s+/).map(t => t.replace(/^[^\w']+|[^\w']+$/g, "")).filter(Boolean);
for (let i = 0; i < tokens.length; i++) {
if (!WEAK_CORRECTION_TOKENS.has(tokens[i])) continue;
const lo = Math.max(0, i - WEAK_WINDOW);
const hi = Math.min(tokens.length - 1, i + WEAK_WINDOW);
for (let j = lo; j <= hi; j++) {
if (j === i) continue;
if (NEGATION_RE.test(tokens[j])) return true;
}
}
return false;
}
// Canonical error codes. Surface text → code mapping below.
const ERROR_CODES = new Set([
"ENOENT", "ECONNREFUSED", "ETIMEDOUT", "EACCES", "EPERM", "EADDRINUSE",
"ENOTFOUND", "EISDIR", "ENOTDIR", "EEXIST", "EMFILE", "EPIPE", "ECONNRESET"
]);
const ERROR_CODE_RE = /\b(ENOENT|ECONNREFUSED|ETIMEDOUT|EACCES|EPERM|EADDRINUSE|ENOTFOUND|EISDIR|ENOTDIR|EEXIST|EMFILE|EPIPE|ECONNRESET)\b/;
// Phrase → code mapping. First match wins; order matters.
const ERROR_PHRASE_MAP = [
[/no such file or directory/i, "ENOENT"],
[/connection refused/i, "ECONNREFUSED"],
[/permission denied/i, "EACCES"],
[/address already in use/i, "EADDRINUSE"],
[/connection reset/i, "ECONNRESET"],
[/operation timed out/i, "ETIMEDOUT"],
[/name resolution|getaddrinfo/i, "ENOTFOUND"],
];
function normalizeErrorText(text) {
if (!text || typeof text !== "string") return "";
let s = text;
// ISO timestamps first (contain digits we'd otherwise strip individually).
s = s.replace(/\d{4}-\d{2}-\d{2}T[\d:.Z+-]+/g, " ");
// Windows paths.
s = s.replace(/[A-Z]:\\[^\s]+/g, " ");
// Absolute POSIX paths.
s = s.replace(/\/[^\s:]+/g, " ");
// Hex addresses.
s = s.replace(/0x[0-9a-f]+/gi, " ");
// Unix epoch (seconds or ms): 10-13 digit runs.
s = s.replace(/\b\d{10,13}\b/g, " ");
// Line/col refs.
s = s.replace(/:\d+(?::\d+)?/g, " ");
// UUIDs.
s = s.replace(/\b[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}\b/gi, " ");
// Large integers (>6 digits) that survived above.
s = s.replace(/\b\d{7,}\b/g, " ");
// Lowercase + collapse whitespace.
s = s.toLowerCase().replace(/\s+/g, " ").trim();
return s.slice(0, 80);
}
const ERROR_RE = /\b(error|failed|exception|traceback|denied|cannot|unable to|not found|undefined|nullpointer|typeerror|syntaxerror|panic|fatal|enoent|econnrefused|etimedout|eaccess|segfault|crashed|uncaught)\b/i;
const BUILD_RE = /\b(build|compile|make|gradle|cargo|tsc|webpack|vite|rollup|pytest|jest|mocha|vitest|go\s+test|npm\s+test|yarn\s+test|npm\s+run\s+build|yarn\s+build|ctest|ninja|bazel)\b/i;
const EDIT_TOOLS = new Set(["Edit", "Write", "MultiEdit", "NotebookEdit"]);
const READ_ONLY_TOOLS = new Set([
"Read", "Grep", "Glob", "ToolSearch", "WebFetch", "WebSearch",
"mcp__filepuff__file_read", "mcp__filepuff__file_search",
"mcp__filepuff__find_definition", "mcp__filepuff__find_references",
"mcp__filepuff__ast_query", "mcp__filepuff__symbol_at", "mcp__filepuff__ping",
]);
const WINDOW_SIZE = 10;
const RETRY_THRESHOLD = 3;
const AGENT_RESPAWN_THRESHOLD = 2;
@@ -28,6 +102,22 @@ const DEAD_END_THRESHOLD = 8;
const EDIT_CHURN_THRESHOLD = 4;
const BUILD_LOOP_THRESHOLD = 2;
const SUBAGENT_DISPATCH_THRESHOLD = 3;
const CORRECTION_FREE_THRESHOLD = 5;
const CLEAN_RECOVERY_WINDOW = 3;
const SILENT_DRIFT_THRESHOLD = 5;
const FILE_REREAD_THRESHOLD = 3;
const ERROR_AFTER_RECOVERY_WINDOW = 5;
const RECENT_RECOVERIES_MAX = 3;
const STRUGGLE_TYPES = new Set([
"tool_error_loop", "dead_end", "retry_loop", "weak_agent",
"edit_churn", "build_loop", "silent_drift", "error_after_recovery",
"file_reread",
]);
const ACTIVE_SKILLS_LOOKBACK = 10;
const TASK_TOOL_MIN = 5;
const TASK_DIVERSITY_MIN = 3;
const CONTEXT_RING_SIZE = 8;
const CONTEXT_EXCERPT_LEN = 200;
const STATE_MAX_BYTES = 1_000_000;
function safeRead(path, fallback) {
@@ -38,14 +128,69 @@ function safeWrite(path, obj) {
try { writeFileSync(path, JSON.stringify(obj)); } catch {}
}
function rotateIfLarge(path, max) {
// ISO-8601 week: returns { year, week } for a Date (UTC).
// Week 1 = the week containing the first Thursday of the year (Monday-based weeks).
function isoWeek(date) {
const d = new Date(Date.UTC(date.getUTCFullYear(), date.getUTCMonth(), date.getUTCDate()));
// Shift to Thursday in current week (ISO week-numbering year tracks the Thursday).
const day = d.getUTCDay() || 7; // 1..7, Mon=1..Sun=7
d.setUTCDate(d.getUTCDate() + 4 - day);
const isoYear = d.getUTCFullYear();
const yearStart = new Date(Date.UTC(isoYear, 0, 1));
const week = Math.ceil((((d - yearStart) / 86400000) + 1) / 7);
return { year: isoYear, week };
}
function isoWeekTag(date) {
const { year, week } = isoWeek(date);
return `${year}-W${String(week).padStart(2, "0")}`;
}
function firstEntryTs(path) {
try {
if (existsSync(path) && statSync(path).size > max) {
mkdirSync(JOURNAL_DIR, { recursive: true });
const today = new Date().toISOString().slice(0, 10);
const dest = join(JOURNAL_DIR, `${today}-${Date.now()}.jsonl`);
renameSync(path, dest);
const buf = readFileSync(path, "utf8");
const nl = buf.indexOf("\n");
const firstLine = nl === -1 ? buf : buf.slice(0, nl);
if (!firstLine.trim()) return null;
const obj = JSON.parse(firstLine);
return obj && typeof obj.ts === "string" ? obj.ts : null;
} catch { return null; }
}
// Weekly ISO rotation + size safety fuse.
// - If active journal's first entry is in a different ISO week than now, rotate to
// journal/<that-entry's-iso-week>.jsonl and start fresh.
// - If active journal exceeds MAX_JOURNAL_BYTES, force-rotate even mid-week
// using the current ISO week tag (suffixed with timestamp to avoid clobber).
function rotateIfNeeded(path) {
try {
if (!existsSync(path)) return;
const size = statSync(path).size;
if (size === 0) return;
const now = new Date();
const currentTag = isoWeekTag(now);
const firstTs = firstEntryTs(path);
let rotate = false;
let destTag = null;
if (firstTs) {
const firstTag = isoWeekTag(new Date(firstTs));
if (firstTag !== currentTag) {
rotate = true;
destTag = firstTag;
}
}
if (!rotate && size > MAX_JOURNAL_BYTES) {
rotate = true;
destTag = `${currentTag}-${Date.now()}`; // safety-fuse: keep mid-week rotations unique
}
if (!rotate) return;
mkdirSync(JOURNAL_DIR, { recursive: true });
let dest = join(JOURNAL_DIR, `${destTag}.jsonl`);
if (existsSync(dest)) {
// Append-merge collision (rare: two mid-week safety-fuse rotations in same ms).
dest = join(JOURNAL_DIR, `${destTag}-${Date.now()}.jsonl`);
}
renameSync(path, dest);
} catch {}
}
@@ -59,7 +204,7 @@ function readStdin() {
}
function appendJournal(entry) {
rotateIfLarge(JOURNAL, STATE_MAX_BYTES * 5);
rotateIfNeeded(JOURNAL);
try {
appendFileSync(JOURNAL, JSON.stringify(entry) + "\n");
} catch {}
@@ -77,6 +222,31 @@ function readUsage(name) {
return usage[name] || 0;
}
function pushActivity(state, kind, name, ts) {
state.activity_ring.push({ kind, name, ts });
if (state.activity_ring.length > ACTIVE_SKILLS_LOOKBACK) state.activity_ring.shift();
}
function activeNames(state, kind) {
const seen = new Set();
for (const e of state.activity_ring) if (e.kind === kind) seen.add(e.name);
return [...seen];
}
function excerpt(text, len) {
if (!text || typeof text !== "string") return null;
return text.length > len ? text.slice(0, len) + "…" : text;
}
function pushContext(state, entry) {
state.context_ring.push(entry);
if (state.context_ring.length > CONTEXT_RING_SIZE) state.context_ring.shift();
}
function snapshotContext(state) {
return state.context_ring.length ? state.context_ring.slice() : undefined;
}
function errorFingerprint(toolResponse) {
if (!toolResponse) return null;
let text = "";
@@ -90,14 +260,34 @@ function errorFingerprint(toolResponse) {
}
if (!text) return null;
text = text.slice(0, 4000);
const isError = (toolResponse && toolResponse.is_error === true) || ERROR_RE.test(text);
// ERROR_RE fallback covers tools that omit `is_error` entirely (text-only
// responses, third-party tools). Explicit `is_error: false` is honored as-is
// — the regex is NOT used to second-guess a tool that already declared success.
const isError = toolResponse.is_error === true ||
(toolResponse.is_error === undefined && ERROR_RE.test(text));
if (!isError) return null;
const m = text.match(ERROR_RE);
const idx = m && typeof m.index === "number" ? m.index : 0;
const start = Math.max(0, idx - 20);
const slice = text.slice(start, start + 80).toLowerCase().replace(/\s+/g, " ").trim();
if (!slice) return null;
return djb2(slice);
// 1. Try canonical code (literal token first, then phrase mapping).
let code = null;
const codeMatch = text.match(ERROR_CODE_RE);
if (codeMatch && ERROR_CODES.has(codeMatch[1])) {
code = codeMatch[1];
} else {
for (const [re, mapped] of ERROR_PHRASE_MAP) {
if (re.test(text)) { code = mapped; break; }
}
}
// 2. When canonical code matched, the bucket key IS the code — residual
// surface text (ports, hostnames, syscall names) varies across instances
// of the same root cause, so we hash a fixed sentinel for stability.
// When no code matched, normalize residual and hash it for the raw bucket.
if (code) {
return `${code}:${djb2(code)}`;
}
const normalized = normalizeErrorText(text);
if (!normalized) return null;
return `raw:${djb2(normalized)}`;
}
function resetFrictionCounters(state) {
@@ -108,12 +298,23 @@ function resetFrictionCounters(state) {
state.edit_churn_emitted = {};
state.build_failure_count = 0;
state.build_loop_emitted = false;
state.silentDriftCounter = 0;
state.silentDriftEmitted = false;
}
function resetSessionLocal(state) {
resetFrictionCounters(state);
state.session_subagents = {};
state.subagent_dispatch_emitted = {};
state.correctionFreeCounter = 0;
state.recoveryWatch = null;
state.recentRecoveries = [];
state.session_post_count = 0;
state.tool_window = [];
state.context_ring = [];
state.task_tool_kinds = {};
state.task_tool_count = 0;
state.task_corrections = 0;
}
function ensureStateDefaults(state) {
@@ -127,12 +328,27 @@ function ensureStateDefaults(state) {
if (typeof state.build_loop_emitted !== "boolean") state.build_loop_emitted = false;
if (!state.session_subagents || typeof state.session_subagents !== "object") state.session_subagents = {};
if (!state.subagent_dispatch_emitted || typeof state.subagent_dispatch_emitted !== "object") state.subagent_dispatch_emitted = {};
if (typeof state.correctionFreeCounter !== "number") state.correctionFreeCounter = 0;
if (state.recoveryWatch === undefined) state.recoveryWatch = null;
if (!Array.isArray(state.activity_ring)) state.activity_ring = [];
if (!state.task_tool_kinds || typeof state.task_tool_kinds !== "object") state.task_tool_kinds = {};
if (typeof state.task_tool_count !== "number") state.task_tool_count = 0;
if (typeof state.task_corrections !== "number") state.task_corrections = 0;
if (typeof state.silentDriftCounter !== "number") state.silentDriftCounter = 0;
if (typeof state.silentDriftEmitted !== "boolean") state.silentDriftEmitted = false;
if (!Array.isArray(state.recentRecoveries)) state.recentRecoveries = [];
if (typeof state.session_post_count !== "number") state.session_post_count = 0;
if (!Array.isArray(state.context_ring)) state.context_ring = [];
}
function main() {
const input = readStdin();
if (!input || typeof input !== "object") return;
// Weekly rotation check at hook entry — ensures the active journal rolls over
// even if this invocation appends nothing.
rotateIfNeeded(JOURNAL);
const event = input.hook_event_name;
const session = input.session_id || "unknown";
const cwd = input.cwd || process.cwd();
@@ -147,7 +363,8 @@ function main() {
if (event === "UserPromptSubmit") {
const prompt = (input.prompt || "").slice(0, 200);
if (CORRECTION_RE.test(prompt)) {
pushContext(state, { event: "user", prompt: excerpt(prompt, CONTEXT_EXCERPT_LEN), ts });
if (isCorrection(prompt)) {
const last = state.tool_window[state.tool_window.length - 1] || {};
appendJournal({
ts, session, cwd, type: "correction",
@@ -155,16 +372,47 @@ function main() {
prev_tool: last.tool || null,
prev_file: last.file || null,
});
state.correctionFreeCounter = 0;
state.task_corrections += 1;
} else {
state.correctionFreeCounter += 1;
if (state.correctionFreeCounter >= CORRECTION_FREE_THRESHOLD) {
appendJournal({
ts, session, cwd, type: "correction_free_streak",
streak: state.correctionFreeCounter,
active_skills: activeNames(state, "skill"),
active_agents: activeNames(state, "agent"),
});
state.correctionFreeCounter = 0;
}
}
// Evaluate prior task (work between previous UserPromptSubmit and this one).
const taskKinds = Object.keys(state.task_tool_kinds);
if (state.task_tool_count >= TASK_TOOL_MIN &&
taskKinds.length >= TASK_DIVERSITY_MIN &&
state.task_corrections === 0) {
appendJournal({
ts, session, cwd, type: "task_completed",
tool_count: state.task_tool_count,
tool_kinds: taskKinds,
active_skills: activeNames(state, "skill"),
active_agents: activeNames(state, "agent"),
});
}
state.task_tool_kinds = {};
state.task_tool_count = 0;
state.task_corrections = 0;
resetFrictionCounters(state);
} else if (event === "PreToolUse") {
const tool = input.tool_name;
if (tool === "Skill") {
const name = (input.tool_input && (input.tool_input.skill || input.tool_input.skill_name)) || "unknown";
bumpUsage(`skill:${name}`);
pushActivity(state, "skill", name, ts);
} else if (tool === "Agent") {
const name = (input.tool_input && (input.tool_input.subagent_type || input.tool_input.agent)) || "unknown";
bumpUsage(`agent:${name}`);
pushActivity(state, "agent", name, ts);
state.session_subagents[name] = (state.session_subagents[name] || 0) + 1;
const cumulative = readUsage(`agent:${name}`);
const sessionCount = state.session_subagents[name];
@@ -182,6 +430,34 @@ function main() {
const argsHash = djb2(JSON.stringify(input.tool_input || {}));
const file = (input.tool_input && (input.tool_input.file_path || input.tool_input.path)) || null;
const toolResponse = input.tool_response;
const respExcerpt = (() => {
if (!toolResponse) return null;
const text = typeof toolResponse === "string" ? toolResponse
: typeof toolResponse.content === "string" ? toolResponse.content
: null;
return excerpt(text, CONTEXT_EXCERPT_LEN);
})();
pushContext(state, {
event: "tool", tool, ts,
input_excerpt: excerpt(JSON.stringify(input.tool_input || {}), CONTEXT_EXCERPT_LEN),
response_excerpt: respExcerpt,
is_error: !!(toolResponse && toolResponse.is_error),
});
let struggleEmittedThisTurn = null;
const emit = (entry) => {
if (STRUGGLE_TYPES.has(entry.type)) {
entry.context_window = snapshotContext(state);
// Struggle signals carry the active skill set so the analyst can run
// skill-attribution sub-clustering (agents/adam.md §5b) and so silent_drift
// — whose primary cluster key IS active_skills[0] — clusters correctly.
if (entry.active_skills === undefined) entry.active_skills = activeNames(state, "skill");
struggleEmittedThisTurn = entry.type;
}
appendJournal(entry);
};
const windowEntry = { tool, argsHash, file };
if (tool === "Agent") {
const sub = (input.tool_input && (input.tool_input.subagent_type || input.tool_input.agent)) || "unknown";
@@ -189,17 +465,39 @@ function main() {
}
state.tool_window.push(windowEntry);
if (state.tool_window.length > WINDOW_SIZE) state.tool_window.shift();
state.session_post_count += 1;
const sameToolArgs = state.tool_window.filter(e => e.tool === tool && e.argsHash === argsHash).length;
if (sameToolArgs >= RETRY_THRESHOLD) {
appendJournal({ ts, session, cwd, type: "retry_loop", tool, count: sameToolArgs });
emit({ ts, session, cwd, type: "retry_loop", tool, count: sameToolArgs });
}
// Offset-aware same-file reread: consecutive Reads of the same file_path
// (ignoring offset/limit) escape the argsHash-based retry_loop dedup above.
// Emit a distinct, actionable signal instead of leaking into tool_error_loop.
if (READ_ONLY_TOOLS.has(tool) && file) {
const sameFileReads = state.tool_window.filter(e => e.tool === tool && e.file === file).length;
if (sameFileReads >= FILE_REREAD_THRESHOLD && sameToolArgs < RETRY_THRESHOLD) {
emit({ ts, session, cwd, type: "file_reread", tool, file, count: sameFileReads });
}
}
if (READ_ONLY_TOOLS.has(tool)) {
state.silentDriftCounter += 1;
if (state.silentDriftCounter >= SILENT_DRIFT_THRESHOLD && !state.silentDriftEmitted) {
emit({ ts, session, cwd, type: "silent_drift", read_count: state.silentDriftCounter, last_tool: tool });
state.silentDriftEmitted = true;
}
} else {
state.silentDriftCounter = 0;
state.silentDriftEmitted = false;
}
if (tool === "Agent") {
const subagent = (input.tool_input && (input.tool_input.subagent_type || input.tool_input.agent)) || "unknown";
const recent = state.tool_window.slice(-5).filter(e => e.tool === "Agent" && e.subagent === subagent).length;
if (recent >= AGENT_RESPAWN_THRESHOLD) {
appendJournal({ ts, session, cwd, type: "weak_agent", subagent_type: subagent, count: recent });
emit({ ts, session, cwd, type: "weak_agent", subagent_type: subagent, count: recent });
}
}
@@ -210,18 +508,35 @@ function main() {
const fp = errorFingerprint(input.tool_response);
if (fp) {
bumpUsage("payload:tool_response_error_seen");
if (state.recentRecoveries.length) {
const keep = [];
for (const rec of state.recentRecoveries) {
const tools_since = state.session_post_count - rec.emitted_at_count;
if (tools_since > ERROR_AFTER_RECOVERY_WINDOW) continue;
if (Array.isArray(rec.fps) && rec.fps.includes(fp)) {
emit({
ts, session, cwd, type: "error_after_recovery",
recovered_from: rec.recovered_from, original_fp: fp,
tools_since_recovery: tools_since,
});
continue;
}
keep.push(rec);
}
state.recentRecoveries = keep;
}
state.last_errors.push({ tool, fp });
if (state.last_errors.length > ERROR_RING_SIZE) state.last_errors.shift();
const sameError = state.last_errors.filter(e => e.fp === fp).length;
if (sameError >= ERROR_LOOP_THRESHOLD) {
appendJournal({ ts, session, cwd, type: "tool_error_loop", tool, count: sameError, fp });
emit({ ts, session, cwd, type: "tool_error_loop", tool, count: sameError, fp });
}
}
if (file && EDIT_TOOLS.has(tool)) {
state.edit_counts[file] = (state.edit_counts[file] || 0) + 1;
if (state.edit_counts[file] >= EDIT_CHURN_THRESHOLD && !state.edit_churn_emitted[file]) {
appendJournal({ ts, session, cwd, type: "edit_churn", file, count: state.edit_counts[file] });
emit({ ts, session, cwd, type: "edit_churn", file, count: state.edit_counts[file] });
state.edit_churn_emitted[file] = true;
}
const keys = Object.keys(state.edit_counts);
@@ -239,7 +554,7 @@ function main() {
if (isBuildCmd && hasError) {
state.build_failure_count += 1;
if (state.build_failure_count >= BUILD_LOOP_THRESHOLD && !state.build_loop_emitted) {
appendJournal({ ts, session, cwd, type: "build_loop", count: state.build_failure_count, command: cmd.slice(0, 80) });
emit({ ts, session, cwd, type: "build_loop", count: state.build_failure_count, command: cmd.slice(0, 80) });
state.build_loop_emitted = true;
}
}
@@ -247,13 +562,62 @@ function main() {
state.tools_since_user += 1;
if (state.tools_since_user >= DEAD_END_THRESHOLD && !state.dead_end_emitted) {
appendJournal({ ts, session, cwd, type: "dead_end", count: state.tools_since_user });
emit({ ts, session, cwd, type: "dead_end", count: state.tools_since_user });
state.dead_end_emitted = true;
}
state.task_tool_count += 1;
state.task_tool_kinds[tool] = (state.task_tool_kinds[tool] || 0) + 1;
if (struggleEmittedThisTurn) {
state.recoveryWatch = {
recovered_from: struggleEmittedThisTurn,
since_ts: ts,
clean_count: 0,
window_tools: [],
watched_fps: state.last_errors.map(e => e.fp),
};
} else if (state.recoveryWatch) {
const turnHadError = fp !== null;
if (turnHadError) {
state.recoveryWatch = null;
} else {
state.recoveryWatch.clean_count += 1;
state.recoveryWatch.window_tools.push(tool);
if (state.recoveryWatch.window_tools.length > CLEAN_RECOVERY_WINDOW) state.recoveryWatch.window_tools.shift();
if (state.recoveryWatch.clean_count >= CLEAN_RECOVERY_WINDOW) {
appendJournal({
ts, session, cwd, type: "clean_recovery",
recovered_from: state.recoveryWatch.recovered_from,
recovery_window_tools: state.recoveryWatch.window_tools.slice(),
active_skills: activeNames(state, "skill"),
active_agents: activeNames(state, "agent"),
});
state.recentRecoveries.push({
recovered_from: state.recoveryWatch.recovered_from,
fps: state.recoveryWatch.watched_fps || [],
emitted_at_count: state.session_post_count,
});
if (state.recentRecoveries.length > RECENT_RECOVERIES_MAX) state.recentRecoveries.shift();
state.recoveryWatch = null;
}
}
}
}
safeWrite(STATE, state);
}
try { main(); } catch {}
process.exit(0);
// Run main only when executed as a script, not when imported for tests.
// import.meta.url comparison is the standard ESM idiom.
const isMain = (() => {
try {
return import.meta.url === `file://${process.argv[1]}`;
} catch { return true; }
})();
if (isMain) {
try { main(); } catch {}
process.exit(0);
}
export { errorFingerprint, normalizeErrorText, isCorrection };
+239 -39
View File
@@ -1,50 +1,250 @@
#!/usr/bin/env bash
# ADAM installer — pure bash + git + curl + jq.
# Idempotent. Safe for upgrades. Supports `curl | bash` via auto-clone.
#
# Usage:
# ./install.sh # local install from cwd
# curl -fsSL <raw>/install.sh | bash
# VERSION=v0.3.0 ./install.sh # pin a tag
# ./install.sh --yes # skip settings.json prompt
# ./install.sh --dry-run # show actions, write nothing
set -euo pipefail
REPO_GIT="https://github.com/lukaszraczylo/claude-adam.git"
DEST="${HOME}/.claude"
SRC="$(cd "$(dirname "$0")" && pwd)"
ASSUME_YES=0
DRY_RUN=0
VERSION="${VERSION:-${BRANCH:-}}" # env var pin; empty = latest tag
echo "ADAM installer"
echo " source: $SRC"
echo " dest: $DEST"
echo
log() { printf ' %s\n' "$*"; }
warn() { printf ' ! %s\n' "$*" >&2; }
die() { printf ' ! %s\n' "$*" >&2; exit 1; }
run() { if [ "$DRY_RUN" = 1 ]; then printf ' [dry-run] %s\n' "$*"; else eval "$@"; fi; }
if [ ! -d "$DEST" ]; then
echo " ! $DEST does not exist. Is Claude Code installed?"
exit 1
# --------------------------------------------------------------------- args
for arg in "$@"; do
case "$arg" in
--yes|-y) ASSUME_YES=1 ;;
--dry-run|-n) DRY_RUN=1 ;;
--version=*) VERSION="${arg#--version=}" ;;
--help|-h) sed -n '2,12p' "$0"; exit 0 ;;
*) die "unknown arg: $arg (try --help)" ;;
esac
done
# --------------------------------------------------------------------- prereqs
need() { command -v "$1" >/dev/null 2>&1 || die "missing: $1$2"; }
need git "install: brew install git || apt install git"
need curl "install: brew install curl || apt install curl"
need jq "install: brew install jq || apt install jq"
command -v node >/dev/null 2>&1 || warn "node not found — hooks need node 18+; install: brew install node"
# --------------------------------------------------------------------- locate source
# If invoked via `curl | bash`, $0 is bash and there are no local files.
PIPED=0
SCRIPT_PATH="${BASH_SOURCE[0]:-$0}"
if [ ! -f "$SCRIPT_PATH" ] || [ "$SCRIPT_PATH" = "bash" ] || [ "$SCRIPT_PATH" = "-" ]; then
PIPED=1
elif [ ! -d "$(dirname "$SCRIPT_PATH")/hooks" ]; then
PIPED=1
fi
mkdir -p \
"$DEST/hooks" \
"$DEST/agents" \
"$DEST/skills/adam-self-improvement" \
"$DEST/commands" \
"$DEST/adam/proposals" \
"$DEST/adam/applied" \
"$DEST/adam/rejected" \
"$DEST/adam/trash" \
"$DEST/adam/journal" \
"$DEST/adam/scripts" \
"$DEST/adam/tests/fixtures"
CLEANUP_TMP=""
cleanup() { [ -n "$CLEANUP_TMP" ] && rm -rf "$CLEANUP_TMP" 2>/dev/null || true; }
trap cleanup EXIT INT TERM
cp "$SRC/hooks/adam-observe.mjs" "$DEST/hooks/"
cp "$SRC/hooks/adam-nudge.mjs" "$DEST/hooks/"
cp "$SRC/agents/adam.md" "$DEST/agents/"
cp "$SRC/skills/adam-self-improvement/SKILL.md" "$DEST/skills/adam-self-improvement/"
cp "$SRC/commands/reflect.md" "$DEST/commands/"
cp "$SRC/adam/scripts/adam-archive.mjs" "$DEST/adam/scripts/"
cp "$SRC/adam/tests/run-tests.sh" "$DEST/adam/tests/"
cp "$SRC/adam/tests/fixtures/seed-corrections.jsonl" "$DEST/adam/tests/fixtures/"
if [ "$PIPED" = 1 ]; then
log "running via curl|bash — cloning repo to tmp"
CLEANUP_TMP="$(mktemp -d -t claude-adam-install.XXXXXX)"
REF="$VERSION"
if [ -z "$REF" ]; then
# latest semver tag from remote (no local clone needed)
REF="$(git ls-remote --tags --refs "$REPO_GIT" \
| awk -F/ '{print $NF}' \
| grep -E '^v[0-9]+\.[0-9]+\.[0-9]+$' \
| sort -V | tail -1)"
[ -z "$REF" ] && REF="main"
fi
log "fetching $REF"
run "git clone --quiet --depth=1 --branch=\"$REF\" \"$REPO_GIT\" \"$CLEANUP_TMP\""
SRC="$CLEANUP_TMP"
else
SRC="$(cd "$(dirname "$SCRIPT_PATH")" && pwd)"
fi
[ -f "$DEST/adam/journal.jsonl" ] || : > "$DEST/adam/journal.jsonl"
[ -f "$DEST/adam/state.json" ] || echo '{"cursor":0,"tool_window":[]}' > "$DEST/adam/state.json"
[ -f "$DEST/adam/usage.json" ] || echo '{}' > "$DEST/adam/usage.json"
log "ADAM installer"
log " source: $SRC"
log " dest: $DEST"
log " mode: $([ "$DRY_RUN" = 1 ] && echo dry-run || echo apply)$([ "$ASSUME_YES" = 1 ] && echo ' --yes' || true)"
log ""
echo " files installed."
echo
echo " next steps:"
echo " 1. bash $DEST/adam/tests/run-tests.sh # must show: 21 passed, 0 failed"
echo " 2. merge settings.json.example into $DEST/settings.json"
echo " 3. start a fresh Claude Code session, then run /reflect"
echo
echo " ADAM is dormant until you invoke /reflect."
[ -d "$DEST" ] || die "$DEST does not exist. Install Claude Code first: https://claude.com/claude-code"
# --------------------------------------------------------------------- dirs
DIRS=(
"hooks" "agents" "skills/adam-self-improvement" "commands"
"adam/proposals" "adam/applied" "adam/rejected" "adam/trash"
"adam/journal" "adam/scripts" "adam/tests/fixtures"
)
for d in "${DIRS[@]}"; do run "mkdir -p \"$DEST/$d\""; done
# .gitkeep markers so the layout survives `git init` for users who VCS ~/.claude
for d in adam/proposals adam/applied adam/rejected adam/trash adam/journal; do
[ -e "$DEST/$d/.gitkeep" ] || run ": > \"$DEST/$d/.gitkeep\""
done
# --------------------------------------------------------------------- file copy
# Conservative: if dest exists and differs from src AND user-modified after install,
# write to <file>.adam-new and warn instead of clobbering.
copy_file() {
local src="$1" dst="$2"
[ -f "$src" ] || die "missing source file: $src"
if [ -f "$dst" ] && ! cmp -s "$src" "$dst"; then
if [ -f "$DEST/adam/.install-marker" ] \
&& [ "$(stat -f %m "$dst" 2>/dev/null || stat -c %Y "$dst")" \
-gt "$(stat -f %m "$DEST/adam/.install-marker" 2>/dev/null || stat -c %Y "$DEST/adam/.install-marker")" ]; then
warn "modified locally, NOT overwriting: $dst"
warn " new version written to: $dst.adam-new (review and merge manually)"
run "cp \"$src\" \"$dst.adam-new\""
return
fi
fi
run "cp \"$src\" \"$dst\""
log " copied: ${dst#$HOME/}"
}
# Hooks
copy_file "$SRC/hooks/adam-observe.mjs" "$DEST/hooks/adam-observe.mjs"
copy_file "$SRC/hooks/adam-nudge.mjs" "$DEST/hooks/adam-nudge.mjs"
# Agent / skill / command
copy_file "$SRC/agents/adam.md" "$DEST/agents/adam.md"
copy_file "$SRC/skills/adam-self-improvement/SKILL.md" "$DEST/skills/adam-self-improvement/SKILL.md"
copy_file "$SRC/commands/reflect.md" "$DEST/commands/reflect.md"
# Adam internals
copy_file "$SRC/adam/scripts/adam-archive.mjs" "$DEST/adam/scripts/adam-archive.mjs"
copy_file "$SRC/adam/scripts/adam-upgrade.mjs" "$DEST/adam/scripts/adam-upgrade.mjs"
# v0.3.3 helper scripts — invoked from SKILL.md / hooks / analyst flow
for _adam_script in adam-utils adam-window adam-explain adam-nudge-eligibility adam-cooldown \
adam-score adam-ab-measure adam-apply-reinforcement adam-batch adam-rollback; do
copy_file "$SRC/adam/scripts/${_adam_script}.mjs" \
"$DEST/adam/scripts/${_adam_script}.mjs"
run "chmod +x \"$DEST/adam/scripts/${_adam_script}.mjs\""
done
run "chmod +x \"$DEST/adam/scripts/adam-upgrade.mjs\""
copy_file "$SRC/adam/tests/run-tests.sh" "$DEST/adam/tests/run-tests.sh"
copy_file "$SRC/adam/tests/fixtures/seed-corrections.jsonl" "$DEST/adam/tests/fixtures/seed-corrections.jsonl"
# Preserve user data — never overwrite
[ -f "$DEST/adam/journal.jsonl" ] || run ": > \"$DEST/adam/journal.jsonl\""
[ -f "$DEST/adam/state.json" ] || run "echo '{\"tool_window\":[]}' > \"$DEST/adam/state.json\""
[ -f "$DEST/adam/usage.json" ] || run "echo '{}' > \"$DEST/adam/usage.json\""
# install marker — used by future runs to detect local mtime drift
run "touch \"$DEST/adam/.install-marker\""
# version marker — records the installed release tag for the update notifier
# (adam-nudge.mjs compares it against the latest GitHub release).
ADAM_VERSION=""
if [ -n "$VERSION" ]; then
ADAM_VERSION="$VERSION"
elif [ "$PIPED" = 1 ] && [ -n "${REF:-}" ]; then
ADAM_VERSION="$REF"
else
ADAM_VERSION="$(git -C "$SRC" describe --tags --abbrev=0 2>/dev/null || true)"
fi
[ -z "$ADAM_VERSION" ] && ADAM_VERSION="unknown"
run "printf '%s\\n' \"$ADAM_VERSION\" > \"$DEST/adam/.version\""
log " version marker: $ADAM_VERSION"
# --------------------------------------------------------------------- settings.json
SETTINGS="$DEST/settings.json"
EXAMPLE="$SRC/settings.json.example"
[ -f "$EXAMPLE" ] || die "missing $EXAMPLE"
# Build target settings via jq merge (preserves all user keys/hooks).
TMP_NEW="$(mktemp -t adam-settings.XXXXXX)"
TMP_DIFF="$(mktemp -t adam-settings-diff.XXXXXX)"
cleanup_full() { cleanup; rm -f "$TMP_NEW" "$TMP_DIFF" 2>/dev/null || true; }
trap cleanup_full EXIT INT TERM
if [ -f "$SETTINGS" ]; then
jq --slurpfile add "$EXAMPLE" '
. as $cur
| ($add[0].hooks // {}) as $new
| .hooks = (
($cur.hooks // {}) as $cur_hooks
| reduce ($new | keys[]) as $k ($cur_hooks;
.[$k] = (
((.[$k] // []) + $new[$k])
| unique_by(tojson)
)
)
)
' "$SETTINGS" > "$TMP_NEW"
else
jq 'del(._comment)' "$EXAMPLE" > "$TMP_NEW"
fi
if [ -f "$SETTINGS" ] && cmp -s "$SETTINGS" "$TMP_NEW"; then
log "settings.json already wired — no changes"
else
log ""
log "settings.json changes proposed:"
if [ -f "$SETTINGS" ]; then
diff -u "$SETTINGS" "$TMP_NEW" > "$TMP_DIFF" || true
else
diff -u /dev/null "$TMP_NEW" > "$TMP_DIFF" || true
fi
sed 's/^/ /' "$TMP_DIFF"
log ""
if [ "$ASSUME_YES" = 1 ]; then
REPLY=y
else
printf ' apply settings.json changes? [y/N] '
read -r REPLY </dev/tty || REPLY=n
fi
case "$REPLY" in
y|Y|yes|YES)
[ -f "$SETTINGS" ] && run "cp \"$SETTINGS\" \"$SETTINGS.adam-bak.$(date +%s)\""
run "mv \"$TMP_NEW\" \"$SETTINGS\""
log " settings.json updated (backup at *.adam-bak.<ts> if pre-existing)"
;;
*)
log " skipped — wire entries from $EXAMPLE manually"
;;
esac
fi
# --------------------------------------------------------------------- summary
log ""
log "installed:"
log " hooks/adam-observe.mjs, hooks/adam-nudge.mjs"
log " agents/adam.md"
log " skills/adam-self-improvement/SKILL.md"
log " commands/reflect.md"
log " adam/scripts/adam-archive.mjs"
log " adam/scripts/adam-upgrade.mjs"
log " adam/tests/run-tests.sh"
log ""
log "preserved (if existed):"
log " adam/journal.jsonl, adam/state.json, adam/usage.json"
log ""
log "next:"
log " 1. bash $DEST/adam/tests/run-tests.sh # expect: all passed"
log " 2. start a fresh Claude Code session"
log " 3. run /reflect to invoke the analyst"
log ""
log "ADAM is dormant until you run /reflect."
log "journal: $DEST/adam/journal.jsonl"
log "proposals: $DEST/adam/proposals/"
# --------------------------------------------------------------------- pending merges
# If this upgrade left any `.adam-new` files behind, make the trap unmissable.
PENDING_COUNT=$(find "$DEST" \( -name .git -o -name node_modules -o -path "*/adam/journal" -o -path "*/adam/trash" -o -path "*/adam/proposals" -o -path "*/adam/applied" -o -path "*/adam/rejected" \) -prune -o -type f -name '*.adam-new' -print 2>/dev/null | wc -l | tr -d ' ')
if [ "${PENDING_COUNT:-0}" -gt 0 ]; then
log ""
warn "${PENDING_COUNT} file(s) need merge review."
warn " Review: node ~/.claude/adam/scripts/adam-upgrade.mjs --list"
warn " Accept: node ~/.claude/adam/scripts/adam-upgrade.mjs --accept <path>"
fi
+222 -18
View File
@@ -5,25 +5,136 @@ description: Use when the user types /reflect, asks "what has adam learned", ask
# adam-self-improvement
You are about to drive a review session for ADAM, the self-improvement layer. You operate in the **main thread** with the user present. The `adam` subagent does the heavy analysis; you orchestrate.
## When to invoke
- User types `/reflect`
- User types `/reflect --explain` (same flow, but the analyst's clustering trace is shown to the user — see §2b below)
- User asks: "what has adam learned", "any proposals", "review the queue"
- SessionStart nudge said proposals are pending and user wants to act on it
## Protocol
### 1. Dispatch the analyst
### 0. Parse flags
Use the Agent tool with `subagent_type: "adam"` and prompt:
Check the slash-command argument string for the literal token `--explain`. Set `explain=true` when present; otherwise `explain=false`. Unknown flags: print one-line warning, continue with `explain=false`. This single flag is the only argument `/reflect` currently accepts.
### 1. Pre-filter the journal (window + exclusion) + score
Before dispatching the analyst, run the windowed-journal filter:
```bash
node ~/.claude/adam/scripts/adam-window.mjs --home ~/.claude > /tmp/adam-windowed-journal.jsonl 2> /tmp/adam-windowed-journal.log
```
The script reads the active journal plus all rotated journal files (new
`journal/YYYY-Www.jsonl` weekly format AND legacy
`journal/YYYY-MM-DD-<ts>.jsonl` size-rotated format are both supported), applies
per-signal-type sliding windows (see `SIGNAL_WINDOWS_DAYS` in
`adam-window.mjs`), and drops entries already actioned via
`applied/*.md` / `rejected/*.md` frontmatter `source_entries`.
If `adam-window.mjs` exits non-zero: log the stderr file to the user, fall
through to passing the raw `~/.claude/adam/journal.jsonl` path to the agent
(graceful degradation — the agent's manual excluded-timestamps logic still
filters actioned entries; only the freshness window is lost).
Then run the scoring pre-step on the same windowed journal:
```bash
node ~/.claude/adam/scripts/adam-score.mjs --input /tmp/adam-windowed-journal.jsonl > /tmp/adam-scores.json 2> /tmp/adam-scores.log
```
This produces a per-session `dampener` (0.5 / 0.75 / 1.0 based on
`task_completed_count`) and a `reinforcement_candidates` list (skills cited by
≥3 clean `task_completed` events). The analyst uses both — see
`agents/adam.md` §"Scoring: task_completed dampener". If the score step fails,
log stderr to the user and pass an empty `{"sessions":[],"reinforcement_candidates":[]}`
to the analyst (dampener defaults to 1.0).
Finally, run the A/B measurement pre-step on any previously auto-applied
proposals (see §3 ab-tracking write):
```bash
node ~/.claude/adam/scripts/adam-ab-measure.mjs --home ~/.claude --format json > /tmp/adam-ab-regressions.json 2> /tmp/adam-ab-regressions.log
```
The JSON output is an array of A/B delta objects (`pre_count`, `post_count`,
`delta_pct`, `status` ∈ {`improved`,`neutral`,`regressed`,`no_baseline`,`pending`}).
Filter to `status == "regressed"` before passing to the analyst as
`ab_regressions`. The analyst is required (see `agents/adam.md` §"A/B
effectiveness") to surface a `## Regressions` section at the top of its output
when this list is non-empty. If the script fails: log stderr, pass `[]`.
**Auto-rollback** (MOSS §3.5): if any entries have `status == "regressed"`, run the rollback script to auto-revert them before analyst dispatch:
```bash
node ~/.claude/adam/scripts/adam-rollback.mjs --auto --home ~/.claude > /tmp/adam-rollback-results.json 2> /tmp/adam-rollback.log
```
For each rolled-back proposal, print to user: `adam: rolled back "<proposal_id>" — regression detected (delta: <delta_pct>%)`. The rollback script moves the proposal from `applied/` back to `proposals/` with `rolled_back: true` and creates a regression nudge. If the script fails: log stderr, continue (rollback is best-effort).
**Evidence batching** (MOSS §3.1): pre-cluster the windowed journal into coherent failure batches:
```bash
node ~/.claude/adam/scripts/adam-batch.mjs --input /tmp/adam-windowed-journal.jsonl > /tmp/adam-batches.json 2> /tmp/adam-batch.log
```
This groups entries by (signal_type, cluster_key) and reports per-batch metadata including `has_context_window` (whether transcript evidence is attached). If the script fails: log stderr, pass `null` to the analyst (graceful degradation — analyst falls back to raw journal clustering).
### 2. Dispatch the analyst (two-stage pipeline)
MOSS §3.3: "A single prompt asked to diagnose, plan, implement, verify, and decide overloads context and produces lower-quality output than a sequenced flow." The analyst is dispatched in two stages with a validation gate between them.
**Stage 1 — Diagnose + Plan**: Use the Agent tool with `subagent_type: "adam"` and prompt:
```
Run a single analysis pass.
stage=diagnose
Read the batched journal entries, cluster by signal type, diagnose root causes,
plan fix types, and score the keypoint matrix. Write diagnoses to /tmp/adam-diagnoses.json.
Do NOT draft proposal files.
Inputs:
- journal_path: ~/.claude/adam/journal.jsonl
- windowed_journal_path: /tmp/adam-windowed-journal.jsonl
- batches_path: /tmp/adam-batches.json # pre-clustered evidence batches
- scores_path: /tmp/adam-scores.json
- ab_regressions_path: /tmp/adam-ab-regressions.json
- journal_path: ~/.claude/adam/journal.jsonl # raw — fallback only
- state_path: ~/.claude/adam/state.json
- usage_path: ~/.claude/adam/usage.json
- applied_dir: ~/.claude/adam/applied/
- rejected_dir: ~/.claude/adam/rejected/
- transcripts_root: ~/.claude/projects/
- skills_root: ~/.claude/skills/
Use batches_path for pre-clustered evidence when available. Prefer context_window
fields in journal entries over transcript file lookups. Write /tmp/adam-diagnoses.json
per the "Diagnose-stage output format" in your system prompt.
```
Wait for return.
**Inter-stage validation** (§2a): after stage 1 returns, read `/tmp/adam-diagnoses.json` and validate each diagnosis:
1. Every `source_entries` timestamp exists in the windowed journal (read `/tmp/adam-windowed-journal.jsonl`, check timestamps match).
2. Every diagnosis has all four fields (`trigger`, `action`, `mismatch`, `outcome`).
3. The planned `type` is a valid proposal type.
4. Remove diagnoses that fail validation — log a one-line warning per removal.
If all diagnoses are removed or the file is missing/empty, print "adam: no valid diagnoses — nothing to implement" and skip to §6.
**Stage 2 — Implement**: Use the Agent tool with `subagent_type: "adam"` and prompt:
```
stage=implement
Read the validated diagnoses and draft full proposal files.
Inputs:
- diagnoses_path: /tmp/adam-diagnoses.json # validated stage-1 output
- windowed_journal_path: /tmp/adam-windowed-journal.jsonl
- scores_path: /tmp/adam-scores.json
- ab_regressions_path: /tmp/adam-ab-regressions.json
- state_path: ~/.claude/adam/state.json
- usage_path: ~/.claude/adam/usage.json
- proposals_dir: ~/.claude/adam/proposals/
@@ -32,26 +143,98 @@ Inputs:
- transcripts_root: ~/.claude/projects/
- skills_root: ~/.claude/skills/
Follow your system prompt exactly. Emit a single JSON punch list as your final message.
Draft proposal files to proposals_dir/ for each diagnosis. Score against the
confidence rubric. Emit the clustering trace and punch list as your final message.
```
Wait for return.
### 2. Auto-apply high-confidence items
### 2b. Persist and render the clustering trace
The analyst's final message always contains a fenced ` ```trace ` block (per `agents/adam.md` §"Clustering trace (always emit)") immediately before its punch-list JSON line.
1. Extract the trace block. If it is missing, print a one-line warning to the user (`adam: trace block missing from agent output — proceeding without observability`) and continue; do not block on this.
2. ALWAYS write the trace verbatim (without the surrounding fences) to `~/.claude/adam/last-trace.txt` (overwrite each run). This persists for retrospection via `node ~/.claude/adam/scripts/adam-explain.mjs`.
3. Extract the `SUMMARY:` line from the trace. ALWAYS display it as a one-line status to the user BEFORE the proposals are listed, e.g. `clustering: <SUMMARY line>`. This single-line status is shown in both `--explain` and default modes.
4. If `explain=true` (from §0): ALSO render the full trace block back to the user as a fenced code block (` ```text `` ``` `) under a header `Clustering trace:`. If `explain=false`: SUPPRESS the cluster-line body from the user-visible output (the SUMMARY line is already shown in step 3).
The user can re-render any past trace at any time via:
```bash
node ~/.claude/adam/scripts/adam-explain.mjs --mode summary # SUMMARY + per-decision counts
node ~/.claude/adam/scripts/adam-explain.mjs --mode full # verbatim trace + rejection histogram
node ~/.claude/adam/scripts/adam-explain.mjs --mode json # machine-readable
```
### 3. Pre-apply verification gate (MOSS §3.4)
MOSS §3.4: "Verification must therefore be runtime, on a production-equivalent environment, and against the same prompts that produced the failure evidence." Before auto-applying, verify each proposal deterministically:
For each id in `high_confidence`:
- Read the proposal file from `~/.claude/adam/proposals/<id>-*.md`.
- Verify in front of the user: print `id`, `target`, `confidence`, `blast_radius`, `cross_session_evidence`, `auto_apply_eligible`.
- **Verification checks** (all must pass for auto-apply to proceed):
1. **Source entries exist**: every timestamp in `source_entries` frontmatter must appear in `/tmp/adam-windowed-journal.jsonl`. If any are missing, the evidence is stale or was already actioned — demote to `queued`.
2. **Diagnosis grounded**: the `# Diagnosis` section must have all four fields (Trigger, Action, Mismatch, Outcome) with ≥1 backtick-wrapped quote. If malformed, demote to `queued`.
3. **Type-evidence match**: the proposal `type` must match what the evidence supports:
- `correction` signals → `memory`, `skill_new`, `skill_edit` (not `nudge`)
- `dead_end` signals → `nudge`, `skill_new`, `skill_edit` (not `memory`)
- `tool_error_loop` signals → `memory`, `skill_new`, `skill_edit`
- `harness_edit` → must cite harness-level evidence (false negative, scoring bias, window miscalibration)
If mismatch, demote to `queued`.
4. **No conflicting applied proposal**: grep `~/.claude/adam/applied/` for any proposal with the same `target` applied in the last 7 days. If found, demote to `queued` (prevents stacking rapid edits).
- Print verification result: `verified: <id> (4/4 checks passed)` or `demoted: <id> (failed: <check_name>)`.
- Demoted proposals are moved from `high_confidence` to `queued` for manual review.
### 3a. Apply verified high-confidence items
For each id that passed verification:
- Print `id`, `target`, `confidence`, `blast_radius`, `cross_session_evidence`, `auto_apply_eligible`.
- Apply the change:
- **For `skill_new`**: `mkdir -p ~/.claude/skills/<slug>/`, then `Write` the proposal's `# Proposed change` body to `~/.claude/skills/<slug>/SKILL.md`. After write, print: "skill `<slug>` written to `~/.claude/skills/<slug>/SKILL.md` — activates immediately — Claude Code v2.1.0+ auto-hot-reloads user-level skills, no restart needed."
- **For `memory`**: `Write` the proposal's `# Proposed change` body (which MUST include the auto-memory frontmatter — see "Memory drafting protocol" in `agents/adam.md`) to the path in `target` (under `~/.claude/projects/<encoded-home>/memory/`, where `<encoded-home>` is the user's home dir with `/` replaced by `-`, e.g. `-Users-alice` on macOS). Then update `MEMORY.md` index with a one-line pointer.
- **For other types under auto-apply**: apply via Write/Edit per `# Proposed change`. (Note: only `memory` and `skill_new` qualify for auto-apply per the rubric.)
- **For `memory`**: `Write` the proposal's `# Proposed change` body (which MUST include the auto-memory frontmatter — see "Memory drafting protocol" in `agents/adam.md`) to the path in `target`. Then update `MEMORY.md` index with a one-line pointer.
- **For `nudge`**: low-blast auto-apply path. Single-session evidence is sufficient — skip the cross-session gate. Append a new entry to `~/.claude/adam/active-nudges.json` (create the file with `[]` if absent) with shape `{kind, message, created_at: <now_ms>, expires_at_ts: <now_ms + 7*86400000>, max_displays: 3, displays_used: 0, source_session: <session_id from proposal>}`. Do NOT modify any skill, memory, agent, or CLAUDE.md. Tell user: "nudge queued — surfaces on next SessionStart in a different session (expires in 7 days)."
- **For `reinforcement`**: gated by `confidence >= 4 AND blast_radius == low` (same as memory). Apply by invoking the helper:
```bash
node ~/.claude/adam/scripts/adam-apply-reinforcement.mjs ~/.claude/adam/proposals/<id>-*.md --home ~/.claude
```
The helper reads the proposal frontmatter (`skill_slug`, `count`, `source_session`) and appends one JSON line to `~/.claude/adam/reinforcements.jsonl`. No code/memory/skill modifications. Output: `{"status":"applied"|"gated", ...}` — on `gated` leave proposal in `proposals/` (helper failed its own re-check), on `applied` continue to the archive step. Tell user: "reinforcement logged for `<skill_slug>` (count=<N>) — appended to reinforcements.jsonl."
- **For `skill_edit`**: enforce the apply-time gate before writing.
1. Verify proposal frontmatter has `auto_apply_eligible: true`. If not, abort and queue for review.
2. Read `target` SKILL.md, capture `current_bytes` from a fresh stat — do NOT trust frontmatter `bytes_before`.
3. Verify diff in `# Proposed change`:
- Unified-diff format.
- Zero `-` lines on existing SKILL.md content (additions only).
- Total `+` lines ≤ 30.
If any check fails, print one-line refusal reason, leave proposal in `proposals/`, continue.
4. Cooldown re-check: run `node ~/.claude/adam/scripts/adam-cooldown.mjs --skill <target_skill> --fingerprint <proposal_fingerprint>` (both fields come from proposal frontmatter; missing fingerprint → "legacy"). Refuse if the script returns `status: cooldown` OR `status: blacklisted`. This per-(skill, fingerprint) gate replaces the previous coarse per-skill scan — proposals for the same skill with a different fingerprint are NOT blocked by an older entry.
5. (covered by step 4 — blacklisted status is returned by `adam-cooldown.mjs` when `auto_apply_blacklist: true` is found in `rejected/` within 30 days for the same (skill, fingerprint))
6. Apply via `Edit` tool (append the new section per the diff). Never use `Write` on existing SKILL.md.
7. Re-stat target. If new size exceeds `2 * current_bytes` (captured in step 2), revert via `Edit` (remove the just-appended section) and refuse — print refusal reason.
8. Add `last_auto_edit: <iso8601 utc now>` to the proposal frontmatter before moving it.
9. Tell user: "skill `<slug>` extended (added <N> lines) — auto-applied via win-evidence gate."
- Move proposal to `~/.claude/adam/applied/<UTC-ts>-<id>.md`.
- **A/B tracking append** (skip for `reinforcement` — positive-only ledger, intentionally not A/B-tracked per `agents/adam.md` §"`reinforcement` proposals"): as a separate atomic step right after the move, append one JSON line to `~/.claude/adam/ab-tracking.jsonl` (create with empty contents if absent). Read fields from the proposal's frontmatter (`proposal_fingerprint`, `originating_signals` — both populated per `agents/adam.md`; `originating_signals` is a list of `{type, count, session_ids}` objects). Schema:
```json
{
"applied_at": <unix_ms now>,
"proposal_id": "<id>",
"proposal_type": "skill_edit|skill_new|memory|nudge",
"target_skill": "<slug or target basename>",
"proposal_fingerprint": "<hash>",
"originating_signals": [{"type":"<signal>","count":<N>,"session_ids":[...]}],
"pre_window_days": 7
}
```
This entry is consumed by `adam-ab-measure.mjs` on subsequent `/reflect` runs to compute pre/post signal-count deltas. See `agents/adam.md` §"A/B effectiveness". If the append fails (disk-full etc.) log a warning but do NOT abort the apply path — A/B is observability, not a gate.
- **Archive consumed journal entries**: `node ~/.claude/adam/scripts/adam-archive.mjs ~/.claude/adam/applied/<UTC-ts>-<id>.md` — moves entries listed in proposal's `source_entries` from `journal.jsonl` to `journal/actioned-<id>.jsonl` so subsequent `/reflect` runs do not re-cluster them.
Print: `auto-applied N proposals: [ids]`.
### 3. Walk the queue
### 4. Walk the queue
For each id in `queued`:
@@ -63,23 +246,33 @@ c. On **approve**:
- For `skill_new`: `mkdir -p ~/.claude/skills/<slug>/`, then write `# Proposed change` body to `<slug>/SKILL.md`. Tell user: "skill `<slug>` written — activates immediately (CC v2.1.0+ auto-hot-reload)."
- For `skill_edit`: apply the unified diff in `# Proposed change` to the existing SKILL.md at `target` (append-only — never replace existing content).
- For `memory`: write `# Proposed change` body (must include auto-memory frontmatter) to `target` and update `MEMORY.md` index with a one-line pointer.
- For `harness_edit` (MOSS §1): apply the unified diff to the target harness file. **Before applying**:
1. Run `bash ~/.claude/adam/tests/run-tests.sh` — capture pass count.
2. Apply the diff via `Edit`.
3. Run `bash ~/.claude/adam/tests/run-tests.sh` again — verify pass count is equal or higher and 0 failures.
4. If test regression: revert the edit, print "harness_edit reverted — test regression detected", leave proposal in `proposals/`.
5. If tests pass: tell user "harness edit applied to `<target>` — tests pass (<N> passed)."
- For all others: apply via Write/Edit per the proposal's `# Proposed change`.
- Move proposal to `~/.claude/adam/applied/<ts>-<id>.md`.
- Archive: `node ~/.claude/adam/scripts/adam-archive.mjs ~/.claude/adam/applied/<ts>-<id>.md`.
d. On **reject**: ask for reason in one line. Append `# Reason\n<reason>` to proposal body. Move to `~/.claude/adam/rejected/<id>.md`. Archive: `node ~/.claude/adam/scripts/adam-archive.mjs ~/.claude/adam/rejected/<id>.md`.
e. On **edit**: ask the user for the change, edit the proposal in place, then loop back to step 3a for that same id.
d. On **reject**: ask for reason in one line. Append `# Reason\n<reason>` to proposal body. If the proposal `type` is `skill_edit`, ALSO add `auto_apply_blacklist: true` to its frontmatter (so future reflects skip auto-apply on this target for 30 days). Move to `~/.claude/adam/rejected/<id>.md`. Archive: `node ~/.claude/adam/scripts/adam-archive.mjs ~/.claude/adam/rejected/<id>.md`.
e. On **edit**: ask the user for the change, edit the proposal in place, then loop back to step 4a for that same id.
### 4. Handle failures
### 5. Handle failures
If apply fails (file write error, target missing): leave proposal in `proposals/`, append `# Apply error\n<error>` to its body. Tell the user. Do not move it.
### 5. Summary
### 6. Summary
End with one block:
```
adam reflect summary:
observations processed: <new>
batches formed: <N>
diagnoses validated: <N>/<total>
rolled back (regression): <N>
verification passed: <N>/<total high_confidence>
auto-applied: <N>
approved: <N>
rejected: <N>
@@ -87,17 +280,28 @@ adam reflect summary:
failed: <N>
```
**Keypoint history**: after all proposals are processed, append one JSON line to `~/.claude/adam/keypoint-history.jsonl` with the aggregate keypoint scores from the diagnose stage:
```json
{"ts":"<iso>","session":"<session_id>","keypoints":{"tool_selection":N,"scope_discipline":N,"error_recovery":N,"first_attempt":N,"build_reliability":N},"proposals_emitted":N,"proposals_applied":N}
```
This builds a longitudinal record of which capabilities are improving across `/reflect` runs.
## Karpathy constraints (you must enforce on each apply)
Before writing any proposal:
- Confirm `# Assumptions` section is non-empty.
- Confirm `# Diagnosis` section exists and contains all four labelled lines (`Trigger:`, `Action:`, `Mismatch:`, `Outcome:`) AND at least one backtick-wrapped quote ≤80 chars in the Outcome line. Refuse if missing or malformed — agent must redraft per the "Diagnosis drafting protocol" in `agents/adam.md`.
- Confirm `# Success criterion` section is non-empty and runnable.
- Confirm change is ≤50 LOC for non-`skill_new`, or ≤80 LOC for `skill_new` body. If larger, ask the user once: "this proposal is N LOC — proceed?"
- For `claude_md_edit`: confirm 3+ distinct cwds in the `# Why` section.
- For `deletion`: confirm both criteria (a) and (b) from the agent's special handling are documented in the proposal.
- For `skill_new`: confirm the slug doesn't collide with any existing skill in `~/.claude/skills/`. If it does, refuse and ask user to rename.
- For `skill_edit`: confirm the diff is append-only (no `-` lines that remove existing content) and that target SKILL.md exists.
- For `memory`: confirm `# Proposed change` body starts with `---` frontmatter containing required fields `name`, `description`, `type`, `originSessionId`. Refuse if frontmatter missing — agent must redraft per the Memory drafting protocol.
- For `skill_edit`: confirm the diff is append-only (no `-` lines that remove existing content) and that target SKILL.md exists. When auto-applying, ALSO re-verify the eligibility gate steps in §3 (cooldown, blacklist, byte cap) before any `Edit` call — never trust frontmatter alone.
- For `skill_edit` with `auto_apply_eligible: true`: confirm `contradiction_flag` is absent or null in frontmatter. Refuse auto-apply if `contradiction_flag` is set with any non-empty value (treat the agent's flag as a hard veto on auto-apply; user can still manually approve in walk-the-queue if they disagree with the heuristic).
- For `memory`: confirm `# Proposed change` body starts with `---` frontmatter matching the live auto-memory schema — top-level `name` (the slug) + `description`, plus a `metadata:` block with `node_type: memory`, `type`, and `originSessionId`. Cross-check the shape against an existing file in the target memory dir. Refuse if frontmatter is flat (`type:`/`originSessionId:` at top level) or missing the `metadata:` block — agent must redraft per the Memory drafting protocol.
- For `harness_edit`: confirm `auto_apply_eligible: false` (never auto-apply). Confirm `confidence ≥ 5`. Confirm `# Test verification` section names the test command. Confirm diff is ≤30 LOC and targets a single allowed harness file (see `agents/adam.md` §"Harness self-modification"). Run test suite before AND after applying — revert on any regression.
- Confirm `source_entries` is present in proposal frontmatter as a non-empty list (used for archive). Warn (do not refuse) if missing — legacy proposals from before v0.2.0 won't have it.
If any check fails, refuse to apply and ask the user how to proceed.