Ranks skills by good:bad outcome co-occurrence (Wilson LB + lift vs
baseline) over the journal's active_skills payloads — the SkillsInjector
(arXiv 2605.29794) execution-grounded utility signal Δ(s), computed from
data already collected, no training.
- reuses adam-score NEGATIVE_SIGNAL_TYPES + entrySeverity (single source of truth)
- registered in install.sh helper-script copy loop
- /reflect pre-step surfaces worst below-baseline skills to the USER as
advisory (co-occurrence != causation; not fed to the analyst's proposal machinery)
- Test 119 added; full suite 141/141 green
adam-rollback.mjs's docstring always claimed it "removes the ab-tracking entry
(so it doesn't re-trigger)", but executeRollback() never did. Consequence: a
rolled-back proposal kept being re-detected as `regressed` on every subsequent
/reflect, which triggered endless `not_found` rollback attempts (the applied
file is already gone) and noisy ## Regressions sections.
executeRollback now deletes the matching ab-tracking.jsonl row by proposal_id
after the move, preserving all unrelated rows. Surfaced by running ADAM's own
/reflect loop a second time (two zombie regressions: 2026-05-16-002 and
2026-05-22-001).
Tests: 138 -> 140 (rollback purges the entry by id; an unrelated entry is
preserved).
Adds a lightweight "new release available" notice without auto-installing —
because re-running install.sh overwrites ADAM's own /reflect-applied skill
edits, so the user must choose when to take an update.
- install.sh writes ~/.claude/adam/.version (the installed release tag) on
every install. Derived from $VERSION / piped REF / `git describe --tags`.
- adam-nudge.mjs (SessionStart) compares .version against the latest GitHub
release at most once/day. Cached in ~/.claude/adam/.update-check.json; the
cache drives an instant nudge (no network on the hot path) and is refreshed
best-effort with a 1.5s AbortController cap. fetch unavailable / offline /
timeout / rate-limit / parse error all degrade to silent no-op. Opt out with
ADAM_NO_UPDATE_CHECK=1. main() is now async; never blocks SessionStart.
- README: "Staying up to date" section; pin example bumped to v0.6.3.
Tests: 134 -> 138. Notifier verified fully offline (cache-driven): nudges when
a newer release is cached, silent when current, suppressed by the opt-out env,
and no-ops when the .version marker is absent.
Two issues surfaced by running ADAM's /reflect loop on a large real journal
(4015 entries, 119 sessions) — both caused false/broken auto-apply behavior.
1. A/B over-reported regressions (adam-ab-measure.mjs).
Regressions were measured on RAW originating-signal counts pre vs post. On a
busy, growing journal almost every signal count rises post-apply regardless
of whether the proposal helped — so the loop flagged 9 false "regressions"
(and would auto-roll-back good proposals). Now the delta is computed on the
signal's SHARE of total activity (rate = count / window-total). Falls back to
the raw-count delta when the signal is the only activity in the window
(preserves prior behavior + all existing A/B tests). Output adds
raw_delta_pct, pre_total, post_total, normalized for transparency.
2. Memory frontmatter drift (agents/adam.md, SKILL.md).
The drafting protocol emitted flat `type:`/`originSessionId:` with a prose
`name`, but the live auto-memory store uses `name` = slug plus a
`metadata: {node_type, type, originSessionId}` block. Auto-applied memories
could fail to load/categorize. Protocol + apply-time validation now require
the live metadata.* schema and cross-checking against an existing file.
Tests: 132 -> 134. New: volume growth (raw +200%) with flat activity-share
classifies neutral, not regressed; a genuine share increase still classifies
regressed.
Proposed and approved through ADAM's own /reflect harness_edit loop (MOSS §1):
the analyst surfaced 23 tool_error_loop entries across 4 sessions whose context
windows were really redundant re-reads of one file.
retry_loop keys on argsHash of the full tool_input (including offset/limit), so
consecutive Reads of the SAME file at different offsets escaped dedup and leaked
into tool_error_loop fingerprints. The new file_reread signal catches them:
same file Read >=3x in the 10-event window, offset-agnostic (keyed on file
path), guarded by `sameToolArgs < RETRY_THRESHOLD` so byte-identical reads stay
with retry_loop (no double-count).
Fully wired end-to-end (not a half-dead signal):
- adam-observe.mjs: detection + STRUGGLE_TYPES membership (so it carries
context_window + active_skills like other struggle signals).
- adam-window.mjs: 14-day sliding window (task-local, like retry_loop).
- adam-score.mjs: severity divisor 3.
- adam-batch.mjs: file-basename clustering.
- agents/adam.md + README: signal tables, clustering rules, rubric, windows.
Tests: 126 -> 132 (file_reread fires on 3x offset-shifted reads, not on 2x;
byte-identical reads route to retry_loop not file_reread; carries context_window).
Full codebase review (multi-agent, adversarially verified) surfaced several
documented-but-dead mechanisms and doc/code drift. Fixes:
- adam-observe: struggle signals now emit `active_skills`, so silent_drift's
primary cluster key AND §5b skill-attribution sub-clustering (+1 rubric
bonus) actually fire — both were silently dead (no struggle signal carried
the field).
- adam-cooldown: new `--compute` CLI deterministically derives
proposal_fingerprint. The exported computeProposalFingerprint() was never
called and the analyst was told to hand-compute a djb2 hash it cannot
reproduce. Spec now mandates a *stable* cluster id so fingerprints reproduce
across /reflect runs. Removed one dead normalization line.
- spec: reinforcement proposals excluded from A/B tracking — agents/adam.md
contradicted itself (:376 included, :476 excluded); SKILL.md aligned.
- adam-nudge: PENDING_CHECK_PATHS now mirrors the full install set
(adam-utils / adam-batch / adam-rollback were missing).
- adam-explain: synthesized clustering summary carries `regressions: 0`
(structural consistency with parsed summaries).
- docs: test-count drift (87/94 -> 126) and "350-line hook" (-> ~600) fixed;
adam-score header documents severity_sum/severity_by_type; adam-batch §4
reference corrected.
Tests: +12 assertions (114 -> 126), all green. New regression tests cover the
active_skills fix and --compute, plus boundary gaps the review flagged:
retry_loop/weak_agent thresholds, A/B exact +/-25% deltas, cooldown 30d
blacklist edge.
- New signal types in hooks/adam-observe.mjs:
- silent_drift: 5 consecutive read-only PostToolUse without an action tool
- error_after_recovery: same error fingerprint returns within 5 events of clean_recovery
- Severity-weighted scoring in adam/scripts/adam-score.mjs:
- SEVERITY_DIVISORS exported per struggle signal type
- Per-session severity_sum + severity_by_type added to JSON output
- Skill-attribution clustering in agents/adam.md:
- Sub-cluster struggle signals on active_skills[0]
- New struggle-driven skill_edit variant (always queues, never auto-applies)
- Rubric updates:
- +1 for cluster severity-sum >= 10, additional +1 for >= 32
- +1 for skill-attributed sub-cluster naming an existing skill
- silent_drift + error_after_recovery added to struggle signal list
- Window: silent_drift 14d, error_after_recovery 30d
- Tests: 94 passing (78-82 new)
Backward compat: entries without count default to severity 1. Existing
win-driven skill_edit gate untouched. No journal migration.
The prior logo.svg used currentColor, which resolves to black when the
SVG is loaded via <img> on GitHub — making the logo invisible in dark
mode (the GitHub default for many users).
Fix uses GitHub's supported <picture> + prefers-color-scheme media-
source pattern in README:
- assets/logo-light.svg — explicit GitHub light-theme text color #24292f
- assets/logo-dark.svg — explicit GitHub dark-theme text color #f0f6fc
- assets/logo.svg — kept with embedded @media + currentColor for
standalone use (markmorph notes, anywhere
else the SVG is loaded outside <picture>)
README updates the <img> tag to a <picture> with media-conditioned
source so GitHub's renderer picks the right variant per theme.
Replaces the geometric-A-with-observation-dot with a softer, more
on-theme design: a swaddled-baby silhouette (rounded A-shape bundle),
face nestled inside, and the wrap-band extended past the bundle on
both sides as little hands. Maintains currentColor + zero external
assets; reads cleanly down to favicon size.
Ties the visual identity to the 'Story behind Adam' section: the
project is named after the author's son, and now the logo is too.
- New 'Story behind Adam' section at the top: the project is named after
the author's newborn son, whose observe-act-adjust-observe-again
learning loop is the methodology ADAM applies to LLM sessions.
- New SVG logo at assets/logo.svg: stylized 'A' with a captured
observation point inside the apex and a feedback crossbar. Uses
currentColor + gradient so it adapts to light/dark GitHub themes.
- Centered header block with project tagline + 5 badges (License,
Version, Tests, Node, Platform).
- New 'Highlights' section: 8 emoji-tagged one-liners covering the
v0.3.3 design pillars (zero LLM cost observation, A/B measurement,
sliding windows, observability, etc.).
- New 'How it works' ASCII pipeline diagram: observation -> analysis
pre-processors -> analyst -> review + apply.
- Signals table now includes per-signal sliding window column.
- Rubric section restructured: gates, modifiers (dampener), and
skill_edit-specific requirements clearly separated.
- New 'Inspecting the analyst's reasoning' section documenting
adam-explain.mjs + /reflect --explain.
- Layout updated for v0.3.3 state files (active-nudges.json,
ab-tracking.jsonl, reinforcements.jsonl, last-trace.txt) and all
9 new helper scripts under adam/scripts/.
- Test count: 27 -> 87.
- Closing line crediting Adam.
Adds an 11th signal type emitted when a run of work (between two
UserPromptSubmit events) crosses three quality gates:
- >=5 tool calls (TASK_TOOL_MIN)
- >=3 distinct tool kinds (TASK_DIVERSITY_MIN, filters single-tool
sweeps like "wrote 5 files")
- 0 correction signals during the run (filters tasks where the user
pushed back; correction-during-task disqualifies the recipe)
Payload carries tool_count, tool_kinds, active_skills, active_agents
so the agent can cluster by sorted tool-kind tuple and route through
the existing skill-overlap rule (skill_new vs skill_edit).
Importantly: cross_session_evidence is FALSE on first occurrence,
so resulting skill_new proposals always queue for review — they only
auto-apply when the same multi-tool recipe recurs in a second session
(then the existing rubric kicks in). Post-task creation captures novel
patterns while preserving the rule "auto-apply requires cross-session".
Hook adds state fields: task_tool_count, task_tool_kinds, task_corrections.
All reset on UserPromptSubmit boundary and on session change.
Agent gets one new signal-types-table row and one clustering bullet
referencing the existing skill-overlap rule.
3 new tests (30 passed, 0 failed):
- 5 tools + 5 kinds + 0 corrections fires task_completed
- 5 tools + 1 kind (Edit only) does NOT fire (diversity gate)
- 5 tools + 3 kinds + correction-on-closing-prompt does NOT fire
Bug fixes (HIGH):
- adam-observe.mjs: errorFingerprint no longer false-positives when
toolResponse.is_error === false; ERROR_RE only used as fallback when
is_error is undefined.
- adam-observe.mjs: resetSessionLocal now clears tool_window so retry_loop
cannot fire on the first tool of a new session by matching prior session.
- adam-archive.mjs: ts dedup uses Map<ts, count> instead of Set<ts>; two
journal entries sharing a millisecond are no longer both archived when
only one is referenced in source_entries.
- adam-nudge.mjs: only counts proposal filenames matching
/^\d{4}-\d{2}-\d{3}-/ pattern; README/notes in proposals/ no longer bump.
- skills/adam-self-improvement/SKILL.md: contradiction_flag veto now applied
at apply time (carry-over from earlier review).
Test isolation:
- adam/tests/run-tests.sh: ALWAYS runs against an isolated $HOME under
mktemp -d. Previously truncated live ~/.claude/adam/journal.jsonl on
every run — destructive on production state.
Conciseness:
- agents/adam.md: -19 LOC (cuts: vestigial cursor sentence, duplicate
not-do bullets, blast-radius bullet collapse, Inputs paths delegate to
SKILL.md, win-cluster-vs-struggle-cluster commentary already enforced
by cluster-key separation, # Overlap section spec compressed).
- skills/adam-self-improvement/SKILL.md: -4 LOC (framing paragraph, dead
catch-all bullet for non-eligible types).
Auto-prune script DELETED:
- The cumulative-count primitive cannot distinguish "never used" from
"used before tracking began"; mtime gate is meaningless for installed
files. Auto-prune deferred to v0.4 with a per-key lastSeen schema.
Cross-platform:
- macOS (BSD coreutils) and Linux (Alpine, glibc + musl) verified.
- All scripts use portable forms (stat -f || stat -c, mktemp -d -t).
- README documents platform support explicitly.
DX overhaul:
- install.sh: hardened — supports `curl | bash` via auto-clone,
--version=vX.Y.Z pinning, --yes / --dry-run flags, jq-based
settings.json merge with diff prompt and backup, conservative file
copy that detects local mtime drift and writes <file>.adam-new
instead of clobbering, idempotent across re-runs.
- adam-uninstall.sh: NEW. Soft-archives ~/.claude/adam/ to .bak.<ts>/
by default; --purge to delete; --yes for non-interactive; jq-based
settings.json cleanup with diff prompt.
- README.md: curl one-liner install + version-pinned variant at top,
What's New section through v0.3.1, upgrade-safe data files callout,
uninstaller documentation, platform support note, expanded rubric
showing skill_edit gate.
Test count: 27 passed, 0 failed (was 27 — no regression).
Closes the gap between categorical signal capture (we saw 3 retries) and
causal proposal drafting (here is why and what to do). Mirrors the NL trace
reflection step Hermes Agent uses before mutating prompts.
Adds # Diagnosis section to every proposal body — four labelled lines:
- Trigger: what the user wanted / context
- Action: what the assistant did
- Mismatch: how the action diverged
- Outcome: surfacing event with >=1 verbatim transcript quote
Constraints:
- <=5 LOC of prose total
- >=1 backtick-wrapped quote <=80 chars from transcript context window
- Cannot speculate; "Mismatch: unclear" is allowed but takes -1 confidence
- Win clusters use "Mismatch: None" with recovery quote in Outcome
Skill enforces structure at apply time (presence + 4 labelled lines + quote)
for both auto-apply and walk-the-queue paths. No semantic check — humans
judge causal correctness during walk-the-queue.
Adds optional frontmatter field `diagnosis_summary` (<=120 chars from the
Mismatch line) so applied/ and rejected/ are searchable by causal pattern.
New rubric penalty: -1 confidence when Diagnosis flags Mismatch: unclear.
Stops weak-causation proposals from auto-applying (drops below conf>=4).
No hook changes. All 27 tests still pass.
Spec: ~/.claude/docs/superpowers/specs/2026-05-10-adam-causal-diagnosis-design.md
Adds two new hook signal types:
- correction_free_streak: 5 consecutive UserPromptSubmits without a correction phrase
- clean_recovery: 3 clean PostToolUse events after a struggle signal
(tool_error_loop / dead_end / retry_loop)
Both carry active_skills/active_agents payloads computed from a 10-event
activity ring, so ADAM can attribute wins to whichever skill was active
during the streak/recovery.
Promotes skill_edit to auto-apply under a strict gate (all required):
- conf >= 4 + cross-session evidence (existing rules)
- # Why cites a win-signal entry whose active_skills includes target
- diff append-only, +lines <= 30
- resulting SKILL.md size <= 2x current size
- 7-day cooldown per target (last_auto_edit in applied/ frontmatter)
- 30-day blacklist on user rejection (auto_apply_blacklist in rejected/)
Skill enforces the gate at apply time as defense in depth: re-stats target,
re-checks cooldown and blacklist, verifies append-only, reverts and refuses
on byte-cap breach. User-rejected skill_edit proposals automatically write
auto_apply_blacklist: true.
Win signals participate in the existing v0.2.0 source_entries archive
lifecycle, so already-applied evidence does not re-cluster.
Test suite: +5 cases (5 new asserts pass), 27 total passing.
Spec: ~/.claude/docs/superpowers/specs/2026-05-10-adam-proactive-design.md
Plan: ~/.claude/docs/superpowers/plans/2026-05-10-adam-proactive.md
Lifecycle redesign:
- Each proposal records source_entries: [<ts>...] in frontmatter listing
the journal timestamps that fed its cluster.
- After apply/reject, skill calls adam/scripts/adam-archive.mjs which moves
matching entries from journal.jsonl to journal/actioned-<id>.jsonl.
- Agent reads applied/ + rejected/ frontmatter on each /reflect, builds an
excluded-timestamps set, skips any leftover already-actioned entries.
- cursor field in state.json is vestigial; agent ignores it.
Effect: journal stays bounded by active observations. Rule changes
re-evaluate the remainder without manual rewind. Race-safer for parallel
sessions on shared state.json (no cursor write contention).
Memory drafting:
- agents/adam.md adds 'Memory drafting protocol' parallel to Skill drafting.
- Memory proposals MUST contain auto-memory frontmatter (name, description,
type, originSessionId) in '# Proposed change' body.
- Skill enforces frontmatter check at apply time; refuses if missing.
Tests: 18 -> 21. Two new tests for adam-archive happy path + no-op.
Migration: existing applied proposals lack source_entries. Their backing
journal entries archived as a one-time bulk migration; legacy proposals
annotated with migration note.
The hook emits struggle signals only after crossing internal thresholds
(3 retries, 8 tools no-prompt, 4 edits to one file, 2 build failures, etc.).
Each journal entry is therefore meaningful evidence on its own. Old rule
required >=3 entries within single session, which the once-per-thing
emission design rarely produces. New rule: >=1 struggle entry qualifies
for proposal at +2 weight (cross-session bonus does not stack).
Auto-apply still requires cross_session_evidence; single-session-only
proposals always queue for review.
- 8 friction signals via lightweight hook (correction, retry_loop, weak_agent,
tool_error_loop, dead_end, edit_churn, build_loop, subagent_dispatch_pattern)
- Deterministic confidence rubric with cross-session evidence gate
- /reflect skill to dispatch the analyst subagent and walk the queue
- Skill overlap detection (prefer skill_edit over skill_new on collision)
- Solution synthesis from transcript context for new skill drafts
- Soft-delete trash, never hard rm
- 18 tests covering all signals