5 Commits

Author SHA1 Message Date
lukaszraczylo a48c705c0a feat(adam): smarter signals & clustering
- New signal types in hooks/adam-observe.mjs:
  - silent_drift: 5 consecutive read-only PostToolUse without an action tool
  - error_after_recovery: same error fingerprint returns within 5 events of clean_recovery
- Severity-weighted scoring in adam/scripts/adam-score.mjs:
  - SEVERITY_DIVISORS exported per struggle signal type
  - Per-session severity_sum + severity_by_type added to JSON output
- Skill-attribution clustering in agents/adam.md:
  - Sub-cluster struggle signals on active_skills[0]
  - New struggle-driven skill_edit variant (always queues, never auto-applies)
- Rubric updates:
  - +1 for cluster severity-sum >= 10, additional +1 for >= 32
  - +1 for skill-attributed sub-cluster naming an existing skill
  - silent_drift + error_after_recovery added to struggle signal list
- Window: silent_drift 14d, error_after_recovery 30d
- Tests: 94 passing (78-82 new)

Backward compat: entries without count default to severity 1. Existing
win-driven skill_edit gate untouched. No journal migration.
2026-05-13 19:21:59 +01:00
lukaszraczylo a8883aa8b7 fix(logo): explicit light/dark variants + <picture> for GitHub
The prior logo.svg used currentColor, which resolves to black when the
SVG is loaded via <img> on GitHub — making the logo invisible in dark
mode (the GitHub default for many users).

Fix uses GitHub's supported <picture> + prefers-color-scheme media-
source pattern in README:

- assets/logo-light.svg — explicit GitHub light-theme text color #24292f
- assets/logo-dark.svg  — explicit GitHub dark-theme text color #f0f6fc
- assets/logo.svg       — kept with embedded @media + currentColor for
                          standalone use (markmorph notes, anywhere
                          else the SVG is loaded outside <picture>)

README updates the <img> tag to a <picture> with media-conditioned
source so GitHub's renderer picks the right variant per theme.
2026-05-13 02:07:11 +01:00
lukaszraczylo 7ed2aecdfa docs(logo): swap to swaddled-baby design with hands
Replaces the geometric-A-with-observation-dot with a softer, more
on-theme design: a swaddled-baby silhouette (rounded A-shape bundle),
face nestled inside, and the wrap-band extended past the bundle on
both sides as little hands. Maintains currentColor + zero external
assets; reads cleanly down to favicon size.

Ties the visual identity to the 'Story behind Adam' section: the
project is named after the author's son, and now the logo is too.
2026-05-13 02:02:02 +01:00
lukaszraczylo a30f8b1158 docs: replace ASCII pipeline diagram with mermaid flowchart
GitHub renders mermaid natively. Diagram now shows three subgraphs
(Observation → Analysis → Review + apply) with a nested Pre-processors
subgraph inside Analysis. Includes:

- Dotted edge labeled 'user runs /reflect' marking the observe→analyze
  boundary.
- Diamond gate node for auto-apply decision (conf≥4 · low blast ·
  cooldown cool) with explicit yes/no branches.
- Feedback loop: applied/ entries measure back into adam-ab-measure.mjs
  on subsequent reflects.
- Color-coded classDef for stores (blue), processes (orange), and the
  clustering trace artifact (purple).

ASCII art retired — diagram now legible at any zoom on github.com.
2026-05-13 01:54:38 +01:00
lukaszraczylo d3e4350d71 docs: modernize README + add SVG logo + inspiration story
- New 'Story behind Adam' section at the top: the project is named after
  the author's newborn son, whose observe-act-adjust-observe-again
  learning loop is the methodology ADAM applies to LLM sessions.
- New SVG logo at assets/logo.svg: stylized 'A' with a captured
  observation point inside the apex and a feedback crossbar. Uses
  currentColor + gradient so it adapts to light/dark GitHub themes.
- Centered header block with project tagline + 5 badges (License,
  Version, Tests, Node, Platform).
- New 'Highlights' section: 8 emoji-tagged one-liners covering the
  v0.3.3 design pillars (zero LLM cost observation, A/B measurement,
  sliding windows, observability, etc.).
- New 'How it works' ASCII pipeline diagram: observation -> analysis
  pre-processors -> analyst -> review + apply.
- Signals table now includes per-signal sliding window column.
- Rubric section restructured: gates, modifiers (dampener), and
  skill_edit-specific requirements clearly separated.
- New 'Inspecting the analyst's reasoning' section documenting
  adam-explain.mjs + /reflect --explain.
- Layout updated for v0.3.3 state files (active-nudges.json,
  ab-tracking.jsonl, reinforcements.jsonl, last-trace.txt) and all
  9 new helper scripts under adam/scripts/.
- Test count: 27 -> 87.
- Closing line crediting Adam.
2026-05-13 01:50:59 +01:00
9 changed files with 528 additions and 134 deletions
+256 -128
View File
@@ -1,159 +1,273 @@
<div align="center">
<picture>
<source media="(prefers-color-scheme: dark)" srcset="./assets/logo-dark.svg">
<img src="./assets/logo-light.svg" alt="claude-adam logo" width="128" height="128" />
</picture>
# claude-adam # claude-adam
Self-improvement layer for [Claude Code](https://claude.com/claude-code) that observes friction signals during your sessions and proposes targeted improvements (new skills, memory entries, agent edits) which you can review and apply. **A self-improvement layer for [Claude Code](https://claude.com/claude-code).**
## What's new Watches the friction in your coding sessions, clusters the signals via an LLM analyst, and proposes targeted improvements — new skills, memory entries, agent edits — that you review and apply.
- **v0.3.3** — analyst observability, A/B measurement, journal hygiene. Storage/window/exclusion split: ISO-week journal rotation with safety fuse (replaces size-based, fixes silent under-counting); per-signal sliding windows via new `adam-window.mjs` (`dead_end` 7d, `correction` 30d, reinforcement signals 60d). Error fingerprint normalization — `ECONNREFUSED` and `"Connection refused"` cluster identically. Correction corpus expanded (`wait`, `hold on`, `try again`, `different approach`); weak tokens (`no`, `actually`, `wait`) require negation co-occurrence within 8 tokens to fire — kills the `"actually, I think..."` false positive. Mandatory clustering trace + new `adam-explain.mjs --mode summary|full|json`. New `nudge` proposal type (single-session auto-apply, low blast) for repeated `dead_end`. Per-(skill, fingerprint) cooldown via `adam-cooldown.mjs` (replaces coarse per-skill gate). `task_completed` scoring: urgency dampener + reinforcement candidates. A/B effectiveness measurement on auto-applied edits (`adam-ab-measure.mjs`, 7d pre/post window). Upgrade UX overhaul: `adam-upgrade.mjs --list/--diff/--accept` + SessionStart pending-merge warning. Shared helper module `adam-utils.mjs` deduplicates journal-reading and frontmatter parsing across scripts. 87 tests (up from 30). [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
- **v0.3.2** — `task_completed` signal: post-task skill capture for downstream reinforcement scoring (consumed in v0.3.3). [![Version](https://img.shields.io/github/v/release/lukaszraczylo/claude-adam?label=version&color=blue)](https://github.com/lukaszraczylo/claude-adam/releases)
- **v0.3.1** — code review pass: bug fixes (`errorFingerprint` no longer false-positives on `is_error: false`, archive script handles same-millisecond duplicates correctly, `tool_window` now clears on session change, nudge filters proposal filenames by pattern), prose conciseness cuts, hardened `install.sh` with curl one-liner + settings.json merge, `adam-uninstall.sh`, isolated test harness (no longer pollutes live `~/.claude/adam/` state). [![Tests](https://img.shields.io/badge/tests-87%20passing-brightgreen.svg)](./adam/tests/run-tests.sh)
- **v0.3.0** — causal diagnosis: every proposal carries a `# Diagnosis` block (Trigger/Action/Mismatch/Outcome with verbatim transcript quote) before drafting, plus optional `contradiction_flag` heuristic that vetoes auto-apply on obviously-conflicting `skill_edit` additions. [![Node](https://img.shields.io/badge/node-22%2B-339933.svg)](https://nodejs.org)
- **v0.2.1** — win signals (`correction_free_streak`, `clean_recovery`) feed `skill_edit` auto-apply under a strict gate (≤30 LOC, ≤2× byte cap, 7d cooldown, 30d blacklist on rejection). [![Platform](https://img.shields.io/badge/platform-macOS%20%7C%20Linux-lightgrey.svg)]()
- **v0.2.0** — actioned-entry archival via `adam-archive.mjs`; `cursor` field deprecated.
## What it does </div>
A lightweight Node.js hook (`adam-observe.mjs`) runs on `UserPromptSubmit`, `PreToolUse`, and `PostToolUse` events. It detects: ---
| Signal | Trigger | ## The story behind Adam
|---|---|
| `correction` | User prompt contains "no", "stop", "wrong", "actually", etc. after a tool call | Adam is my newborn son.
| `retry_loop` | Same tool + same args called 3× in a 10-event window |
| `weak_agent` | Same subagent dispatched 2× in last 5 tool calls | Watching him over the last few months — the way he observes the world, tries something, watches what happens, adjusts, and tries again — I realised that the most powerful learning loop in nature is also one of the simplest. No grand theory. No instruction manual. Just relentless feedback and pattern recognition, applied to every waking moment.
| `tool_error_loop` | Same error fingerprint appears 3× in a 5-event ring |
| `dead_end` | 8 PostToolUse events without a UserPromptSubmit between them | LLMs can learn the same way. Give them a hook into the real friction of your work — the corrections, the dead-ends, the moments you say *"no, try again"* — and let them propose improvements grounded in **what actually happened**. Not what they assume might help. What you actually struggled with.
| `edit_churn` | Same file edited 4× in a window |
| `build_loop` | 2× build/test/compile commands fail in same session | **claude-adam** is that loop, wired into Claude Code. It's named after Adam because the methodology is his.
| `subagent_dispatch_pattern` | Same subagent dispatched ≥3× cumulatively |
| `correction_free_streak` | 5 clean UserPromptSubmits in a row (no correction phrase) — feeds `skill_edit` reinforcement | ---
| `clean_recovery` | 3 clean PostToolUse events after a struggle signal — feeds `skill_edit` reinforcement |
## Highlights
- 🔍 **Zero LLM cost at observation time.** Deterministic regex + counter detection in a Node hook. The analyst only runs when you invoke `/reflect`.
- 📡 **11 signal types.** Friction (`correction`, `tool_error_loop`, `dead_end`, `edit_churn`, …) + reinforcement (`task_completed`, `correction_free_streak`, `clean_recovery`) + meta.
- 🛡️ **Tight auto-apply gates.** Confidence ≥ 4, cross-session evidence, contradiction veto, per-(skill, fingerprint) cooldown. Most things queue for your manual review.
- 📊 **A/B effectiveness measurement.** Every auto-applied edit gets a 7-day pre/post signal-count delta. If a proposed fix made things worse, the next `/reflect` says so.
-**Per-signal sliding windows.** Stale friction doesn't accumulate forever. `dead_end` 7d, `correction` 30d, reinforcement signals 60d.
- 🔬 **Observable.** Every clustering decision (passed / threshold-blocked / window-filtered / contradiction-vetoed) emits a trace. `/reflect --explain` shows it.
- 📦 **Pure Node.** Zero npm dependencies. Runs on macOS and Linux (Alpine smoke-tested).
## Quick start
```sh
curl -fsSL https://raw.githubusercontent.com/lukaszraczylo/claude-adam/main/install.sh | bash
```
The installer copies files into `~/.claude/`, offers to merge ADAM's hook entries into `~/.claude/settings.json` (with a diff preview and `[y/N]` confirm), and preserves any local edits via `.adam-new` sidecar files. Pass `--yes` to skip prompts, `--dry-run` to preview.
Then:
```sh
bash ~/.claude/adam/tests/run-tests.sh # expect: 87 passed, 0 failed
# … start a fresh Claude Code session …
/reflect # walks the proposal queue
/reflect --explain # also shows the analyst's clustering trace
```
Pin a release for reproducibility:
```sh
curl -fsSL https://raw.githubusercontent.com/lukaszraczylo/claude-adam/v0.3.3/install.sh \
| VERSION=v0.3.3 bash
```
## How it works
```mermaid
flowchart TB
subgraph OBS["Observation (deterministic, in-hook, zero LLM cost)"]
direction LR
EV["Tool event /<br/>user prompt"] --> OBSERVE["adam-observe.mjs<br/><sub>regex · counters · ring buffers</sub>"]
OBSERVE --> JOURNAL[("journal.jsonl<br/><sub>append-only signal log</sub>")]
end
JOURNAL -. user runs <code>/reflect</code> .-> ANALYSIS
subgraph ANALYSIS["Analysis (LLM, only on demand)"]
direction TB
subgraph PRE["Pre-processors (deterministic)"]
direction LR
W["adam-window.mjs<br/><sub>per-signal sliding window</sub>"]
S["adam-score.mjs<br/><sub>task_completed dampener<br/>+ reinforcement candidates</sub>"]
AB["adam-ab-measure.mjs<br/><sub>7d pre/post deltas<br/>on prior auto-applies</sub>"]
end
AGENT["adam subagent<br/><sub>cluster · score · diagnose</sub>"]
PRE --> AGENT
AGENT --> PROPOSALS[("proposals/")]
AGENT --> TRACE[["clustering trace<br/><sub>adam-explain.mjs renders</sub>"]]
end
PROPOSALS --> REVIEW
subgraph REVIEW["Review + apply"]
direction TB
GATE{"auto-apply<br/>gates pass?<br/><sub>conf≥4 · low blast<br/>· cooldown cool</sub>"}
GATE -->|yes| APPLIED[("applied/<br/>+ ab-tracking.jsonl")]
GATE -->|no| QUEUE["walk-the-queue<br/><sub>approve · reject · edit</sub>"]
QUEUE -->|approve| APPLIED
QUEUE -->|reject| REJECTED[("rejected/")]
end
APPLIED -. measures back into .-> AB
classDef store fill:#e8f4fd,stroke:#5b9bd5,stroke-width:2px,color:#1f3a5f
classDef proc fill:#fff4e6,stroke:#e8a33d,stroke-width:1px,color:#5a3d0f
classDef trace fill:#f0e8fd,stroke:#7e5dc0,stroke-width:1px,color:#2f1e60
class JOURNAL,PROPOSALS,APPLIED,REJECTED store
class EV,OBSERVE,W,S,AB,AGENT,QUEUE proc
class TRACE trace
```
The observation layer is a 350-line Node hook. Pure regex, counters, ring buffers — no LLM in the hot path. Signals append one JSONL line per detection to `~/.claude/adam/journal.jsonl`.
The analysis layer is an LLM subagent invoked by `/reflect`. Before the analyst runs, three deterministic pre-processors filter and enrich the journal: `adam-window.mjs` drops stale entries per per-signal age, `adam-score.mjs` computes per-session urgency dampeners + reinforcement candidates, and `adam-ab-measure.mjs` checks whether previously auto-applied edits actually reduced their originating signal.
The analyst clusters signals, scores them against a deterministic rubric (see below), and emits proposal markdown files to `~/.claude/adam/proposals/`. Each proposal carries a `# Diagnosis` block (Trigger / Action / Mismatch / Outcome with a verbatim transcript quote), a `# Success criterion`, and the source journal-entry timestamps it clustered.
Auto-apply runs only for low-blast types (memory entries, new skills, ephemeral nudges, reinforcement logs) backed by cross-session evidence. Everything else queues for your manual approve / reject / edit walk.
## Signals
| Signal | Trigger | Window* |
|---|---|---|
| `correction` | Strong tokens (`stop`, `wrong`, `undo`, …) OR weak tokens (`no`, `actually`, `wait`) with negation/contrast nearby | 30d |
| `retry_loop` | Same tool + same args called 3× in a 10-event window | 14d |
| `weak_agent` | Same subagent dispatched 2× in last 5 tool calls | 30d |
| `tool_error_loop` | Same error fingerprint 3× in a 5-event ring (fingerprints normalised — `ECONNREFUSED` and `"Connection refused"` cluster) | 30d |
| `dead_end` | 8 PostToolUse events without a UserPromptSubmit between them | 7d |
| `edit_churn` | Same file edited 4× in a window | 14d |
| `build_loop` | 2× build/test/compile commands fail in same session | 30d |
| `subagent_dispatch_pattern` | Same subagent dispatched ≥ 3× cumulatively | 30d |
| `correction_free_streak` | 5 clean UserPromptSubmits in a row — reinforcement input | 60d |
| `clean_recovery` | 3 clean PostToolUse events after a struggle signal — reinforcement input | 60d |
| `task_completed` | 5 tools / 3 kinds / 0 corrections — fed into the urgency dampener + reinforcement candidates | 60d |
\* Per-signal sliding window for `/reflect` analysis. See `SIGNAL_WINDOWS_DAYS` in `adam/scripts/adam-window.mjs`.
Detection is local, regex-based, zero LLM cost. Signals append to `~/.claude/adam/journal.jsonl`. Detection is local, regex-based, zero LLM cost. Signals append to `~/.claude/adam/journal.jsonl`.
When you run `/reflect`, the `adam` subagent reads the journal, clusters signals, scores them against a deterministic rubric, and emits proposal files to `~/.claude/adam/proposals/`. Auto-applied proposals only ship for low-blast types (memory, new skills) backed by cross-session evidence; everything else queues for your manual approve/reject/edit walk. ## Auto-apply rubric
## Why ```
Sum:
+2 Signal repeated ≥ 3× across ≥ 2 sessions (within signal's window)
+2 Struggle signal appearing ≥ 1× within a single session (does not stack)
+2 Transcript contains positive endorsement near related action
+1 Multi-axis cluster (≥ 2 distinct struggle types in same session)
-1 Type-bias penalty (≥ 3 rejections, applied:rejected < 1:2)
+1 Blast radius low (memory or new isolated skill)
0 Blast radius medium (new agent, new hook, edit existing skill)
-1 Blast radius high (CLAUDE.md, settings hooks, edit agent, deletion)
+1 Surgical (one file, ≤ 50 LOC for non-skill_new; ≤ 80 LOC for skill_new)
-3 Touches deny-list (settings.json hooks/permissions, CLAUDE.md, deletions)
```
LLM coding sessions reveal repeated friction the moment you stop and look. ADAM looks so you don't have to. Modifiers applied at scoring time:
- × `dampener` from `adam-score.mjs` (0.5 / 0.75 / 1.0 based on session's `task_completed` count) — sessions that net-succeeded score lower urgency.
`auto_apply_eligible` requires **all** of:
- `confidence ≥ 4`
- `blast_radius == low`
- `type ∈ {memory, skill_new, nudge, reinforcement}` (or `skill_edit` via the win-driven gate)
- `cross_session_evidence == true` (except `nudge`, which is single-session by design)
- `adam-cooldown.mjs` returns `cool` for `(target_skill, proposal_fingerprint)`
- `contradiction_flag` unset
`skill_edit` additionally requires:
- Win-signal evidence (`correction_free_streak` / `clean_recovery` cites target skill)
- Diff is append-only, ≤ 30 LOC, resulting size ≤ 2× original
- No auto-edit to same target in past 7 days (per-fingerprint cooldown)
- No rejection-blacklist on target in past 30 days
- `# Diagnosis` section present + structurally valid
Everything else queues.
## Lifecycle: from signal to permanent improvement
Every proposal records the journal entry timestamps that fed its cluster (`source_entries` in frontmatter). When you apply or reject a proposal, the skill calls `adam-archive.mjs` which moves matching entries from `journal.jsonl` to `journal/actioned-<id>.jsonl`. The result:
- `journal.jsonl` stays bounded by **active** observations only.
- The next `/reflect` reads `applied/` + `rejected/` frontmatter, builds an excluded-timestamps set, and skips any leftover journal entries that were already actioned.
- Rule changes (e.g. lowering a threshold) immediately re-evaluate the remaining active observations — no manual cursor rewind needed.
Auto-applied proposals additionally append to `~/.claude/adam/ab-tracking.jsonl`. The next time `/reflect` runs (and 7+ days have passed), `adam-ab-measure.mjs` computes a pre/post delta of the originating signal count. Status: `improved` / `neutral` / `regressed` / `no_baseline` / `pending`. Regressions surface at the top of the analyst's output so a bad fix doesn't quietly persist.
## Inspecting the analyst's reasoning
Every `/reflect` run also writes the analyst's clustering trace to `~/.claude/adam/last-trace.txt`. The trace records, per cluster: signal type, occurrence count, sessions, which gates passed or failed, and whether the cluster produced a proposal or was skipped (with reason: `threshold` / `cross_session` / `window` / `contradiction` / `other`).
```sh
node ~/.claude/adam/scripts/adam-explain.mjs --mode summary # SUMMARY + per-decision counts
node ~/.claude/adam/scripts/adam-explain.mjs --mode full # verbatim trace + rejection histogram
node ~/.claude/adam/scripts/adam-explain.mjs --mode json # machine-readable
```
Or pass `--explain` to `/reflect` to render the full trace inline.
## What it will not do
- 🚫 No background LLM spend. The analyst runs only when you invoke `/reflect`.
- 🚫 No retroactive transcript mining beyond the journal.
- 🚫 No hard `rm` of any artifact. Deletions are soft (`mv` to `trash/<ts>/`).
- 🚫 No autonomous edits to `CLAUDE.md`, agents, hooks, or `settings.json` — these always queue for review regardless of confidence.
- 🚫 No proposal that matches a previously-rejected idea (≥ 2 token overlap with rejection's `# Why`).
- 🚫 No invented trigger phrases for new skills — every trigger comes from observed user input.
## Layout ## Layout
``` ```
~/.claude/ ~/.claude/
├── hooks/ ├── hooks/
│ ├── adam-observe.mjs # signal collector │ ├── adam-observe.mjs # signal collector (UserPromptSubmit / PreToolUse / PostToolUse)
│ └── adam-nudge.mjs # SessionStart reminder when ≥3 proposals queued │ └── adam-nudge.mjs # SessionStart reminder + pending-upgrade warning
├── agents/adam.md # analyst subagent (system prompt + rubric) ├── agents/adam.md # analyst subagent (system prompt + rubric)
├── skills/adam-self-improvement/SKILL.md # /reflect protocol ├── skills/adam-self-improvement/
├── commands/reflect.md # /reflect slash command │ └── SKILL.md # /reflect protocol
├── commands/reflect.md # /reflect slash command
└── adam/ └── adam/
├── journal.jsonl # append-only signal log (active observations) ├── journal.jsonl # active observations
├── journal/ # rotated daily logs + actioned-<id>.jsonl per applied/rejected proposal ├── journal/ # rotated weekly (YYYY-Www.jsonl) + actioned-<id>.jsonl
├── state.json # per-session counters ├── state.json # per-session counters
├── usage.json # skill/agent invocation tallies + payload visibility counters ├── usage.json # invocation tallies + visibility metrics
├── proposals/ # queued, awaiting review ├── active-nudges.json # ephemeral SessionStart reminders (auto-expire)
├── applied/ # approved + auto-applied archive ├── ab-tracking.jsonl # one entry per auto-apply, drives effectiveness measurement
├── rejected/ # rejected (with reason) ├── reinforcements.jsonl # appended on reinforcement proposal apply
├── trash/ # soft-deleted artifacts (recoverable) ├── last-trace.txt # most recent analyst clustering trace
├── scripts/ # adam-archive.mjs (called by skill on apply/reject) ├── proposals/ # queued, awaiting review
── tests/run-tests.sh # 27 verification tests (isolated tmpdir; never touches live state) ── applied/ # approved + auto-applied archive
├── rejected/ # rejected with reason
├── trash/ # soft-deleted artifacts (recoverable)
├── scripts/
│ ├── adam-utils.mjs # shared journal-reading + frontmatter parsing
│ ├── adam-window.mjs # per-signal sliding-window filter
│ ├── adam-score.mjs # urgency dampener + reinforcement candidates
│ ├── adam-ab-measure.mjs # 7d pre/post delta per auto-applied edit
│ ├── adam-cooldown.mjs # per-(skill, fingerprint) cooldown gate
│ ├── adam-nudge-eligibility.mjs # dead_end session-count check
│ ├── adam-explain.mjs # clustering trace parser/renderer
│ ├── adam-apply-reinforcement.mjs # reinforcement proposal apply
│ ├── adam-upgrade.mjs # .adam-new file UX (list/diff/accept)
│ └── adam-archive.mjs # post-apply journal cleanup
└── tests/run-tests.sh # 87 isolated tests; never touches live state
``` ```
## Install ## What's new
### One-liner (recommended) - **v0.3.3** — analyst observability, A/B measurement, journal hygiene. ISO-week journal rotation replaces 5MB size-based (fixes silent cluster-straddling under-count); per-signal sliding windows via `adam-window.mjs`; error fingerprint normalisation; correction corpus expanded + weak-token co-occurrence requirement (kills the `"actually, I think..."` false positive); mandatory clustering trace + `adam-explain.mjs`; new `nudge` and `reinforcement` proposal types; per-(skill, fingerprint) cooldown via `adam-cooldown.mjs`; `task_completed` scoring (dampener + reinforcement); A/B effectiveness measurement; upgrade UX overhaul (`adam-upgrade.mjs --list/--diff/--accept`); shared `adam-utils.mjs`. 87 tests (up from 30).
- **v0.3.2** — `task_completed` signal: post-task skill capture for downstream reinforcement scoring (consumed in v0.3.3).
```sh - **v0.3.1** — code review pass: bug fixes (`errorFingerprint` no longer false-positives on `is_error: false`, archive script handles same-millisecond duplicates correctly, `tool_window` clears on session change, nudge filters proposal filenames by pattern), prose conciseness cuts, hardened `install.sh` with curl one-liner + settings.json merge, `adam-uninstall.sh`, isolated test harness.
curl -fsSL https://raw.githubusercontent.com/lukaszraczylo/claude-adam/main/install.sh | bash - **v0.3.0** — causal diagnosis: every proposal carries a `# Diagnosis` block (Trigger/Action/Mismatch/Outcome with verbatim transcript quote), plus `contradiction_flag` heuristic that vetoes auto-apply on obviously-conflicting `skill_edit` additions.
``` - **v0.2.1** — win signals (`correction_free_streak`, `clean_recovery`) feed `skill_edit` auto-apply under a strict gate (≤ 30 LOC, ≤ 2× byte cap, 7d cooldown, 30d blacklist).
- **v0.2.0** — actioned-entry archival via `adam-archive.mjs`; `cursor` field deprecated.
Pin a release for reproducibility:
```sh
curl -fsSL https://raw.githubusercontent.com/lukaszraczylo/claude-adam/v0.3.1/install.sh \
| VERSION=v0.3.1 bash
```
The installer clones the repo to `/tmp`, copies files into `~/.claude/`, and offers to merge ADAM's hook entries into your `~/.claude/settings.json` (with a diff preview and `[y/N]` confirmation — your existing hooks are preserved). Pass `--yes` to skip the prompt; `--dry-run` to preview without writing.
Requires `git`, `curl`, `jq`, and `node` 18+.
### From a clone
```sh
git clone https://github.com/lukaszraczylo/claude-adam
cd claude-adam
./install.sh
```
### Upgrade-safe
These files are **never overwritten** if they already exist:
- `~/.claude/adam/journal.jsonl` — your observation log
- `~/.claude/adam/state.json` — session counters
- `~/.claude/adam/usage.json` — invocation tallies
If you've locally edited any installed file (e.g. `agents/adam.md`), the installer writes the new version to `<file>.adam-new` and warns you instead of clobbering.
After install: run `bash ~/.claude/adam/tests/run-tests.sh` to verify (expect `27 passed, 0 failed`), start a fresh Claude Code session, then run `/reflect`.
## Requirements ## Requirements
- Claude Code v2.1.0+ (for auto skill hot-reload; older versions need session restart after `skill_new` proposals are applied) - **Claude Code v2.1.0+** — for auto skill hot-reload (older versions need a session restart after `skill_new` proposals).
- Node.js 18+ (for the hook; tested on v22) - **Node.js 18+** — tested on v22, used by the hook + helper scripts. Zero npm dependencies.
- Bash 4+, `git`, `curl`, `jq` (for installer + test harness) - **Bash 4+**, `git`, `curl`, `jq` for the installer + test harness.
### Platform support ### Platform support
Tested on **macOS** (Darwin / BSD coreutils) and **Linux** (Alpine, glibc + musl). The install / uninstall / test scripts are written to be portable: `stat` uses BSD `-f` with GNU `-c` fallback, `mktemp -d -t prefix.XXXXXX` works on both, no GNU-only flags. CI smoke verified `27 passed, 0 failed` under `alpine:latest`. Tested on **macOS** (Darwin / BSD coreutils) and **Linux** (Alpine, glibc + musl). The install / uninstall / test scripts are written to be portable: `stat` uses BSD `-f` with GNU `-c` fallback, `mktemp -d -t prefix.XXXXXX` works on both, no GNU-only flags. CI smoke verified under `alpine:latest`.
## Confidence rubric
```
Sum:
+2 Signal repeated ≥3× across ≥2 sessions
+2 Struggle signal appearing ≥1× within a single session (does not stack)
+2 Transcript contains positive endorsement near related action
+1 Multi-axis cluster (≥2 distinct struggle types in same session)
-1 Type-bias penalty (≥3 rejections, applied:rejected <1:2)
+1 Blast radius low (memory or new isolated skill)
0 Blast radius medium (new agent, new hook, edit existing skill)
-1 Blast radius high (CLAUDE.md, settings hooks, edit agent, deletion)
+1 Surgical (one file, ≤50 LOC for non-skill_new; ≤80 LOC for skill_new)
-3 Touches deny-list (settings.json hooks/permissions, CLAUDE.md, deletions)
auto_apply_eligible requires ALL:
confidence ≥ 4
blast_radius == low
type ∈ {memory, skill_new, skill_edit} # skill_edit also passes the win-driven gate
cross_session_evidence == true (single-session-only proposals always queue)
skill_edit additionally requires (v0.2.1+):
win-signal evidence (correction_free_streak / clean_recovery cites target skill)
diff is append-only, ≤30 LOC, resulting size ≤2× original
no auto-edit to same target in past 7 days (cooldown)
no rejection-blacklist on target in past 30 days
contradiction heuristic does not flag (v0.3.0+)
# Diagnosis section present + structurally valid (v0.3.0+)
```
## Lifecycle: how proposals become permanent
Every proposal records the journal entry timestamps that fed its cluster (`source_entries` in frontmatter). When you apply or reject a proposal, the skill calls `adam/scripts/adam-archive.mjs` which moves matching entries from `journal.jsonl` to `journal/actioned-<id>.jsonl`. Effects:
- The `journal.jsonl` stays bounded by **active** observations only.
- The next `/reflect` reads applied/ + rejected/ frontmatter, builds an excluded-timestamps set, and skips any leftover journal entries that were already actioned.
- Rule changes (e.g. lowering a threshold) immediately re-evaluate the remaining active observations — no manual cursor rewind needed.
## What it will not do
- No background LLM spend. The analyst runs only when you invoke `/reflect`.
- No retroactive transcript mining beyond the journal cursor.
- No hard `rm` of any artifact. Deletions are soft (`mv` to `trash/<ts>/`).
- No autonomous edits to `CLAUDE.md`, agents, hooks, or `settings.json` — these always queue for review regardless of confidence.
- No proposal that matches a previously-rejected idea (≥2 token overlap with rejection's `# Why`).
- No invented trigger phrases for new skills — every trigger comes from observed user input.
## Uninstall ## Uninstall
@@ -175,6 +289,20 @@ rm -rf ~/.claude/skills/adam-self-improvement
Then remove the four `adam-*` hook entries from `~/.claude/settings.json`. Then remove the four `adam-*` hook entries from `~/.claude/settings.json`.
## Contributing
Issues and PRs welcome — especially additional signal types, transcript-aware diagnosis improvements, and platform fixes. Run the test suite before opening a PR:
```sh
bash ~/.claude/adam/tests/run-tests.sh
```
## License ## License
[MIT](LICENSE) — © 2026 Lukasz Raczylo [MIT](LICENSE) — © 2026 Lukasz Raczylo
---
<div align="center">
<sub>Named after my son Adam, who taught me that observation is the start of every interesting thing.</sub>
</div>
+35 -2
View File
@@ -43,10 +43,32 @@ export const NEGATIVE_SIGNAL_TYPES = new Set([
"retry_loop", "retry_loop",
"build_loop", "build_loop",
"weak_agent", "weak_agent",
"silent_drift",
"error_after_recovery",
]); ]);
export const REINFORCEMENT_THRESHOLD = 3; export const REINFORCEMENT_THRESHOLD = 3;
// Severity divisor per struggle signal type. Severity = max(1, floor(count / divisor)).
// Entries without `count` default to severity 1. Source of truth — referenced by
// agents/adam.md (Confidence rubric → severity-sum bullets).
export const SEVERITY_DIVISORS = {
dead_end: 8,
edit_churn: 4,
tool_error_loop: 3,
retry_loop: 3,
weak_agent: 2,
build_loop: 1,
};
export function entrySeverity(entry) {
if (!entry || typeof entry !== "object") return 1;
const divisor = SEVERITY_DIVISORS[entry.type];
if (!divisor) return 1;
const count = typeof entry.count === "number" && entry.count > 0 ? entry.count : 1;
return Math.max(1, Math.floor(count / divisor));
}
function parseArgs(argv) { function parseArgs(argv) {
const args = { home: null, input: null, help: false }; const args = { home: null, input: null, help: false };
for (let i = 0; i < argv.length; i++) { for (let i = 0; i < argv.length; i++) {
@@ -84,11 +106,22 @@ export function computeSessionScores(entries) {
const sid = e.session || e.session_id || ""; const sid = e.session || e.session_id || "";
if (!sid) continue; if (!sid) continue;
if (!bySession.has(sid)) { if (!bySession.has(sid)) {
bySession.set(sid, { session_id: sid, negative_count: 0, task_completed_count: 0 }); bySession.set(sid, {
session_id: sid,
negative_count: 0,
task_completed_count: 0,
severity_sum: 0,
severity_by_type: {},
});
} }
const slot = bySession.get(sid); const slot = bySession.get(sid);
if (e.type === "task_completed") slot.task_completed_count++; if (e.type === "task_completed") slot.task_completed_count++;
else if (NEGATIVE_SIGNAL_TYPES.has(e.type)) slot.negative_count++; else if (NEGATIVE_SIGNAL_TYPES.has(e.type)) {
slot.negative_count++;
const sev = entrySeverity(e);
slot.severity_sum += sev;
slot.severity_by_type[e.type] = (slot.severity_by_type[e.type] || 0) + sev;
}
} }
const out = []; const out = [];
for (const slot of bySession.values()) { for (const slot of bySession.values()) {
+2
View File
@@ -29,6 +29,8 @@ export const SIGNAL_WINDOWS_DAYS = {
build_loop: 30, build_loop: 30,
weak_agent: 30, weak_agent: 30,
subagent_dispatch_pattern: 30, subagent_dispatch_pattern: 30,
silent_drift: 14,
error_after_recovery: 30,
correction_free_streak: 60, correction_free_streak: 60,
clean_recovery: 60, clean_recovery: 60,
task_completed: 60, task_completed: 60,
+99
View File
@@ -1388,6 +1388,105 @@ else
fi fi
fi fi
# --- Test 78: silent_drift fires after 5 consecutive read-only tools ---
echo "Test 78: silent_drift after 5 reads"
reset_state
for i in 1 2 3 4 5; do
echo "{\"hook_event_name\":\"PostToolUse\",\"tool_name\":\"Read\",\"tool_input\":{\"file_path\":\"/tmp/r-$i\"},\"tool_response\":{\"content\":\"ok\"},\"session_id\":\"sSD\",\"cwd\":\"/tmp/x\"}" \
| HOOK_RUN >/dev/null 2>&1 || true
done
assert_grep "$ROOT/journal.jsonl" '"type":"silent_drift"' "5 consecutive reads emit silent_drift"
assert_grep "$ROOT/journal.jsonl" '"read_count":5' "silent_drift entry records read_count"
# --- Test 79: silent_drift counter resets on action tool ---
echo "Test 79: silent_drift counter resets on action tool"
reset_state
for i in 1 2 3 4; do
echo "{\"hook_event_name\":\"PostToolUse\",\"tool_name\":\"Read\",\"tool_input\":{\"file_path\":\"/tmp/r-$i\"},\"tool_response\":{\"content\":\"ok\"},\"session_id\":\"sSDR\",\"cwd\":\"/tmp/x\"}" \
| HOOK_RUN >/dev/null 2>&1 || true
done
# Action tool — should reset
echo '{"hook_event_name":"PostToolUse","tool_name":"Edit","tool_input":{"file_path":"/tmp/x"},"tool_response":{"content":"ok"},"session_id":"sSDR","cwd":"/tmp/x"}' \
| HOOK_RUN >/dev/null 2>&1 || true
for i in 1 2 3 4; do
echo "{\"hook_event_name\":\"PostToolUse\",\"tool_name\":\"Read\",\"tool_input\":{\"file_path\":\"/tmp/rb-$i\"},\"tool_response\":{\"content\":\"ok\"},\"session_id\":\"sSDR\",\"cwd\":\"/tmp/x\"}" \
| HOOK_RUN >/dev/null 2>&1 || true
done
if grep -qE '"type":"silent_drift"' "$ROOT/journal.jsonl"; then
echo " FAIL: silent_drift fired despite action tool reset"; FAIL=$((FAIL+1))
else
echo " PASS: silent_drift suppressed by intervening action tool"; PASS=$((PASS+1))
fi
# --- Test 80: error_after_recovery fires when same fp returns post-clean_recovery ---
echo "Test 80: error_after_recovery fires when fp returns after recovery"
reset_state
# Build a tool_error_loop with ENOENT
for i in 1 2 3; do
echo '{"hook_event_name":"PostToolUse","tool_name":"Bash","tool_input":{"command":"cat missing"},"tool_response":{"is_error":true,"content":"cat: missing: No such file or directory"},"session_id":"sEAR","cwd":"/tmp/x"}' \
| HOOK_RUN >/dev/null 2>&1 || true
done
# 3 clean tools → clean_recovery
for i in 1 2 3; do
echo "{\"hook_event_name\":\"PostToolUse\",\"tool_name\":\"Read\",\"tool_input\":{\"file_path\":\"/tmp/ok-$i\"},\"tool_response\":{\"content\":\"ok\"},\"session_id\":\"sEAR\",\"cwd\":\"/tmp/x\"}" \
| HOOK_RUN >/dev/null 2>&1 || true
done
# Same fp returns within window
echo '{"hook_event_name":"PostToolUse","tool_name":"Bash","tool_input":{"command":"cat other"},"tool_response":{"is_error":true,"content":"cat: other: No such file or directory"},"session_id":"sEAR","cwd":"/tmp/x"}' \
| HOOK_RUN >/dev/null 2>&1 || true
assert_grep "$ROOT/journal.jsonl" '"type":"error_after_recovery"' "same fp after clean_recovery emits error_after_recovery"
# --- Test 81: error_after_recovery does NOT fire after window expires ---
echo "Test 81: error_after_recovery suppressed beyond window"
reset_state
for i in 1 2 3; do
echo '{"hook_event_name":"PostToolUse","tool_name":"Bash","tool_input":{"command":"cat missing"},"tool_response":{"is_error":true,"content":"cat: missing: No such file or directory"},"session_id":"sEARW","cwd":"/tmp/x"}' \
| HOOK_RUN >/dev/null 2>&1 || true
done
for i in 1 2 3; do
echo "{\"hook_event_name\":\"PostToolUse\",\"tool_name\":\"Read\",\"tool_input\":{\"file_path\":\"/tmp/ok-$i\"},\"tool_response\":{\"content\":\"ok\"},\"session_id\":\"sEARW\",\"cwd\":\"/tmp/x\"}" \
| HOOK_RUN >/dev/null 2>&1 || true
done
# UserPromptSubmit resets tools_since_user + last_errors so the burn reads don't
# trigger a secondary dead_end + clean_recovery cycle (which would create a fresh
# recovery within window and cause error_after_recovery to fire legitimately).
echo '{"hook_event_name":"UserPromptSubmit","prompt":"keep going","session_id":"sEARW","cwd":"/tmp/x"}' \
| HOOK_RUN >/dev/null 2>&1 || true
# Burn through the 5-event window with 6 clean reads (session_post_count: 6 → 12)
for i in 1 2 3 4 5 6; do
echo "{\"hook_event_name\":\"PostToolUse\",\"tool_name\":\"Read\",\"tool_input\":{\"file_path\":\"/tmp/burn-$i\"},\"tool_response\":{\"content\":\"ok\"},\"session_id\":\"sEARW\",\"cwd\":\"/tmp/x\"}" \
| HOOK_RUN >/dev/null 2>&1 || true
done
echo '{"hook_event_name":"PostToolUse","tool_name":"Bash","tool_input":{"command":"cat other"},"tool_response":{"is_error":true,"content":"cat: other: No such file or directory"},"session_id":"sEARW","cwd":"/tmp/x"}' \
| HOOK_RUN >/dev/null 2>&1 || true
if grep -qE '"type":"error_after_recovery"' "$ROOT/journal.jsonl"; then
echo " FAIL: error_after_recovery fired outside 5-event window"; FAIL=$((FAIL+1))
else
echo " PASS: error_after_recovery suppressed outside window"; PASS=$((PASS+1))
fi
# --- Test 82: adam-score.mjs reports severity_sum + severity_by_type ---
echo "Test 82: severity-sum reporting in score.mjs"
SEV_TMP="$(mktemp)"
cat > "$SEV_TMP" <<'EOF'
{"ts":"2026-05-12T10:00:00Z","session":"sSEV","type":"dead_end","count":64}
{"ts":"2026-05-12T10:01:00Z","session":"sSEV","type":"edit_churn","count":8}
{"ts":"2026-05-12T10:02:00Z","session":"sSEV","type":"tool_error_loop","count":3,"fp":"ENOENT:abc"}
EOF
out=$(SCORE_RUN --input "$SEV_TMP" 2>/dev/null)
rm -f "$SEV_TMP"
# Expected: dead_end 64/8=8, edit_churn 8/4=2, tool_error_loop 3/3=1 → sum=11
if echo "$out" | grep -q '"severity_sum":11'; then
echo " PASS: severity_sum=11 reported"; PASS=$((PASS+1))
else
echo " FAIL: severity_sum mismatch (got: $out)"; FAIL=$((FAIL+1))
fi
if echo "$out" | grep -q '"dead_end":8'; then
echo " PASS: severity_by_type.dead_end=8"; PASS=$((PASS+1))
else
echo " FAIL: severity_by_type.dead_end missing/wrong (got: $out)"; FAIL=$((FAIL+1))
fi
echo echo
echo "Results: $PASS passed, $FAIL failed" echo "Results: $PASS passed, $FAIL failed"
[ "$FAIL" = "0" ] [ "$FAIL" = "0" ]
+32 -3
View File
@@ -38,6 +38,8 @@ Per-signal windows (single source of truth: `SIGNAL_WINDOWS_DAYS` in `~/.claude/
| `build_loop` | 30 d | build/test failure patterns | | `build_loop` | 30 d | build/test failure patterns |
| `weak_agent` | 30 d | subagent quality signal | | `weak_agent` | 30 d | subagent quality signal |
| `subagent_dispatch_pattern` | 30 d | dispatch routing pattern | | `subagent_dispatch_pattern` | 30 d | dispatch routing pattern |
| `silent_drift` | 14 d | exploration-without-action is task-local |
| `error_after_recovery` | 30 d | recovery-then-same-error patterns persist |
| `correction_free_streak` | 60 d | wins accumulate slowly | | `correction_free_streak` | 60 d | wins accumulate slowly |
| `clean_recovery` | 60 d | wins accumulate slowly | | `clean_recovery` | 60 d | wins accumulate slowly |
| `task_completed` | 60 d | recipe wins accumulate slowly | | `task_completed` | 60 d | recipe wins accumulate slowly |
@@ -59,6 +61,8 @@ The hook emits these `type` values into the journal:
| `edit_churn` | same file edited 4× in window | file basename | | `edit_churn` | same file edited 4× in window | file basename |
| `build_loop` | 2 build/test/compile commands fail in session | session | | `build_loop` | 2 build/test/compile commands fail in session | session |
| `subagent_dispatch_pattern` | same subagent dispatched ≥3× cumulatively | subagent_type | | `subagent_dispatch_pattern` | same subagent dispatched ≥3× cumulatively | subagent_type |
| `silent_drift` | 5 consecutive read-only PostToolUse without an action tool (reset on action or UserPromptSubmit) | `active_skills[0]` |
| `error_after_recovery` | same error fingerprint returns within 5 PostToolUse of a `clean_recovery` | (`recovered_from`, `original_fp`) |
| `correction_free_streak` | 5 clean UserPromptSubmits in a row (no correction phrase) | `active_skills[0]` | | `correction_free_streak` | 5 clean UserPromptSubmits in a row (no correction phrase) | `active_skills[0]` |
| `clean_recovery` | 3 clean PostToolUse events after a `tool_error_loop`/`dead_end`/`retry_loop` | (`recovered_from`, `active_skills[0]`) | | `clean_recovery` | 3 clean PostToolUse events after a `tool_error_loop`/`dead_end`/`retry_loop` | (`recovered_from`, `active_skills[0]`) |
| `task_completed` | UserPromptSubmit closes a run of ≥5 tool calls with ≥3 distinct tool kinds and 0 corrections | sorted `tool_kinds` tuple | | `task_completed` | UserPromptSubmit closes a run of ≥5 tool calls with ≥3 distinct tool kinds and 0 corrections | sorted `tool_kinds` tuple |
@@ -84,10 +88,17 @@ The hook emits these `type` values into the journal:
- `edit_churn`: cluster by file basename pattern (e.g. `*.test.ts`). - `edit_churn`: cluster by file basename pattern (e.g. `*.test.ts`).
- `build_loop`: cluster by `session`. - `build_loop`: cluster by `session`.
- `subagent_dispatch_pattern`: cluster by `subagent_type`. - `subagent_dispatch_pattern`: cluster by `subagent_type`.
- `silent_drift`: cluster by `active_skills[0]` (empty string when no skill is active).
- `error_after_recovery`: cluster by (`recovered_from`, `original_fp`).
- `correction_free_streak`: cluster by `active_skills[0]`. Treat ≥3 streaks across ≥2 sessions naming the same skill as cross-session evidence. - `correction_free_streak`: cluster by `active_skills[0]`. Treat ≥3 streaks across ≥2 sessions naming the same skill as cross-session evidence.
- `clean_recovery`: cluster by (`recovered_from`, `active_skills[0]`). A win cluster qualifies for `skill_edit` only when the named skill exists in `skills_root`. - `clean_recovery`: cluster by (`recovered_from`, `active_skills[0]`). A win cluster qualifies for `skill_edit` only when the named skill exists in `skills_root`.
- `task_completed`: cluster by sorted `tool_kinds` tuple (the multi-tool recipe). Single entry qualifies for `skill_new` proposal (drafting protocol applies). Cross-session evidence requires ≥2 entries from distinct sessions with same tuple — without it, proposal queues, never auto-applies. Run the existing skill-overlap rule before drafting: if the recipe matches an existing skill's name/description tokens, route to `skill_edit` instead. - `task_completed`: cluster by sorted `tool_kinds` tuple (the multi-tool recipe). Single entry qualifies for `skill_new` proposal (drafting protocol applies). Cross-session evidence requires ≥2 entries from distinct sessions with same tuple — without it, proposal queues, never auto-applies. Run the existing skill-overlap rule before drafting: if the recipe matches an existing skill's name/description tokens, route to `skill_edit` instead.
5. **Multi-axis correlation**: for each session that produced ≥2 distinct struggle types (`tool_error_loop`, `dead_end`, `weak_agent`, `retry_loop`, `edit_churn`, `build_loop`), tag clusters from that session as `multi_axis: true`. This grants +1 confidence at scoring. 5. **Multi-axis correlation**: for each session that produced ≥2 distinct struggle types (`tool_error_loop`, `dead_end`, `weak_agent`, `retry_loop`, `edit_churn`, `build_loop`, `silent_drift`, `error_after_recovery`), tag clusters from that session as `multi_axis: true`. This grants +1 confidence at scoring.
5b. **Skill-attribution sub-clustering**: after primary clustering (step 4), for every struggle cluster (`tool_error_loop`, `dead_end`, `weak_agent`, `retry_loop`, `edit_churn`, `build_loop`, `silent_drift`, `error_after_recovery`) that contains entries with non-empty `active_skills[0]`:
- Split into per-skill sub-clusters keyed on `active_skills[0]`. Entries with empty `active_skills` stay in the original cluster.
- If a sub-cluster has ≥3 entries AND names a skill that exists in `skills_root`, mark it as a candidate for `skill_edit` (struggle-driven variant; see "Struggle-driven `skill_edit` eligibility"). Otherwise treat the parent cluster normally.
- The umbrella cluster (cross-skill) still emits its usual proposal type (memory, etc.) — sub-clusters do NOT replace it, they supplement it.
6. For each cluster qualifying under the rubric — ≥3 occurrences across ≥2 sessions, OR (for struggle types) ≥1 entry within a single session, OR (for `correction`) ≥3 occurrences across ≥2 cwds: 6. For each cluster qualifying under the rubric — ≥3 occurrences across ≥2 sessions, OR (for struggle types) ≥1 entry within a single session, OR (for `correction`) ≥3 occurrences across ≥2 cwds:
a. If cluster topic matches a rejected idea via the rejected-ideas fuzzy set (≥2 token overlap with rejection's `# Why`), skip with reason `"rejected-similar"`. a. If cluster topic matches a rejected idea via the rejected-ideas fuzzy set (≥2 token overlap with rejection's `# Why`), skip with reason `"rejected-similar"`.
b. Pull ~20 messages of transcript context from `transcripts_root` to enrich. Never read full transcripts. b. Pull ~20 messages of transcript context from `transcripts_root` to enrich. Never read full transcripts.
@@ -254,6 +265,21 @@ A `skill_edit` proposal sets `auto_apply_eligible: true` ONLY when ALL hold:
If any of (3)(9) fails: still emit the proposal, but `auto_apply_eligible: false` — main thread queues for review. If any of (3)(9) fails: still emit the proposal, but `auto_apply_eligible: false` — main thread queues for review.
## Struggle-driven `skill_edit` eligibility
Skill-attribution sub-clustering (step 5b) produces struggle-driven `skill_edit` candidates: a sub-cluster of ≥3 struggle entries all naming the same `active_skills[0]` that exists in `skills_root`. These proposals are emitted but **ALWAYS queue**`auto_apply_eligible: false` regardless of confidence. Negative evidence on a skill is a weaker basis for self-modification than positive evidence (the skill may be active during friction caused by something else), so the human reviews every one.
A struggle-driven `skill_edit` proposal MUST:
1. Set `target` to the matched skill's `SKILL.md` path.
2. Cluster severity-sum ≥ 10 (same threshold as the +1 rubric bullet).
3. Sub-cluster names exactly one skill (no ambiguity across distinct `active_skills[0]` values).
4. `# Proposed change` is an append-only diff adding a `## When struggling` section (naive default body: a checkpoint-or-pause rule appropriate to the dominant signal — e.g. `dead_end` → "After 16 PostToolUse events without UserPromptSubmit, emit a one-line checkpoint summary before continuing.").
5. Frontmatter includes `struggle_evidence: "<ts of one source entry naming this skill>"` and `struggle_signals: [<list of signal types in the sub-cluster>]`. The win-driven `win_evidence` field is omitted.
6. Subject to the same Per-(skill, fingerprint) cooldown as win-driven `skill_edit`.
If gate (2) or (3) fails: skip the sub-cluster (the parent cluster still produces its umbrella proposal). The sub-cluster's `source_entries` overlap with the parent's — the apply pipeline handles dedup via the excluded-timestamps set.
## Per-(skill, fingerprint) cooldown ## Per-(skill, fingerprint) cooldown
The cooldown gate is keyed on **(target_skill, proposal_fingerprint)** — not on target_skill alone. A rejected/applied proposal for skill `X` with fingerprint `A` does NOT block future proposals for skill `X` with fingerprint `B`. The cooldown gate is keyed on **(target_skill, proposal_fingerprint)** — not on target_skill alone. A rejected/applied proposal for skill `X` with fingerprint `A` does NOT block future proposals for skill `X` with fingerprint `B`.
@@ -307,9 +333,12 @@ The clustering trace summary (see §"Clustering trace") adds an extra `regressio
Sum: Sum:
- Signal repeated ≥3× across ≥2 sessions: **+2** - Signal repeated ≥3× across ≥2 sessions: **+2**
- Struggle signal (`tool_error_loop`, `dead_end`, `weak_agent`, `retry_loop`, `edit_churn`, `build_loop`) appearing ≥1× within a single session: **+2** *(each struggle entry already represents a hook-side threshold crossing — e.g. 8 tools without a prompt, 3 same-args retries, 4 edits to one file. Treat each entry as one piece of evidence. Does not stack with the cross-session bonus.)* - Struggle signal (`tool_error_loop`, `dead_end`, `weak_agent`, `retry_loop`, `edit_churn`, `build_loop`, `silent_drift`, `error_after_recovery`) appearing ≥1× within a single session: **+2** *(each struggle entry already represents a hook-side threshold crossing — e.g. 8 tools without a prompt, 3 same-args retries, 4 edits to one file, 5 read-only tools in a row, same-fp error after a recovery. Treat each entry as one piece of evidence. Does not stack with the cross-session bonus.)*
- Transcript contains positive endorsement (`yes`, `exactly`, `do that`, `keep doing`) within 2 messages of related action: **+2** - Transcript contains positive endorsement (`yes`, `exactly`, `do that`, `keep doing`) within 2 messages of related action: **+2**
- Multi-axis cluster (≥2 distinct struggle types in same session): **+1** - Multi-axis cluster (≥2 distinct struggle types in same session): **+1**
- Cluster severity-sum ≥ 10 (severity per entry = `max(1, floor(count / divisor))` using `SEVERITY_DIVISORS` from `adam-score.mjs``dead_end:8, edit_churn:4, tool_error_loop:3, retry_loop:3, weak_agent:2, build_loop:1`; entries without `count` count as 1): **+1**
- Cluster severity-sum ≥ 32: **+1** *(additive — a severity-sum of 32 gets +1 from the previous bullet AND +1 here, total +2.)*
- Skill-attributed sub-cluster (≥3 entries naming the same `active_skills[0]` that exists in `skills_root`): **+1**
- Type-bias penalty from feedback loop (≥3 rejections, applied:rejected ratio <1:2 for this `type`): **-1** - Type-bias penalty from feedback loop (≥3 rejections, applied:rejected ratio <1:2 for this `type`): **-1**
- Diagnosis flags `Mismatch: unclear` (causation could not be reconstructed from transcript context): **-1** - Diagnosis flags `Mismatch: unclear` (causation could not be reconstructed from transcript context): **-1**
- Blast radius: low **+1**, medium **0**, high **-1** (default per type — see Proposal types table) - Blast radius: low **+1**, medium **0**, high **-1** (default per type — see Proposal types table)
@@ -328,7 +357,7 @@ Sum:
|---|---|---|---| |---|---|---|---|
| `memory` | `~/.claude/projects/-Users-nvm/memory/*.md` | low | yes if conf≥4 AND cross_session | | `memory` | `~/.claude/projects/-Users-nvm/memory/*.md` | low | yes if conf≥4 AND cross_session |
| `skill_new` | new dir under `~/.claude/skills/` | low | yes if conf≥4 AND cross_session | | `skill_new` | new dir under `~/.claude/skills/` | low | yes if conf≥4 AND cross_session |
| `skill_edit` | existing skill file | medium | yes if win-evidence + LOC + cooldown gates all pass (see "Win-driven skill_edit eligibility") | | `skill_edit` | existing skill file | medium | yes (win-driven only) if win-evidence + LOC + cooldown gates all pass (see "Win-driven skill_edit eligibility"); struggle-driven variant ALWAYS queues (see "Struggle-driven skill_edit eligibility") |
| `nudge` | append to `~/.claude/adam/active-nudges.json` | low | yes when `dead_end_count ≥ 3` in a single session (single-session evidence sufficient; skips cross-session gate). Does NOT modify skills/memories/CLAUDE.md — only seeds a SessionStart reminder for a future session. | | `nudge` | append to `~/.claude/adam/active-nudges.json` | low | yes when `dead_end_count ≥ 3` in a single session (single-session evidence sufficient; skips cross-session gate). Does NOT modify skills/memories/CLAUDE.md — only seeds a SessionStart reminder for a future session. |
| `reinforcement` | append entry to `~/.claude/adam/reinforcements.jsonl` | low | yes if conf≥4 AND blast_radius=low (same gate as memory). Applies via `adam-apply-reinforcement.mjs`; appends one JSONL entry, no code/memory/skill changes. | | `reinforcement` | append entry to `~/.claude/adam/reinforcements.jsonl` | low | yes if conf≥4 AND blast_radius=low (same gate as memory). Applies via `adam-apply-reinforcement.mjs`; appends one JSONL entry, no code/memory/skill changes. |
| `agent_new` | new file under `~/.claude/agents/` | medium | no | | `agent_new` | new file under `~/.claude/agents/` | medium | no |
+13
View File
@@ -0,0 +1,13 @@
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 160 160" width="160" height="160" role="img" aria-label="claude-adam logo">
<title>claude-adam</title>
<desc>A swaddled baby — rounded A-shape bundle with a face inside and small hands extending from the wrap-band. Dark-background variant.</desc>
<g stroke="#f0f6fc">
<path d="M 36 134 Q 30 78 80 28 Q 130 78 124 134 Z" fill="none" stroke-width="9" stroke-linejoin="round"/>
<path d="M 16 100 L 44 100 Q 80 115 116 100 L 144 100" fill="none" stroke-width="6" stroke-linecap="round"/>
<path d="M 75 78 Q 80 82 85 78" fill="none" stroke-width="2.5" stroke-linecap="round"/>
</g>
<g fill="#f0f6fc">
<circle cx="72" cy="64" r="3.2"/>
<circle cx="88" cy="64" r="3.2"/>
</g>
</svg>

After

Width:  |  Height:  |  Size: 763 B

+13
View File
@@ -0,0 +1,13 @@
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 160 160" width="160" height="160" role="img" aria-label="claude-adam logo">
<title>claude-adam</title>
<desc>A swaddled baby — rounded A-shape bundle with a face inside and small hands extending from the wrap-band. Light-background variant.</desc>
<g stroke="#24292f">
<path d="M 36 134 Q 30 78 80 28 Q 130 78 124 134 Z" fill="none" stroke-width="9" stroke-linejoin="round"/>
<path d="M 16 100 L 44 100 Q 80 115 116 100 L 144 100" fill="none" stroke-width="6" stroke-linecap="round"/>
<path d="M 75 78 Q 80 82 85 78" fill="none" stroke-width="2.5" stroke-linecap="round"/>
</g>
<g fill="#24292f">
<circle cx="72" cy="64" r="3.2"/>
<circle cx="88" cy="64" r="3.2"/>
</g>
</svg>

After

Width:  |  Height:  |  Size: 764 B

+19
View File
@@ -0,0 +1,19 @@
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 160 160" width="160" height="160" role="img" aria-label="claude-adam logo">
<title>claude-adam</title>
<desc>A swaddled baby — rounded A-shape bundle with a face inside and small hands extending from the wrap-band. Adapts to light/dark via embedded media query + currentColor fallback.</desc>
<style>
svg { color: #24292f; }
@media (prefers-color-scheme: dark) {
svg { color: #f0f6fc; }
}
</style>
<g stroke="currentColor">
<path d="M 36 134 Q 30 78 80 28 Q 130 78 124 134 Z" fill="none" stroke-width="9" stroke-linejoin="round"/>
<path d="M 16 100 L 44 100 Q 80 115 116 100 L 144 100" fill="none" stroke-width="6" stroke-linecap="round"/>
<path d="M 75 78 Q 80 82 85 78" fill="none" stroke-width="2.5" stroke-linecap="round"/>
</g>
<g fill="currentColor">
<circle cx="72" cy="64" r="3.2"/>
<circle cx="88" cy="64" r="3.2"/>
</g>
</svg>

After

Width:  |  Height:  |  Size: 946 B

+59 -1
View File
@@ -87,6 +87,12 @@ function normalizeErrorText(text) {
const ERROR_RE = /\b(error|failed|exception|traceback|denied|cannot|unable to|not found|undefined|nullpointer|typeerror|syntaxerror|panic|fatal|enoent|econnrefused|etimedout|eaccess|segfault|crashed|uncaught)\b/i; const ERROR_RE = /\b(error|failed|exception|traceback|denied|cannot|unable to|not found|undefined|nullpointer|typeerror|syntaxerror|panic|fatal|enoent|econnrefused|etimedout|eaccess|segfault|crashed|uncaught)\b/i;
const BUILD_RE = /\b(build|compile|make|gradle|cargo|tsc|webpack|vite|rollup|pytest|jest|mocha|vitest|go\s+test|npm\s+test|yarn\s+test|npm\s+run\s+build|yarn\s+build|ctest|ninja|bazel)\b/i; const BUILD_RE = /\b(build|compile|make|gradle|cargo|tsc|webpack|vite|rollup|pytest|jest|mocha|vitest|go\s+test|npm\s+test|yarn\s+test|npm\s+run\s+build|yarn\s+build|ctest|ninja|bazel)\b/i;
const EDIT_TOOLS = new Set(["Edit", "Write", "MultiEdit", "NotebookEdit"]); const EDIT_TOOLS = new Set(["Edit", "Write", "MultiEdit", "NotebookEdit"]);
const READ_ONLY_TOOLS = new Set([
"Read", "Grep", "Glob", "ToolSearch", "WebFetch", "WebSearch",
"mcp__filepuff__file_read", "mcp__filepuff__file_search",
"mcp__filepuff__find_definition", "mcp__filepuff__find_references",
"mcp__filepuff__ast_query", "mcp__filepuff__symbol_at", "mcp__filepuff__ping",
]);
const WINDOW_SIZE = 10; const WINDOW_SIZE = 10;
const RETRY_THRESHOLD = 3; const RETRY_THRESHOLD = 3;
const AGENT_RESPAWN_THRESHOLD = 2; const AGENT_RESPAWN_THRESHOLD = 2;
@@ -98,6 +104,9 @@ const BUILD_LOOP_THRESHOLD = 2;
const SUBAGENT_DISPATCH_THRESHOLD = 3; const SUBAGENT_DISPATCH_THRESHOLD = 3;
const CORRECTION_FREE_THRESHOLD = 5; const CORRECTION_FREE_THRESHOLD = 5;
const CLEAN_RECOVERY_WINDOW = 3; const CLEAN_RECOVERY_WINDOW = 3;
const SILENT_DRIFT_THRESHOLD = 5;
const ERROR_AFTER_RECOVERY_WINDOW = 5;
const RECENT_RECOVERIES_MAX = 3;
const STRUGGLE_TYPES = new Set(["tool_error_loop", "dead_end", "retry_loop"]); const STRUGGLE_TYPES = new Set(["tool_error_loop", "dead_end", "retry_loop"]);
const ACTIVE_SKILLS_LOOKBACK = 10; const ACTIVE_SKILLS_LOOKBACK = 10;
const TASK_TOOL_MIN = 5; const TASK_TOOL_MIN = 5;
@@ -268,6 +277,8 @@ function resetFrictionCounters(state) {
state.edit_churn_emitted = {}; state.edit_churn_emitted = {};
state.build_failure_count = 0; state.build_failure_count = 0;
state.build_loop_emitted = false; state.build_loop_emitted = false;
state.silentDriftCounter = 0;
state.silentDriftEmitted = false;
} }
function resetSessionLocal(state) { function resetSessionLocal(state) {
@@ -276,6 +287,8 @@ function resetSessionLocal(state) {
state.subagent_dispatch_emitted = {}; state.subagent_dispatch_emitted = {};
state.correctionFreeCounter = 0; state.correctionFreeCounter = 0;
state.recoveryWatch = null; state.recoveryWatch = null;
state.recentRecoveries = [];
state.session_post_count = 0;
state.tool_window = []; state.tool_window = [];
state.task_tool_kinds = {}; state.task_tool_kinds = {};
state.task_tool_count = 0; state.task_tool_count = 0;
@@ -299,6 +312,10 @@ function ensureStateDefaults(state) {
if (!state.task_tool_kinds || typeof state.task_tool_kinds !== "object") state.task_tool_kinds = {}; if (!state.task_tool_kinds || typeof state.task_tool_kinds !== "object") state.task_tool_kinds = {};
if (typeof state.task_tool_count !== "number") state.task_tool_count = 0; if (typeof state.task_tool_count !== "number") state.task_tool_count = 0;
if (typeof state.task_corrections !== "number") state.task_corrections = 0; if (typeof state.task_corrections !== "number") state.task_corrections = 0;
if (typeof state.silentDriftCounter !== "number") state.silentDriftCounter = 0;
if (typeof state.silentDriftEmitted !== "boolean") state.silentDriftEmitted = false;
if (!Array.isArray(state.recentRecoveries)) state.recentRecoveries = [];
if (typeof state.session_post_count !== "number") state.session_post_count = 0;
} }
function main() { function main() {
@@ -402,12 +419,24 @@ function main() {
} }
state.tool_window.push(windowEntry); state.tool_window.push(windowEntry);
if (state.tool_window.length > WINDOW_SIZE) state.tool_window.shift(); if (state.tool_window.length > WINDOW_SIZE) state.tool_window.shift();
state.session_post_count += 1;
const sameToolArgs = state.tool_window.filter(e => e.tool === tool && e.argsHash === argsHash).length; const sameToolArgs = state.tool_window.filter(e => e.tool === tool && e.argsHash === argsHash).length;
if (sameToolArgs >= RETRY_THRESHOLD) { if (sameToolArgs >= RETRY_THRESHOLD) {
emit({ ts, session, cwd, type: "retry_loop", tool, count: sameToolArgs }); emit({ ts, session, cwd, type: "retry_loop", tool, count: sameToolArgs });
} }
if (READ_ONLY_TOOLS.has(tool)) {
state.silentDriftCounter += 1;
if (state.silentDriftCounter >= SILENT_DRIFT_THRESHOLD && !state.silentDriftEmitted) {
emit({ ts, session, cwd, type: "silent_drift", read_count: state.silentDriftCounter, last_tool: tool });
state.silentDriftEmitted = true;
}
} else {
state.silentDriftCounter = 0;
state.silentDriftEmitted = false;
}
if (tool === "Agent") { if (tool === "Agent") {
const subagent = (input.tool_input && (input.tool_input.subagent_type || input.tool_input.agent)) || "unknown"; const subagent = (input.tool_input && (input.tool_input.subagent_type || input.tool_input.agent)) || "unknown";
const recent = state.tool_window.slice(-5).filter(e => e.tool === "Agent" && e.subagent === subagent).length; const recent = state.tool_window.slice(-5).filter(e => e.tool === "Agent" && e.subagent === subagent).length;
@@ -423,6 +452,23 @@ function main() {
const fp = errorFingerprint(input.tool_response); const fp = errorFingerprint(input.tool_response);
if (fp) { if (fp) {
bumpUsage("payload:tool_response_error_seen"); bumpUsage("payload:tool_response_error_seen");
if (state.recentRecoveries.length) {
const keep = [];
for (const rec of state.recentRecoveries) {
const tools_since = state.session_post_count - rec.emitted_at_count;
if (tools_since > ERROR_AFTER_RECOVERY_WINDOW) continue;
if (Array.isArray(rec.fps) && rec.fps.includes(fp)) {
emit({
ts, session, cwd, type: "error_after_recovery",
recovered_from: rec.recovered_from, original_fp: fp,
tools_since_recovery: tools_since,
});
continue;
}
keep.push(rec);
}
state.recentRecoveries = keep;
}
state.last_errors.push({ tool, fp }); state.last_errors.push({ tool, fp });
if (state.last_errors.length > ERROR_RING_SIZE) state.last_errors.shift(); if (state.last_errors.length > ERROR_RING_SIZE) state.last_errors.shift();
const sameError = state.last_errors.filter(e => e.fp === fp).length; const sameError = state.last_errors.filter(e => e.fp === fp).length;
@@ -468,7 +514,13 @@ function main() {
state.task_tool_kinds[tool] = (state.task_tool_kinds[tool] || 0) + 1; state.task_tool_kinds[tool] = (state.task_tool_kinds[tool] || 0) + 1;
if (struggleEmittedThisTurn) { if (struggleEmittedThisTurn) {
state.recoveryWatch = { recovered_from: struggleEmittedThisTurn, since_ts: ts, clean_count: 0, window_tools: [] }; state.recoveryWatch = {
recovered_from: struggleEmittedThisTurn,
since_ts: ts,
clean_count: 0,
window_tools: [],
watched_fps: state.last_errors.map(e => e.fp),
};
} else if (state.recoveryWatch) { } else if (state.recoveryWatch) {
const turnHadError = fp !== null; const turnHadError = fp !== null;
if (turnHadError) { if (turnHadError) {
@@ -485,6 +537,12 @@ function main() {
active_skills: activeNames(state, "skill"), active_skills: activeNames(state, "skill"),
active_agents: activeNames(state, "agent"), active_agents: activeNames(state, "agent"),
}); });
state.recentRecoveries.push({
recovered_from: state.recoveryWatch.recovered_from,
fps: state.recoveryWatch.watched_fps || [],
emitted_at_count: state.session_post_count,
});
if (state.recentRecoveries.length > RECENT_RECOVERIES_MAX) state.recentRecoveries.shift();
state.recoveryWatch = null; state.recoveryWatch = null;
} }
} }