4 Commits

Author SHA1 Message Date
lukaszraczylo 780401e96a feat: causal diagnosis step on every proposal (v0.3.0)
Closes the gap between categorical signal capture (we saw 3 retries) and
causal proposal drafting (here is why and what to do). Mirrors the NL trace
reflection step Hermes Agent uses before mutating prompts.

Adds # Diagnosis section to every proposal body — four labelled lines:
- Trigger: what the user wanted / context
- Action:  what the assistant did
- Mismatch: how the action diverged
- Outcome: surfacing event with >=1 verbatim transcript quote

Constraints:
- <=5 LOC of prose total
- >=1 backtick-wrapped quote <=80 chars from transcript context window
- Cannot speculate; "Mismatch: unclear" is allowed but takes -1 confidence
- Win clusters use "Mismatch: None" with recovery quote in Outcome

Skill enforces structure at apply time (presence + 4 labelled lines + quote)
for both auto-apply and walk-the-queue paths. No semantic check — humans
judge causal correctness during walk-the-queue.

Adds optional frontmatter field `diagnosis_summary` (<=120 chars from the
Mismatch line) so applied/ and rejected/ are searchable by causal pattern.

New rubric penalty: -1 confidence when Diagnosis flags Mismatch: unclear.
Stops weak-causation proposals from auto-applying (drops below conf>=4).

No hook changes. All 27 tests still pass.

Spec: ~/.claude/docs/superpowers/specs/2026-05-10-adam-causal-diagnosis-design.md
2026-05-10 21:02:36 +01:00
lukaszraczylo 2dc76bf203 feat: lessons-learned loop — win signals + skill_edit auto-apply
Adds two new hook signal types:
- correction_free_streak: 5 consecutive UserPromptSubmits without a correction phrase
- clean_recovery: 3 clean PostToolUse events after a struggle signal
  (tool_error_loop / dead_end / retry_loop)

Both carry active_skills/active_agents payloads computed from a 10-event
activity ring, so ADAM can attribute wins to whichever skill was active
during the streak/recovery.

Promotes skill_edit to auto-apply under a strict gate (all required):
- conf >= 4 + cross-session evidence (existing rules)
- # Why cites a win-signal entry whose active_skills includes target
- diff append-only, +lines <= 30
- resulting SKILL.md size <= 2x current size
- 7-day cooldown per target (last_auto_edit in applied/ frontmatter)
- 30-day blacklist on user rejection (auto_apply_blacklist in rejected/)

Skill enforces the gate at apply time as defense in depth: re-stats target,
re-checks cooldown and blacklist, verifies append-only, reverts and refuses
on byte-cap breach. User-rejected skill_edit proposals automatically write
auto_apply_blacklist: true.

Win signals participate in the existing v0.2.0 source_entries archive
lifecycle, so already-applied evidence does not re-cluster.

Test suite: +5 cases (5 new asserts pass), 27 total passing.

Spec:  ~/.claude/docs/superpowers/specs/2026-05-10-adam-proactive-design.md
Plan:  ~/.claude/docs/superpowers/plans/2026-05-10-adam-proactive.md
2026-05-10 20:51:12 +01:00
lukaszraczylo 7962e85578 v0.2.0: drop cursor, add source_entries lifecycle, mandate memory frontmatter
Lifecycle redesign:
- Each proposal records source_entries: [<ts>...] in frontmatter listing
  the journal timestamps that fed its cluster.
- After apply/reject, skill calls adam/scripts/adam-archive.mjs which moves
  matching entries from journal.jsonl to journal/actioned-<id>.jsonl.
- Agent reads applied/ + rejected/ frontmatter on each /reflect, builds an
  excluded-timestamps set, skips any leftover already-actioned entries.
- cursor field in state.json is vestigial; agent ignores it.

Effect: journal stays bounded by active observations. Rule changes
re-evaluate the remainder without manual rewind. Race-safer for parallel
sessions on shared state.json (no cursor write contention).

Memory drafting:
- agents/adam.md adds 'Memory drafting protocol' parallel to Skill drafting.
- Memory proposals MUST contain auto-memory frontmatter (name, description,
  type, originSessionId) in '# Proposed change' body.
- Skill enforces frontmatter check at apply time; refuses if missing.

Tests: 18 -> 21. Two new tests for adam-archive happy path + no-op.

Migration: existing applied proposals lack source_entries. Their backing
journal entries archived as a one-time bulk migration; legacy proposals
annotated with migration note.
2026-05-10 04:29:49 +01:00
lukaszraczylo 2b91db6bf3 rubric: lower single-session struggle threshold to >=1 entry
The hook emits struggle signals only after crossing internal thresholds
(3 retries, 8 tools no-prompt, 4 edits to one file, 2 build failures, etc.).
Each journal entry is therefore meaningful evidence on its own. Old rule
required >=3 entries within single session, which the once-per-thing
emission design rarely produces. New rule: >=1 struggle entry qualifies
for proposal at +2 weight (cross-session bonus does not stack).

Auto-apply still requires cross_session_evidence; single-session-only
proposals always queue for review.
2026-05-10 03:08:02 +01:00
7 changed files with 502 additions and 44 deletions
+15 -6
View File
@@ -36,15 +36,16 @@ LLM coding sessions reveal repeated friction the moment you stop and look. ADAM
├── skills/adam-self-improvement/SKILL.md # /reflect protocol ├── skills/adam-self-improvement/SKILL.md # /reflect protocol
├── commands/reflect.md # /reflect slash command ├── commands/reflect.md # /reflect slash command
└── adam/ └── adam/
├── journal.jsonl # append-only signal log ├── journal.jsonl # append-only signal log (active observations)
├── journal/ # rotated daily logs (>5 MB threshold) ├── journal/ # rotated daily logs + actioned-<id>.jsonl per applied/rejected proposal
├── state.json # cursor + per-session counters ├── state.json # per-session counters (cursor field is vestigial as of v0.2.0)
├── usage.json # skill/agent invocation tallies ├── usage.json # skill/agent invocation tallies + payload visibility counters
├── proposals/ # queued, awaiting review ├── proposals/ # queued, awaiting review
├── applied/ # approved + auto-applied archive ├── applied/ # approved + auto-applied archive
├── rejected/ # rejected (with reason) ├── rejected/ # rejected (with reason)
├── trash/ # soft-deleted artifacts (recoverable) ├── trash/ # soft-deleted artifacts (recoverable)
── tests/run-tests.sh # 18 verification tests ── scripts/ # adam-archive.mjs (called by skill on apply/reject)
└── tests/run-tests.sh # 21 verification tests
``` ```
## Install ## Install
@@ -71,7 +72,7 @@ After install:
``` ```
Sum: Sum:
+2 Signal repeated ≥3× across ≥2 sessions +2 Signal repeated ≥3× across ≥2 sessions
+2 Struggle signal repeated3× within a single session (does not stack with above) +2 Struggle signal appearing1× within a single session (does not stack)
+2 Transcript contains positive endorsement near related action +2 Transcript contains positive endorsement near related action
+1 Multi-axis cluster (≥2 distinct struggle types in same session) +1 Multi-axis cluster (≥2 distinct struggle types in same session)
-1 Type-bias penalty (≥3 rejections, applied:rejected <1:2) -1 Type-bias penalty (≥3 rejections, applied:rejected <1:2)
@@ -88,6 +89,14 @@ auto_apply_eligible requires ALL:
cross_session_evidence == true (single-session-only proposals always queue) cross_session_evidence == true (single-session-only proposals always queue)
``` ```
## Lifecycle: how proposals become permanent
Every proposal records the journal entry timestamps that fed its cluster (`source_entries` in frontmatter). When you apply or reject a proposal, the skill calls `adam/scripts/adam-archive.mjs` which moves matching entries from `journal.jsonl` to `journal/actioned-<id>.jsonl`. Effects:
- The `journal.jsonl` stays bounded by **active** observations only.
- The next `/reflect` reads applied/ + rejected/ frontmatter, builds an excluded-timestamps set, and skips any leftover journal entries that were already actioned.
- Rule changes (e.g. lowering a threshold) immediately re-evaluate the remaining active observations — no manual cursor rewind needed.
## What it will not do ## What it will not do
- No background LLM spend. The analyst runs only when you invoke `/reflect`. - No background LLM spend. The analyst runs only when you invoke `/reflect`.
+117
View File
@@ -0,0 +1,117 @@
#!/usr/bin/env node
// Usage: adam-archive.mjs <proposal-path>
// Reads `source_entries` from proposal frontmatter, moves matching journal
// entries from journal.jsonl to journal/actioned-<id>.jsonl. Used by the
// adam-self-improvement skill after each apply/reject so subsequent /reflect
// runs do not re-cluster already-actioned signals.
import { readFileSync, writeFileSync, appendFileSync, mkdirSync, existsSync } from "node:fs";
import { join } from "node:path";
import { homedir } from "node:os";
const ROOT = join(homedir(), ".claude", "adam");
const JOURNAL = join(ROOT, "journal.jsonl");
const JOURNAL_DIR = join(ROOT, "journal");
function parseFrontmatter(content) {
const m = content.match(/^---\n([\s\S]*?)\n---/);
if (!m) return {};
const fm = {};
const lines = m[1].split("\n");
let i = 0;
while (i < lines.length) {
const line = lines[i];
const idx = line.indexOf(":");
if (idx === -1) { i++; continue; }
const key = line.slice(0, idx).trim();
const value = line.slice(idx + 1).trim();
if (key === "source_entries") {
const arr = [];
if (value.startsWith("[") && value.endsWith("]")) {
const inner = value.slice(1, -1)
.split(",")
.map(s => s.trim().replace(/^['"]|['"]$/g, ""));
arr.push(...inner.filter(Boolean));
fm[key] = arr;
i++;
continue;
}
i++;
while (i < lines.length && /^\s*-\s+/.test(lines[i])) {
const item = lines[i].replace(/^\s*-\s+/, "").trim().replace(/^['"]|['"]$/g, "");
if (item) arr.push(item);
i++;
}
fm[key] = arr;
continue;
}
fm[key] = value;
i++;
}
return fm;
}
function main() {
const proposalPath = process.argv[2];
if (!proposalPath) {
console.error("usage: adam-archive.mjs <proposal-path>");
process.exit(2);
}
let proposal;
try {
proposal = readFileSync(proposalPath, "utf8");
} catch (e) {
console.error(`cannot read ${proposalPath}: ${e.message}`);
process.exit(1);
}
const fm = parseFrontmatter(proposal);
const id = fm.id || "unknown";
const sourceEntries = Array.isArray(fm.source_entries) ? fm.source_entries : [];
if (sourceEntries.length === 0) {
console.log(`${id}: no source_entries in frontmatter — nothing to archive`);
return;
}
if (!existsSync(JOURNAL)) {
console.log(`${id}: journal does not exist at ${JOURNAL}`);
return;
}
const lines = readFileSync(JOURNAL, "utf8").split("\n").filter(Boolean);
const tsSet = new Set(sourceEntries);
const matched = [];
const remaining = [];
for (const line of lines) {
try {
const e = JSON.parse(line);
if (e.ts && tsSet.has(e.ts)) {
matched.push(line);
} else {
remaining.push(line);
}
} catch {
remaining.push(line);
}
}
if (matched.length === 0) {
console.log(`${id}: no matching entries in journal (already archived?)`);
return;
}
mkdirSync(JOURNAL_DIR, { recursive: true });
const archivePath = join(JOURNAL_DIR, `actioned-${id}.jsonl`);
appendFileSync(archivePath, matched.join("\n") + "\n");
writeFileSync(JOURNAL, remaining.length ? remaining.join("\n") + "\n" : "");
console.log(`${id}: archived ${matched.length}/${lines.length} entries → ${archivePath}`);
}
try { main(); } catch (e) {
console.error(`error: ${e.message}`);
process.exit(1);
}
+136
View File
@@ -215,6 +215,142 @@ else
echo " PASS: build_loop correctly ignored non-build command"; PASS=$((PASS+1)) echo " PASS: build_loop correctly ignored non-build command"; PASS=$((PASS+1))
fi fi
# --- Test 17: adam-archive moves matching entries to actioned file ---
echo "Test 17: adam-archive moves matching journal entries"
ARCHIVE="$HOME/.claude/adam/scripts/adam-archive.mjs"
reset_state
rm -f "$ROOT/journal/actioned-test-archive-001.jsonl"
cat > "$ROOT/journal.jsonl" <<EOF
{"ts":"2026-01-01T00:00:00Z","session":"sX","type":"correction"}
{"ts":"2026-01-02T00:00:00Z","session":"sX","type":"correction"}
{"ts":"2026-01-03T00:00:00Z","session":"sX","type":"dead_end"}
EOF
mkdir -p /tmp/adam-test-17
cat > /tmp/adam-test-17/proposal.md <<EOF
---
id: test-archive-001
type: memory
target: /tmp/test
confidence: 5
blast_radius: low
auto_apply_eligible: false
status: applied
source_entries:
- "2026-01-01T00:00:00Z"
- "2026-01-02T00:00:00Z"
---
# Why
test
EOF
node "$ARCHIVE" /tmp/adam-test-17/proposal.md >/dev/null 2>&1 || true
remaining=$(wc -l < "$ROOT/journal.jsonl" | tr -d ' ')
archived=$(wc -l < "$ROOT/journal/actioned-test-archive-001.jsonl" 2>/dev/null | tr -d ' ' || echo 0)
if [ "$remaining" = "1" ] && [ "$archived" = "2" ]; then
echo " PASS: archive moved 2 matching, kept 1 unmatched"; PASS=$((PASS+1))
else
echo " FAIL: expected 1 remaining + 2 archived, got $remaining + $archived"; FAIL=$((FAIL+1))
fi
rm -rf /tmp/adam-test-17 "$ROOT/journal/actioned-test-archive-001.jsonl"
# --- Test 18: adam-archive no-op when source_entries missing ---
echo "Test 18: adam-archive no-op when source_entries missing"
reset_state
echo '{"ts":"2026-01-01T00:00:00Z","type":"correction"}' > "$ROOT/journal.jsonl"
mkdir -p /tmp/adam-test-18
cat > /tmp/adam-test-18/proposal.md <<EOF
---
id: test-noop-002
type: memory
---
# Why
no source_entries
EOF
node "$ARCHIVE" /tmp/adam-test-18/proposal.md >/dev/null 2>&1 || true
if [ -f "$ROOT/journal/actioned-test-noop-002.jsonl" ]; then
echo " FAIL: archive file created when no source_entries"; FAIL=$((FAIL+1))
else
echo " PASS: no archive file created"; PASS=$((PASS+1))
fi
remaining=$(wc -l < "$ROOT/journal.jsonl" | tr -d ' ')
if [ "$remaining" = "1" ]; then
echo " PASS: journal unchanged"; PASS=$((PASS+1))
else
echo " FAIL: journal modified ($remaining lines, expected 1)"; FAIL=$((FAIL+1))
fi
rm -rf /tmp/adam-test-18
# --- Test 19: correction_free_streak fires after 5 clean prompts ---
echo "Test 19: correction_free_streak after 5 clean prompts"
reset_state
for i in 1 2 3 4 5; do
echo "{\"hook_event_name\":\"UserPromptSubmit\",\"prompt\":\"please do step $i\",\"session_id\":\"sCF\",\"cwd\":\"/tmp/x\"}" \
| node "$HOOK" >/dev/null 2>&1 || true
done
assert_grep "$ROOT/journal.jsonl" '"type":"correction_free_streak"' "5 clean prompts logs correction_free_streak"
# --- Test 20: correction phrase resets streak counter ---
echo "Test 20: correction phrase breaks correction_free_streak"
reset_state
for i in 1 2 3 4; do
echo "{\"hook_event_name\":\"UserPromptSubmit\",\"prompt\":\"please do step $i\",\"session_id\":\"sCB\",\"cwd\":\"/tmp/x\"}" \
| node "$HOOK" >/dev/null 2>&1 || true
done
echo '{"hook_event_name":"UserPromptSubmit","prompt":"no, undo that","session_id":"sCB","cwd":"/tmp/x"}' \
| node "$HOOK" >/dev/null 2>&1 || true
echo '{"hook_event_name":"UserPromptSubmit","prompt":"go on","session_id":"sCB","cwd":"/tmp/x"}' \
| node "$HOOK" >/dev/null 2>&1 || true
if grep -qE '"type":"correction_free_streak"' "$ROOT/journal.jsonl"; then
echo " FAIL: correction_free_streak fired despite intervening correction"; FAIL=$((FAIL+1))
else
echo " PASS: correction phrase reset the streak counter"; PASS=$((PASS+1))
fi
# --- Test 21: clean_recovery fires after struggle + 3 clean tools ---
echo "Test 21: clean_recovery after struggle + 3 clean tools"
reset_state
for i in 1 2 3; do
echo '{"hook_event_name":"PostToolUse","tool_name":"Bash","tool_input":{"command":"foo"},"tool_response":{"is_error":true,"content":"Error: command not found: foo"},"session_id":"sR","cwd":"/tmp/x"}' \
| node "$HOOK" >/dev/null 2>&1 || true
done
for i in 1 2 3; do
echo "{\"hook_event_name\":\"PostToolUse\",\"tool_name\":\"Read\",\"tool_input\":{\"file_path\":\"/tmp/ok-$i\"},\"tool_response\":{\"content\":\"ok\"},\"session_id\":\"sR\",\"cwd\":\"/tmp/x\"}" \
| node "$HOOK" >/dev/null 2>&1 || true
done
assert_grep "$ROOT/journal.jsonl" '"type":"clean_recovery"' "3 clean tools after struggle logs clean_recovery"
assert_grep "$ROOT/journal.jsonl" '"recovered_from":"tool_error_loop"' "recovered_from set on clean_recovery"
# --- Test 22: clean_recovery resets when error breaks the streak ---
echo "Test 22: clean_recovery suppressed by intervening error"
reset_state
for i in 1 2 3; do
echo '{"hook_event_name":"PostToolUse","tool_name":"Bash","tool_input":{"command":"foo"},"tool_response":{"is_error":true,"content":"Error: command not found: foo"},"session_id":"sRE","cwd":"/tmp/x"}' \
| node "$HOOK" >/dev/null 2>&1 || true
done
for i in 1 2; do
echo "{\"hook_event_name\":\"PostToolUse\",\"tool_name\":\"Read\",\"tool_input\":{\"file_path\":\"/tmp/ok-$i\"},\"tool_response\":{\"content\":\"ok\"},\"session_id\":\"sRE\",\"cwd\":\"/tmp/x\"}" \
| node "$HOOK" >/dev/null 2>&1 || true
done
echo '{"hook_event_name":"PostToolUse","tool_name":"Bash","tool_input":{"command":"x"},"tool_response":{"is_error":true,"content":"Error: again"},"session_id":"sRE","cwd":"/tmp/x"}' \
| node "$HOOK" >/dev/null 2>&1 || true
echo '{"hook_event_name":"PostToolUse","tool_name":"Read","tool_input":{"file_path":"/tmp/ok-3"},"tool_response":{"content":"ok"},"session_id":"sRE","cwd":"/tmp/x"}' \
| node "$HOOK" >/dev/null 2>&1 || true
if grep -qE '"type":"clean_recovery"' "$ROOT/journal.jsonl"; then
echo " FAIL: clean_recovery fired despite intervening error"; FAIL=$((FAIL+1))
else
echo " PASS: clean_recovery suppressed by intervening error"; PASS=$((PASS+1))
fi
# --- Test 23: active_skills payload populated on win signals ---
echo "Test 23: correction_free_streak payload includes active skill"
reset_state
echo '{"hook_event_name":"PreToolUse","tool_name":"Skill","tool_input":{"skill":"caveman"},"session_id":"sAS","cwd":"/tmp/x"}' \
| node "$HOOK" >/dev/null 2>&1 || true
for i in 1 2 3 4 5; do
echo "{\"hook_event_name\":\"UserPromptSubmit\",\"prompt\":\"step $i\",\"session_id\":\"sAS\",\"cwd\":\"/tmp/x\"}" \
| node "$HOOK" >/dev/null 2>&1 || true
done
assert_grep "$ROOT/journal.jsonl" '"active_skills":\["caveman"\]' "active_skills payload includes invoked skill"
echo echo
echo "Results: $PASS passed, $FAIL failed" echo "Results: $PASS passed, $FAIL failed"
[ "$FAIL" = "0" ] [ "$FAIL" = "0" ]
+138 -26
View File
@@ -43,20 +43,23 @@ The hook emits these `type` values into the journal:
| `edit_churn` | same file edited 4× in window | file basename | | `edit_churn` | same file edited 4× in window | file basename |
| `build_loop` | 2 build/test/compile commands fail in session | session | | `build_loop` | 2 build/test/compile commands fail in session | session |
| `subagent_dispatch_pattern` | same subagent dispatched ≥3× cumulatively | subagent_type | | `subagent_dispatch_pattern` | same subagent dispatched ≥3× cumulatively | subagent_type |
| `correction_free_streak` | 5 clean UserPromptSubmits in a row (no correction phrase) | `active_skills[0]` |
| `clean_recovery` | 3 clean PostToolUse events after a `tool_error_loop`/`dead_end`/`retry_loop` | (`recovered_from`, `active_skills[0]`) |
## Process ## Process
1. Read `state.json``cursor` (number of journal lines already processed). 1. **Build feedback context** (run once per `/reflect`):
2. Read `journal.jsonl`. New observations = lines after `cursor`. a. List `rejected_dir/` filenames. Parse each frontmatter `source_entries` (if present), `# Why` and `# Reason` sections.
3. If 0 new lines, emit punch list `{"new":0}` and stop. b. List `applied_dir/` filenames. Parse each frontmatter `type`, `target`, `source_entries`. Tally `applied_by_type[type]`.
4. **Build feedback context** (run once per `/reflect`): c. Compute the **excluded-timestamps set**: union of all `source_entries` arrays across `applied_dir/` + `rejected_dir/`. Journal entries with these `ts` values have already been actioned and MUST NOT be re-clustered.
a. List `rejected_dir/` filenames. Parse each `# Why` and `# Reason` sections. Build a set of rejected ideas (token-tokenized for similarity matching). d. Build the **rejected-ideas set** (token-tokenized `# Why` content) for fuzzy fallback matching when a new cluster topic resembles a rejected one but doesn't share `source_entries` (handles legacy proposals without `source_entries`).
b. List `applied_dir/` filenames. Parse frontmatter `type` and `target`. Tally `applied_by_type[type]` and `applied_by_target[basename(target)]`. e. Compute **type biases**:
c. From these, compute **type biases**:
- Types with applied:rejected ratio >2:1 (over ≥3 total): neutral, no bonus. - Types with applied:rejected ratio >2:1 (over ≥3 total): neutral, no bonus.
- Types with applied:rejected ratio <1:2 (over ≥3 rejections): **-1 confidence penalty**, recorded in proposal `# Why` as "type-bias-penalty: <reason>". - Types with applied:rejected ratio <1:2 (over ≥3 rejections): **-1 confidence penalty**, recorded in proposal `# Why` as "type-bias-penalty: <reason>".
5. Cluster new observations: 2. Read `journal.jsonl`. Filter out entries whose `ts` is in the excluded-timestamps set. The result = **active observations**.
- `correction`: tokenize phrase (drop stopwords, keep content tokens). Phrases sharing ≥2 content tokens collapse into one cluster — regardless of `prev_tool` or `cwd`. Record distinct cwds in cluster (used for CLAUDE.md eligibility). 3. If 0 active observations, emit punch list `{"new":0}` and stop.
4. Cluster active observations:
- `correction`: tokenize phrase (drop stopwords, keep content tokens). Phrases sharing ≥2 content tokens collapse into one cluster — regardless of `prev_tool` or `cwd`. Record distinct cwds (used for CLAUDE.md eligibility).
- `retry_loop`: cluster by `tool`. - `retry_loop`: cluster by `tool`.
- `weak_agent`: cluster by `subagent_type`. - `weak_agent`: cluster by `subagent_type`.
- `tool_error_loop`: cluster by `fp`. - `tool_error_loop`: cluster by `fp`.
@@ -64,26 +67,29 @@ The hook emits these `type` values into the journal:
- `edit_churn`: cluster by file basename pattern (e.g. `*.test.ts`). - `edit_churn`: cluster by file basename pattern (e.g. `*.test.ts`).
- `build_loop`: cluster by `session`. - `build_loop`: cluster by `session`.
- `subagent_dispatch_pattern`: cluster by `subagent_type`. - `subagent_dispatch_pattern`: cluster by `subagent_type`.
6. **Multi-axis correlation**: for each session that produced ≥2 distinct struggle types (`tool_error_loop`, `dead_end`, `weak_agent`, `retry_loop`, `edit_churn`, `build_loop`), tag clusters from that session as `multi_axis: true`. This grants +1 confidence at scoring. - `correction_free_streak`: cluster by `active_skills[0]`. Treat ≥3 streaks across ≥2 sessions naming the same skill as cross-session evidence.
7. For each cluster qualifying under the rubric — ≥3× across ≥2 sessions, OR ≥3× within a single session for struggle types, OR (for `correction`) ≥3 occurrences across ≥2 cwds: - `clean_recovery`: cluster by (`recovered_from`, `active_skills[0]`). A win cluster qualifies for `skill_edit` only when the named skill exists in `skills_root`.
a. If cluster topic matches a rejected idea (≥2 token overlap with rejection's `# Why`), skip with reason `"rejected-similar"`. 5. **Multi-axis correlation**: for each session that produced ≥2 distinct struggle types (`tool_error_loop`, `dead_end`, `weak_agent`, `retry_loop`, `edit_churn`, `build_loop`), tag clusters from that session as `multi_axis: true`. This grants +1 confidence at scoring.
6. For each cluster qualifying under the rubric — ≥3 occurrences across ≥2 sessions, OR (for struggle types) ≥1 entry within a single session, OR (for `correction`) ≥3 occurrences across ≥2 cwds:
a. If cluster topic matches a rejected idea via the rejected-ideas fuzzy set (≥2 token overlap with rejection's `# Why`), skip with reason `"rejected-similar"`.
b. Pull ~20 messages of transcript context from `transcripts_root` to enrich. Never read full transcripts. b. Pull ~20 messages of transcript context from `transcripts_root` to enrich. Never read full transcripts.
c. **Solution synthesis** (when type would be `skill_new` AND cluster qualifies for proposal): pull additional ~30 messages of transcript window around the friction events (~50 messages total). Extract: b1. **Causal diagnosis** (required for every proposal type): from the pulled context, draft a `# Diagnosis` block per the "Diagnosis drafting protocol". Cite ≥1 verbatim transcript quote within the `source_entries` window. If causation cannot be reconstructed, write `Mismatch: unclear` and apply `-1` confidence (rubric penalty). Diagnosis writes the proposal's narrative *before* the proposal body is drafted in step 6e.
c. **Solution synthesis** (when candidate type is `skill_new` AND cluster qualifies): pull additional ~30 messages around friction events (~50 messages total). Extract:
- Concrete trigger phrases the user says verbatim. - Concrete trigger phrases the user says verbatim.
- Tools / files involved. - Tools / files involved.
- Successful resolution patterns later in transcript (positive endorsement). - Successful resolution patterns later in transcript (positive endorsement).
- Counterexamples (false-positive triggers to exclude). - Counterexamples (false-positive triggers to exclude).
d. **Skill overlap check** (skill_new candidates only): see "Skill overlap rule" below. If overlap qualifies, switch type to `skill_edit` targeting the matched SKILL.md. d. **Skill overlap check** (`skill_new` only): see "Skill overlap rule". If overlap qualifies, switch type to `skill_edit` targeting matched SKILL.md.
e. **Draft full content**: e. **Draft full content**:
- `skill_new`: draft the complete SKILL.md per "Skill drafting protocol" below. `# Proposed change` contains the full file body. - `skill_new`: complete SKILL.md per "Skill drafting protocol".
- `skill_edit`: draft an append-only unified diff per "Skill overlap rule". - `skill_edit`: append-only unified diff per "Skill overlap rule".
- `memory`: draft full memory file content (frontmatter + body). - `memory`: complete memory file per "Memory drafting protocol".
- Other types: per existing rules (unified diff or full content). - Other: per existing rules (unified diff or full content).
f. Score against rubric → `confidence`, `blast_radius`, `cross_session_evidence`, `multi_axis`, `auto_apply_eligible`. f. Score against rubric → `confidence`, `blast_radius`, `cross_session_evidence`, `multi_axis`, `auto_apply_eligible`.
g. Apply feedback bias (step 4c) and multi-axis bonus. g. Apply feedback bias (step 1e) and multi-axis bonus.
h. Emit proposal file to `proposals_dir/`. h. **Record `source_entries`**: list every journal entry timestamp that fed this cluster. Goes in proposal frontmatter as a YAML block-form array (one `- "<ts>"` per line). The skill consumes this on apply/reject to archive matching entries out of `journal.jsonl` and into `journal/actioned-<id>.jsonl`.
8. Update `cursor` in `state.json` to new line count. i. Emit proposal file to `proposals_dir/`.
9. Emit punch list to stdout (last message): `{"new":N, "high_confidence":[...], "queued":[...], "skipped":[...]}`. 7. Emit punch list to stdout (last message): `{"new":N, "high_confidence":[...], "queued":[...], "skipped":[...]}`. The `cursor` field in `state.json` is vestigial as of v0.2.0 — do not read or write it.
## Skill overlap rule ## Skill overlap rule
@@ -143,14 +149,107 @@ When the main thread applies a `skill_new` proposal:
2. Writes the `# Proposed change` body to `<slug>/SKILL.md`. 2. Writes the `# Proposed change` body to `<slug>/SKILL.md`.
3. Tells the user: "skill `<slug>` written. Activates immediately on next user turn (CC v2.1.0+ auto-hot-reload)." 3. Tells the user: "skill `<slug>` written. Activates immediately on next user turn (CC v2.1.0+ auto-hot-reload)."
## Memory drafting protocol (for `memory` proposals)
Every `memory` proposal's `# Proposed change` section MUST contain the COMPLETE memory file body — frontmatter + content — that will be written to the target path under `~/.claude/projects/<encoded-home>/memory/<slug>.md`.
Required structure:
```markdown
---
name: <human-readable name, ≤80 chars>
description: <one-line description used to decide future relevance — be specific, ≤200 chars>
type: user | feedback | project | reference
originSessionId: <session_id from journal entries that fed this cluster>
---
<Body content per type, see CLAUDE.md memory schema:
- feedback: lead with the rule, then **Why:** line, then **How to apply:** line.
- project: lead with fact/decision, then **Why:** and **How to apply:** lines.
- user: brief description of role/preference/knowledge.
- reference: pointer to external system + what's there.>
```
Constraints:
- Frontmatter fields `name`, `description`, `type` are **required**. Skill enforces this at apply time.
- `originSessionId` is required — must be a `session` value from one of the cluster's journal entries.
- ≤50 LOC of body content. Surgical.
- Slug (used in `target` path filename) must not collide with any existing memory file.
- For `type=feedback` and `type=project`, body MUST contain `**Why:**` and `**How to apply:**` lines (CLAUDE.md memory schema).
## Diagnosis drafting protocol (required for every proposal)
Every proposal's body MUST include a `# Diagnosis` section between `# Why` and `# Assumptions`. It states the causal chain — *trigger → action → mismatch → outcome* — that motivates the proposed change, grounded in transcript evidence.
Required structure (exactly four labelled lines):
```markdown
# Diagnosis
**Trigger:** <what the user wanted / context the assistant was in — 1 sentence>
**Action:** <what the assistant did — 1 sentence, name specific tools/files when relevant>
**Mismatch:** <how the action diverged from the trigger — 1 sentence>
**Outcome:** <what surfaced the mismatch — user correction quote, error message, dead end — must include ≥1 verbatim quote ≤80 chars from transcript, in backticks>
```
Constraints:
1. ≤5 LOC of prose total.
2. ≥1 verbatim transcript quote, max 80 chars, wrapped in backticks.
3. The quote MUST appear within ~20 messages of one of the `source_entries` timestamps (transcript context window already pulled in step 6b).
4. No speculation — if causation is unclear from available context, write `Mismatch: unclear — see Outcome` and the cluster takes a `-1` rubric penalty (see rubric).
5. For win clusters (`correction_free_streak`, `clean_recovery`) where there is no failure: `Mismatch: None` is a valid value. Outcome cites the recovery quote or the silence ("no correction across N prompts" + closest journal `ts`).
Example — struggle cluster:
```markdown
# Diagnosis
**Trigger:** User asked to run Go tests in three different sessions, expected fresh results each time.
**Action:** Assistant ran `go test ./...` without `-count=1` flag.
**Mismatch:** Go's test cache returned stale passes from prior runs; assistant did not invalidate.
**Outcome:** User corrected with `"no use go test -count=1"` (s-aaa, 2026-05-10T10:00).
```
Example — win cluster:
```markdown
# Diagnosis
**Trigger:** Bash commands failed 3× with the same fingerprint; user did not intervene.
**Action:** Assistant switched from Bash to `Read` + `Edit` for the same goal, finished without further error.
**Mismatch:** None — recovery confirms the alternate tool is the right path here.
**Outcome:** Three clean PostToolUse events after the loop (`recovered_from: tool_error_loop`, s-bbb).
```
After drafting the four lines, set proposal frontmatter `diagnosis_summary` to a single sentence ≤120 chars derived from the **Mismatch** line — used for skim/search across `applied/` and `rejected/`.
## Win-driven `skill_edit` eligibility
A `skill_edit` proposal sets `auto_apply_eligible: true` ONLY when ALL hold:
1. `confidence ≥ 4`.
2. `cross_session_evidence == true`.
3. `# Why` cites ≥1 win-signal entry (`clean_recovery` or `correction_free_streak`) whose `active_skills` includes the target skill slug. Record this entry's `ts` in frontmatter field `win_evidence`.
4. Diff is append-only — verify no `-` lines on existing SKILL.md content.
5. Diff `+` lines ≤ 30.
6. Resulting SKILL.md size ≤ 2× current size. Record both byte counts in frontmatter fields `bytes_before`, `bytes_after`.
7. No entry in `applied_dir/` for the same `target` with `last_auto_edit` newer than 7 days ago (cooldown).
8. No entry in `rejected_dir/` for this `target` with `auto_apply_blacklist: true` newer than 30 days ago.
If any of (3)(8) fails: still emit the proposal, but `auto_apply_eligible: false` — main thread queues for review.
Win clusters do NOT override struggle clusters: a single `clean_recovery` cannot turn a `correction` cluster into a `skill_edit`. Struggle paths and win paths are independent.
## Confidence rubric (deterministic — do NOT vibe) ## Confidence rubric (deterministic — do NOT vibe)
Sum: Sum:
- Signal repeated ≥3× across ≥2 sessions: **+2** - Signal repeated ≥3× across ≥2 sessions: **+2**
- Struggle signal (`tool_error_loop`, `dead_end`, `weak_agent`, `retry_loop`, `edit_churn`, `build_loop`) repeated3× within a single session: **+2** *(does not stack with the cross-session bonus — pick whichever applies, never both)* - Struggle signal (`tool_error_loop`, `dead_end`, `weak_agent`, `retry_loop`, `edit_churn`, `build_loop`) appearing1× within a single session: **+2** *(each struggle entry already represents a hook-side threshold crossing — e.g. 8 tools without a prompt, 3 same-args retries, 4 edits to one file. Treat each entry as one piece of evidence. Does not stack with the cross-session bonus.)*
- Transcript contains positive endorsement (`yes`, `exactly`, `do that`, `keep doing`) within 2 messages of related action: **+2** - Transcript contains positive endorsement (`yes`, `exactly`, `do that`, `keep doing`) within 2 messages of related action: **+2**
- Multi-axis cluster (≥2 distinct struggle types in same session): **+1** - Multi-axis cluster (≥2 distinct struggle types in same session): **+1**
- Type-bias penalty from feedback loop (≥3 rejections, applied:rejected ratio <1:2 for this `type`): **-1** - Type-bias penalty from feedback loop (≥3 rejections, applied:rejected ratio <1:2 for this `type`): **-1**
- Diagnosis flags `Mismatch: unclear` (causation could not be reconstructed from transcript context): **-1**
- Blast radius low (memory file or new isolated skill): **+1** - Blast radius low (memory file or new isolated skill): **+1**
- Blast radius medium (new agent, new hook, edit existing skill): **0** - Blast radius medium (new agent, new hook, edit existing skill): **0**
- Blast radius high (CLAUDE.md, settings.json hooks, edit agent, deletion): **-1** - Blast radius high (CLAUDE.md, settings.json hooks, edit agent, deletion): **-1**
@@ -160,16 +259,16 @@ Sum:
`auto_apply_eligible: true` requires **all** of: `auto_apply_eligible: true` requires **all** of:
- `confidence ≥ 4` - `confidence ≥ 4`
- `blast_radius == "low"` - `blast_radius == "low"`
- `type ∈ {memory, skill_new}` - `type ∈ {memory, skill_new, skill_edit}``skill_edit` additionally requires the win-driven gate (see "Win-driven `skill_edit` eligibility")
- `cross_session_evidence == true` — the +2 signal-repetition bonus came from the cross-session bullet (≥3× across ≥2 sessions). **Single-session-only struggle proposals always queue, never auto-apply, regardless of total confidence.** Record as frontmatter field `cross_session_evidence: true|false` on every proposal. - `cross_session_evidence == true` — the +2 signal-repetition bonus came from the cross-session bullet (≥3× across ≥2 sessions). **Single-session-only struggle proposals always queue, never auto-apply, regardless of total confidence.** Record as frontmatter field `cross_session_evidence: true|false` on every proposal.
## Proposal types ## Proposal types
| Type | Target | Default blast | Auto-apply? | | Type | Target | Default blast | Auto-apply? |
|---|---|---|---| |---|---|---|---|
| `memory` | `~/.claude/projects/<encoded-home>/memory/*.md` | low | yes if conf≥4 AND cross_session | | `memory` | `~/.claude/projects/-Users-nvm/memory/*.md` | low | yes if conf≥4 AND cross_session |
| `skill_new` | new dir under `~/.claude/skills/` | low | yes if conf≥4 AND cross_session | | `skill_new` | new dir under `~/.claude/skills/` | low | yes if conf≥4 AND cross_session |
| `skill_edit` | existing skill file | medium | no | | `skill_edit` | existing skill file | medium | yes if win-evidence + LOC + cooldown gates all pass (see "Win-driven skill_edit eligibility") |
| `agent_new` | new file under `~/.claude/agents/` | medium | no | | `agent_new` | new file under `~/.claude/agents/` | medium | no |
| `agent_edit` | existing agent file | medium | no | | `agent_edit` | existing agent file | medium | no |
| `claude_md_edit` | `~/.claude/CLAUDE.md` | high | no | | `claude_md_edit` | `~/.claude/CLAUDE.md` | high | no |
@@ -210,11 +309,24 @@ cross_session_evidence: true | false
multi_axis: true | false multi_axis: true | false
auto_apply_eligible: true | false auto_apply_eligible: true | false
status: queued status: queued
source_entries:
- "<journal entry ts that fed this cluster>"
- "<another ts>"
- "..."
# skill_edit only — required when auto_apply_eligible: true
win_evidence: "<ts of triggering clean_recovery or correction_free_streak entry>"
bytes_before: <int>
bytes_after: <int>
# optional — auto-populated from Diagnosis Mismatch line
diagnosis_summary: "<≤120 chars, single sentence>"
--- ---
# Why # Why
<observed evidence: session ids, dates, quotes from transcript synthesis> <observed evidence: session ids, dates, quotes from transcript synthesis>
# Diagnosis
<four labelled lines per "Diagnosis drafting protocol": Trigger / Action / Mismatch / Outcome — Outcome must contain ≥1 backtick-wrapped transcript quote ≤80 chars>
# Assumptions # Assumptions
- <assumption 1> - <assumption 1>
- <assumption 2> - <assumption 2>
+69 -6
View File
@@ -28,6 +28,10 @@ const DEAD_END_THRESHOLD = 8;
const EDIT_CHURN_THRESHOLD = 4; const EDIT_CHURN_THRESHOLD = 4;
const BUILD_LOOP_THRESHOLD = 2; const BUILD_LOOP_THRESHOLD = 2;
const SUBAGENT_DISPATCH_THRESHOLD = 3; const SUBAGENT_DISPATCH_THRESHOLD = 3;
const CORRECTION_FREE_THRESHOLD = 5;
const CLEAN_RECOVERY_WINDOW = 3;
const STRUGGLE_TYPES = new Set(["tool_error_loop", "dead_end", "retry_loop"]);
const ACTIVE_SKILLS_LOOKBACK = 10;
const STATE_MAX_BYTES = 1_000_000; const STATE_MAX_BYTES = 1_000_000;
function safeRead(path, fallback) { function safeRead(path, fallback) {
@@ -77,6 +81,17 @@ function readUsage(name) {
return usage[name] || 0; return usage[name] || 0;
} }
function pushActivity(state, kind, name, ts) {
state.activity_ring.push({ kind, name, ts });
if (state.activity_ring.length > ACTIVE_SKILLS_LOOKBACK) state.activity_ring.shift();
}
function activeNames(state, kind) {
const seen = new Set();
for (const e of state.activity_ring) if (e.kind === kind) seen.add(e.name);
return [...seen];
}
function errorFingerprint(toolResponse) { function errorFingerprint(toolResponse) {
if (!toolResponse) return null; if (!toolResponse) return null;
let text = ""; let text = "";
@@ -114,6 +129,8 @@ function resetSessionLocal(state) {
resetFrictionCounters(state); resetFrictionCounters(state);
state.session_subagents = {}; state.session_subagents = {};
state.subagent_dispatch_emitted = {}; state.subagent_dispatch_emitted = {};
state.correctionFreeCounter = 0;
state.recoveryWatch = null;
} }
function ensureStateDefaults(state) { function ensureStateDefaults(state) {
@@ -127,6 +144,9 @@ function ensureStateDefaults(state) {
if (typeof state.build_loop_emitted !== "boolean") state.build_loop_emitted = false; if (typeof state.build_loop_emitted !== "boolean") state.build_loop_emitted = false;
if (!state.session_subagents || typeof state.session_subagents !== "object") state.session_subagents = {}; if (!state.session_subagents || typeof state.session_subagents !== "object") state.session_subagents = {};
if (!state.subagent_dispatch_emitted || typeof state.subagent_dispatch_emitted !== "object") state.subagent_dispatch_emitted = {}; if (!state.subagent_dispatch_emitted || typeof state.subagent_dispatch_emitted !== "object") state.subagent_dispatch_emitted = {};
if (typeof state.correctionFreeCounter !== "number") state.correctionFreeCounter = 0;
if (state.recoveryWatch === undefined) state.recoveryWatch = null;
if (!Array.isArray(state.activity_ring)) state.activity_ring = [];
} }
function main() { function main() {
@@ -155,6 +175,18 @@ function main() {
prev_tool: last.tool || null, prev_tool: last.tool || null,
prev_file: last.file || null, prev_file: last.file || null,
}); });
state.correctionFreeCounter = 0;
} else {
state.correctionFreeCounter += 1;
if (state.correctionFreeCounter >= CORRECTION_FREE_THRESHOLD) {
appendJournal({
ts, session, cwd, type: "correction_free_streak",
streak: state.correctionFreeCounter,
active_skills: activeNames(state, "skill"),
active_agents: activeNames(state, "agent"),
});
state.correctionFreeCounter = 0;
}
} }
resetFrictionCounters(state); resetFrictionCounters(state);
} else if (event === "PreToolUse") { } else if (event === "PreToolUse") {
@@ -162,9 +194,11 @@ function main() {
if (tool === "Skill") { if (tool === "Skill") {
const name = (input.tool_input && (input.tool_input.skill || input.tool_input.skill_name)) || "unknown"; const name = (input.tool_input && (input.tool_input.skill || input.tool_input.skill_name)) || "unknown";
bumpUsage(`skill:${name}`); bumpUsage(`skill:${name}`);
pushActivity(state, "skill", name, ts);
} else if (tool === "Agent") { } else if (tool === "Agent") {
const name = (input.tool_input && (input.tool_input.subagent_type || input.tool_input.agent)) || "unknown"; const name = (input.tool_input && (input.tool_input.subagent_type || input.tool_input.agent)) || "unknown";
bumpUsage(`agent:${name}`); bumpUsage(`agent:${name}`);
pushActivity(state, "agent", name, ts);
state.session_subagents[name] = (state.session_subagents[name] || 0) + 1; state.session_subagents[name] = (state.session_subagents[name] || 0) + 1;
const cumulative = readUsage(`agent:${name}`); const cumulative = readUsage(`agent:${name}`);
const sessionCount = state.session_subagents[name]; const sessionCount = state.session_subagents[name];
@@ -182,6 +216,12 @@ function main() {
const argsHash = djb2(JSON.stringify(input.tool_input || {})); const argsHash = djb2(JSON.stringify(input.tool_input || {}));
const file = (input.tool_input && (input.tool_input.file_path || input.tool_input.path)) || null; const file = (input.tool_input && (input.tool_input.file_path || input.tool_input.path)) || null;
let struggleEmittedThisTurn = null;
const emit = (entry) => {
if (STRUGGLE_TYPES.has(entry.type)) struggleEmittedThisTurn = entry.type;
appendJournal(entry);
};
const windowEntry = { tool, argsHash, file }; const windowEntry = { tool, argsHash, file };
if (tool === "Agent") { if (tool === "Agent") {
const sub = (input.tool_input && (input.tool_input.subagent_type || input.tool_input.agent)) || "unknown"; const sub = (input.tool_input && (input.tool_input.subagent_type || input.tool_input.agent)) || "unknown";
@@ -192,14 +232,14 @@ function main() {
const sameToolArgs = state.tool_window.filter(e => e.tool === tool && e.argsHash === argsHash).length; const sameToolArgs = state.tool_window.filter(e => e.tool === tool && e.argsHash === argsHash).length;
if (sameToolArgs >= RETRY_THRESHOLD) { if (sameToolArgs >= RETRY_THRESHOLD) {
appendJournal({ ts, session, cwd, type: "retry_loop", tool, count: sameToolArgs }); emit({ ts, session, cwd, type: "retry_loop", tool, count: sameToolArgs });
} }
if (tool === "Agent") { if (tool === "Agent") {
const subagent = (input.tool_input && (input.tool_input.subagent_type || input.tool_input.agent)) || "unknown"; const subagent = (input.tool_input && (input.tool_input.subagent_type || input.tool_input.agent)) || "unknown";
const recent = state.tool_window.slice(-5).filter(e => e.tool === "Agent" && e.subagent === subagent).length; const recent = state.tool_window.slice(-5).filter(e => e.tool === "Agent" && e.subagent === subagent).length;
if (recent >= AGENT_RESPAWN_THRESHOLD) { if (recent >= AGENT_RESPAWN_THRESHOLD) {
appendJournal({ ts, session, cwd, type: "weak_agent", subagent_type: subagent, count: recent }); emit({ ts, session, cwd, type: "weak_agent", subagent_type: subagent, count: recent });
} }
} }
@@ -214,14 +254,14 @@ function main() {
if (state.last_errors.length > ERROR_RING_SIZE) state.last_errors.shift(); if (state.last_errors.length > ERROR_RING_SIZE) state.last_errors.shift();
const sameError = state.last_errors.filter(e => e.fp === fp).length; const sameError = state.last_errors.filter(e => e.fp === fp).length;
if (sameError >= ERROR_LOOP_THRESHOLD) { if (sameError >= ERROR_LOOP_THRESHOLD) {
appendJournal({ ts, session, cwd, type: "tool_error_loop", tool, count: sameError, fp }); emit({ ts, session, cwd, type: "tool_error_loop", tool, count: sameError, fp });
} }
} }
if (file && EDIT_TOOLS.has(tool)) { if (file && EDIT_TOOLS.has(tool)) {
state.edit_counts[file] = (state.edit_counts[file] || 0) + 1; state.edit_counts[file] = (state.edit_counts[file] || 0) + 1;
if (state.edit_counts[file] >= EDIT_CHURN_THRESHOLD && !state.edit_churn_emitted[file]) { if (state.edit_counts[file] >= EDIT_CHURN_THRESHOLD && !state.edit_churn_emitted[file]) {
appendJournal({ ts, session, cwd, type: "edit_churn", file, count: state.edit_counts[file] }); emit({ ts, session, cwd, type: "edit_churn", file, count: state.edit_counts[file] });
state.edit_churn_emitted[file] = true; state.edit_churn_emitted[file] = true;
} }
const keys = Object.keys(state.edit_counts); const keys = Object.keys(state.edit_counts);
@@ -239,7 +279,7 @@ function main() {
if (isBuildCmd && hasError) { if (isBuildCmd && hasError) {
state.build_failure_count += 1; state.build_failure_count += 1;
if (state.build_failure_count >= BUILD_LOOP_THRESHOLD && !state.build_loop_emitted) { if (state.build_failure_count >= BUILD_LOOP_THRESHOLD && !state.build_loop_emitted) {
appendJournal({ ts, session, cwd, type: "build_loop", count: state.build_failure_count, command: cmd.slice(0, 80) }); emit({ ts, session, cwd, type: "build_loop", count: state.build_failure_count, command: cmd.slice(0, 80) });
state.build_loop_emitted = true; state.build_loop_emitted = true;
} }
} }
@@ -247,9 +287,32 @@ function main() {
state.tools_since_user += 1; state.tools_since_user += 1;
if (state.tools_since_user >= DEAD_END_THRESHOLD && !state.dead_end_emitted) { if (state.tools_since_user >= DEAD_END_THRESHOLD && !state.dead_end_emitted) {
appendJournal({ ts, session, cwd, type: "dead_end", count: state.tools_since_user }); emit({ ts, session, cwd, type: "dead_end", count: state.tools_since_user });
state.dead_end_emitted = true; state.dead_end_emitted = true;
} }
if (struggleEmittedThisTurn) {
state.recoveryWatch = { recovered_from: struggleEmittedThisTurn, since_ts: ts, clean_count: 0, window_tools: [] };
} else if (state.recoveryWatch) {
const turnHadError = fp !== null;
if (turnHadError) {
state.recoveryWatch = null;
} else {
state.recoveryWatch.clean_count += 1;
state.recoveryWatch.window_tools.push(tool);
if (state.recoveryWatch.window_tools.length > CLEAN_RECOVERY_WINDOW) state.recoveryWatch.window_tools.shift();
if (state.recoveryWatch.clean_count >= CLEAN_RECOVERY_WINDOW) {
appendJournal({
ts, session, cwd, type: "clean_recovery",
recovered_from: state.recoveryWatch.recovered_from,
recovery_window_tools: state.recoveryWatch.window_tools.slice(),
active_skills: activeNames(state, "skill"),
active_agents: activeNames(state, "agent"),
});
state.recoveryWatch = null;
}
}
}
} }
safeWrite(STATE, state); safeWrite(STATE, state);
+3 -1
View File
@@ -24,6 +24,7 @@ mkdir -p \
"$DEST/adam/rejected" \ "$DEST/adam/rejected" \
"$DEST/adam/trash" \ "$DEST/adam/trash" \
"$DEST/adam/journal" \ "$DEST/adam/journal" \
"$DEST/adam/scripts" \
"$DEST/adam/tests/fixtures" "$DEST/adam/tests/fixtures"
cp "$SRC/hooks/adam-observe.mjs" "$DEST/hooks/" cp "$SRC/hooks/adam-observe.mjs" "$DEST/hooks/"
@@ -31,6 +32,7 @@ cp "$SRC/hooks/adam-nudge.mjs" "$DEST/hoo
cp "$SRC/agents/adam.md" "$DEST/agents/" cp "$SRC/agents/adam.md" "$DEST/agents/"
cp "$SRC/skills/adam-self-improvement/SKILL.md" "$DEST/skills/adam-self-improvement/" cp "$SRC/skills/adam-self-improvement/SKILL.md" "$DEST/skills/adam-self-improvement/"
cp "$SRC/commands/reflect.md" "$DEST/commands/" cp "$SRC/commands/reflect.md" "$DEST/commands/"
cp "$SRC/adam/scripts/adam-archive.mjs" "$DEST/adam/scripts/"
cp "$SRC/adam/tests/run-tests.sh" "$DEST/adam/tests/" cp "$SRC/adam/tests/run-tests.sh" "$DEST/adam/tests/"
cp "$SRC/adam/tests/fixtures/seed-corrections.jsonl" "$DEST/adam/tests/fixtures/" cp "$SRC/adam/tests/fixtures/seed-corrections.jsonl" "$DEST/adam/tests/fixtures/"
@@ -41,7 +43,7 @@ cp "$SRC/adam/tests/fixtures/seed-corrections.jsonl" "$DEST/ada
echo " files installed." echo " files installed."
echo echo
echo " next steps:" echo " next steps:"
echo " 1. bash $DEST/adam/tests/run-tests.sh # must show: 18 passed, 0 failed" echo " 1. bash $DEST/adam/tests/run-tests.sh # must show: 21 passed, 0 failed"
echo " 2. merge settings.json.example into $DEST/settings.json" echo " 2. merge settings.json.example into $DEST/settings.json"
echo " 3. start a fresh Claude Code session, then run /reflect" echo " 3. start a fresh Claude Code session, then run /reflect"
echo echo
+24 -5
View File
@@ -44,9 +44,24 @@ For each id in `high_confidence`:
- Verify in front of the user: print `id`, `target`, `confidence`, `blast_radius`, `cross_session_evidence`, `auto_apply_eligible`. - Verify in front of the user: print `id`, `target`, `confidence`, `blast_radius`, `cross_session_evidence`, `auto_apply_eligible`.
- Apply the change: - Apply the change:
- **For `skill_new`**: `mkdir -p ~/.claude/skills/<slug>/`, then `Write` the proposal's `# Proposed change` body to `~/.claude/skills/<slug>/SKILL.md`. After write, print: "skill `<slug>` written to `~/.claude/skills/<slug>/SKILL.md` — activates immediately — Claude Code v2.1.0+ auto-hot-reloads user-level skills, no restart needed." - **For `skill_new`**: `mkdir -p ~/.claude/skills/<slug>/`, then `Write` the proposal's `# Proposed change` body to `~/.claude/skills/<slug>/SKILL.md`. After write, print: "skill `<slug>` written to `~/.claude/skills/<slug>/SKILL.md` — activates immediately — Claude Code v2.1.0+ auto-hot-reloads user-level skills, no restart needed."
- **For `memory`**: `Write` the proposal's `# Proposed change` body to the path in `target` (under `~/.claude/projects/<encoded-home>/memory/`, where `<encoded-home>` is the user's home dir with `/` replaced by `-`, e.g. `-Users-alice` on macOS). Then update `MEMORY.md` index with a one-line pointer. - **For `memory`**: `Write` the proposal's `# Proposed change` body (which MUST include the auto-memory frontmatter — see "Memory drafting protocol" in `agents/adam.md`) to the path in `target`. Then update `MEMORY.md` index with a one-line pointer.
- **For other types under auto-apply**: apply via Write/Edit per `# Proposed change`. (Note: only `memory` and `skill_new` qualify for auto-apply per the rubric.) - **For `skill_edit`**: enforce the apply-time gate before writing.
1. Verify proposal frontmatter has `auto_apply_eligible: true`. If not, abort and queue for review.
2. Read `target` SKILL.md, capture `current_bytes` from a fresh stat — do NOT trust frontmatter `bytes_before`.
3. Verify diff in `# Proposed change`:
- Unified-diff format.
- Zero `-` lines on existing SKILL.md content (additions only).
- Total `+` lines ≤ 30.
If any check fails, print one-line refusal reason, leave proposal in `proposals/`, continue.
4. Cooldown re-check: scan `applied/` frontmatter for `target` matching this and `last_auto_edit` newer than 7 days ago. Refuse if found.
5. Blacklist re-check: scan `rejected/` frontmatter for `target` matching this and `auto_apply_blacklist: true` newer than 30 days ago. Refuse if found.
6. Apply via `Edit` tool (append the new section per the diff). Never use `Write` on existing SKILL.md.
7. Re-stat target. If new size exceeds `2 * current_bytes` (captured in step 2), revert via `Edit` (remove the just-appended section) and refuse — print refusal reason.
8. Add `last_auto_edit: <iso8601 utc now>` to the proposal frontmatter before moving it.
9. Tell user: "skill `<slug>` extended (added <N> lines) — auto-applied via win-evidence gate."
- **For other types under auto-apply**: apply via Write/Edit per `# Proposed change`. (Note: only `memory`, `skill_new`, and `skill_edit` qualify for auto-apply per the rubric.)
- Move proposal to `~/.claude/adam/applied/<UTC-ts>-<id>.md`. - Move proposal to `~/.claude/adam/applied/<UTC-ts>-<id>.md`.
- **Archive consumed journal entries**: `node ~/.claude/adam/scripts/adam-archive.mjs ~/.claude/adam/applied/<UTC-ts>-<id>.md` — moves entries listed in proposal's `source_entries` from `journal.jsonl` to `journal/actioned-<id>.jsonl` so subsequent `/reflect` runs do not re-cluster them.
Print: `auto-applied N proposals: [ids]`. Print: `auto-applied N proposals: [ids]`.
@@ -61,10 +76,11 @@ c. On **approve**:
- For `deletion`: `mkdir -p ~/.claude/adam/trash/<ts>` then `mv` the artifact into it. Print restoration command. - For `deletion`: `mkdir -p ~/.claude/adam/trash/<ts>` then `mv` the artifact into it. Print restoration command.
- For `skill_new`: `mkdir -p ~/.claude/skills/<slug>/`, then write `# Proposed change` body to `<slug>/SKILL.md`. Tell user: "skill `<slug>` written — activates immediately (CC v2.1.0+ auto-hot-reload)." - For `skill_new`: `mkdir -p ~/.claude/skills/<slug>/`, then write `# Proposed change` body to `<slug>/SKILL.md`. Tell user: "skill `<slug>` written — activates immediately (CC v2.1.0+ auto-hot-reload)."
- For `skill_edit`: apply the unified diff in `# Proposed change` to the existing SKILL.md at `target` (append-only — never replace existing content). - For `skill_edit`: apply the unified diff in `# Proposed change` to the existing SKILL.md at `target` (append-only — never replace existing content).
- For `memory`: write to `target` and update `MEMORY.md` index. - For `memory`: write `# Proposed change` body (must include auto-memory frontmatter) to `target` and update `MEMORY.md` index with a one-line pointer.
- For all others: apply via Write/Edit per the proposal's `# Proposed change`. - For all others: apply via Write/Edit per the proposal's `# Proposed change`.
- Move proposal to `~/.claude/adam/applied/<ts>-<id>.md`. - Move proposal to `~/.claude/adam/applied/<ts>-<id>.md`.
d. On **reject**: ask for reason in one line. Append `# Reason\n<reason>` to proposal body. Move to `~/.claude/adam/rejected/<id>.md`. - Archive: `node ~/.claude/adam/scripts/adam-archive.mjs ~/.claude/adam/applied/<ts>-<id>.md`.
d. On **reject**: ask for reason in one line. Append `# Reason\n<reason>` to proposal body. If the proposal `type` is `skill_edit`, ALSO add `auto_apply_blacklist: true` to its frontmatter (so future reflects skip auto-apply on this target for 30 days). Move to `~/.claude/adam/rejected/<id>.md`. Archive: `node ~/.claude/adam/scripts/adam-archive.mjs ~/.claude/adam/rejected/<id>.md`.
e. On **edit**: ask the user for the change, edit the proposal in place, then loop back to step 3a for that same id. e. On **edit**: ask the user for the change, edit the proposal in place, then loop back to step 3a for that same id.
### 4. Handle failures ### 4. Handle failures
@@ -89,12 +105,15 @@ adam reflect summary:
Before writing any proposal: Before writing any proposal:
- Confirm `# Assumptions` section is non-empty. - Confirm `# Assumptions` section is non-empty.
- Confirm `# Diagnosis` section exists and contains all four labelled lines (`Trigger:`, `Action:`, `Mismatch:`, `Outcome:`) AND at least one backtick-wrapped quote ≤80 chars in the Outcome line. Refuse if missing or malformed — agent must redraft per the "Diagnosis drafting protocol" in `agents/adam.md`.
- Confirm `# Success criterion` section is non-empty and runnable. - Confirm `# Success criterion` section is non-empty and runnable.
- Confirm change is ≤50 LOC for non-`skill_new`, or ≤80 LOC for `skill_new` body. If larger, ask the user once: "this proposal is N LOC — proceed?" - Confirm change is ≤50 LOC for non-`skill_new`, or ≤80 LOC for `skill_new` body. If larger, ask the user once: "this proposal is N LOC — proceed?"
- For `claude_md_edit`: confirm 3+ distinct cwds in the `# Why` section. - For `claude_md_edit`: confirm 3+ distinct cwds in the `# Why` section.
- For `deletion`: confirm both criteria (a) and (b) from the agent's special handling are documented in the proposal. - For `deletion`: confirm both criteria (a) and (b) from the agent's special handling are documented in the proposal.
- For `skill_new`: confirm the slug doesn't collide with any existing skill in `~/.claude/skills/`. If it does, refuse and ask user to rename. - For `skill_new`: confirm the slug doesn't collide with any existing skill in `~/.claude/skills/`. If it does, refuse and ask user to rename.
- For `skill_edit`: confirm the diff is append-only (no `-` lines that remove existing content) and that target SKILL.md exists. - For `skill_edit`: confirm the diff is append-only (no `-` lines that remove existing content) and that target SKILL.md exists. When auto-applying, ALSO re-verify the eligibility gate steps in §2 (cooldown, blacklist, byte cap) before any `Edit` call — never trust frontmatter alone.
- For `memory`: confirm `# Proposed change` body starts with `---` frontmatter containing required fields `name`, `description`, `type`, `originSessionId`. Refuse if frontmatter missing — agent must redraft per the Memory drafting protocol.
- Confirm `source_entries` is present in proposal frontmatter as a non-empty list (used for archive). Warn (do not refuse) if missing — legacy proposals from before v0.2.0 won't have it.
If any check fails, refuse to apply and ask the user how to proceed. If any check fails, refuse to apply and ask the user how to proceed.