2 Commits

Author SHA1 Message Date
lukaszraczylo d929101af4 fix(v0.6.2): A/B volume normalization + memory frontmatter schema
Two issues surfaced by running ADAM's /reflect loop on a large real journal
(4015 entries, 119 sessions) — both caused false/broken auto-apply behavior.

1. A/B over-reported regressions (adam-ab-measure.mjs).
   Regressions were measured on RAW originating-signal counts pre vs post. On a
   busy, growing journal almost every signal count rises post-apply regardless
   of whether the proposal helped — so the loop flagged 9 false "regressions"
   (and would auto-roll-back good proposals). Now the delta is computed on the
   signal's SHARE of total activity (rate = count / window-total). Falls back to
   the raw-count delta when the signal is the only activity in the window
   (preserves prior behavior + all existing A/B tests). Output adds
   raw_delta_pct, pre_total, post_total, normalized for transparency.

2. Memory frontmatter drift (agents/adam.md, SKILL.md).
   The drafting protocol emitted flat `type:`/`originSessionId:` with a prose
   `name`, but the live auto-memory store uses `name` = slug plus a
   `metadata: {node_type, type, originSessionId}` block. Auto-applied memories
   could fail to load/categorize. Protocol + apply-time validation now require
   the live metadata.* schema and cross-checking against an existing file.

Tests: 132 -> 134. New: volume growth (raw +200%) with flat activity-share
classifies neutral, not regressed; a genuine share increase still classifies
regressed.
2026-05-29 12:37:10 +01:00
lukaszraczylo 3a54d7d3e1 feat(v0.6.1): file_reread signal — catch offset-shifted same-file re-reads
Proposed and approved through ADAM's own /reflect harness_edit loop (MOSS §1):
the analyst surfaced 23 tool_error_loop entries across 4 sessions whose context
windows were really redundant re-reads of one file.

retry_loop keys on argsHash of the full tool_input (including offset/limit), so
consecutive Reads of the SAME file at different offsets escaped dedup and leaked
into tool_error_loop fingerprints. The new file_reread signal catches them:
same file Read >=3x in the 10-event window, offset-agnostic (keyed on file
path), guarded by `sameToolArgs < RETRY_THRESHOLD` so byte-identical reads stay
with retry_loop (no double-count).

Fully wired end-to-end (not a half-dead signal):
- adam-observe.mjs: detection + STRUGGLE_TYPES membership (so it carries
  context_window + active_skills like other struggle signals).
- adam-window.mjs: 14-day sliding window (task-local, like retry_loop).
- adam-score.mjs: severity divisor 3.
- adam-batch.mjs: file-basename clustering.
- agents/adam.md + README: signal tables, clustering rules, rubric, windows.

Tests: 126 -> 132 (file_reread fires on 3x offset-shifted reads, not on 2x;
byte-identical reads route to retry_loop not file_reread; carries context_window).
2026-05-29 11:31:50 +01:00
9 changed files with 158 additions and 24 deletions
+6 -3
View File
@@ -13,7 +13,7 @@ Watches the friction in your coding sessions, clusters the signals via an LLM an
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
[![Version](https://img.shields.io/github/v/release/lukaszraczylo/claude-adam?label=version&color=blue)](https://github.com/lukaszraczylo/claude-adam/releases)
[![Tests](https://img.shields.io/badge/tests-126%20passing-brightgreen.svg)](./adam/tests/run-tests.sh)
[![Tests](https://img.shields.io/badge/tests-134%20passing-brightgreen.svg)](./adam/tests/run-tests.sh)
[![Node](https://img.shields.io/badge/node-22%2B-339933.svg)](https://nodejs.org)
[![Platform](https://img.shields.io/badge/platform-macOS%20%7C%20Linux-lightgrey.svg)]()
@@ -54,7 +54,7 @@ The installer copies files into `~/.claude/`, offers to merge ADAM's hook entrie
Then:
```sh
bash ~/.claude/adam/tests/run-tests.sh # expect: 126 passed, 0 failed
bash ~/.claude/adam/tests/run-tests.sh # expect: 134 passed, 0 failed
# … start a fresh Claude Code session …
/reflect # walks the proposal queue
/reflect --explain # also shows the analyst's clustering trace
@@ -132,6 +132,7 @@ Auto-apply runs only for low-blast types (memory entries, new skills, ephemeral
| `tool_error_loop` | Same error fingerprint 3× in a 5-event ring (fingerprints normalised — `ECONNREFUSED` and `"Connection refused"` cluster) | 30d |
| `dead_end` | 8 PostToolUse events without a UserPromptSubmit between them | 7d |
| `edit_churn` | Same file edited 4× in a window | 14d |
| `file_reread` | Same file Read ≥3× in the 10-event window, ignoring offset/limit (catches re-reads that escape `retry_loop`'s arg-hash dedup) | 14d |
| `build_loop` | 2× build/test/compile commands fail in same session | 30d |
| `subagent_dispatch_pattern` | Same subagent dispatched ≥ 3× cumulatively | 30d |
| `correction_free_streak` | 5 clean UserPromptSubmits in a row — reinforcement input | 60d |
@@ -247,11 +248,13 @@ Or pass `--explain` to `/reflect` to render the full trace inline.
│ ├── adam-apply-reinforcement.mjs # reinforcement proposal apply
│ ├── adam-upgrade.mjs # .adam-new file UX (list/diff/accept)
│ └── adam-archive.mjs # post-apply journal cleanup
└── tests/run-tests.sh # 126 isolated tests; never touches live state
└── tests/run-tests.sh # 134 isolated tests; never touches live state
```
## What's new
- **v0.6.2** — two fixes surfaced by running ADAM's loop on a large real journal. **(1) A/B volume normalization** (`adam-ab-measure.mjs`): regressions are now measured on the signal's *share* of total activity (rate = count / window-total), not raw count — so a generally busier journal after an apply no longer masquerades as a regression. Falls back to raw delta when the signal is the only activity in the window (preserves prior behavior + tests); output adds `raw_delta_pct`, `pre_total`, `post_total`, `normalized` for transparency. **(2) Memory frontmatter schema** (`agents/adam.md`, `SKILL.md`): the drafting protocol now emits the live auto-memory shape — `name` = slug + a `metadata: {node_type, type, originSessionId}` block — instead of flat `type:`/`originSessionId:`, so auto-applied memories load and categorize correctly. 134 tests (up from 132).
- **v0.6.1** — new `file_reread` signal (MOSS §1 harness self-modification, proposed and approved through ADAM's own `/reflect` loop). Consecutive Reads of the same file at different `offset`/`limit` escaped `retry_loop`'s arg-hash dedup and leaked into `tool_error_loop`; `file_reread` now catches them (same file ≥3× in the 10-event window, offset-agnostic, guarded against double-counting byte-identical reads). Fully wired: detection (`adam-observe.mjs`), 14-day window (`adam-window.mjs`), severity divisor 3 (`adam-score.mjs`), file-basename clustering (`adam-batch.mjs`), and the analyst rubric/spec. 132 tests (up from 126).
- **v0.6.0** — review hardening. Struggle signals now emit `active_skills`, so `silent_drift`'s primary cluster key and the §5b skill-attribution sub-clustering (+1 rubric bonus) actually fire (both were silently dead). `proposal_fingerprint` is now deterministically computable via `adam-cooldown.mjs --compute` instead of asking the LLM analyst to hand-compute a djb2 hash; spec now mandates a *stable* cluster id so fingerprints reproduce across runs. `reinforcement` proposals are correctly excluded from A/B tracking (the spec previously contradicted itself). `adam-nudge.mjs` pending-upgrade check now mirrors the full install set (`adam-utils`/`adam-batch`/`adam-rollback` were missing). Doc/test-count drift corrected. 126 tests (up from 114).
- **v0.5.0** — MOSS-grounded self-evolution (arXiv 2605.22794). Transcript capture: `context_window` field on struggle signals captures 8 surrounding events for evidence-based diagnosis. Two-stage analysis pipeline: diagnose+plan → inter-stage validation → implement (§3.3). Evidence batching via `adam-batch.mjs`: pre-clusters journal into coherent failure batches (§3.1). Pre-apply verification: 4-check deterministic gate before auto-apply (§3.4). Auto-rollback via `adam-rollback.mjs`: reverts regressed proposals detected by A/B measurement, creates regression nudges (§3.5). Harness self-modification: new `harness_edit` proposal type lets ADAM propose edits to its own scripts with test-suite-gated apply (§1 Table 1). Keypoint matrix: 5 capability dimensions scored per batch for structured evaluation (§4.2). 114 tests (up from 94).
- **v0.4.0** — expanded struggle detection: `silent_drift` (5 consecutive read-only tools), `error_after_recovery` (same error fingerprint returns after clean recovery); severity-sum scoring with per-type divisors; extended `STRUGGLE_TYPES` set. 94 tests (up from 87).
+44 -7
View File
@@ -3,11 +3,19 @@
//
// Reads ~/.claude/adam/ab-tracking.jsonl (one line per auto-apply event,
// written by adam-self-improvement/SKILL.md), then for each entry old enough
// (>= --min-age-days; default 7) compares signal counts in the 7-day window
// BEFORE applied_at against the 7-day window AFTER applied_at across the
// (>= --min-age-days; default 7) compares the originating signal in the 7-day
// window BEFORE applied_at against the 7-day window AFTER applied_at across the
// full journal corpus (active + rotated). Surfaces regressions so /reflect
// can flag proposals that made things worse.
//
// Volume normalization: when the windows contain other (non-originating)
// activity, the delta is computed on the signal's SHARE of total activity
// (rate = count / total), not its raw count — so a generally busier journal
// after apply does not masquerade as a regression. When the signal is the only
// activity in the windows, it falls back to the raw-count delta. Output carries
// both `delta_pct` (drives status) and `raw_delta_pct` + `normalized` for
// transparency.
//
// CLI:
// adam-ab-measure.mjs [--home <path>] [--format json|table] [--min-age-days N]
//
@@ -92,31 +100,60 @@ export function computeDeltas(entries, journal, opts = {}) {
const preStart = appliedAt - windowDays * DAY_MS;
const postEnd = appliedAt + windowDays * DAY_MS;
// preCount/postCount = originating-signal occurrences; preTotal/postTotal =
// ALL journal entries in the window (the activity denominator).
let preCount = 0;
let postCount = 0;
let preTotal = 0;
let postTotal = 0;
for (const je of journal || []) {
if (!je || typeof je !== "object") continue;
if (!sigSet.has(je.type)) continue;
const t = tsMs(je);
if (Number.isNaN(t)) continue;
if (t >= preStart && t < appliedAt) preCount++;
else if (t >= appliedAt && t < postEnd) postCount++;
const inPre = t >= preStart && t < appliedAt;
const inPost = t >= appliedAt && t < postEnd;
if (!inPre && !inPost) continue;
if (inPre) preTotal++; else postTotal++;
if (!sigSet.has(je.type)) continue;
if (inPre) preCount++; else postCount++;
}
let status;
let deltaPct;
let rawDeltaPct = null;
let normalized = false;
if (preCount === 0) {
status = "no_baseline";
deltaPct = null;
} else {
deltaPct = ((postCount - preCount) / preCount) * 100;
rawDeltaPct = Math.round(((postCount - preCount) / preCount) * 10000) / 100;
// Volume normalization: when the windows contain non-originating activity,
// compare the signal's SHARE of activity (rate), not its absolute count —
// otherwise a generally busier post-window masquerades as a regression.
// No background (signal IS the only activity) → fall back to raw delta,
// preserving prior behavior.
const hasBackground = (preTotal - preCount) + (postTotal - postCount) > 0;
if (hasBackground && postTotal > 0) {
const preRate = preCount / preTotal; // preTotal >= preCount > 0
const postRate = postCount / postTotal;
deltaPct = ((postRate - preRate) / preRate) * 100;
normalized = true;
} else {
deltaPct = ((postCount - preCount) / preCount) * 100;
}
// Round to 2 dp for stable comparison + presentation.
deltaPct = Math.round(deltaPct * 100) / 100;
if (deltaPct <= IMPROVED_PCT) status = "improved";
else if (deltaPct >= REGRESSED_PCT) status = "regressed";
else status = "neutral";
}
out.push({ ...base, pre_count: preCount, post_count: postCount, delta_pct: deltaPct, status });
out.push({
...base,
pre_count: preCount, post_count: postCount,
pre_total: preTotal, post_total: postTotal,
raw_delta_pct: rawDeltaPct, normalized,
delta_pct: deltaPct, status,
});
}
return out;
}
+2
View File
@@ -11,6 +11,7 @@
// tool_error_loop→ fp
// dead_end → session
// edit_churn → file basename
// file_reread → file basename
// build_loop → session
// subagent_dispatch_pattern → subagent_type
// silent_drift → active_skills[0]
@@ -65,6 +66,7 @@ function clusterKey(entry) {
case "build_loop":
return entry.session || "unknown";
case "edit_churn":
case "file_reread":
return entry.file ? entry.file.split("/").pop() : "unknown";
case "silent_drift":
case "correction_free_streak":
+1
View File
@@ -58,6 +58,7 @@ export const SEVERITY_DIVISORS = {
edit_churn: 4,
tool_error_loop: 3,
retry_loop: 3,
file_reread: 3,
weak_agent: 2,
build_loop: 1,
};
+1
View File
@@ -30,6 +30,7 @@ export const SIGNAL_WINDOWS_DAYS = {
weak_agent: 30,
subagent_dispatch_pattern: 30,
silent_drift: 14,
file_reread: 14,
error_after_recovery: 30,
correction_free_streak: 60,
clean_recovery: 60,
+68
View File
@@ -1990,6 +1990,74 @@ else
fi
rm -f "$ROOT/rejected/2026-blk-31.md"
# --- Test 110: file_reread fires on 3x offset-shifted same-file reads, not 2x ---
echo "Test 110: file_reread (offset-shifted same-file reads escape retry_loop)"
reset_state
for off in 0 100; do
echo "{\"hook_event_name\":\"PostToolUse\",\"tool_name\":\"Read\",\"tool_input\":{\"file_path\":\"/tmp/big.go\",\"offset\":$off},\"tool_response\":{\"content\":\"ok\"},\"session_id\":\"sFR\",\"cwd\":\"/tmp/x\"}" \
| HOOK_RUN >/dev/null 2>&1 || true
done
assert_no_grep "$ROOT/journal.jsonl" '"type":"file_reread"' "2x same-file reads does NOT emit file_reread"
echo '{"hook_event_name":"PostToolUse","tool_name":"Read","tool_input":{"file_path":"/tmp/big.go","offset":200},"tool_response":{"content":"ok"},"session_id":"sFR","cwd":"/tmp/x"}' \
| HOOK_RUN >/dev/null 2>&1 || true
assert_grep "$ROOT/journal.jsonl" '"type":"file_reread"' "3x offset-shifted same-file reads emit file_reread"
assert_no_grep "$ROOT/journal.jsonl" '"type":"retry_loop"' "offset-shifted reads do NOT emit retry_loop (argsHash differs)"
assert_grep "$ROOT/journal.jsonl" '"type":"file_reread".*"context_window"' "file_reread carries context_window (in STRUGGLE_TYPES)"
# --- Test 111: byte-identical reread is caught by retry_loop, not double-counted as file_reread ---
echo "Test 111: identical reads → retry_loop (file_reread guard avoids double-count)"
reset_state
for i in 1 2 3; do
echo '{"hook_event_name":"PostToolUse","tool_name":"Read","tool_input":{"file_path":"/tmp/same.go"},"tool_response":{"content":"ok"},"session_id":"sFR2","cwd":"/tmp/x"}' \
| HOOK_RUN >/dev/null 2>&1 || true
done
assert_grep "$ROOT/journal.jsonl" '"type":"retry_loop"' "3x byte-identical reads emit retry_loop"
assert_no_grep "$ROOT/journal.jsonl" '"type":"file_reread"' "byte-identical reads NOT double-counted as file_reread (sameToolArgs>=RETRY guard)"
# --- Test 112: A/B volume normalization — busier journal does NOT fake a regression ---
echo "Test 112: A/B volume-normalized (raw +200% but flat share → neutral)"
reset_state
applied_at_ms=$(node -e 'console.log(Date.now() - 14*86400000)')
cat > "$ROOT/ab-tracking.jsonl" <<EOF
{"applied_at":$applied_at_ms,"proposal_id":"ab-vol-001","proposal_type":"memory","target_skill":"vol","proposal_fingerprint":"fpV","originating_signals":[{"type":"correction","count":2,"session_ids":["sV"]}],"pre_window_days":7}
EOF
> "$ROOT/journal.jsonl"
# pre window: 2 correction + 8 dead_end (rate 0.2)
for i in 1 2; do ts=$(node -e "console.log(new Date(Date.now()-(15+$i*0.2)*86400000).toISOString())"); echo "{\"ts\":\"$ts\",\"session\":\"sV\",\"type\":\"correction\"}" >> "$ROOT/journal.jsonl"; done
for i in 1 2 3 4 5 6 7 8; do ts=$(node -e "console.log(new Date(Date.now()-(15+$i*0.1)*86400000).toISOString())"); echo "{\"ts\":\"$ts\",\"session\":\"sV\",\"type\":\"dead_end\"}" >> "$ROOT/journal.jsonl"; done
# post window: 6 correction + 24 dead_end (rate 0.2 — share unchanged, raw count +200%)
for i in $(seq 1 6); do ts=$(node -e "console.log(new Date(Date.now()-(8+$i*0.1)*86400000).toISOString())"); echo "{\"ts\":\"$ts\",\"session\":\"sV\",\"type\":\"correction\"}" >> "$ROOT/journal.jsonl"; done
for i in $(seq 1 24); do ts=$(node -e "console.log(new Date(Date.now()-(8+$i*0.05)*86400000).toISOString())"); echo "{\"ts\":\"$ts\",\"session\":\"sV\",\"type\":\"dead_end\"}" >> "$ROOT/journal.jsonl"; done
out=$(ABMEASURE_RUN --format json 2>/dev/null)
if echo "$out" | node -e 'let b="";process.stdin.on("data",d=>b+=d).on("end",()=>{const a=JSON.parse(b);const e=a.find(x=>x.proposal_id==="ab-vol-001");process.exit(e&&e.normalized===true&&e.raw_delta_pct===200&&e.status==="neutral"?0:1)})'; then
echo " PASS: volume growth normalized → neutral (raw +200%)"; PASS=$((PASS+1))
else
echo " FAIL: volume normalization wrong (got: $out)"; FAIL=$((FAIL+1))
fi
rm -f "$ROOT/ab-tracking.jsonl"
# --- Test 113: A/B genuine rate regression still flagged ---
echo "Test 113: A/B genuine share increase → regressed"
reset_state
applied_at_ms=$(node -e 'console.log(Date.now() - 14*86400000)')
cat > "$ROOT/ab-tracking.jsonl" <<EOF
{"applied_at":$applied_at_ms,"proposal_id":"ab-vol-002","proposal_type":"memory","target_skill":"vol2","proposal_fingerprint":"fpV2","originating_signals":[{"type":"correction","count":2,"session_ids":["sV2"]}],"pre_window_days":7}
EOF
> "$ROOT/journal.jsonl"
# pre: 2 correction + 8 dead_end (rate 0.2)
for i in 1 2; do ts=$(node -e "console.log(new Date(Date.now()-(15+$i*0.2)*86400000).toISOString())"); echo "{\"ts\":\"$ts\",\"session\":\"sV2\",\"type\":\"correction\"}" >> "$ROOT/journal.jsonl"; done
for i in 1 2 3 4 5 6 7 8; do ts=$(node -e "console.log(new Date(Date.now()-(15+$i*0.1)*86400000).toISOString())"); echo "{\"ts\":\"$ts\",\"session\":\"sV2\",\"type\":\"dead_end\"}" >> "$ROOT/journal.jsonl"; done
# post: 6 correction + 6 dead_end (rate 0.5 — share up → genuine regression)
for i in $(seq 1 6); do ts=$(node -e "console.log(new Date(Date.now()-(8+$i*0.1)*86400000).toISOString())"); echo "{\"ts\":\"$ts\",\"session\":\"sV2\",\"type\":\"correction\"}" >> "$ROOT/journal.jsonl"; done
for i in 1 2 3 4 5 6; do ts=$(node -e "console.log(new Date(Date.now()-(8+$i*0.07)*86400000).toISOString())"); echo "{\"ts\":\"$ts\",\"session\":\"sV2\",\"type\":\"dead_end\"}" >> "$ROOT/journal.jsonl"; done
out=$(ABMEASURE_RUN --format json 2>/dev/null)
if echo "$out" | node -e 'let b="";process.stdin.on("data",d=>b+=d).on("end",()=>{const a=JSON.parse(b);const e=a.find(x=>x.proposal_id==="ab-vol-002");process.exit(e&&e.normalized===true&&e.status==="regressed"?0:1)})'; then
echo " PASS: genuine share increase → regressed"; PASS=$((PASS+1))
else
echo " FAIL: genuine regression missed (got: $out)"; FAIL=$((FAIL+1))
fi
rm -f "$ROOT/ab-tracking.jsonl"
echo
echo "Results: $PASS passed, $FAIL failed"
[ "$FAIL" = "0" ]
+23 -13
View File
@@ -104,6 +104,7 @@ Per-signal windows (single source of truth: `SIGNAL_WINDOWS_DAYS` in `~/.claude/
| `weak_agent` | 30 d | subagent quality signal |
| `subagent_dispatch_pattern` | 30 d | dispatch routing pattern |
| `silent_drift` | 14 d | exploration-without-action is task-local |
| `file_reread` | 14 d | redundant same-file reads are task-local |
| `error_after_recovery` | 30 d | recovery-then-same-error patterns persist |
| `correction_free_streak` | 60 d | wins accumulate slowly |
| `clean_recovery` | 60 d | wins accumulate slowly |
@@ -127,6 +128,7 @@ The hook emits these `type` values into the journal:
| `build_loop` | 2 build/test/compile commands fail in session | session |
| `subagent_dispatch_pattern` | same subagent dispatched ≥3× cumulatively | subagent_type |
| `silent_drift` | 5 consecutive read-only PostToolUse without an action tool (reset on action or UserPromptSubmit) | `active_skills[0]` |
| `file_reread` | same file Read ≥3× in the 10-tool window, ignoring offset/limit (escapes `retry_loop`'s argsHash dedup) | file basename |
| `error_after_recovery` | same error fingerprint returns within 5 PostToolUse of a `clean_recovery` | (`recovered_from`, `original_fp`) |
| `correction_free_streak` | 5 clean UserPromptSubmits in a row (no correction phrase) | `active_skills[0]` |
| `clean_recovery` | 3 clean PostToolUse events after a `tool_error_loop`/`dead_end`/`retry_loop` | (`recovered_from`, `active_skills[0]`) |
@@ -154,13 +156,14 @@ The hook emits these `type` values into the journal:
- `build_loop`: cluster by `session`.
- `subagent_dispatch_pattern`: cluster by `subagent_type`.
- `silent_drift`: cluster by `active_skills[0]` (empty string when no skill is active).
- `file_reread`: cluster by file basename (same offset-agnostic same-file re-Read pattern).
- `error_after_recovery`: cluster by (`recovered_from`, `original_fp`).
- `correction_free_streak`: cluster by `active_skills[0]`. Treat ≥3 streaks across ≥2 sessions naming the same skill as cross-session evidence.
- `clean_recovery`: cluster by (`recovered_from`, `active_skills[0]`). A win cluster qualifies for `skill_edit` only when the named skill exists in `skills_root`.
- `task_completed`: cluster by sorted `tool_kinds` tuple (the multi-tool recipe). Single entry qualifies for `skill_new` proposal (drafting protocol applies). Cross-session evidence requires ≥2 entries from distinct sessions with same tuple — without it, proposal queues, never auto-applies. Run the existing skill-overlap rule before drafting: if the recipe matches an existing skill's name/description tokens, route to `skill_edit` instead.
5. **Multi-axis correlation**: for each session that produced ≥2 distinct struggle types (`tool_error_loop`, `dead_end`, `weak_agent`, `retry_loop`, `edit_churn`, `build_loop`, `silent_drift`, `error_after_recovery`), tag clusters from that session as `multi_axis: true`. This grants +1 confidence at scoring.
5. **Multi-axis correlation**: for each session that produced ≥2 distinct struggle types (`tool_error_loop`, `dead_end`, `weak_agent`, `retry_loop`, `edit_churn`, `build_loop`, `silent_drift`, `file_reread`, `error_after_recovery`), tag clusters from that session as `multi_axis: true`. This grants +1 confidence at scoring.
5b. **Skill-attribution sub-clustering**: after primary clustering (step 4), for every struggle cluster (`tool_error_loop`, `dead_end`, `weak_agent`, `retry_loop`, `edit_churn`, `build_loop`, `silent_drift`, `error_after_recovery`) that contains entries with non-empty `active_skills[0]`:
5b. **Skill-attribution sub-clustering**: after primary clustering (step 4), for every struggle cluster (`tool_error_loop`, `dead_end`, `weak_agent`, `retry_loop`, `edit_churn`, `build_loop`, `silent_drift`, `file_reread`, `error_after_recovery`) that contains entries with non-empty `active_skills[0]`:
- Split into per-skill sub-clusters keyed on `active_skills[0]`. Entries with empty `active_skills` stay in the original cluster.
- If a sub-cluster has ≥3 entries AND names a skill that exists in `skills_root`, mark it as a candidate for `skill_edit` (struggle-driven variant; see "Struggle-driven `skill_edit` eligibility"). Otherwise treat the parent cluster normally.
- The umbrella cluster (cross-skill) still emits its usual proposal type (memory, etc.) — sub-clusters do NOT replace it, they supplement it.
@@ -247,10 +250,12 @@ Required structure:
```markdown
---
name: <human-readable name, ≤80 chars>
description: <one-line description used to decide future relevance — be specific, ≤200 chars>
type: user | feedback | project | reference
originSessionId: <session_id from journal entries that fed this cluster>
name: <slug — snake_case, MUST equal the target filename without `.md`, e.g. feedback_go_test_cache>
description: "<one-line used to decide future relevance — be specific, ≤200 chars>"
metadata:
node_type: memory
type: user | feedback | project | reference
originSessionId: <session_id from journal entries that fed this cluster>
---
<Body content per type, see CLAUDE.md memory schema:
@@ -260,12 +265,17 @@ originSessionId: <session_id from journal entries that fed this cluster>
- reference: pointer to external system + what's there.>
```
The frontmatter MUST match the live auto-memory schema exactly: `name` is the
slug (NOT a prose title), and `node_type`, `type`, `originSessionId` live under
a `metadata:` block (verify against an existing file in the target memory dir
before drafting — match its shape).
Constraints:
- Frontmatter fields `name`, `description`, `type` are **required**. Skill enforces this at apply time.
- `originSessionId` is required — must be a `session` value from one of the cluster's journal entries.
- Top-level `name` + `description` and nested `metadata.node_type` (always `memory`) + `metadata.type` are **required**. Skill enforces this at apply time.
- `metadata.originSessionId` is required — must be a `session` value from one of the cluster's journal entries.
- ≤50 LOC of body content. Surgical.
- Slug (used in `target` path filename) must not collide with any existing memory file.
- For `type=feedback` and `type=project`, body MUST contain `**Why:**` and `**How to apply:**` lines (CLAUDE.md memory schema).
- `name`/slug (also the `target` path filename) must not collide with any existing memory file.
- For `type: feedback` and `type: project`, body MUST contain `**Why:**` and `**How to apply:**` lines (CLAUDE.md memory schema).
## Diagnosis drafting protocol (required for every proposal)
@@ -425,10 +435,10 @@ The matrix goes into the diagnosis output as `keypoints: {tool_selection: N, sco
Sum:
- Signal repeated ≥3× across ≥2 sessions: **+2**
- Struggle signal (`tool_error_loop`, `dead_end`, `weak_agent`, `retry_loop`, `edit_churn`, `build_loop`, `silent_drift`, `error_after_recovery`) appearing ≥1× within a single session: **+2** *(each struggle entry already represents a hook-side threshold crossing — e.g. 8 tools without a prompt, 3 same-args retries, 4 edits to one file, 5 read-only tools in a row, same-fp error after a recovery. Treat each entry as one piece of evidence. Does not stack with the cross-session bonus.)*
- Struggle signal (`tool_error_loop`, `dead_end`, `weak_agent`, `retry_loop`, `edit_churn`, `build_loop`, `silent_drift`, `file_reread`, `error_after_recovery`) appearing ≥1× within a single session: **+2** *(each struggle entry already represents a hook-side threshold crossing — e.g. 8 tools without a prompt, 3 same-args retries, 4 edits to one file, 5 read-only tools in a row, same-fp error after a recovery. Treat each entry as one piece of evidence. Does not stack with the cross-session bonus.)*
- Transcript contains positive endorsement (`yes`, `exactly`, `do that`, `keep doing`) within 2 messages of related action: **+2**
- Multi-axis cluster (≥2 distinct struggle types in same session): **+1**
- Cluster severity-sum ≥ 10 (severity per entry = `max(1, floor(count / divisor))` using `SEVERITY_DIVISORS` from `adam-score.mjs``dead_end:8, edit_churn:4, tool_error_loop:3, retry_loop:3, weak_agent:2, build_loop:1`; entries without `count` count as 1): **+1**
- Cluster severity-sum ≥ 10 (severity per entry = `max(1, floor(count / divisor))` using `SEVERITY_DIVISORS` from `adam-score.mjs``dead_end:8, edit_churn:4, tool_error_loop:3, retry_loop:3, file_reread:3, weak_agent:2, build_loop:1`; entries without `count` count as 1): **+1**
- Cluster severity-sum ≥ 32: **+1** *(additive — a severity-sum of 32 gets +1 from the previous bullet AND +1 here, total +2.)*
- Skill-attributed sub-cluster (≥3 entries naming the same `active_skills[0]` that exists in `skills_root`): **+1**
- Type-bias penalty from feedback loop (≥3 rejections, applied:rejected ratio <1:2 for this `type`): **-1**
@@ -506,7 +516,7 @@ MOSS's core thesis: "routing, hook ordering, state invariants, and dispatch live
2. `cross_session_evidence == true` (≥5 occurrences across ≥3 sessions)
3. `auto_apply_eligible: false`**always**. Harness edits are never auto-applied.
4. `blast_radius: high`
5. Proposal includes a `# Test verification` section with the command `bash ~/.claude/adam/tests/run-tests.sh` and the expected result "126 passed, 0 failed" (or current pass count). The skill runs this test before applying.
5. Proposal includes a `# Test verification` section with the command `bash ~/.claude/adam/tests/run-tests.sh` and the expected result "134 passed, 0 failed" (or current pass count). The skill runs this test before applying.
6. Change is surgical: ≤30 LOC diff, single file.
7. `# Diagnosis` reconstructs the causal chain from harness-level behavior (not from text-artifact behavior). The mismatch must name a specific code path (function, regex, threshold) in the target file.
+12
View File
@@ -105,11 +105,13 @@ const SUBAGENT_DISPATCH_THRESHOLD = 3;
const CORRECTION_FREE_THRESHOLD = 5;
const CLEAN_RECOVERY_WINDOW = 3;
const SILENT_DRIFT_THRESHOLD = 5;
const FILE_REREAD_THRESHOLD = 3;
const ERROR_AFTER_RECOVERY_WINDOW = 5;
const RECENT_RECOVERIES_MAX = 3;
const STRUGGLE_TYPES = new Set([
"tool_error_loop", "dead_end", "retry_loop", "weak_agent",
"edit_churn", "build_loop", "silent_drift", "error_after_recovery",
"file_reread",
]);
const ACTIVE_SKILLS_LOOKBACK = 10;
const TASK_TOOL_MIN = 5;
@@ -470,6 +472,16 @@ function main() {
emit({ ts, session, cwd, type: "retry_loop", tool, count: sameToolArgs });
}
// Offset-aware same-file reread: consecutive Reads of the same file_path
// (ignoring offset/limit) escape the argsHash-based retry_loop dedup above.
// Emit a distinct, actionable signal instead of leaking into tool_error_loop.
if (READ_ONLY_TOOLS.has(tool) && file) {
const sameFileReads = state.tool_window.filter(e => e.tool === tool && e.file === file).length;
if (sameFileReads >= FILE_REREAD_THRESHOLD && sameToolArgs < RETRY_THRESHOLD) {
emit({ ts, session, cwd, type: "file_reread", tool, file, count: sameFileReads });
}
}
if (READ_ONLY_TOOLS.has(tool)) {
state.silentDriftCounter += 1;
if (state.silentDriftCounter >= SILENT_DRIFT_THRESHOLD && !state.silentDriftEmitted) {
+1 -1
View File
@@ -300,7 +300,7 @@ Before writing any proposal:
- For `skill_new`: confirm the slug doesn't collide with any existing skill in `~/.claude/skills/`. If it does, refuse and ask user to rename.
- For `skill_edit`: confirm the diff is append-only (no `-` lines that remove existing content) and that target SKILL.md exists. When auto-applying, ALSO re-verify the eligibility gate steps in §3 (cooldown, blacklist, byte cap) before any `Edit` call — never trust frontmatter alone.
- For `skill_edit` with `auto_apply_eligible: true`: confirm `contradiction_flag` is absent or null in frontmatter. Refuse auto-apply if `contradiction_flag` is set with any non-empty value (treat the agent's flag as a hard veto on auto-apply; user can still manually approve in walk-the-queue if they disagree with the heuristic).
- For `memory`: confirm `# Proposed change` body starts with `---` frontmatter containing required fields `name`, `description`, `type`, `originSessionId`. Refuse if frontmatter missing — agent must redraft per the Memory drafting protocol.
- For `memory`: confirm `# Proposed change` body starts with `---` frontmatter matching the live auto-memory schema — top-level `name` (the slug) + `description`, plus a `metadata:` block with `node_type: memory`, `type`, and `originSessionId`. Cross-check the shape against an existing file in the target memory dir. Refuse if frontmatter is flat (`type:`/`originSessionId:` at top level) or missing the `metadata:` block — agent must redraft per the Memory drafting protocol.
- For `harness_edit`: confirm `auto_apply_eligible: false` (never auto-apply). Confirm `confidence ≥ 5`. Confirm `# Test verification` section names the test command. Confirm diff is ≤30 LOC and targets a single allowed harness file (see `agents/adam.md` §"Harness self-modification"). Run test suite before AND after applying — revert on any regression.
- Confirm `source_entries` is present in proposal frontmatter as a non-empty list (used for archive). Warn (do not refuse) if missing — legacy proposals from before v0.2.0 won't have it.