From 3a54d7d3e1a1417c57c39cb8f62cda6a282ea40a Mon Sep 17 00:00:00 2001 From: Lukasz Raczylo Date: Fri, 29 May 2026 11:31:50 +0100 Subject: [PATCH] =?UTF-8?q?feat(v0.6.1):=20file=5Freread=20signal=20?= =?UTF-8?q?=E2=80=94=20catch=20offset-shifted=20same-file=20re-reads?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Proposed and approved through ADAM's own /reflect harness_edit loop (MOSS §1): the analyst surfaced 23 tool_error_loop entries across 4 sessions whose context windows were really redundant re-reads of one file. retry_loop keys on argsHash of the full tool_input (including offset/limit), so consecutive Reads of the SAME file at different offsets escaped dedup and leaked into tool_error_loop fingerprints. The new file_reread signal catches them: same file Read >=3x in the 10-event window, offset-agnostic (keyed on file path), guarded by `sameToolArgs < RETRY_THRESHOLD` so byte-identical reads stay with retry_loop (no double-count). Fully wired end-to-end (not a half-dead signal): - adam-observe.mjs: detection + STRUGGLE_TYPES membership (so it carries context_window + active_skills like other struggle signals). - adam-window.mjs: 14-day sliding window (task-local, like retry_loop). - adam-score.mjs: severity divisor 3. - adam-batch.mjs: file-basename clustering. - agents/adam.md + README: signal tables, clustering rules, rubric, windows. Tests: 126 -> 132 (file_reread fires on 3x offset-shifted reads, not on 2x; byte-identical reads route to retry_loop not file_reread; carries context_window). --- README.md | 8 +++++--- adam/scripts/adam-batch.mjs | 2 ++ adam/scripts/adam-score.mjs | 1 + adam/scripts/adam-window.mjs | 1 + adam/tests/run-tests.sh | 24 ++++++++++++++++++++++++ agents/adam.md | 13 ++++++++----- hooks/adam-observe.mjs | 12 ++++++++++++ 7 files changed, 53 insertions(+), 8 deletions(-) diff --git a/README.md b/README.md index 34d100f..3f43dc6 100644 --- a/README.md +++ b/README.md @@ -13,7 +13,7 @@ Watches the friction in your coding sessions, clusters the signals via an LLM an [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE) [![Version](https://img.shields.io/github/v/release/lukaszraczylo/claude-adam?label=version&color=blue)](https://github.com/lukaszraczylo/claude-adam/releases) -[![Tests](https://img.shields.io/badge/tests-126%20passing-brightgreen.svg)](./adam/tests/run-tests.sh) +[![Tests](https://img.shields.io/badge/tests-132%20passing-brightgreen.svg)](./adam/tests/run-tests.sh) [![Node](https://img.shields.io/badge/node-22%2B-339933.svg)](https://nodejs.org) [![Platform](https://img.shields.io/badge/platform-macOS%20%7C%20Linux-lightgrey.svg)]() @@ -54,7 +54,7 @@ The installer copies files into `~/.claude/`, offers to merge ADAM's hook entrie Then: ```sh -bash ~/.claude/adam/tests/run-tests.sh # expect: 126 passed, 0 failed +bash ~/.claude/adam/tests/run-tests.sh # expect: 132 passed, 0 failed # … start a fresh Claude Code session … /reflect # walks the proposal queue /reflect --explain # also shows the analyst's clustering trace @@ -132,6 +132,7 @@ Auto-apply runs only for low-blast types (memory entries, new skills, ephemeral | `tool_error_loop` | Same error fingerprint 3× in a 5-event ring (fingerprints normalised — `ECONNREFUSED` and `"Connection refused"` cluster) | 30d | | `dead_end` | 8 PostToolUse events without a UserPromptSubmit between them | 7d | | `edit_churn` | Same file edited 4× in a window | 14d | +| `file_reread` | Same file Read ≥3× in the 10-event window, ignoring offset/limit (catches re-reads that escape `retry_loop`'s arg-hash dedup) | 14d | | `build_loop` | 2× build/test/compile commands fail in same session | 30d | | `subagent_dispatch_pattern` | Same subagent dispatched ≥ 3× cumulatively | 30d | | `correction_free_streak` | 5 clean UserPromptSubmits in a row — reinforcement input | 60d | @@ -247,11 +248,12 @@ Or pass `--explain` to `/reflect` to render the full trace inline. │ ├── adam-apply-reinforcement.mjs # reinforcement proposal apply │ ├── adam-upgrade.mjs # .adam-new file UX (list/diff/accept) │ └── adam-archive.mjs # post-apply journal cleanup - └── tests/run-tests.sh # 126 isolated tests; never touches live state + └── tests/run-tests.sh # 132 isolated tests; never touches live state ``` ## What's new +- **v0.6.1** — new `file_reread` signal (MOSS §1 harness self-modification, proposed and approved through ADAM's own `/reflect` loop). Consecutive Reads of the same file at different `offset`/`limit` escaped `retry_loop`'s arg-hash dedup and leaked into `tool_error_loop`; `file_reread` now catches them (same file ≥3× in the 10-event window, offset-agnostic, guarded against double-counting byte-identical reads). Fully wired: detection (`adam-observe.mjs`), 14-day window (`adam-window.mjs`), severity divisor 3 (`adam-score.mjs`), file-basename clustering (`adam-batch.mjs`), and the analyst rubric/spec. 132 tests (up from 126). - **v0.6.0** — review hardening. Struggle signals now emit `active_skills`, so `silent_drift`'s primary cluster key and the §5b skill-attribution sub-clustering (+1 rubric bonus) actually fire (both were silently dead). `proposal_fingerprint` is now deterministically computable via `adam-cooldown.mjs --compute` instead of asking the LLM analyst to hand-compute a djb2 hash; spec now mandates a *stable* cluster id so fingerprints reproduce across runs. `reinforcement` proposals are correctly excluded from A/B tracking (the spec previously contradicted itself). `adam-nudge.mjs` pending-upgrade check now mirrors the full install set (`adam-utils`/`adam-batch`/`adam-rollback` were missing). Doc/test-count drift corrected. 126 tests (up from 114). - **v0.5.0** — MOSS-grounded self-evolution (arXiv 2605.22794). Transcript capture: `context_window` field on struggle signals captures 8 surrounding events for evidence-based diagnosis. Two-stage analysis pipeline: diagnose+plan → inter-stage validation → implement (§3.3). Evidence batching via `adam-batch.mjs`: pre-clusters journal into coherent failure batches (§3.1). Pre-apply verification: 4-check deterministic gate before auto-apply (§3.4). Auto-rollback via `adam-rollback.mjs`: reverts regressed proposals detected by A/B measurement, creates regression nudges (§3.5). Harness self-modification: new `harness_edit` proposal type lets ADAM propose edits to its own scripts with test-suite-gated apply (§1 Table 1). Keypoint matrix: 5 capability dimensions scored per batch for structured evaluation (§4.2). 114 tests (up from 94). - **v0.4.0** — expanded struggle detection: `silent_drift` (5 consecutive read-only tools), `error_after_recovery` (same error fingerprint returns after clean recovery); severity-sum scoring with per-type divisors; extended `STRUGGLE_TYPES` set. 94 tests (up from 87). diff --git a/adam/scripts/adam-batch.mjs b/adam/scripts/adam-batch.mjs index 66f6e04..06cd87b 100755 --- a/adam/scripts/adam-batch.mjs +++ b/adam/scripts/adam-batch.mjs @@ -11,6 +11,7 @@ // tool_error_loop→ fp // dead_end → session // edit_churn → file basename +// file_reread → file basename // build_loop → session // subagent_dispatch_pattern → subagent_type // silent_drift → active_skills[0] @@ -65,6 +66,7 @@ function clusterKey(entry) { case "build_loop": return entry.session || "unknown"; case "edit_churn": + case "file_reread": return entry.file ? entry.file.split("/").pop() : "unknown"; case "silent_drift": case "correction_free_streak": diff --git a/adam/scripts/adam-score.mjs b/adam/scripts/adam-score.mjs index 2f53465..4a56707 100755 --- a/adam/scripts/adam-score.mjs +++ b/adam/scripts/adam-score.mjs @@ -58,6 +58,7 @@ export const SEVERITY_DIVISORS = { edit_churn: 4, tool_error_loop: 3, retry_loop: 3, + file_reread: 3, weak_agent: 2, build_loop: 1, }; diff --git a/adam/scripts/adam-window.mjs b/adam/scripts/adam-window.mjs index 1c5801d..08b37a5 100755 --- a/adam/scripts/adam-window.mjs +++ b/adam/scripts/adam-window.mjs @@ -30,6 +30,7 @@ export const SIGNAL_WINDOWS_DAYS = { weak_agent: 30, subagent_dispatch_pattern: 30, silent_drift: 14, + file_reread: 14, error_after_recovery: 30, correction_free_streak: 60, clean_recovery: 60, diff --git a/adam/tests/run-tests.sh b/adam/tests/run-tests.sh index d3f0c28..3534346 100755 --- a/adam/tests/run-tests.sh +++ b/adam/tests/run-tests.sh @@ -1990,6 +1990,30 @@ else fi rm -f "$ROOT/rejected/2026-blk-31.md" +# --- Test 110: file_reread fires on 3x offset-shifted same-file reads, not 2x --- +echo "Test 110: file_reread (offset-shifted same-file reads escape retry_loop)" +reset_state +for off in 0 100; do + echo "{\"hook_event_name\":\"PostToolUse\",\"tool_name\":\"Read\",\"tool_input\":{\"file_path\":\"/tmp/big.go\",\"offset\":$off},\"tool_response\":{\"content\":\"ok\"},\"session_id\":\"sFR\",\"cwd\":\"/tmp/x\"}" \ + | HOOK_RUN >/dev/null 2>&1 || true +done +assert_no_grep "$ROOT/journal.jsonl" '"type":"file_reread"' "2x same-file reads does NOT emit file_reread" +echo '{"hook_event_name":"PostToolUse","tool_name":"Read","tool_input":{"file_path":"/tmp/big.go","offset":200},"tool_response":{"content":"ok"},"session_id":"sFR","cwd":"/tmp/x"}' \ + | HOOK_RUN >/dev/null 2>&1 || true +assert_grep "$ROOT/journal.jsonl" '"type":"file_reread"' "3x offset-shifted same-file reads emit file_reread" +assert_no_grep "$ROOT/journal.jsonl" '"type":"retry_loop"' "offset-shifted reads do NOT emit retry_loop (argsHash differs)" +assert_grep "$ROOT/journal.jsonl" '"type":"file_reread".*"context_window"' "file_reread carries context_window (in STRUGGLE_TYPES)" + +# --- Test 111: byte-identical reread is caught by retry_loop, not double-counted as file_reread --- +echo "Test 111: identical reads → retry_loop (file_reread guard avoids double-count)" +reset_state +for i in 1 2 3; do + echo '{"hook_event_name":"PostToolUse","tool_name":"Read","tool_input":{"file_path":"/tmp/same.go"},"tool_response":{"content":"ok"},"session_id":"sFR2","cwd":"/tmp/x"}' \ + | HOOK_RUN >/dev/null 2>&1 || true +done +assert_grep "$ROOT/journal.jsonl" '"type":"retry_loop"' "3x byte-identical reads emit retry_loop" +assert_no_grep "$ROOT/journal.jsonl" '"type":"file_reread"' "byte-identical reads NOT double-counted as file_reread (sameToolArgs>=RETRY guard)" + echo echo "Results: $PASS passed, $FAIL failed" [ "$FAIL" = "0" ] diff --git a/agents/adam.md b/agents/adam.md index a106851..2ff7421 100644 --- a/agents/adam.md +++ b/agents/adam.md @@ -104,6 +104,7 @@ Per-signal windows (single source of truth: `SIGNAL_WINDOWS_DAYS` in `~/.claude/ | `weak_agent` | 30 d | subagent quality signal | | `subagent_dispatch_pattern` | 30 d | dispatch routing pattern | | `silent_drift` | 14 d | exploration-without-action is task-local | +| `file_reread` | 14 d | redundant same-file reads are task-local | | `error_after_recovery` | 30 d | recovery-then-same-error patterns persist | | `correction_free_streak` | 60 d | wins accumulate slowly | | `clean_recovery` | 60 d | wins accumulate slowly | @@ -127,6 +128,7 @@ The hook emits these `type` values into the journal: | `build_loop` | 2 build/test/compile commands fail in session | session | | `subagent_dispatch_pattern` | same subagent dispatched ≥3× cumulatively | subagent_type | | `silent_drift` | 5 consecutive read-only PostToolUse without an action tool (reset on action or UserPromptSubmit) | `active_skills[0]` | +| `file_reread` | same file Read ≥3× in the 10-tool window, ignoring offset/limit (escapes `retry_loop`'s argsHash dedup) | file basename | | `error_after_recovery` | same error fingerprint returns within 5 PostToolUse of a `clean_recovery` | (`recovered_from`, `original_fp`) | | `correction_free_streak` | 5 clean UserPromptSubmits in a row (no correction phrase) | `active_skills[0]` | | `clean_recovery` | 3 clean PostToolUse events after a `tool_error_loop`/`dead_end`/`retry_loop` | (`recovered_from`, `active_skills[0]`) | @@ -154,13 +156,14 @@ The hook emits these `type` values into the journal: - `build_loop`: cluster by `session`. - `subagent_dispatch_pattern`: cluster by `subagent_type`. - `silent_drift`: cluster by `active_skills[0]` (empty string when no skill is active). + - `file_reread`: cluster by file basename (same offset-agnostic same-file re-Read pattern). - `error_after_recovery`: cluster by (`recovered_from`, `original_fp`). - `correction_free_streak`: cluster by `active_skills[0]`. Treat ≥3 streaks across ≥2 sessions naming the same skill as cross-session evidence. - `clean_recovery`: cluster by (`recovered_from`, `active_skills[0]`). A win cluster qualifies for `skill_edit` only when the named skill exists in `skills_root`. - `task_completed`: cluster by sorted `tool_kinds` tuple (the multi-tool recipe). Single entry qualifies for `skill_new` proposal (drafting protocol applies). Cross-session evidence requires ≥2 entries from distinct sessions with same tuple — without it, proposal queues, never auto-applies. Run the existing skill-overlap rule before drafting: if the recipe matches an existing skill's name/description tokens, route to `skill_edit` instead. -5. **Multi-axis correlation**: for each session that produced ≥2 distinct struggle types (`tool_error_loop`, `dead_end`, `weak_agent`, `retry_loop`, `edit_churn`, `build_loop`, `silent_drift`, `error_after_recovery`), tag clusters from that session as `multi_axis: true`. This grants +1 confidence at scoring. +5. **Multi-axis correlation**: for each session that produced ≥2 distinct struggle types (`tool_error_loop`, `dead_end`, `weak_agent`, `retry_loop`, `edit_churn`, `build_loop`, `silent_drift`, `file_reread`, `error_after_recovery`), tag clusters from that session as `multi_axis: true`. This grants +1 confidence at scoring. -5b. **Skill-attribution sub-clustering**: after primary clustering (step 4), for every struggle cluster (`tool_error_loop`, `dead_end`, `weak_agent`, `retry_loop`, `edit_churn`, `build_loop`, `silent_drift`, `error_after_recovery`) that contains entries with non-empty `active_skills[0]`: +5b. **Skill-attribution sub-clustering**: after primary clustering (step 4), for every struggle cluster (`tool_error_loop`, `dead_end`, `weak_agent`, `retry_loop`, `edit_churn`, `build_loop`, `silent_drift`, `file_reread`, `error_after_recovery`) that contains entries with non-empty `active_skills[0]`: - Split into per-skill sub-clusters keyed on `active_skills[0]`. Entries with empty `active_skills` stay in the original cluster. - If a sub-cluster has ≥3 entries AND names a skill that exists in `skills_root`, mark it as a candidate for `skill_edit` (struggle-driven variant; see "Struggle-driven `skill_edit` eligibility"). Otherwise treat the parent cluster normally. - The umbrella cluster (cross-skill) still emits its usual proposal type (memory, etc.) — sub-clusters do NOT replace it, they supplement it. @@ -425,10 +428,10 @@ The matrix goes into the diagnosis output as `keypoints: {tool_selection: N, sco Sum: - Signal repeated ≥3× across ≥2 sessions: **+2** -- Struggle signal (`tool_error_loop`, `dead_end`, `weak_agent`, `retry_loop`, `edit_churn`, `build_loop`, `silent_drift`, `error_after_recovery`) appearing ≥1× within a single session: **+2** *(each struggle entry already represents a hook-side threshold crossing — e.g. 8 tools without a prompt, 3 same-args retries, 4 edits to one file, 5 read-only tools in a row, same-fp error after a recovery. Treat each entry as one piece of evidence. Does not stack with the cross-session bonus.)* +- Struggle signal (`tool_error_loop`, `dead_end`, `weak_agent`, `retry_loop`, `edit_churn`, `build_loop`, `silent_drift`, `file_reread`, `error_after_recovery`) appearing ≥1× within a single session: **+2** *(each struggle entry already represents a hook-side threshold crossing — e.g. 8 tools without a prompt, 3 same-args retries, 4 edits to one file, 5 read-only tools in a row, same-fp error after a recovery. Treat each entry as one piece of evidence. Does not stack with the cross-session bonus.)* - Transcript contains positive endorsement (`yes`, `exactly`, `do that`, `keep doing`) within 2 messages of related action: **+2** - Multi-axis cluster (≥2 distinct struggle types in same session): **+1** -- Cluster severity-sum ≥ 10 (severity per entry = `max(1, floor(count / divisor))` using `SEVERITY_DIVISORS` from `adam-score.mjs` — `dead_end:8, edit_churn:4, tool_error_loop:3, retry_loop:3, weak_agent:2, build_loop:1`; entries without `count` count as 1): **+1** +- Cluster severity-sum ≥ 10 (severity per entry = `max(1, floor(count / divisor))` using `SEVERITY_DIVISORS` from `adam-score.mjs` — `dead_end:8, edit_churn:4, tool_error_loop:3, retry_loop:3, file_reread:3, weak_agent:2, build_loop:1`; entries without `count` count as 1): **+1** - Cluster severity-sum ≥ 32: **+1** *(additive — a severity-sum of 32 gets +1 from the previous bullet AND +1 here, total +2.)* - Skill-attributed sub-cluster (≥3 entries naming the same `active_skills[0]` that exists in `skills_root`): **+1** - Type-bias penalty from feedback loop (≥3 rejections, applied:rejected ratio <1:2 for this `type`): **-1** @@ -506,7 +509,7 @@ MOSS's core thesis: "routing, hook ordering, state invariants, and dispatch live 2. `cross_session_evidence == true` (≥5 occurrences across ≥3 sessions) 3. `auto_apply_eligible: false` — **always**. Harness edits are never auto-applied. 4. `blast_radius: high` -5. Proposal includes a `# Test verification` section with the command `bash ~/.claude/adam/tests/run-tests.sh` and the expected result "126 passed, 0 failed" (or current pass count). The skill runs this test before applying. +5. Proposal includes a `# Test verification` section with the command `bash ~/.claude/adam/tests/run-tests.sh` and the expected result "132 passed, 0 failed" (or current pass count). The skill runs this test before applying. 6. Change is surgical: ≤30 LOC diff, single file. 7. `# Diagnosis` reconstructs the causal chain from harness-level behavior (not from text-artifact behavior). The mismatch must name a specific code path (function, regex, threshold) in the target file. diff --git a/hooks/adam-observe.mjs b/hooks/adam-observe.mjs index 24e2d84..13e3e9d 100755 --- a/hooks/adam-observe.mjs +++ b/hooks/adam-observe.mjs @@ -105,11 +105,13 @@ const SUBAGENT_DISPATCH_THRESHOLD = 3; const CORRECTION_FREE_THRESHOLD = 5; const CLEAN_RECOVERY_WINDOW = 3; const SILENT_DRIFT_THRESHOLD = 5; +const FILE_REREAD_THRESHOLD = 3; const ERROR_AFTER_RECOVERY_WINDOW = 5; const RECENT_RECOVERIES_MAX = 3; const STRUGGLE_TYPES = new Set([ "tool_error_loop", "dead_end", "retry_loop", "weak_agent", "edit_churn", "build_loop", "silent_drift", "error_after_recovery", + "file_reread", ]); const ACTIVE_SKILLS_LOOKBACK = 10; const TASK_TOOL_MIN = 5; @@ -470,6 +472,16 @@ function main() { emit({ ts, session, cwd, type: "retry_loop", tool, count: sameToolArgs }); } + // Offset-aware same-file reread: consecutive Reads of the same file_path + // (ignoring offset/limit) escape the argsHash-based retry_loop dedup above. + // Emit a distinct, actionable signal instead of leaking into tool_error_loop. + if (READ_ONLY_TOOLS.has(tool) && file) { + const sameFileReads = state.tool_window.filter(e => e.tool === tool && e.file === file).length; + if (sameFileReads >= FILE_REREAD_THRESHOLD && sameToolArgs < RETRY_THRESHOLD) { + emit({ ts, session, cwd, type: "file_reread", tool, file, count: sameFileReads }); + } + } + if (READ_ONLY_TOOLS.has(tool)) { state.silentDriftCounter += 1; if (state.silentDriftCounter >= SILENT_DRIFT_THRESHOLD && !state.silentDriftEmitted) {