chore(v0.3.3): analyst observability, A/B measurement, journal hygiene

Storage/window/exclusion split (#7): ISO-week journal rotation with safety
fuse replaces size-based rotation (fixes silent under-counting when clusters
straddle boundaries). Per-signal sliding windows via adam-window.mjs guard
against stale signal accumulation. Legacy YYYY-MM-DD-<ts>.jsonl files remain
readable.

Error fingerprint normalization (#3): adam-observe.mjs extracts canonical
error codes (ENOENT, ECONNREFUSED, etc.) and normalizes paths/timestamps/hex
before hashing. 'Connection refused' and 'ECONNREFUSED' now cluster identically.

Correction corpus expansion (#1): strong tokens (stop, wrong, undo, try again,
different approach, etc.) fire on any occurrence. Weak tokens (no, actually,
wait) require negation/contrast co-occurrence within 8 tokens. Kills the
'actually, I think...' false positive.

Analyst observability (#6): mandatory clustering trace block; adam-explain.mjs
parses to summary/full/json. Cluster decisions now surface rejection reasons
(threshold, contradiction, window). Persisted to ~/.claude/adam/last-trace.txt.

Dead_end nudge proposal type (#2): single-session auto-apply gate (>=3
dead_end events). Action appends to active-nudges.json, surfaced via
adam-nudge.mjs at next SessionStart. Lower blast than skill_edit.

Per-(skill, fingerprint) cooldown (#4): adam-cooldown.mjs replaces coarse
per-skill check. proposal_fingerprint = djb2(skill_slug + cluster_id +
normalized_diff_body). Legacy applied/rejected records gate via 'legacy'
fingerprint fallback through resolveSkill helper (handles target_skill,
skill, or target: <path>).

task_completed scoring integration (#8): adam-score.mjs computes per-session
urgency dampener (3 task_completed -> 0.5) and reinforcement candidates
(skills cited in >=3 clean completions). New 'reinforcement' proposal type
appends to reinforcements.jsonl on apply (no code/memory mutation).

A/B effectiveness measurement (#5): every auto-applied edit appends to
ab-tracking.jsonl. adam-ab-measure.mjs computes 7d pre/post signal-count
delta per entry (improved / neutral / regressed / no_baseline / pending).
Analyst surfaces regressions at top of /reflect output.

Upgrade UX overhaul (#9): adam-upgrade.mjs implements --list/--diff/--accept
/--accept-all. SessionStart nudge prints pending-merge warning when
.adam-new files exist (latency ~20ms via fixed shortlist). install.sh
emits unmissable final-message hint after creating any .adam-new file.

Simplify pass: adam-utils.mjs deduplicates readJsonlSafe / listJsonlFiles /
parseFrontmatter across 8 scripts. Net -46 LOC.

Test coverage: 30 -> 87 tests. Every new feature has feature-validating
assertions (false-case coverage included). T77 statically verifies install.sh
references every adam-*.mjs source script (would have caught the missing
adam-utils inclusion that review #2 surfaced).
This commit is contained in:
2026-05-13 01:02:33 +01:00
parent 7ddda26bb4
commit 012c40b9ab
18 changed files with 3064 additions and 81 deletions
+2
View File
@@ -4,6 +4,8 @@ Self-improvement layer for [Claude Code](https://claude.com/claude-code) that ob
## What's new
- **v0.3.3** — analyst observability, A/B measurement, journal hygiene. Storage/window/exclusion split: ISO-week journal rotation with safety fuse (replaces size-based, fixes silent under-counting); per-signal sliding windows via new `adam-window.mjs` (`dead_end` 7d, `correction` 30d, reinforcement signals 60d). Error fingerprint normalization — `ECONNREFUSED` and `"Connection refused"` cluster identically. Correction corpus expanded (`wait`, `hold on`, `try again`, `different approach`); weak tokens (`no`, `actually`, `wait`) require negation co-occurrence within 8 tokens to fire — kills the `"actually, I think..."` false positive. Mandatory clustering trace + new `adam-explain.mjs --mode summary|full|json`. New `nudge` proposal type (single-session auto-apply, low blast) for repeated `dead_end`. Per-(skill, fingerprint) cooldown via `adam-cooldown.mjs` (replaces coarse per-skill gate). `task_completed` scoring: urgency dampener + reinforcement candidates. A/B effectiveness measurement on auto-applied edits (`adam-ab-measure.mjs`, 7d pre/post window). Upgrade UX overhaul: `adam-upgrade.mjs --list/--diff/--accept` + SessionStart pending-merge warning. Shared helper module `adam-utils.mjs` deduplicates journal-reading and frontmatter parsing across scripts. 87 tests (up from 30).
- **v0.3.2** — `task_completed` signal: post-task skill capture for downstream reinforcement scoring (consumed in v0.3.3).
- **v0.3.1** — code review pass: bug fixes (`errorFingerprint` no longer false-positives on `is_error: false`, archive script handles same-millisecond duplicates correctly, `tool_window` now clears on session change, nudge filters proposal filenames by pattern), prose conciseness cuts, hardened `install.sh` with curl one-liner + settings.json merge, `adam-uninstall.sh`, isolated test harness (no longer pollutes live `~/.claude/adam/` state).
- **v0.3.0** — causal diagnosis: every proposal carries a `# Diagnosis` block (Trigger/Action/Mismatch/Outcome with verbatim transcript quote) before drafting, plus optional `contradiction_flag` heuristic that vetoes auto-apply on obviously-conflicting `skill_edit` additions.
- **v0.2.1** — win signals (`correction_free_streak`, `clean_recovery`) feed `skill_edit` auto-apply under a strict gate (≤30 LOC, ≤2× byte cap, 7d cooldown, 30d blacklist on rejection).