claude-adam

mirror of https://github.com/lukaszraczylo/claude-adam.git synced 2026-06-09 23:19:12 +00:00

Author	SHA1	Message	Date
lukaszraczylo	4b36d6c09e	feat(v0.6.0): review hardening — live active_skills clustering, computable fingerprints Full codebase review (multi-agent, adversarially verified) surfaced several documented-but-dead mechanisms and doc/code drift. Fixes: - adam-observe: struggle signals now emit `active_skills`, so silent_drift's primary cluster key AND §5b skill-attribution sub-clustering (+1 rubric bonus) actually fire — both were silently dead (no struggle signal carried the field). - adam-cooldown: new `--compute` CLI deterministically derives proposal_fingerprint. The exported computeProposalFingerprint() was never called and the analyst was told to hand-compute a djb2 hash it cannot reproduce. Spec now mandates a stable cluster id so fingerprints reproduce across /reflect runs. Removed one dead normalization line. - spec: reinforcement proposals excluded from A/B tracking — agents/adam.md contradicted itself (:376 included, :476 excluded); SKILL.md aligned. - adam-nudge: PENDING_CHECK_PATHS now mirrors the full install set (adam-utils / adam-batch / adam-rollback were missing). - adam-explain: synthesized clustering summary carries `regressions: 0` (structural consistency with parsed summaries). - docs: test-count drift (87/94 -> 126) and "350-line hook" (-> ~600) fixed; adam-score header documents severity_sum/severity_by_type; adam-batch §4 reference corrected. Tests: +12 assertions (114 -> 126), all green. New regression tests cover the active_skills fix and --compute, plus boundary gaps the review flagged: retry_loop/weak_agent thresholds, A/B exact +/-25% deltas, cooldown 30d blacklist edge.	2026-05-29 01:57:44 +01:00
lukaszraczylo	440fb52eb1	feat: apply MOSS-grounded self-evolution improvements to ADAM Implements 7 improvements grounded in MOSS paper (arXiv 2605.22794): 1. Transcript capture (§3.4): context_ring buffer in adam-observe.mjs captures last 8 events around struggle signals as context_window. 2. Evidence batching (§3.1): new adam-batch.mjs pre-clusters windowed journal entries into coherent failure batches by (signal_type, cluster_key). 3. Multi-stage analysis (§3.3): SKILL.md dispatches adam agent in two stages (diagnose+plan → implement) with inter-stage validation gate. 4. Pre-apply verification (§3.4): 4-check deterministic gate before auto-apply (source entries exist, diagnosis grounded, type-evidence match, no conflicting recent proposals). 5. Auto-rollback (§3.5): new adam-rollback.mjs reverts regressed proposals detected by A/B measurement, creates regression nudges. 6. Harness self-modification (§1 Table 1): new harness_edit proposal type targeting adam's own scripts with stricter gates (confidence≥5, never auto-apply, test-suite-gated). 7. Keypoint matrix evaluation (§4.2): 5 capability dimensions (tool_selection, scope_discipline, error_recovery, first_attempt, build_reliability) scored per batch for structured evaluation. Test suite: 94 → 114 tests (20 new), all passing.	2026-05-24 11:15:32 +01:00
lukaszraczylo	a48c705c0a	feat(adam): smarter signals & clustering - New signal types in hooks/adam-observe.mjs: - silent_drift: 5 consecutive read-only PostToolUse without an action tool - error_after_recovery: same error fingerprint returns within 5 events of clean_recovery - Severity-weighted scoring in adam/scripts/adam-score.mjs: - SEVERITY_DIVISORS exported per struggle signal type - Per-session severity_sum + severity_by_type added to JSON output - Skill-attribution clustering in agents/adam.md: - Sub-cluster struggle signals on active_skills[0] - New struggle-driven skill_edit variant (always queues, never auto-applies) - Rubric updates: - +1 for cluster severity-sum >= 10, additional +1 for >= 32 - +1 for skill-attributed sub-cluster naming an existing skill - silent_drift + error_after_recovery added to struggle signal list - Window: silent_drift 14d, error_after_recovery 30d - Tests: 94 passing (78-82 new) Backward compat: entries without count default to severity 1. Existing win-driven skill_edit gate untouched. No journal migration.	2026-05-13 19:21:59 +01:00
lukaszraczylo	012c40b9ab	chore(v0.3.3): analyst observability, A/B measurement, journal hygiene Storage/window/exclusion split (#7): ISO-week journal rotation with safety fuse replaces size-based rotation (fixes silent under-counting when clusters straddle boundaries). Per-signal sliding windows via adam-window.mjs guard against stale signal accumulation. Legacy YYYY-MM-DD-<ts>.jsonl files remain readable. Error fingerprint normalization (#3): adam-observe.mjs extracts canonical error codes (ENOENT, ECONNREFUSED, etc.) and normalizes paths/timestamps/hex before hashing. 'Connection refused' and 'ECONNREFUSED' now cluster identically. Correction corpus expansion (#1): strong tokens (stop, wrong, undo, try again, different approach, etc.) fire on any occurrence. Weak tokens (no, actually, wait) require negation/contrast co-occurrence within 8 tokens. Kills the 'actually, I think...' false positive. Analyst observability (#6): mandatory clustering trace block; adam-explain.mjs parses to summary/full/json. Cluster decisions now surface rejection reasons (threshold, contradiction, window). Persisted to ~/.claude/adam/last-trace.txt. Dead_end nudge proposal type (#2): single-session auto-apply gate (>=3 dead_end events). Action appends to active-nudges.json, surfaced via adam-nudge.mjs at next SessionStart. Lower blast than skill_edit. Per-(skill, fingerprint) cooldown (#4): adam-cooldown.mjs replaces coarse per-skill check. proposal_fingerprint = djb2(skill_slug + cluster_id + normalized_diff_body). Legacy applied/rejected records gate via 'legacy' fingerprint fallback through resolveSkill helper (handles target_skill, skill, or target: <path>). task_completed scoring integration (#8): adam-score.mjs computes per-session urgency dampener (3 task_completed -> 0.5) and reinforcement candidates (skills cited in >=3 clean completions). New 'reinforcement' proposal type appends to reinforcements.jsonl on apply (no code/memory mutation). A/B effectiveness measurement (#5): every auto-applied edit appends to ab-tracking.jsonl. adam-ab-measure.mjs computes 7d pre/post signal-count delta per entry (improved / neutral / regressed / no_baseline / pending). Analyst surfaces regressions at top of /reflect output. Upgrade UX overhaul (#9): adam-upgrade.mjs implements --list/--diff/--accept /--accept-all. SessionStart nudge prints pending-merge warning when .adam-new files exist (latency ~20ms via fixed shortlist). install.sh emits unmissable final-message hint after creating any .adam-new file. Simplify pass: adam-utils.mjs deduplicates readJsonlSafe / listJsonlFiles / parseFrontmatter across 8 scripts. Net -46 LOC. Test coverage: 30 -> 87 tests. Every new feature has feature-validating assertions (false-case coverage included). T77 statically verifies install.sh references every adam-*.mjs source script (would have caught the missing adam-utils inclusion that review #2 surfaced).	2026-05-13 01:02:33 +01:00
lukaszraczylo	6d8ff37cb2	v0.3.1: code review pass + DX overhaul Bug fixes (HIGH): - adam-observe.mjs: errorFingerprint no longer false-positives when toolResponse.is_error === false; ERROR_RE only used as fallback when is_error is undefined. - adam-observe.mjs: resetSessionLocal now clears tool_window so retry_loop cannot fire on the first tool of a new session by matching prior session. - adam-archive.mjs: ts dedup uses Map<ts, count> instead of Set<ts>; two journal entries sharing a millisecond are no longer both archived when only one is referenced in source_entries. - adam-nudge.mjs: only counts proposal filenames matching /^\d{4}-\d{2}-\d{3}-/ pattern; README/notes in proposals/ no longer bump. - skills/adam-self-improvement/SKILL.md: contradiction_flag veto now applied at apply time (carry-over from earlier review). Test isolation: - adam/tests/run-tests.sh: ALWAYS runs against an isolated $HOME under mktemp -d. Previously truncated live ~/.claude/adam/journal.jsonl on every run — destructive on production state. Conciseness: - agents/adam.md: -19 LOC (cuts: vestigial cursor sentence, duplicate not-do bullets, blast-radius bullet collapse, Inputs paths delegate to SKILL.md, win-cluster-vs-struggle-cluster commentary already enforced by cluster-key separation, # Overlap section spec compressed). - skills/adam-self-improvement/SKILL.md: -4 LOC (framing paragraph, dead catch-all bullet for non-eligible types). Auto-prune script DELETED: - The cumulative-count primitive cannot distinguish "never used" from "used before tracking began"; mtime gate is meaningless for installed files. Auto-prune deferred to v0.4 with a per-key lastSeen schema. Cross-platform: - macOS (BSD coreutils) and Linux (Alpine, glibc + musl) verified. - All scripts use portable forms (stat -f \|\| stat -c, mktemp -d -t). - README documents platform support explicitly. DX overhaul: - install.sh: hardened — supports `curl \| bash` via auto-clone, --version=vX.Y.Z pinning, --yes / --dry-run flags, jq-based settings.json merge with diff prompt and backup, conservative file copy that detects local mtime drift and writes <file>.adam-new instead of clobbering, idempotent across re-runs. - adam-uninstall.sh: NEW. Soft-archives ~/.claude/adam/ to .bak.<ts>/ by default; --purge to delete; --yes for non-interactive; jq-based settings.json cleanup with diff prompt. - README.md: curl one-liner install + version-pinned variant at top, What's New section through v0.3.1, upgrade-safe data files callout, uninstaller documentation, platform support note, expanded rubric showing skill_edit gate. Test count: 27 passed, 0 failed (was 27 — no regression).	2026-05-10 21:33:17 +01:00
lukaszraczylo	7962e85578	v0.2.0: drop cursor, add source_entries lifecycle, mandate memory frontmatter Lifecycle redesign: - Each proposal records source_entries: [<ts>...] in frontmatter listing the journal timestamps that fed its cluster. - After apply/reject, skill calls adam/scripts/adam-archive.mjs which moves matching entries from journal.jsonl to journal/actioned-<id>.jsonl. - Agent reads applied/ + rejected/ frontmatter on each /reflect, builds an excluded-timestamps set, skips any leftover already-actioned entries. - cursor field in state.json is vestigial; agent ignores it. Effect: journal stays bounded by active observations. Rule changes re-evaluate the remainder without manual rewind. Race-safer for parallel sessions on shared state.json (no cursor write contention). Memory drafting: - agents/adam.md adds 'Memory drafting protocol' parallel to Skill drafting. - Memory proposals MUST contain auto-memory frontmatter (name, description, type, originSessionId) in '# Proposed change' body. - Skill enforces frontmatter check at apply time; refuses if missing. Tests: 18 -> 21. Two new tests for adam-archive happy path + no-op. Migration: existing applied proposals lack source_entries. Their backing journal entries archived as a one-time bulk migration; legacy proposals annotated with migration note.	2026-05-10 04:29:49 +01:00

6 Commits