claude-adam

lukaszraczylo/claude-adam

Fork 0

mirror of https://github.com/lukaszraczylo/claude-adam.git synced 2026-06-13 00:26:45 +00:00

Commit Graph

Author	SHA1	Message	Date
lukaszraczylo	c23b09cc09	fix(v0.6.4): rollback removes the proposal's ab-tracking entry adam-rollback.mjs's docstring always claimed it "removes the ab-tracking entry (so it doesn't re-trigger)", but executeRollback() never did. Consequence: a rolled-back proposal kept being re-detected as `regressed` on every subsequent /reflect, which triggered endless `not_found` rollback attempts (the applied file is already gone) and noisy ## Regressions sections. executeRollback now deletes the matching ab-tracking.jsonl row by proposal_id after the move, preserving all unrelated rows. Surfaced by running ADAM's own /reflect loop a second time (two zombie regressions: 2026-05-16-002 and 2026-05-22-001). Tests: 138 -> 140 (rollback purges the entry by id; an unrelated entry is preserved).	2026-05-29 13:50:38 +01:00
lukaszraczylo	440fb52eb1	feat: apply MOSS-grounded self-evolution improvements to ADAM Implements 7 improvements grounded in MOSS paper (arXiv 2605.22794): 1. Transcript capture (§3.4): context_ring buffer in adam-observe.mjs captures last 8 events around struggle signals as context_window. 2. Evidence batching (§3.1): new adam-batch.mjs pre-clusters windowed journal entries into coherent failure batches by (signal_type, cluster_key). 3. Multi-stage analysis (§3.3): SKILL.md dispatches adam agent in two stages (diagnose+plan → implement) with inter-stage validation gate. 4. Pre-apply verification (§3.4): 4-check deterministic gate before auto-apply (source entries exist, diagnosis grounded, type-evidence match, no conflicting recent proposals). 5. Auto-rollback (§3.5): new adam-rollback.mjs reverts regressed proposals detected by A/B measurement, creates regression nudges. 6. Harness self-modification (§1 Table 1): new harness_edit proposal type targeting adam's own scripts with stricter gates (confidence≥5, never auto-apply, test-suite-gated). 7. Keypoint matrix evaluation (§4.2): 5 capability dimensions (tool_selection, scope_discipline, error_recovery, first_attempt, build_reliability) scored per batch for structured evaluation. Test suite: 94 → 114 tests (20 new), all passing.	2026-05-24 11:15:32 +01:00

Author

SHA1

Message

Date

lukaszraczylo

c23b09cc09

fix(v0.6.4): rollback removes the proposal's ab-tracking entry

adam-rollback.mjs's docstring always claimed it "removes the ab-tracking entry
(so it doesn't re-trigger)", but executeRollback() never did. Consequence: a
rolled-back proposal kept being re-detected as `regressed` on every subsequent
/reflect, which triggered endless `not_found` rollback attempts (the applied
file is already gone) and noisy ## Regressions sections.

executeRollback now deletes the matching ab-tracking.jsonl row by proposal_id
after the move, preserving all unrelated rows. Surfaced by running ADAM's own
/reflect loop a second time (two zombie regressions: 2026-05-16-002 and
2026-05-22-001).

Tests: 138 -> 140 (rollback purges the entry by id; an unrelated entry is
preserved).

2026-05-29 13:50:38 +01:00

lukaszraczylo

440fb52eb1

feat: apply MOSS-grounded self-evolution improvements to ADAM

Implements 7 improvements grounded in MOSS paper (arXiv 2605.22794):

1. Transcript capture (§3.4): context_ring buffer in adam-observe.mjs
   captures last 8 events around struggle signals as context_window.

2. Evidence batching (§3.1): new adam-batch.mjs pre-clusters windowed
   journal entries into coherent failure batches by (signal_type, cluster_key).

3. Multi-stage analysis (§3.3): SKILL.md dispatches adam agent in two
   stages (diagnose+plan → implement) with inter-stage validation gate.

4. Pre-apply verification (§3.4): 4-check deterministic gate before
   auto-apply (source entries exist, diagnosis grounded, type-evidence
   match, no conflicting recent proposals).

5. Auto-rollback (§3.5): new adam-rollback.mjs reverts regressed proposals
   detected by A/B measurement, creates regression nudges.

6. Harness self-modification (§1 Table 1): new harness_edit proposal type
   targeting adam's own scripts with stricter gates (confidence≥5, never
   auto-apply, test-suite-gated).

7. Keypoint matrix evaluation (§4.2): 5 capability dimensions
   (tool_selection, scope_discipline, error_recovery, first_attempt,
   build_reliability) scored per batch for structured evaluation.

Test suite: 94 → 114 tests (20 new), all passing.

2026-05-24 11:15:32 +01:00

2 Commits