lukaszraczylo
|
440fb52eb1
|
feat: apply MOSS-grounded self-evolution improvements to ADAM
Implements 7 improvements grounded in MOSS paper (arXiv 2605.22794):
1. Transcript capture (§3.4): context_ring buffer in adam-observe.mjs
captures last 8 events around struggle signals as context_window.
2. Evidence batching (§3.1): new adam-batch.mjs pre-clusters windowed
journal entries into coherent failure batches by (signal_type, cluster_key).
3. Multi-stage analysis (§3.3): SKILL.md dispatches adam agent in two
stages (diagnose+plan → implement) with inter-stage validation gate.
4. Pre-apply verification (§3.4): 4-check deterministic gate before
auto-apply (source entries exist, diagnosis grounded, type-evidence
match, no conflicting recent proposals).
5. Auto-rollback (§3.5): new adam-rollback.mjs reverts regressed proposals
detected by A/B measurement, creates regression nudges.
6. Harness self-modification (§1 Table 1): new harness_edit proposal type
targeting adam's own scripts with stricter gates (confidence≥5, never
auto-apply, test-suite-gated).
7. Keypoint matrix evaluation (§4.2): 5 capability dimensions
(tool_selection, scope_discipline, error_recovery, first_attempt,
build_reliability) scored per batch for structured evaluation.
Test suite: 94 → 114 tests (20 new), all passing.
|
2026-05-24 11:15:32 +01:00 |
|