claude-adam

lukaszraczylo/claude-adam

Fork 0

mirror of https://github.com/lukaszraczylo/claude-adam.git synced 2026-06-10 23:29:03 +00:00

Commit Graph

Author	SHA1	Message	Date
lukaszraczylo	4b36d6c09e	feat(v0.6.0): review hardening — live active_skills clustering, computable fingerprints Full codebase review (multi-agent, adversarially verified) surfaced several documented-but-dead mechanisms and doc/code drift. Fixes: - adam-observe: struggle signals now emit `active_skills`, so silent_drift's primary cluster key AND §5b skill-attribution sub-clustering (+1 rubric bonus) actually fire — both were silently dead (no struggle signal carried the field). - adam-cooldown: new `--compute` CLI deterministically derives proposal_fingerprint. The exported computeProposalFingerprint() was never called and the analyst was told to hand-compute a djb2 hash it cannot reproduce. Spec now mandates a stable cluster id so fingerprints reproduce across /reflect runs. Removed one dead normalization line. - spec: reinforcement proposals excluded from A/B tracking — agents/adam.md contradicted itself (:376 included, :476 excluded); SKILL.md aligned. - adam-nudge: PENDING_CHECK_PATHS now mirrors the full install set (adam-utils / adam-batch / adam-rollback were missing). - adam-explain: synthesized clustering summary carries `regressions: 0` (structural consistency with parsed summaries). - docs: test-count drift (87/94 -> 126) and "350-line hook" (-> ~600) fixed; adam-score header documents severity_sum/severity_by_type; adam-batch §4 reference corrected. Tests: +12 assertions (114 -> 126), all green. New regression tests cover the active_skills fix and --compute, plus boundary gaps the review flagged: retry_loop/weak_agent thresholds, A/B exact +/-25% deltas, cooldown 30d blacklist edge.	2026-05-29 01:57:44 +01:00
lukaszraczylo	440fb52eb1	feat: apply MOSS-grounded self-evolution improvements to ADAM Implements 7 improvements grounded in MOSS paper (arXiv 2605.22794): 1. Transcript capture (§3.4): context_ring buffer in adam-observe.mjs captures last 8 events around struggle signals as context_window. 2. Evidence batching (§3.1): new adam-batch.mjs pre-clusters windowed journal entries into coherent failure batches by (signal_type, cluster_key). 3. Multi-stage analysis (§3.3): SKILL.md dispatches adam agent in two stages (diagnose+plan → implement) with inter-stage validation gate. 4. Pre-apply verification (§3.4): 4-check deterministic gate before auto-apply (source entries exist, diagnosis grounded, type-evidence match, no conflicting recent proposals). 5. Auto-rollback (§3.5): new adam-rollback.mjs reverts regressed proposals detected by A/B measurement, creates regression nudges. 6. Harness self-modification (§1 Table 1): new harness_edit proposal type targeting adam's own scripts with stricter gates (confidence≥5, never auto-apply, test-suite-gated). 7. Keypoint matrix evaluation (§4.2): 5 capability dimensions (tool_selection, scope_discipline, error_recovery, first_attempt, build_reliability) scored per batch for structured evaluation. Test suite: 94 → 114 tests (20 new), all passing.	2026-05-24 11:15:32 +01:00

Author

SHA1

Message

Date

lukaszraczylo

4b36d6c09e

feat(v0.6.0): review hardening — live active_skills clustering, computable fingerprints

Full codebase review (multi-agent, adversarially verified) surfaced several
documented-but-dead mechanisms and doc/code drift. Fixes:

- adam-observe: struggle signals now emit `active_skills`, so silent_drift's
  primary cluster key AND §5b skill-attribution sub-clustering (+1 rubric
  bonus) actually fire — both were silently dead (no struggle signal carried
  the field).
- adam-cooldown: new `--compute` CLI deterministically derives
  proposal_fingerprint. The exported computeProposalFingerprint() was never
  called and the analyst was told to hand-compute a djb2 hash it cannot
  reproduce. Spec now mandates a *stable* cluster id so fingerprints reproduce
  across /reflect runs. Removed one dead normalization line.
- spec: reinforcement proposals excluded from A/B tracking — agents/adam.md
  contradicted itself (:376 included, :476 excluded); SKILL.md aligned.
- adam-nudge: PENDING_CHECK_PATHS now mirrors the full install set
  (adam-utils / adam-batch / adam-rollback were missing).
- adam-explain: synthesized clustering summary carries `regressions: 0`
  (structural consistency with parsed summaries).
- docs: test-count drift (87/94 -> 126) and "350-line hook" (-> ~600) fixed;
  adam-score header documents severity_sum/severity_by_type; adam-batch §4
  reference corrected.

Tests: +12 assertions (114 -> 126), all green. New regression tests cover the
active_skills fix and --compute, plus boundary gaps the review flagged:
retry_loop/weak_agent thresholds, A/B exact +/-25% deltas, cooldown 30d
blacklist edge.

2026-05-29 01:57:44 +01:00

lukaszraczylo

440fb52eb1

feat: apply MOSS-grounded self-evolution improvements to ADAM

Implements 7 improvements grounded in MOSS paper (arXiv 2605.22794):

1. Transcript capture (§3.4): context_ring buffer in adam-observe.mjs
   captures last 8 events around struggle signals as context_window.

2. Evidence batching (§3.1): new adam-batch.mjs pre-clusters windowed
   journal entries into coherent failure batches by (signal_type, cluster_key).

3. Multi-stage analysis (§3.3): SKILL.md dispatches adam agent in two
   stages (diagnose+plan → implement) with inter-stage validation gate.

4. Pre-apply verification (§3.4): 4-check deterministic gate before
   auto-apply (source entries exist, diagnosis grounded, type-evidence
   match, no conflicting recent proposals).

5. Auto-rollback (§3.5): new adam-rollback.mjs reverts regressed proposals
   detected by A/B measurement, creates regression nudges.

6. Harness self-modification (§1 Table 1): new harness_edit proposal type
   targeting adam's own scripts with stricter gates (confidence≥5, never
   auto-apply, test-suite-gated).

7. Keypoint matrix evaluation (§4.2): 5 capability dimensions
   (tool_selection, scope_discipline, error_recovery, first_attempt,
   build_reliability) scored per batch for structured evaluation.

Test suite: 94 → 114 tests (20 new), all passing.

2026-05-24 11:15:32 +01:00

2 Commits