mirror of
https://github.com/lukaszraczylo/claude-adam.git
synced 2026-06-22 02:01:44 +00:00
Compare commits
6 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| 4d1276a73f | |||
| c23b09cc09 | |||
| fcddb6bf79 | |||
| d929101af4 | |||
| 3a54d7d3e1 | |||
| 4b36d6c09e |
@@ -13,7 +13,7 @@ Watches the friction in your coding sessions, clusters the signals via an LLM an
|
|||||||
|
|
||||||
[](LICENSE)
|
[](LICENSE)
|
||||||
[](https://github.com/lukaszraczylo/claude-adam/releases)
|
[](https://github.com/lukaszraczylo/claude-adam/releases)
|
||||||
[](./adam/tests/run-tests.sh)
|
[](./adam/tests/run-tests.sh)
|
||||||
[](https://nodejs.org)
|
[](https://nodejs.org)
|
||||||
[]()
|
[]()
|
||||||
|
|
||||||
@@ -54,7 +54,7 @@ The installer copies files into `~/.claude/`, offers to merge ADAM's hook entrie
|
|||||||
Then:
|
Then:
|
||||||
|
|
||||||
```sh
|
```sh
|
||||||
bash ~/.claude/adam/tests/run-tests.sh # expect: 87 passed, 0 failed
|
bash ~/.claude/adam/tests/run-tests.sh # expect: 140 passed, 0 failed
|
||||||
# … start a fresh Claude Code session …
|
# … start a fresh Claude Code session …
|
||||||
/reflect # walks the proposal queue
|
/reflect # walks the proposal queue
|
||||||
/reflect --explain # also shows the analyst's clustering trace
|
/reflect --explain # also shows the analyst's clustering trace
|
||||||
@@ -63,10 +63,27 @@ bash ~/.claude/adam/tests/run-tests.sh # expect: 87 passed, 0 failed
|
|||||||
Pin a release for reproducibility:
|
Pin a release for reproducibility:
|
||||||
|
|
||||||
```sh
|
```sh
|
||||||
curl -fsSL https://raw.githubusercontent.com/lukaszraczylo/claude-adam/v0.5.0/install.sh \
|
curl -fsSL https://raw.githubusercontent.com/lukaszraczylo/claude-adam/v0.6.3/install.sh \
|
||||||
| VERSION=v0.5.0 bash
|
| VERSION=v0.6.3 bash
|
||||||
```
|
```
|
||||||
|
|
||||||
|
### Staying up to date
|
||||||
|
|
||||||
|
`install.sh` records the installed release in `~/.claude/adam/.version`. The
|
||||||
|
SessionStart hook (`adam-nudge.mjs`) then checks the latest GitHub release **at
|
||||||
|
most once a day** (cached in `~/.claude/adam/.update-check.json`, network call
|
||||||
|
hard-capped at 1.5 s, fully best-effort — it never blocks or slows session
|
||||||
|
start). When a newer release exists it prints a one-line, **notify-only** prompt:
|
||||||
|
|
||||||
|
```
|
||||||
|
[adam] update available: v0.6.3 → v0.6.4. Apply: curl -fsSL …/install.sh | bash
|
||||||
|
(re-runs install.sh — resets ADAM's own /reflect-applied skill edits; apply when you're ready)
|
||||||
|
```
|
||||||
|
|
||||||
|
It is deliberately **not** auto-applied: re-running `install.sh` overwrites
|
||||||
|
ADAM's own `/reflect`-applied skill edits, so you decide when to take an update.
|
||||||
|
Disable the check entirely with `ADAM_NO_UPDATE_CHECK=1` in your environment.
|
||||||
|
|
||||||
## How it works
|
## How it works
|
||||||
|
|
||||||
```mermaid
|
```mermaid
|
||||||
@@ -114,7 +131,7 @@ flowchart TB
|
|||||||
class TRACE trace
|
class TRACE trace
|
||||||
```
|
```
|
||||||
|
|
||||||
The observation layer is a 350-line Node hook. Pure regex, counters, ring buffers — no LLM in the hot path. Signals append one JSONL line per detection to `~/.claude/adam/journal.jsonl`.
|
The observation layer is a ~600-line Node hook. Pure regex, counters, ring buffers — no LLM in the hot path. Signals append one JSONL line per detection to `~/.claude/adam/journal.jsonl`.
|
||||||
|
|
||||||
The analysis layer is an LLM subagent invoked by `/reflect`. Before the analyst runs, three deterministic pre-processors filter and enrich the journal: `adam-window.mjs` drops stale entries per per-signal age, `adam-score.mjs` computes per-session urgency dampeners + reinforcement candidates, and `adam-ab-measure.mjs` checks whether previously auto-applied edits actually reduced their originating signal.
|
The analysis layer is an LLM subagent invoked by `/reflect`. Before the analyst runs, three deterministic pre-processors filter and enrich the journal: `adam-window.mjs` drops stale entries per per-signal age, `adam-score.mjs` computes per-session urgency dampeners + reinforcement candidates, and `adam-ab-measure.mjs` checks whether previously auto-applied edits actually reduced their originating signal.
|
||||||
|
|
||||||
@@ -132,6 +149,7 @@ Auto-apply runs only for low-blast types (memory entries, new skills, ephemeral
|
|||||||
| `tool_error_loop` | Same error fingerprint 3× in a 5-event ring (fingerprints normalised — `ECONNREFUSED` and `"Connection refused"` cluster) | 30d |
|
| `tool_error_loop` | Same error fingerprint 3× in a 5-event ring (fingerprints normalised — `ECONNREFUSED` and `"Connection refused"` cluster) | 30d |
|
||||||
| `dead_end` | 8 PostToolUse events without a UserPromptSubmit between them | 7d |
|
| `dead_end` | 8 PostToolUse events without a UserPromptSubmit between them | 7d |
|
||||||
| `edit_churn` | Same file edited 4× in a window | 14d |
|
| `edit_churn` | Same file edited 4× in a window | 14d |
|
||||||
|
| `file_reread` | Same file Read ≥3× in the 10-event window, ignoring offset/limit (catches re-reads that escape `retry_loop`'s arg-hash dedup) | 14d |
|
||||||
| `build_loop` | 2× build/test/compile commands fail in same session | 30d |
|
| `build_loop` | 2× build/test/compile commands fail in same session | 30d |
|
||||||
| `subagent_dispatch_pattern` | Same subagent dispatched ≥ 3× cumulatively | 30d |
|
| `subagent_dispatch_pattern` | Same subagent dispatched ≥ 3× cumulatively | 30d |
|
||||||
| `correction_free_streak` | 5 clean UserPromptSubmits in a row — reinforcement input | 60d |
|
| `correction_free_streak` | 5 clean UserPromptSubmits in a row — reinforcement input | 60d |
|
||||||
@@ -247,11 +265,16 @@ Or pass `--explain` to `/reflect` to render the full trace inline.
|
|||||||
│ ├── adam-apply-reinforcement.mjs # reinforcement proposal apply
|
│ ├── adam-apply-reinforcement.mjs # reinforcement proposal apply
|
||||||
│ ├── adam-upgrade.mjs # .adam-new file UX (list/diff/accept)
|
│ ├── adam-upgrade.mjs # .adam-new file UX (list/diff/accept)
|
||||||
│ └── adam-archive.mjs # post-apply journal cleanup
|
│ └── adam-archive.mjs # post-apply journal cleanup
|
||||||
└── tests/run-tests.sh # 87 isolated tests; never touches live state
|
└── tests/run-tests.sh # 140 isolated tests; never touches live state
|
||||||
```
|
```
|
||||||
|
|
||||||
## What's new
|
## What's new
|
||||||
|
|
||||||
|
- **v0.6.4** — rollback now keeps its promise. `adam-rollback.mjs`'s docstring always claimed it "removes the ab-tracking entry (so it doesn't re-trigger)," but `executeRollback()` never did — so a rolled-back proposal kept flagging as `regressed` on every subsequent `/reflect`, triggering endless `not_found` rollback attempts. It now deletes the matching `ab-tracking.jsonl` row by `proposal_id` (preserving unrelated rows). Surfaced by running ADAM's own loop twice. 140 tests (up from 138).
|
||||||
|
- **v0.6.3** — release-update notifier. `install.sh` now writes a `~/.claude/adam/.version` marker; `adam-nudge.mjs` (SessionStart) compares it against the latest GitHub release at most once/day (cached, 1.5 s network cap, best-effort — never blocks) and prints a **notify-only** one-line update prompt. Deliberately not auto-applied: re-running the installer resets ADAM's own `/reflect`-applied skill edits, so you choose when to update. Opt out with `ADAM_NO_UPDATE_CHECK=1`. See "Staying up to date". 138 tests (up from 134).
|
||||||
|
- **v0.6.2** — two fixes surfaced by running ADAM's loop on a large real journal. **(1) A/B volume normalization** (`adam-ab-measure.mjs`): regressions are now measured on the signal's *share* of total activity (rate = count / window-total), not raw count — so a generally busier journal after an apply no longer masquerades as a regression. Falls back to raw delta when the signal is the only activity in the window (preserves prior behavior + tests); output adds `raw_delta_pct`, `pre_total`, `post_total`, `normalized` for transparency. **(2) Memory frontmatter schema** (`agents/adam.md`, `SKILL.md`): the drafting protocol now emits the live auto-memory shape — `name` = slug + a `metadata: {node_type, type, originSessionId}` block — instead of flat `type:`/`originSessionId:`, so auto-applied memories load and categorize correctly. 134 tests (up from 132).
|
||||||
|
- **v0.6.1** — new `file_reread` signal (MOSS §1 harness self-modification, proposed and approved through ADAM's own `/reflect` loop). Consecutive Reads of the same file at different `offset`/`limit` escaped `retry_loop`'s arg-hash dedup and leaked into `tool_error_loop`; `file_reread` now catches them (same file ≥3× in the 10-event window, offset-agnostic, guarded against double-counting byte-identical reads). Fully wired: detection (`adam-observe.mjs`), 14-day window (`adam-window.mjs`), severity divisor 3 (`adam-score.mjs`), file-basename clustering (`adam-batch.mjs`), and the analyst rubric/spec. 132 tests (up from 126).
|
||||||
|
- **v0.6.0** — review hardening. Struggle signals now emit `active_skills`, so `silent_drift`'s primary cluster key and the §5b skill-attribution sub-clustering (+1 rubric bonus) actually fire (both were silently dead). `proposal_fingerprint` is now deterministically computable via `adam-cooldown.mjs --compute` instead of asking the LLM analyst to hand-compute a djb2 hash; spec now mandates a *stable* cluster id so fingerprints reproduce across runs. `reinforcement` proposals are correctly excluded from A/B tracking (the spec previously contradicted itself). `adam-nudge.mjs` pending-upgrade check now mirrors the full install set (`adam-utils`/`adam-batch`/`adam-rollback` were missing). Doc/test-count drift corrected. 126 tests (up from 114).
|
||||||
- **v0.5.0** — MOSS-grounded self-evolution (arXiv 2605.22794). Transcript capture: `context_window` field on struggle signals captures 8 surrounding events for evidence-based diagnosis. Two-stage analysis pipeline: diagnose+plan → inter-stage validation → implement (§3.3). Evidence batching via `adam-batch.mjs`: pre-clusters journal into coherent failure batches (§3.1). Pre-apply verification: 4-check deterministic gate before auto-apply (§3.4). Auto-rollback via `adam-rollback.mjs`: reverts regressed proposals detected by A/B measurement, creates regression nudges (§3.5). Harness self-modification: new `harness_edit` proposal type lets ADAM propose edits to its own scripts with test-suite-gated apply (§1 Table 1). Keypoint matrix: 5 capability dimensions scored per batch for structured evaluation (§4.2). 114 tests (up from 94).
|
- **v0.5.0** — MOSS-grounded self-evolution (arXiv 2605.22794). Transcript capture: `context_window` field on struggle signals captures 8 surrounding events for evidence-based diagnosis. Two-stage analysis pipeline: diagnose+plan → inter-stage validation → implement (§3.3). Evidence batching via `adam-batch.mjs`: pre-clusters journal into coherent failure batches (§3.1). Pre-apply verification: 4-check deterministic gate before auto-apply (§3.4). Auto-rollback via `adam-rollback.mjs`: reverts regressed proposals detected by A/B measurement, creates regression nudges (§3.5). Harness self-modification: new `harness_edit` proposal type lets ADAM propose edits to its own scripts with test-suite-gated apply (§1 Table 1). Keypoint matrix: 5 capability dimensions scored per batch for structured evaluation (§4.2). 114 tests (up from 94).
|
||||||
- **v0.4.0** — expanded struggle detection: `silent_drift` (5 consecutive read-only tools), `error_after_recovery` (same error fingerprint returns after clean recovery); severity-sum scoring with per-type divisors; extended `STRUGGLE_TYPES` set. 94 tests (up from 87).
|
- **v0.4.0** — expanded struggle detection: `silent_drift` (5 consecutive read-only tools), `error_after_recovery` (same error fingerprint returns after clean recovery); severity-sum scoring with per-type divisors; extended `STRUGGLE_TYPES` set. 94 tests (up from 87).
|
||||||
- **v0.3.3** — analyst observability, A/B measurement, journal hygiene. ISO-week journal rotation replaces 5MB size-based (fixes silent cluster-straddling under-count); per-signal sliding windows via `adam-window.mjs`; error fingerprint normalisation; correction corpus expanded + weak-token co-occurrence requirement (kills the `"actually, I think..."` false positive); mandatory clustering trace + `adam-explain.mjs`; new `nudge` and `reinforcement` proposal types; per-(skill, fingerprint) cooldown via `adam-cooldown.mjs`; `task_completed` scoring (dampener + reinforcement); A/B effectiveness measurement; upgrade UX overhaul (`adam-upgrade.mjs --list/--diff/--accept`); shared `adam-utils.mjs`. 87 tests (up from 30).
|
- **v0.3.3** — analyst observability, A/B measurement, journal hygiene. ISO-week journal rotation replaces 5MB size-based (fixes silent cluster-straddling under-count); per-signal sliding windows via `adam-window.mjs`; error fingerprint normalisation; correction corpus expanded + weak-token co-occurrence requirement (kills the `"actually, I think..."` false positive); mandatory clustering trace + `adam-explain.mjs`; new `nudge` and `reinforcement` proposal types; per-(skill, fingerprint) cooldown via `adam-cooldown.mjs`; `task_completed` scoring (dampener + reinforcement); A/B effectiveness measurement; upgrade UX overhaul (`adam-upgrade.mjs --list/--diff/--accept`); shared `adam-utils.mjs`. 87 tests (up from 30).
|
||||||
|
|||||||
@@ -3,11 +3,19 @@
|
|||||||
//
|
//
|
||||||
// Reads ~/.claude/adam/ab-tracking.jsonl (one line per auto-apply event,
|
// Reads ~/.claude/adam/ab-tracking.jsonl (one line per auto-apply event,
|
||||||
// written by adam-self-improvement/SKILL.md), then for each entry old enough
|
// written by adam-self-improvement/SKILL.md), then for each entry old enough
|
||||||
// (>= --min-age-days; default 7) compares signal counts in the 7-day window
|
// (>= --min-age-days; default 7) compares the originating signal in the 7-day
|
||||||
// BEFORE applied_at against the 7-day window AFTER applied_at across the
|
// window BEFORE applied_at against the 7-day window AFTER applied_at across the
|
||||||
// full journal corpus (active + rotated). Surfaces regressions so /reflect
|
// full journal corpus (active + rotated). Surfaces regressions so /reflect
|
||||||
// can flag proposals that made things worse.
|
// can flag proposals that made things worse.
|
||||||
//
|
//
|
||||||
|
// Volume normalization: when the windows contain other (non-originating)
|
||||||
|
// activity, the delta is computed on the signal's SHARE of total activity
|
||||||
|
// (rate = count / total), not its raw count — so a generally busier journal
|
||||||
|
// after apply does not masquerade as a regression. When the signal is the only
|
||||||
|
// activity in the windows, it falls back to the raw-count delta. Output carries
|
||||||
|
// both `delta_pct` (drives status) and `raw_delta_pct` + `normalized` for
|
||||||
|
// transparency.
|
||||||
|
//
|
||||||
// CLI:
|
// CLI:
|
||||||
// adam-ab-measure.mjs [--home <path>] [--format json|table] [--min-age-days N]
|
// adam-ab-measure.mjs [--home <path>] [--format json|table] [--min-age-days N]
|
||||||
//
|
//
|
||||||
@@ -92,31 +100,60 @@ export function computeDeltas(entries, journal, opts = {}) {
|
|||||||
|
|
||||||
const preStart = appliedAt - windowDays * DAY_MS;
|
const preStart = appliedAt - windowDays * DAY_MS;
|
||||||
const postEnd = appliedAt + windowDays * DAY_MS;
|
const postEnd = appliedAt + windowDays * DAY_MS;
|
||||||
|
// preCount/postCount = originating-signal occurrences; preTotal/postTotal =
|
||||||
|
// ALL journal entries in the window (the activity denominator).
|
||||||
let preCount = 0;
|
let preCount = 0;
|
||||||
let postCount = 0;
|
let postCount = 0;
|
||||||
|
let preTotal = 0;
|
||||||
|
let postTotal = 0;
|
||||||
for (const je of journal || []) {
|
for (const je of journal || []) {
|
||||||
if (!je || typeof je !== "object") continue;
|
if (!je || typeof je !== "object") continue;
|
||||||
if (!sigSet.has(je.type)) continue;
|
|
||||||
const t = tsMs(je);
|
const t = tsMs(je);
|
||||||
if (Number.isNaN(t)) continue;
|
if (Number.isNaN(t)) continue;
|
||||||
if (t >= preStart && t < appliedAt) preCount++;
|
const inPre = t >= preStart && t < appliedAt;
|
||||||
else if (t >= appliedAt && t < postEnd) postCount++;
|
const inPost = t >= appliedAt && t < postEnd;
|
||||||
|
if (!inPre && !inPost) continue;
|
||||||
|
if (inPre) preTotal++; else postTotal++;
|
||||||
|
if (!sigSet.has(je.type)) continue;
|
||||||
|
if (inPre) preCount++; else postCount++;
|
||||||
}
|
}
|
||||||
|
|
||||||
let status;
|
let status;
|
||||||
let deltaPct;
|
let deltaPct;
|
||||||
|
let rawDeltaPct = null;
|
||||||
|
let normalized = false;
|
||||||
if (preCount === 0) {
|
if (preCount === 0) {
|
||||||
status = "no_baseline";
|
status = "no_baseline";
|
||||||
deltaPct = null;
|
deltaPct = null;
|
||||||
} else {
|
} else {
|
||||||
deltaPct = ((postCount - preCount) / preCount) * 100;
|
rawDeltaPct = Math.round(((postCount - preCount) / preCount) * 10000) / 100;
|
||||||
|
// Volume normalization: when the windows contain non-originating activity,
|
||||||
|
// compare the signal's SHARE of activity (rate), not its absolute count —
|
||||||
|
// otherwise a generally busier post-window masquerades as a regression.
|
||||||
|
// No background (signal IS the only activity) → fall back to raw delta,
|
||||||
|
// preserving prior behavior.
|
||||||
|
const hasBackground = (preTotal - preCount) + (postTotal - postCount) > 0;
|
||||||
|
if (hasBackground && postTotal > 0) {
|
||||||
|
const preRate = preCount / preTotal; // preTotal >= preCount > 0
|
||||||
|
const postRate = postCount / postTotal;
|
||||||
|
deltaPct = ((postRate - preRate) / preRate) * 100;
|
||||||
|
normalized = true;
|
||||||
|
} else {
|
||||||
|
deltaPct = ((postCount - preCount) / preCount) * 100;
|
||||||
|
}
|
||||||
// Round to 2 dp for stable comparison + presentation.
|
// Round to 2 dp for stable comparison + presentation.
|
||||||
deltaPct = Math.round(deltaPct * 100) / 100;
|
deltaPct = Math.round(deltaPct * 100) / 100;
|
||||||
if (deltaPct <= IMPROVED_PCT) status = "improved";
|
if (deltaPct <= IMPROVED_PCT) status = "improved";
|
||||||
else if (deltaPct >= REGRESSED_PCT) status = "regressed";
|
else if (deltaPct >= REGRESSED_PCT) status = "regressed";
|
||||||
else status = "neutral";
|
else status = "neutral";
|
||||||
}
|
}
|
||||||
out.push({ ...base, pre_count: preCount, post_count: postCount, delta_pct: deltaPct, status });
|
out.push({
|
||||||
|
...base,
|
||||||
|
pre_count: preCount, post_count: postCount,
|
||||||
|
pre_total: preTotal, post_total: postTotal,
|
||||||
|
raw_delta_pct: rawDeltaPct, normalized,
|
||||||
|
delta_pct: deltaPct, status,
|
||||||
|
});
|
||||||
}
|
}
|
||||||
return out;
|
return out;
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -4,13 +4,14 @@
|
|||||||
// automatically curated batch of production-failure evidence."
|
// automatically curated batch of production-failure evidence."
|
||||||
//
|
//
|
||||||
// Each batch groups entries by (signal_type, cluster_key) where cluster_key
|
// Each batch groups entries by (signal_type, cluster_key) where cluster_key
|
||||||
// follows the same clustering rules as agents/adam.md §4:
|
// follows the same clustering rules as agents/adam.md ## Signal types / ## Process step 4:
|
||||||
// correction → tokenized phrase (cross-cwd)
|
// correction → tokenized phrase (cross-cwd)
|
||||||
// retry_loop → tool
|
// retry_loop → tool
|
||||||
// weak_agent → subagent_type
|
// weak_agent → subagent_type
|
||||||
// tool_error_loop→ fp
|
// tool_error_loop→ fp
|
||||||
// dead_end → session
|
// dead_end → session
|
||||||
// edit_churn → file basename
|
// edit_churn → file basename
|
||||||
|
// file_reread → file basename
|
||||||
// build_loop → session
|
// build_loop → session
|
||||||
// subagent_dispatch_pattern → subagent_type
|
// subagent_dispatch_pattern → subagent_type
|
||||||
// silent_drift → active_skills[0]
|
// silent_drift → active_skills[0]
|
||||||
@@ -65,6 +66,7 @@ function clusterKey(entry) {
|
|||||||
case "build_loop":
|
case "build_loop":
|
||||||
return entry.session || "unknown";
|
return entry.session || "unknown";
|
||||||
case "edit_churn":
|
case "edit_churn":
|
||||||
|
case "file_reread":
|
||||||
return entry.file ? entry.file.split("/").pop() : "unknown";
|
return entry.file ? entry.file.split("/").pop() : "unknown";
|
||||||
case "silent_drift":
|
case "silent_drift":
|
||||||
case "correction_free_streak":
|
case "correction_free_streak":
|
||||||
|
|||||||
@@ -4,8 +4,12 @@
|
|||||||
//
|
//
|
||||||
// CLI:
|
// CLI:
|
||||||
// adam-cooldown.mjs --skill <slug> --fingerprint <hash> [--home <path>]
|
// adam-cooldown.mjs --skill <slug> --fingerprint <hash> [--home <path>]
|
||||||
|
// adam-cooldown.mjs --compute --skill <slug> --cluster <id> [--diff-file <path>]
|
||||||
|
// → prints {"fingerprint":"<djb2_base36>"}; diff body read from --diff-file
|
||||||
|
// or stdin. This is how proposal_fingerprint is populated (the analyst
|
||||||
|
// runs it via Bash after drafting a proposal).
|
||||||
//
|
//
|
||||||
// Output: JSON one-liner with shape
|
// Output (gate mode): JSON one-liner with shape
|
||||||
// { "status": "cool"|"cooldown"|"blacklisted",
|
// { "status": "cool"|"cooldown"|"blacklisted",
|
||||||
// "reason": "<human-readable reason>",
|
// "reason": "<human-readable reason>",
|
||||||
// "blocked_by": { "file": "<basename>", "days_remaining": <int> } | null }
|
// "blocked_by": { "file": "<basename>", "days_remaining": <int> } | null }
|
||||||
@@ -33,12 +37,15 @@ const DAY_MS = 86400000;
|
|||||||
export const LEGACY_FINGERPRINT = "legacy";
|
export const LEGACY_FINGERPRINT = "legacy";
|
||||||
|
|
||||||
function parseArgs(argv) {
|
function parseArgs(argv) {
|
||||||
const args = { home: null, skill: null, fingerprint: null, help: false };
|
const args = { home: null, skill: null, fingerprint: null, compute: false, cluster: null, diffFile: null, help: false };
|
||||||
for (let i = 0; i < argv.length; i++) {
|
for (let i = 0; i < argv.length; i++) {
|
||||||
const a = argv[i];
|
const a = argv[i];
|
||||||
if (a === "--home" && i + 1 < argv.length) args.home = argv[++i];
|
if (a === "--home" && i + 1 < argv.length) args.home = argv[++i];
|
||||||
else if (a === "--skill" && i + 1 < argv.length) args.skill = argv[++i];
|
else if (a === "--skill" && i + 1 < argv.length) args.skill = argv[++i];
|
||||||
else if (a === "--fingerprint" && i + 1 < argv.length) args.fingerprint = argv[++i];
|
else if (a === "--fingerprint" && i + 1 < argv.length) args.fingerprint = argv[++i];
|
||||||
|
else if (a === "--cluster" && i + 1 < argv.length) args.cluster = argv[++i];
|
||||||
|
else if (a === "--diff-file" && i + 1 < argv.length) args.diffFile = argv[++i];
|
||||||
|
else if (a === "--compute") args.compute = true;
|
||||||
else if (a === "--help" || a === "-h") args.help = true;
|
else if (a === "--help" || a === "-h") args.help = true;
|
||||||
}
|
}
|
||||||
return args;
|
return args;
|
||||||
@@ -158,9 +165,11 @@ export function computeProposalFingerprint(proposal) {
|
|||||||
if (!proposal || typeof proposal !== "object") return LEGACY_FINGERPRINT;
|
if (!proposal || typeof proposal !== "object") return LEGACY_FINGERPRINT;
|
||||||
const skill = proposal.skill_slug || proposal.target_skill || proposal.skill || "";
|
const skill = proposal.skill_slug || proposal.target_skill || proposal.skill || "";
|
||||||
const cluster = proposal.signal_cluster_id || proposal.cluster_id || "";
|
const cluster = proposal.signal_cluster_id || proposal.cluster_id || "";
|
||||||
|
// normalized_diff_body: whitespace (incl. newlines) collapsed to single
|
||||||
|
// spaces, then trimmed. Matches agents/adam.md §"Per-(skill, fingerprint)
|
||||||
|
// cooldown". (No trailing-newline strip needed — \s+ already absorbed them.)
|
||||||
const diff = String(proposal.diff_body || proposal.proposed_change || "")
|
const diff = String(proposal.diff_body || proposal.proposed_change || "")
|
||||||
.replace(/\s+/g, " ")
|
.replace(/\s+/g, " ")
|
||||||
.replace(/\n+$/g, "")
|
|
||||||
.trim();
|
.trim();
|
||||||
return djb2(`${skill}\n${cluster}\n${diff}`);
|
return djb2(`${skill}\n${cluster}\n${diff}`);
|
||||||
}
|
}
|
||||||
@@ -168,7 +177,28 @@ export function computeProposalFingerprint(proposal) {
|
|||||||
function main() {
|
function main() {
|
||||||
const args = parseArgs(process.argv.slice(2));
|
const args = parseArgs(process.argv.slice(2));
|
||||||
if (args.help) {
|
if (args.help) {
|
||||||
process.stdout.write("usage: adam-cooldown.mjs --skill <slug> --fingerprint <hash> [--home <path>]\n");
|
process.stdout.write(
|
||||||
|
"usage: adam-cooldown.mjs --skill <slug> --fingerprint <hash> [--home <path>]\n" +
|
||||||
|
" adam-cooldown.mjs --compute --skill <slug> --cluster <id> [--diff-file <path>]\n"
|
||||||
|
);
|
||||||
|
process.exit(0);
|
||||||
|
}
|
||||||
|
// --compute: deterministically derive a proposal_fingerprint. The analyst
|
||||||
|
// invokes this (it has Bash) after drafting a proposal, then writes the
|
||||||
|
// result into proposal frontmatter so the cooldown gate keys on it.
|
||||||
|
if (args.compute) {
|
||||||
|
let diff = "";
|
||||||
|
if (args.diffFile) {
|
||||||
|
try { diff = readFileSync(args.diffFile, "utf8"); } catch { /* empty → still deterministic */ }
|
||||||
|
} else {
|
||||||
|
try { diff = readFileSync(0, "utf8"); } catch { /* no stdin */ }
|
||||||
|
}
|
||||||
|
const fp = computeProposalFingerprint({
|
||||||
|
skill_slug: args.skill || "",
|
||||||
|
signal_cluster_id: args.cluster || "",
|
||||||
|
diff_body: diff,
|
||||||
|
});
|
||||||
|
process.stdout.write(JSON.stringify({ fingerprint: fp }) + "\n");
|
||||||
process.exit(0);
|
process.exit(0);
|
||||||
}
|
}
|
||||||
if (!args.skill || !args.fingerprint) {
|
if (!args.skill || !args.fingerprint) {
|
||||||
|
|||||||
@@ -135,6 +135,7 @@ export function parseTrace(text) {
|
|||||||
considered: clusters.length,
|
considered: clusters.length,
|
||||||
emitted,
|
emitted,
|
||||||
skipped: clusters.length - emitted,
|
skipped: clusters.length - emitted,
|
||||||
|
regressions: 0,
|
||||||
reasons,
|
reasons,
|
||||||
};
|
};
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -167,6 +167,23 @@ export function executeRollback(plan, adamRoot, opts = {}) {
|
|||||||
result.actions.push(`nudge failed: ${e.message}`);
|
result.actions.push(`nudge failed: ${e.message}`);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Remove the ab-tracking entry for this proposal so it stops re-flagging as a
|
||||||
|
// regression on every future /reflect (which would trigger endless not_found
|
||||||
|
// rollback attempts). This is the documented contract for rollback.
|
||||||
|
try {
|
||||||
|
const abPath = join(adamRoot, "ab-tracking.jsonl");
|
||||||
|
if (existsSync(abPath)) {
|
||||||
|
const before = readJsonlSafe(abPath);
|
||||||
|
const kept = before.filter((e) => !(e && e.proposal_id === plan.proposal_id));
|
||||||
|
if (kept.length !== before.length) {
|
||||||
|
writeFileSync(abPath, kept.length ? kept.map((e) => JSON.stringify(e)).join("\n") + "\n" : "");
|
||||||
|
result.actions.push(`ab-tracking entry removed (${before.length - kept.length})`);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
} catch (e) {
|
||||||
|
result.actions.push(`ab-tracking cleanup failed: ${e.message}`);
|
||||||
|
}
|
||||||
|
|
||||||
result.status = "rolled_back";
|
result.status = "rolled_back";
|
||||||
return result;
|
return result;
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -23,7 +23,8 @@
|
|||||||
// Output: JSON object
|
// Output: JSON object
|
||||||
// {
|
// {
|
||||||
// "sessions": [
|
// "sessions": [
|
||||||
// {"session_id": "...", "negative_count": N, "task_completed_count": M, "dampener": 1.0}
|
// {"session_id": "...", "negative_count": N, "task_completed_count": M,
|
||||||
|
// "severity_sum": S, "severity_by_type": {"<type>": N, ...}, "dampener": 1.0}
|
||||||
// ],
|
// ],
|
||||||
// "reinforcement_candidates": [
|
// "reinforcement_candidates": [
|
||||||
// {"skill_slug": "tdd-loop", "count": 3, "recent_ts": "..."}
|
// {"skill_slug": "tdd-loop", "count": 3, "recent_ts": "..."}
|
||||||
@@ -57,6 +58,7 @@ export const SEVERITY_DIVISORS = {
|
|||||||
edit_churn: 4,
|
edit_churn: 4,
|
||||||
tool_error_loop: 3,
|
tool_error_loop: 3,
|
||||||
retry_loop: 3,
|
retry_loop: 3,
|
||||||
|
file_reread: 3,
|
||||||
weak_agent: 2,
|
weak_agent: 2,
|
||||||
build_loop: 1,
|
build_loop: 1,
|
||||||
};
|
};
|
||||||
|
|||||||
Executable
+272
@@ -0,0 +1,272 @@
|
|||||||
|
#!/usr/bin/env node
|
||||||
|
// adam-skill-utility.mjs — execution-grounded per-skill utility report.
|
||||||
|
//
|
||||||
|
// Inspired by SkillsInjector (arXiv 2605.29794v1), which shows skill injection
|
||||||
|
// should be driven by execution-grounded *utility* Δ(t,s), not surface keyword
|
||||||
|
// match — and that some topically-relevant skills actively *lower* success.
|
||||||
|
// The paper learns Δ(t,s) from rollout outcomes. We don't train anything: the
|
||||||
|
// adam journal already attaches `active_skills` to both positive outcome events
|
||||||
|
// (task_completed, clean_recovery, correction_free_streak) and negative ones
|
||||||
|
// (dead_end, tool_error_loop, …). So we approximate Δ(s) as a co-occurrence
|
||||||
|
// ratio over the data we already collect.
|
||||||
|
//
|
||||||
|
// CAVEAT (honest): this is CO-OCCURRENCE, not causation. A skill active during
|
||||||
|
// a dead_end did not necessarily cause it. Read the report as "which skills
|
||||||
|
// correlate with friction", a prompt for review — never as proof.
|
||||||
|
//
|
||||||
|
// Metric, per skill active during scored events:
|
||||||
|
// pos / neg — count of positive / negative outcome events it co-occurred with
|
||||||
|
// share — pos / (pos+neg)
|
||||||
|
// lift — share − global_baseline (>0 above baseline, <0 below)
|
||||||
|
// wLB — Wilson 95% lower bound of the positive proportion; ranks
|
||||||
|
// *reliably* below-baseline skills to the top (sample-aware)
|
||||||
|
// sevNeg — severity-weighted negative sum (adam SEVERITY_DIVISORS)
|
||||||
|
// topNeg — dominant negative event type
|
||||||
|
// Rows sorted worst-first (lowest wLB) so harmful/over-eager skills surface.
|
||||||
|
//
|
||||||
|
// CLI:
|
||||||
|
// adam-skill-utility.mjs [--home <path>] [--input <jsonl-path>]
|
||||||
|
// [--min <n>] [--days <n>] [--json]
|
||||||
|
// --min min event count (n) to treat a skill's signal as confident (default 8)
|
||||||
|
// --days only consider events within the last <n> days (default: all)
|
||||||
|
// --json emit machine-readable JSON instead of the text table
|
||||||
|
//
|
||||||
|
// Reuses adam-utils (jsonl IO) and adam-score (canonical NEGATIVE set +
|
||||||
|
// severity), so the positive/negative taxonomy stays single-sourced.
|
||||||
|
|
||||||
|
import { readFileSync } from "node:fs";
|
||||||
|
import { join } from "node:path";
|
||||||
|
import { homedir } from "node:os";
|
||||||
|
import { readJsonlSafe, listJsonlFiles } from "./adam-utils.mjs";
|
||||||
|
import { NEGATIVE_SIGNAL_TYPES, entrySeverity } from "./adam-score.mjs";
|
||||||
|
|
||||||
|
// Positive outcome signals (mirror adam's vocabulary; task_completed is adam's
|
||||||
|
// canonical "clean task", the same one adam-score uses for reinforcement).
|
||||||
|
export const POSITIVE_SIGNAL_TYPES = new Set([
|
||||||
|
"task_completed",
|
||||||
|
"clean_recovery",
|
||||||
|
"correction_free_streak",
|
||||||
|
]);
|
||||||
|
|
||||||
|
export const DEFAULT_MIN_SAMPLE = 8;
|
||||||
|
|
||||||
|
function round(x) {
|
||||||
|
return Math.round(x * 1000) / 1000;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Wilson score interval lower bound for a binomial proportion. Sample-aware:
|
||||||
|
// a skill with 1 pos / 0 neg does NOT outrank one with 40 pos / 2 neg.
|
||||||
|
export function wilsonLower(pos, n, z = 1.96) {
|
||||||
|
if (n <= 0) return 0;
|
||||||
|
const p = pos / n;
|
||||||
|
const z2 = z * z;
|
||||||
|
const denom = 1 + z2 / n;
|
||||||
|
const center = p + z2 / (2 * n);
|
||||||
|
const margin = z * Math.sqrt((p * (1 - p) + z2 / (4 * n)) / n);
|
||||||
|
return (center - margin) / denom;
|
||||||
|
}
|
||||||
|
|
||||||
|
// computeSkillUtility: pure. entries → { baseline, totalPos, totalNeg, min, skills[] }.
|
||||||
|
export function computeSkillUtility(entries, opts = {}) {
|
||||||
|
const min = Number.isFinite(opts.min) ? opts.min : DEFAULT_MIN_SAMPLE;
|
||||||
|
const per = new Map();
|
||||||
|
let totalPos = 0;
|
||||||
|
let totalNeg = 0;
|
||||||
|
|
||||||
|
for (const e of entries || []) {
|
||||||
|
if (!e || typeof e !== "object") continue;
|
||||||
|
const isPos = POSITIVE_SIGNAL_TYPES.has(e.type);
|
||||||
|
const isNeg = NEGATIVE_SIGNAL_TYPES.has(e.type);
|
||||||
|
if (!isPos && !isNeg) continue;
|
||||||
|
|
||||||
|
if (isPos) totalPos++;
|
||||||
|
else totalNeg++;
|
||||||
|
const sev = isNeg ? entrySeverity(e) : 0;
|
||||||
|
|
||||||
|
const skills = Array.isArray(e.active_skills) ? e.active_skills : [];
|
||||||
|
for (const slug of skills) {
|
||||||
|
if (!slug || typeof slug !== "string") continue;
|
||||||
|
if (!per.has(slug)) {
|
||||||
|
per.set(slug, { pos: 0, neg: 0, sevNeg: 0, negTypes: {}, recent_ts: null });
|
||||||
|
}
|
||||||
|
const s = per.get(slug);
|
||||||
|
if (isPos) {
|
||||||
|
s.pos++;
|
||||||
|
} else {
|
||||||
|
s.neg++;
|
||||||
|
s.sevNeg += sev;
|
||||||
|
s.negTypes[e.type] = (s.negTypes[e.type] || 0) + 1;
|
||||||
|
}
|
||||||
|
const ts = typeof e.ts === "string" ? e.ts : null;
|
||||||
|
if (ts && (!s.recent_ts || ts > s.recent_ts)) s.recent_ts = ts;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
const scored = totalPos + totalNeg;
|
||||||
|
const baseline = scored ? totalPos / scored : 0;
|
||||||
|
|
||||||
|
const skills = [];
|
||||||
|
for (const [slug, s] of per.entries()) {
|
||||||
|
const n = s.pos + s.neg;
|
||||||
|
const share = n ? s.pos / n : 0;
|
||||||
|
const topNeg = Object.entries(s.negTypes).sort((a, b) => b[1] - a[1])[0];
|
||||||
|
skills.push({
|
||||||
|
skill: slug,
|
||||||
|
n,
|
||||||
|
pos: s.pos,
|
||||||
|
neg: s.neg,
|
||||||
|
share: round(share),
|
||||||
|
lift: round(share - baseline),
|
||||||
|
wLB: round(wilsonLower(s.pos, n)),
|
||||||
|
sevNeg: s.sevNeg,
|
||||||
|
topNeg: topNeg ? topNeg[0] : null,
|
||||||
|
lowSample: n < min,
|
||||||
|
recent_ts: s.recent_ts,
|
||||||
|
});
|
||||||
|
}
|
||||||
|
// Worst-first: lowest Wilson lower bound, then most negatives.
|
||||||
|
skills.sort(
|
||||||
|
(a, b) =>
|
||||||
|
a.wLB - b.wLB ||
|
||||||
|
b.neg - a.neg ||
|
||||||
|
(a.skill < b.skill ? -1 : a.skill > b.skill ? 1 : 0),
|
||||||
|
);
|
||||||
|
|
||||||
|
return { baseline: round(baseline), totalPos, totalNeg, min, skills };
|
||||||
|
}
|
||||||
|
|
||||||
|
function parseArgs(argv) {
|
||||||
|
const args = { home: null, input: null, min: DEFAULT_MIN_SAMPLE, days: null, json: false, help: false };
|
||||||
|
for (let i = 0; i < argv.length; i++) {
|
||||||
|
const a = argv[i];
|
||||||
|
if (a === "--home" && i + 1 < argv.length) args.home = argv[++i];
|
||||||
|
else if (a === "--input" && i + 1 < argv.length) args.input = argv[++i];
|
||||||
|
else if (a === "--min" && i + 1 < argv.length) args.min = Number(argv[++i]);
|
||||||
|
else if (a === "--days" && i + 1 < argv.length) args.days = Number(argv[++i]);
|
||||||
|
else if (a === "--json") args.json = true;
|
||||||
|
else if (a === "--help" || a === "-h") args.help = true;
|
||||||
|
}
|
||||||
|
return args;
|
||||||
|
}
|
||||||
|
|
||||||
|
function readAllStdin() {
|
||||||
|
try { return readFileSync(0, "utf8"); } catch { return ""; }
|
||||||
|
}
|
||||||
|
|
||||||
|
function entriesFromText(text) {
|
||||||
|
const out = [];
|
||||||
|
for (const line of (text || "").split("\n")) {
|
||||||
|
if (!line) continue;
|
||||||
|
try { out.push(JSON.parse(line)); } catch { /* skip */ }
|
||||||
|
}
|
||||||
|
return out;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Same gathering strategy as adam-score.mjs: explicit --input, else piped
|
||||||
|
// stdin (e.g. from adam-window.mjs), else the active journal + rotated files.
|
||||||
|
function gatherInputEntries(args) {
|
||||||
|
if (args.input) return readJsonlSafe(args.input);
|
||||||
|
if (!process.stdin.isTTY) {
|
||||||
|
const piped = readAllStdin();
|
||||||
|
if (piped && piped.trim()) return entriesFromText(piped);
|
||||||
|
}
|
||||||
|
const home = args.home || join(homedir(), ".claude");
|
||||||
|
const adamRoot = join(home, "adam");
|
||||||
|
const sources = [join(adamRoot, "journal.jsonl"), ...listJsonlFiles(join(adamRoot, "journal"))];
|
||||||
|
const all = [];
|
||||||
|
for (const p of sources) {
|
||||||
|
for (const e of readJsonlSafe(p)) all.push(e);
|
||||||
|
}
|
||||||
|
return all;
|
||||||
|
}
|
||||||
|
|
||||||
|
function filterByDays(entries, days) {
|
||||||
|
if (!Number.isFinite(days) || days <= 0) return entries;
|
||||||
|
// Anchor the window to the newest ts in the data (avoids Date.now()
|
||||||
|
// nondeterminism and works on historical exports).
|
||||||
|
let maxMs = 0;
|
||||||
|
for (const e of entries) {
|
||||||
|
const ms = e && typeof e.ts === "string" ? Date.parse(e.ts) : NaN;
|
||||||
|
if (Number.isFinite(ms) && ms > maxMs) maxMs = ms;
|
||||||
|
}
|
||||||
|
if (!maxMs) return entries;
|
||||||
|
const cutoff = maxMs - days * 86400000;
|
||||||
|
return entries.filter((e) => {
|
||||||
|
const ms = e && typeof e.ts === "string" ? Date.parse(e.ts) : NaN;
|
||||||
|
return Number.isFinite(ms) ? ms >= cutoff : false;
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
function pad(s, w) {
|
||||||
|
s = String(s);
|
||||||
|
return s.length >= w ? s : s + " ".repeat(w - s.length);
|
||||||
|
}
|
||||||
|
function padL(s, w) {
|
||||||
|
s = String(s);
|
||||||
|
return s.length >= w ? s : " ".repeat(w - s.length) + s;
|
||||||
|
}
|
||||||
|
|
||||||
|
function renderText(report) {
|
||||||
|
const { baseline, totalPos, totalNeg, min, skills } = report;
|
||||||
|
const lines = [];
|
||||||
|
lines.push("adam skill-utility report — execution-grounded Δ(skill) proxy");
|
||||||
|
lines.push(
|
||||||
|
`baseline positive-rate ${(baseline * 100).toFixed(1)}% ` +
|
||||||
|
`(${totalPos} positive / ${totalNeg} negative outcome events) min-sample n≥${min}`,
|
||||||
|
);
|
||||||
|
lines.push("CAVEAT: co-occurrence, not causation. Worst-first. ⚠ = below baseline with n≥min.");
|
||||||
|
lines.push("");
|
||||||
|
const head =
|
||||||
|
pad("skill", 44) + padL("n", 5) + padL("pos", 6) + padL("neg", 6) +
|
||||||
|
padL("share", 8) + padL("lift", 8) + padL("wLB", 7) + padL("sevNeg", 8) +
|
||||||
|
" " + pad("topNeg", 18) + "flag";
|
||||||
|
lines.push(head);
|
||||||
|
lines.push("-".repeat(head.length));
|
||||||
|
for (const s of skills) {
|
||||||
|
const below = s.lift < 0 && !s.lowSample;
|
||||||
|
const flag = below ? "⚠" : s.lowSample ? "·(low n)" : "";
|
||||||
|
lines.push(
|
||||||
|
pad(s.skill, 44) +
|
||||||
|
padL(s.n, 5) +
|
||||||
|
padL(s.pos, 6) +
|
||||||
|
padL(s.neg, 6) +
|
||||||
|
padL((s.share * 100).toFixed(0) + "%", 8) +
|
||||||
|
padL((s.lift >= 0 ? "+" : "") + (s.lift * 100).toFixed(0) + "%", 8) +
|
||||||
|
padL(s.wLB.toFixed(2), 7) +
|
||||||
|
padL(s.sevNeg, 8) +
|
||||||
|
" " +
|
||||||
|
pad(s.topNeg || "-", 18) +
|
||||||
|
flag,
|
||||||
|
);
|
||||||
|
}
|
||||||
|
return lines.join("\n");
|
||||||
|
}
|
||||||
|
|
||||||
|
function main() {
|
||||||
|
const args = parseArgs(process.argv.slice(2));
|
||||||
|
if (args.help) {
|
||||||
|
process.stdout.write(
|
||||||
|
"usage: adam-skill-utility.mjs [--home <path>] [--input <jsonl-path>] " +
|
||||||
|
"[--min <n>] [--days <n>] [--json]\n",
|
||||||
|
);
|
||||||
|
process.exit(0);
|
||||||
|
}
|
||||||
|
try {
|
||||||
|
let entries = gatherInputEntries(args);
|
||||||
|
entries = filterByDays(entries, args.days);
|
||||||
|
const report = computeSkillUtility(entries, { min: args.min });
|
||||||
|
if (args.json) {
|
||||||
|
process.stdout.write(JSON.stringify(report) + "\n");
|
||||||
|
} else {
|
||||||
|
process.stdout.write(renderText(report) + "\n");
|
||||||
|
}
|
||||||
|
process.exit(0);
|
||||||
|
} catch (e) {
|
||||||
|
process.stderr.write(`adam-skill-utility error: ${e.message}\n`);
|
||||||
|
process.exit(1);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if (import.meta.url === `file://${process.argv[1]}`) {
|
||||||
|
main();
|
||||||
|
}
|
||||||
@@ -30,6 +30,7 @@ export const SIGNAL_WINDOWS_DAYS = {
|
|||||||
weak_agent: 30,
|
weak_agent: 30,
|
||||||
subagent_dispatch_pattern: 30,
|
subagent_dispatch_pattern: 30,
|
||||||
silent_drift: 14,
|
silent_drift: 14,
|
||||||
|
file_reread: 14,
|
||||||
error_after_recovery: 30,
|
error_after_recovery: 30,
|
||||||
correction_free_streak: 60,
|
correction_free_streak: 60,
|
||||||
clean_recovery: 60,
|
clean_recovery: 60,
|
||||||
|
|||||||
@@ -18,6 +18,7 @@ APPLYREIN="$REAL_HOME/.claude/adam/scripts/adam-apply-reinforcement.mjs"
|
|||||||
UPGRADE="$REAL_HOME/.claude/adam/scripts/adam-upgrade.mjs"
|
UPGRADE="$REAL_HOME/.claude/adam/scripts/adam-upgrade.mjs"
|
||||||
BATCH="$REAL_HOME/.claude/adam/scripts/adam-batch.mjs"
|
BATCH="$REAL_HOME/.claude/adam/scripts/adam-batch.mjs"
|
||||||
ROLLBACK="$REAL_HOME/.claude/adam/scripts/adam-rollback.mjs"
|
ROLLBACK="$REAL_HOME/.claude/adam/scripts/adam-rollback.mjs"
|
||||||
|
SKILLUTIL="$REAL_HOME/.claude/adam/scripts/adam-skill-utility.mjs"
|
||||||
|
|
||||||
TMP_HOME="$(mktemp -d -t adam-test.XXXXXX)"
|
TMP_HOME="$(mktemp -d -t adam-test.XXXXXX)"
|
||||||
trap 'rm -rf "$TMP_HOME"' EXIT INT TERM
|
trap 'rm -rf "$TMP_HOME"' EXIT INT TERM
|
||||||
@@ -37,6 +38,7 @@ APPLYREIN_RUN(){ HOME="$TMP_HOME" node "$APPLYREIN" "$@" --home "$TMP_HOME/.clau
|
|||||||
UPGRADE_RUN() { HOME="$TMP_HOME" node "$UPGRADE" "$@"; }
|
UPGRADE_RUN() { HOME="$TMP_HOME" node "$UPGRADE" "$@"; }
|
||||||
BATCH_RUN() { HOME="$TMP_HOME" node "$BATCH" "$@"; }
|
BATCH_RUN() { HOME="$TMP_HOME" node "$BATCH" "$@"; }
|
||||||
ROLLBACK_RUN(){ HOME="$TMP_HOME" node "$ROLLBACK" "$@"; }
|
ROLLBACK_RUN(){ HOME="$TMP_HOME" node "$ROLLBACK" "$@"; }
|
||||||
|
SKILLUTIL_RUN(){ HOME="$TMP_HOME" node "$SKILLUTIL" "$@"; }
|
||||||
|
|
||||||
PASS=0
|
PASS=0
|
||||||
FAIL=0
|
FAIL=0
|
||||||
@@ -71,6 +73,17 @@ assert_grep() {
|
|||||||
fi
|
fi
|
||||||
}
|
}
|
||||||
|
|
||||||
|
assert_no_grep() {
|
||||||
|
local file="$1" pattern="$2" name="$3"
|
||||||
|
if grep -qE "$pattern" "$file" 2>/dev/null; then
|
||||||
|
echo " FAIL: $name (pattern $pattern unexpectedly present in $file)"
|
||||||
|
FAIL=$((FAIL+1))
|
||||||
|
else
|
||||||
|
echo " PASS: $name"
|
||||||
|
PASS=$((PASS+1))
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
|
||||||
# --- Test 1: correction signal ---
|
# --- Test 1: correction signal ---
|
||||||
echo "Test 1: user correction"
|
echo "Test 1: user correction"
|
||||||
reset_state
|
reset_state
|
||||||
@@ -1839,6 +1852,330 @@ else
|
|||||||
echo " FAIL: expected 8 context_window entries (got $cw_len)"; FAIL=$((FAIL+1))
|
echo " FAIL: expected 8 context_window entries (got $cw_len)"; FAIL=$((FAIL+1))
|
||||||
fi
|
fi
|
||||||
|
|
||||||
|
# --- Test 103: silent_drift carries active_skills (its primary cluster key) ---
|
||||||
|
echo "Test 103: silent_drift emits active_skills (§5b skill-attribution)"
|
||||||
|
reset_state
|
||||||
|
echo '{"hook_event_name":"PreToolUse","tool_name":"Skill","tool_input":{"skill":"tdd"},"session_id":"sSK","cwd":"/tmp/x"}' \
|
||||||
|
| HOOK_RUN >/dev/null 2>&1 || true
|
||||||
|
for i in 1 2 3 4 5; do
|
||||||
|
echo "{\"hook_event_name\":\"PostToolUse\",\"tool_name\":\"Read\",\"tool_input\":{\"file_path\":\"/tmp/sk-$i\"},\"tool_response\":{\"content\":\"ok\"},\"session_id\":\"sSK\",\"cwd\":\"/tmp/x\"}" \
|
||||||
|
| HOOK_RUN >/dev/null 2>&1 || true
|
||||||
|
done
|
||||||
|
assert_grep "$ROOT/journal.jsonl" '"type":"silent_drift"' "silent_drift emitted after 5 reads with skill active"
|
||||||
|
assert_grep "$ROOT/journal.jsonl" '"active_skills":\["tdd"\]' "silent_drift carries active_skills cluster key"
|
||||||
|
|
||||||
|
# --- Test 104: retry_loop fires at threshold 3, not below ---
|
||||||
|
echo "Test 104: retry_loop boundary (2x no fire, 3x fires)"
|
||||||
|
reset_state
|
||||||
|
for i in 1 2; do
|
||||||
|
echo '{"hook_event_name":"PostToolUse","tool_name":"Bash","tool_input":{"command":"make"},"session_id":"sRT","cwd":"/tmp/x"}' \
|
||||||
|
| HOOK_RUN >/dev/null 2>&1 || true
|
||||||
|
done
|
||||||
|
assert_no_grep "$ROOT/journal.jsonl" '"type":"retry_loop"' "2x same args does NOT emit retry_loop"
|
||||||
|
echo '{"hook_event_name":"PostToolUse","tool_name":"Bash","tool_input":{"command":"make"},"session_id":"sRT","cwd":"/tmp/x"}' \
|
||||||
|
| HOOK_RUN >/dev/null 2>&1 || true
|
||||||
|
assert_grep "$ROOT/journal.jsonl" '"type":"retry_loop"' "3x same args emits retry_loop"
|
||||||
|
|
||||||
|
# --- Test 105: weak_agent fires at 2 dispatches, not at 1 ---
|
||||||
|
echo "Test 105: weak_agent boundary (1x no fire, 2x fires)"
|
||||||
|
reset_state
|
||||||
|
echo '{"hook_event_name":"PostToolUse","tool_name":"Agent","tool_input":{"subagent_type":"explorer"},"session_id":"sWA","cwd":"/tmp/x"}' \
|
||||||
|
| HOOK_RUN >/dev/null 2>&1 || true
|
||||||
|
assert_no_grep "$ROOT/journal.jsonl" '"type":"weak_agent"' "1x agent dispatch does NOT emit weak_agent"
|
||||||
|
echo '{"hook_event_name":"PostToolUse","tool_name":"Agent","tool_input":{"subagent_type":"explorer"},"session_id":"sWA","cwd":"/tmp/x"}' \
|
||||||
|
| HOOK_RUN >/dev/null 2>&1 || true
|
||||||
|
assert_grep "$ROOT/journal.jsonl" '"type":"weak_agent"' "2x same agent in window emits weak_agent"
|
||||||
|
|
||||||
|
# --- Test 106: adam-cooldown --compute deterministic + input-sensitive ---
|
||||||
|
echo "Test 106: adam-cooldown --compute fingerprint"
|
||||||
|
fp1=$(printf 'add section X' | COOLDOWN_RUN --compute --skill foo --cluster k1 2>/dev/null)
|
||||||
|
fp2=$(printf 'add section X' | COOLDOWN_RUN --compute --skill foo --cluster k1 2>/dev/null)
|
||||||
|
fp3=$(printf 'add section X' | COOLDOWN_RUN --compute --skill foo --cluster k2 2>/dev/null)
|
||||||
|
if [ -n "$fp1" ] && [ "$fp1" = "$fp2" ] && echo "$fp1" | grep -q '"fingerprint":'; then
|
||||||
|
echo " PASS: --compute deterministic for identical inputs"; PASS=$((PASS+1))
|
||||||
|
else
|
||||||
|
echo " FAIL: --compute not deterministic (got '$fp1' vs '$fp2')"; FAIL=$((FAIL+1))
|
||||||
|
fi
|
||||||
|
if [ "$fp1" != "$fp3" ]; then
|
||||||
|
echo " PASS: --compute sensitive to cluster id"; PASS=$((PASS+1))
|
||||||
|
else
|
||||||
|
echo " FAIL: --compute ignored cluster id (both '$fp1')"; FAIL=$((FAIL+1))
|
||||||
|
fi
|
||||||
|
|
||||||
|
# --- Test 107: A/B boundary — exactly -25% delta → improved ---
|
||||||
|
echo "Test 107: A/B exact -25% boundary (4 pre / 3 post → improved)"
|
||||||
|
reset_state
|
||||||
|
applied_at_ms=$(node -e 'console.log(Date.now() - 14*86400000)')
|
||||||
|
cat > "$ROOT/ab-tracking.jsonl" <<EOF
|
||||||
|
{"applied_at":$applied_at_ms,"proposal_id":"ab-b25-001","proposal_type":"memory","target_skill":"b1","proposal_fingerprint":"fpB1","originating_signals":[{"type":"correction","count":4,"session_ids":["sB1"]}],"pre_window_days":7}
|
||||||
|
EOF
|
||||||
|
> "$ROOT/journal.jsonl"
|
||||||
|
for i in 1 2 3 4; do
|
||||||
|
pre_ts=$(node -e "console.log(new Date(Date.now() - (15 + $i*0.3) * 86400000).toISOString())")
|
||||||
|
echo "{\"ts\":\"$pre_ts\",\"session\":\"sB1\",\"type\":\"correction\",\"phrase\":\"x\"}" >> "$ROOT/journal.jsonl"
|
||||||
|
done
|
||||||
|
for i in 1 2 3; do
|
||||||
|
post_ts=$(node -e "console.log(new Date(Date.now() - (8 + $i*0.3) * 86400000).toISOString())")
|
||||||
|
echo "{\"ts\":\"$post_ts\",\"session\":\"sB1\",\"type\":\"correction\",\"phrase\":\"y\"}" >> "$ROOT/journal.jsonl"
|
||||||
|
done
|
||||||
|
out=$(ABMEASURE_RUN --format json 2>/dev/null)
|
||||||
|
if echo "$out" | node -e 'let b="";process.stdin.on("data",d=>b+=d).on("end",()=>{const a=JSON.parse(b);const e=a.find(x=>x.proposal_id==="ab-b25-001");process.exit(e&&e.pre_count===4&&e.post_count===3&&e.delta_pct===-25&&e.status==="improved"?0:1)})'; then
|
||||||
|
echo " PASS: -25% boundary classified improved"; PASS=$((PASS+1))
|
||||||
|
else
|
||||||
|
echo " FAIL: -25% boundary misclassified (got: $out)"; FAIL=$((FAIL+1))
|
||||||
|
fi
|
||||||
|
rm -f "$ROOT/ab-tracking.jsonl"
|
||||||
|
|
||||||
|
# --- Test 108: A/B boundary — exactly +25% delta → regressed ---
|
||||||
|
echo "Test 108: A/B exact +25% boundary (4 pre / 5 post → regressed)"
|
||||||
|
reset_state
|
||||||
|
applied_at_ms=$(node -e 'console.log(Date.now() - 14*86400000)')
|
||||||
|
cat > "$ROOT/ab-tracking.jsonl" <<EOF
|
||||||
|
{"applied_at":$applied_at_ms,"proposal_id":"ab-b25-002","proposal_type":"memory","target_skill":"b2","proposal_fingerprint":"fpB2","originating_signals":[{"type":"correction","count":4,"session_ids":["sB2"]}],"pre_window_days":7}
|
||||||
|
EOF
|
||||||
|
> "$ROOT/journal.jsonl"
|
||||||
|
for i in 1 2 3 4; do
|
||||||
|
pre_ts=$(node -e "console.log(new Date(Date.now() - (15 + $i*0.3) * 86400000).toISOString())")
|
||||||
|
echo "{\"ts\":\"$pre_ts\",\"session\":\"sB2\",\"type\":\"correction\",\"phrase\":\"x\"}" >> "$ROOT/journal.jsonl"
|
||||||
|
done
|
||||||
|
for i in 1 2 3 4 5; do
|
||||||
|
post_ts=$(node -e "console.log(new Date(Date.now() - (8 + $i*0.3) * 86400000).toISOString())")
|
||||||
|
echo "{\"ts\":\"$post_ts\",\"session\":\"sB2\",\"type\":\"correction\",\"phrase\":\"y\"}" >> "$ROOT/journal.jsonl"
|
||||||
|
done
|
||||||
|
out=$(ABMEASURE_RUN --format json 2>/dev/null)
|
||||||
|
if echo "$out" | node -e 'let b="";process.stdin.on("data",d=>b+=d).on("end",()=>{const a=JSON.parse(b);const e=a.find(x=>x.proposal_id==="ab-b25-002");process.exit(e&&e.pre_count===4&&e.post_count===5&&e.delta_pct===25&&e.status==="regressed"?0:1)})'; then
|
||||||
|
echo " PASS: +25% boundary classified regressed"; PASS=$((PASS+1))
|
||||||
|
else
|
||||||
|
echo " FAIL: +25% boundary misclassified (got: $out)"; FAIL=$((FAIL+1))
|
||||||
|
fi
|
||||||
|
rm -f "$ROOT/ab-tracking.jsonl"
|
||||||
|
|
||||||
|
# --- Test 109: cooldown blacklist 30d boundary (day 29 active, day 31 expired) ---
|
||||||
|
echo "Test 109: blacklist 30d boundary"
|
||||||
|
reset_state
|
||||||
|
ts29=$(node -e 'console.log(Date.now() - 29*86400000)')
|
||||||
|
cat > "$ROOT/rejected/2026-blk-29.md" <<EOF
|
||||||
|
---
|
||||||
|
id: blk-29
|
||||||
|
type: skill_edit
|
||||||
|
target_skill: blkskill
|
||||||
|
proposal_fingerprint: fpZ
|
||||||
|
auto_apply_blacklist: true
|
||||||
|
applied_at: $ts29
|
||||||
|
---
|
||||||
|
body
|
||||||
|
EOF
|
||||||
|
out29=$(COOLDOWN_RUN --skill blkskill --fingerprint fpZ 2>/dev/null)
|
||||||
|
if echo "$out29" | grep -q '"status":"blacklisted"'; then
|
||||||
|
echo " PASS: day-29 blacklist still active"; PASS=$((PASS+1))
|
||||||
|
else
|
||||||
|
echo " FAIL: day-29 should be blacklisted (got: $out29)"; FAIL=$((FAIL+1))
|
||||||
|
fi
|
||||||
|
rm -f "$ROOT/rejected/2026-blk-29.md"
|
||||||
|
ts31=$(node -e 'console.log(Date.now() - 31*86400000)')
|
||||||
|
cat > "$ROOT/rejected/2026-blk-31.md" <<EOF
|
||||||
|
---
|
||||||
|
id: blk-31
|
||||||
|
type: skill_edit
|
||||||
|
target_skill: blkskill
|
||||||
|
proposal_fingerprint: fpZ
|
||||||
|
auto_apply_blacklist: true
|
||||||
|
applied_at: $ts31
|
||||||
|
---
|
||||||
|
body
|
||||||
|
EOF
|
||||||
|
out31=$(COOLDOWN_RUN --skill blkskill --fingerprint fpZ 2>/dev/null)
|
||||||
|
if echo "$out31" | grep -q '"status":"cool"'; then
|
||||||
|
echo " PASS: day-31 blacklist expired → cool"; PASS=$((PASS+1))
|
||||||
|
else
|
||||||
|
echo " FAIL: day-31 should be cool (got: $out31)"; FAIL=$((FAIL+1))
|
||||||
|
fi
|
||||||
|
rm -f "$ROOT/rejected/2026-blk-31.md"
|
||||||
|
|
||||||
|
# --- Test 110: file_reread fires on 3x offset-shifted same-file reads, not 2x ---
|
||||||
|
echo "Test 110: file_reread (offset-shifted same-file reads escape retry_loop)"
|
||||||
|
reset_state
|
||||||
|
for off in 0 100; do
|
||||||
|
echo "{\"hook_event_name\":\"PostToolUse\",\"tool_name\":\"Read\",\"tool_input\":{\"file_path\":\"/tmp/big.go\",\"offset\":$off},\"tool_response\":{\"content\":\"ok\"},\"session_id\":\"sFR\",\"cwd\":\"/tmp/x\"}" \
|
||||||
|
| HOOK_RUN >/dev/null 2>&1 || true
|
||||||
|
done
|
||||||
|
assert_no_grep "$ROOT/journal.jsonl" '"type":"file_reread"' "2x same-file reads does NOT emit file_reread"
|
||||||
|
echo '{"hook_event_name":"PostToolUse","tool_name":"Read","tool_input":{"file_path":"/tmp/big.go","offset":200},"tool_response":{"content":"ok"},"session_id":"sFR","cwd":"/tmp/x"}' \
|
||||||
|
| HOOK_RUN >/dev/null 2>&1 || true
|
||||||
|
assert_grep "$ROOT/journal.jsonl" '"type":"file_reread"' "3x offset-shifted same-file reads emit file_reread"
|
||||||
|
assert_no_grep "$ROOT/journal.jsonl" '"type":"retry_loop"' "offset-shifted reads do NOT emit retry_loop (argsHash differs)"
|
||||||
|
assert_grep "$ROOT/journal.jsonl" '"type":"file_reread".*"context_window"' "file_reread carries context_window (in STRUGGLE_TYPES)"
|
||||||
|
|
||||||
|
# --- Test 111: byte-identical reread is caught by retry_loop, not double-counted as file_reread ---
|
||||||
|
echo "Test 111: identical reads → retry_loop (file_reread guard avoids double-count)"
|
||||||
|
reset_state
|
||||||
|
for i in 1 2 3; do
|
||||||
|
echo '{"hook_event_name":"PostToolUse","tool_name":"Read","tool_input":{"file_path":"/tmp/same.go"},"tool_response":{"content":"ok"},"session_id":"sFR2","cwd":"/tmp/x"}' \
|
||||||
|
| HOOK_RUN >/dev/null 2>&1 || true
|
||||||
|
done
|
||||||
|
assert_grep "$ROOT/journal.jsonl" '"type":"retry_loop"' "3x byte-identical reads emit retry_loop"
|
||||||
|
assert_no_grep "$ROOT/journal.jsonl" '"type":"file_reread"' "byte-identical reads NOT double-counted as file_reread (sameToolArgs>=RETRY guard)"
|
||||||
|
|
||||||
|
# --- Test 112: A/B volume normalization — busier journal does NOT fake a regression ---
|
||||||
|
echo "Test 112: A/B volume-normalized (raw +200% but flat share → neutral)"
|
||||||
|
reset_state
|
||||||
|
applied_at_ms=$(node -e 'console.log(Date.now() - 14*86400000)')
|
||||||
|
cat > "$ROOT/ab-tracking.jsonl" <<EOF
|
||||||
|
{"applied_at":$applied_at_ms,"proposal_id":"ab-vol-001","proposal_type":"memory","target_skill":"vol","proposal_fingerprint":"fpV","originating_signals":[{"type":"correction","count":2,"session_ids":["sV"]}],"pre_window_days":7}
|
||||||
|
EOF
|
||||||
|
> "$ROOT/journal.jsonl"
|
||||||
|
# pre window: 2 correction + 8 dead_end (rate 0.2)
|
||||||
|
for i in 1 2; do ts=$(node -e "console.log(new Date(Date.now()-(15+$i*0.2)*86400000).toISOString())"); echo "{\"ts\":\"$ts\",\"session\":\"sV\",\"type\":\"correction\"}" >> "$ROOT/journal.jsonl"; done
|
||||||
|
for i in 1 2 3 4 5 6 7 8; do ts=$(node -e "console.log(new Date(Date.now()-(15+$i*0.1)*86400000).toISOString())"); echo "{\"ts\":\"$ts\",\"session\":\"sV\",\"type\":\"dead_end\"}" >> "$ROOT/journal.jsonl"; done
|
||||||
|
# post window: 6 correction + 24 dead_end (rate 0.2 — share unchanged, raw count +200%)
|
||||||
|
for i in $(seq 1 6); do ts=$(node -e "console.log(new Date(Date.now()-(8+$i*0.1)*86400000).toISOString())"); echo "{\"ts\":\"$ts\",\"session\":\"sV\",\"type\":\"correction\"}" >> "$ROOT/journal.jsonl"; done
|
||||||
|
for i in $(seq 1 24); do ts=$(node -e "console.log(new Date(Date.now()-(8+$i*0.05)*86400000).toISOString())"); echo "{\"ts\":\"$ts\",\"session\":\"sV\",\"type\":\"dead_end\"}" >> "$ROOT/journal.jsonl"; done
|
||||||
|
out=$(ABMEASURE_RUN --format json 2>/dev/null)
|
||||||
|
if echo "$out" | node -e 'let b="";process.stdin.on("data",d=>b+=d).on("end",()=>{const a=JSON.parse(b);const e=a.find(x=>x.proposal_id==="ab-vol-001");process.exit(e&&e.normalized===true&&e.raw_delta_pct===200&&e.status==="neutral"?0:1)})'; then
|
||||||
|
echo " PASS: volume growth normalized → neutral (raw +200%)"; PASS=$((PASS+1))
|
||||||
|
else
|
||||||
|
echo " FAIL: volume normalization wrong (got: $out)"; FAIL=$((FAIL+1))
|
||||||
|
fi
|
||||||
|
rm -f "$ROOT/ab-tracking.jsonl"
|
||||||
|
|
||||||
|
# --- Test 113: A/B genuine rate regression still flagged ---
|
||||||
|
echo "Test 113: A/B genuine share increase → regressed"
|
||||||
|
reset_state
|
||||||
|
applied_at_ms=$(node -e 'console.log(Date.now() - 14*86400000)')
|
||||||
|
cat > "$ROOT/ab-tracking.jsonl" <<EOF
|
||||||
|
{"applied_at":$applied_at_ms,"proposal_id":"ab-vol-002","proposal_type":"memory","target_skill":"vol2","proposal_fingerprint":"fpV2","originating_signals":[{"type":"correction","count":2,"session_ids":["sV2"]}],"pre_window_days":7}
|
||||||
|
EOF
|
||||||
|
> "$ROOT/journal.jsonl"
|
||||||
|
# pre: 2 correction + 8 dead_end (rate 0.2)
|
||||||
|
for i in 1 2; do ts=$(node -e "console.log(new Date(Date.now()-(15+$i*0.2)*86400000).toISOString())"); echo "{\"ts\":\"$ts\",\"session\":\"sV2\",\"type\":\"correction\"}" >> "$ROOT/journal.jsonl"; done
|
||||||
|
for i in 1 2 3 4 5 6 7 8; do ts=$(node -e "console.log(new Date(Date.now()-(15+$i*0.1)*86400000).toISOString())"); echo "{\"ts\":\"$ts\",\"session\":\"sV2\",\"type\":\"dead_end\"}" >> "$ROOT/journal.jsonl"; done
|
||||||
|
# post: 6 correction + 6 dead_end (rate 0.5 — share up → genuine regression)
|
||||||
|
for i in $(seq 1 6); do ts=$(node -e "console.log(new Date(Date.now()-(8+$i*0.1)*86400000).toISOString())"); echo "{\"ts\":\"$ts\",\"session\":\"sV2\",\"type\":\"correction\"}" >> "$ROOT/journal.jsonl"; done
|
||||||
|
for i in 1 2 3 4 5 6; do ts=$(node -e "console.log(new Date(Date.now()-(8+$i*0.07)*86400000).toISOString())"); echo "{\"ts\":\"$ts\",\"session\":\"sV2\",\"type\":\"dead_end\"}" >> "$ROOT/journal.jsonl"; done
|
||||||
|
out=$(ABMEASURE_RUN --format json 2>/dev/null)
|
||||||
|
if echo "$out" | node -e 'let b="";process.stdin.on("data",d=>b+=d).on("end",()=>{const a=JSON.parse(b);const e=a.find(x=>x.proposal_id==="ab-vol-002");process.exit(e&&e.normalized===true&&e.status==="regressed"?0:1)})'; then
|
||||||
|
echo " PASS: genuine share increase → regressed"; PASS=$((PASS+1))
|
||||||
|
else
|
||||||
|
echo " FAIL: genuine regression missed (got: $out)"; FAIL=$((FAIL+1))
|
||||||
|
fi
|
||||||
|
rm -f "$ROOT/ab-tracking.jsonl"
|
||||||
|
|
||||||
|
# --- Test 114: update notifier nudges from cache when a newer release exists (no network) ---
|
||||||
|
echo "Test 114: update notifier — cached newer release prints nudge"
|
||||||
|
reset_state
|
||||||
|
printf 'v0.6.2\n' > "$ROOT/.version"
|
||||||
|
node -e "require('fs').writeFileSync('$ROOT/.update-check.json', JSON.stringify({last_check: Date.now(), latest: 'v9.9.9'}))"
|
||||||
|
out=$(echo '{"hook_event_name":"SessionStart","session_id":"sUP"}' | NUDGE_RUN 2>/dev/null)
|
||||||
|
if echo "$out" | grep -q "update available: v0.6.2 → v9.9.9"; then
|
||||||
|
echo " PASS: update nudge printed from cache (offline)"; PASS=$((PASS+1))
|
||||||
|
else
|
||||||
|
echo " FAIL: expected update nudge (got: $out)"; FAIL=$((FAIL+1))
|
||||||
|
fi
|
||||||
|
rm -f "$ROOT/.version" "$ROOT/.update-check.json"
|
||||||
|
|
||||||
|
# --- Test 115: update notifier silent when installed is current ---
|
||||||
|
echo "Test 115: update notifier — up-to-date is silent"
|
||||||
|
reset_state
|
||||||
|
printf 'v9.9.9\n' > "$ROOT/.version"
|
||||||
|
node -e "require('fs').writeFileSync('$ROOT/.update-check.json', JSON.stringify({last_check: Date.now(), latest: 'v9.9.9'}))"
|
||||||
|
out=$(echo '{"hook_event_name":"SessionStart","session_id":"sUP2"}' | NUDGE_RUN 2>/dev/null)
|
||||||
|
if echo "$out" | grep -q "update available"; then
|
||||||
|
echo " FAIL: nudged despite being current (got: $out)"; FAIL=$((FAIL+1))
|
||||||
|
else
|
||||||
|
echo " PASS: no nudge when up-to-date"; PASS=$((PASS+1))
|
||||||
|
fi
|
||||||
|
rm -f "$ROOT/.version" "$ROOT/.update-check.json"
|
||||||
|
|
||||||
|
# --- Test 116: ADAM_NO_UPDATE_CHECK disables the notifier ---
|
||||||
|
echo "Test 116: ADAM_NO_UPDATE_CHECK opt-out"
|
||||||
|
reset_state
|
||||||
|
printf 'v0.6.2\n' > "$ROOT/.version"
|
||||||
|
node -e "require('fs').writeFileSync('$ROOT/.update-check.json', JSON.stringify({last_check: Date.now(), latest: 'v9.9.9'}))"
|
||||||
|
out=$(echo '{"hook_event_name":"SessionStart","session_id":"sUP3"}' | HOME="$TMP_HOME" ADAM_NO_UPDATE_CHECK=1 node "$NUDGE" 2>/dev/null)
|
||||||
|
if echo "$out" | grep -q "update available"; then
|
||||||
|
echo " FAIL: notifier ran despite opt-out (got: $out)"; FAIL=$((FAIL+1))
|
||||||
|
else
|
||||||
|
echo " PASS: ADAM_NO_UPDATE_CHECK suppressed the check"; PASS=$((PASS+1))
|
||||||
|
fi
|
||||||
|
rm -f "$ROOT/.version" "$ROOT/.update-check.json"
|
||||||
|
|
||||||
|
# --- Test 117: no .version marker → notifier no-op (no crash) ---
|
||||||
|
echo "Test 117: missing .version marker → notifier silent, hook still runs"
|
||||||
|
reset_state
|
||||||
|
node -e "require('fs').writeFileSync('$ROOT/.update-check.json', JSON.stringify({last_check: Date.now(), latest: 'v9.9.9'}))"
|
||||||
|
out=$(echo '{"hook_event_name":"SessionStart","session_id":"sUP4"}' | NUDGE_RUN 2>/dev/null)
|
||||||
|
if echo "$out" | grep -q "update available"; then
|
||||||
|
echo " FAIL: nudged without a .version marker (got: $out)"; FAIL=$((FAIL+1))
|
||||||
|
else
|
||||||
|
echo " PASS: no marker → no update nudge"; PASS=$((PASS+1))
|
||||||
|
fi
|
||||||
|
rm -f "$ROOT/.update-check.json"
|
||||||
|
|
||||||
|
# --- Test 118: rollback removes the proposal's ab-tracking entry (stops re-flagging) ---
|
||||||
|
echo "Test 118: rollback purges ab-tracking entry by proposal_id"
|
||||||
|
reset_state
|
||||||
|
rm -f "$ROOT/proposals/"*rollback* "$ROOT/active-nudges.json"
|
||||||
|
cat > "$ROOT/applied/2026-05-20T00-00-00Z-rb-ab-001.md" <<'EOF'
|
||||||
|
---
|
||||||
|
id: rb-ab-001
|
||||||
|
type: memory
|
||||||
|
target: ~/.claude/projects/-Users-nvm/memory/x.md
|
||||||
|
confidence: 5
|
||||||
|
blast_radius: low
|
||||||
|
status: applied
|
||||||
|
source_entries:
|
||||||
|
- "2026-05-18T10:00:00Z"
|
||||||
|
---
|
||||||
|
# Why
|
||||||
|
test
|
||||||
|
# Rollback
|
||||||
|
```bash
|
||||||
|
rm -f x
|
||||||
|
```
|
||||||
|
EOF
|
||||||
|
cat > "$ROOT/ab-tracking.jsonl" <<'EOF'
|
||||||
|
{"applied_at":1,"proposal_id":"rb-ab-001","proposal_type":"memory","target_skill":"x","proposal_fingerprint":"f1","originating_signals":[{"type":"correction","count":3}],"pre_window_days":7}
|
||||||
|
{"applied_at":2,"proposal_id":"keep-me-002","proposal_type":"memory","target_skill":"y","proposal_fingerprint":"f2","originating_signals":[{"type":"correction","count":3}],"pre_window_days":7}
|
||||||
|
EOF
|
||||||
|
ROLLBACK_RUN --proposal-id rb-ab-001 --home "$TMP_HOME/.claude" >/dev/null 2>&1 || true
|
||||||
|
if grep -q '"proposal_id":"rb-ab-001"' "$ROOT/ab-tracking.jsonl"; then
|
||||||
|
echo " FAIL: rolled-back proposal still in ab-tracking.jsonl"; FAIL=$((FAIL+1))
|
||||||
|
else
|
||||||
|
echo " PASS: rolled-back proposal removed from ab-tracking.jsonl"; PASS=$((PASS+1))
|
||||||
|
fi
|
||||||
|
if grep -q '"proposal_id":"keep-me-002"' "$ROOT/ab-tracking.jsonl"; then
|
||||||
|
echo " PASS: unrelated ab-tracking entry preserved"; PASS=$((PASS+1))
|
||||||
|
else
|
||||||
|
echo " FAIL: rollback clobbered an unrelated ab-tracking entry"; FAIL=$((FAIL+1))
|
||||||
|
fi
|
||||||
|
rm -f "$ROOT/proposals/"*rb-ab-001* "$ROOT/applied/"*rb-ab-001* "$ROOT/ab-tracking.jsonl" "$ROOT/active-nudges.json"
|
||||||
|
|
||||||
|
# --- Test 119: adam-skill-utility ranks friction-correlated skills below baseline ---
|
||||||
|
echo "Test 119: adam-skill-utility computes per-skill good:bad utility (execution-grounded Δ)"
|
||||||
|
reset_state
|
||||||
|
SU_INPUT="$TMP_HOME/su-input.jsonl"
|
||||||
|
{
|
||||||
|
for i in 1 2 3 4 5; do echo "{\"ts\":\"2026-05-20T0$i:00:00Z\",\"session\":\"sSU\",\"type\":\"task_completed\",\"active_skills\":[\"goodskill\"]}"; done
|
||||||
|
for i in 1 2 3 4 5; do echo "{\"ts\":\"2026-05-20T1$i:00:00Z\",\"session\":\"sSU\",\"type\":\"dead_end\",\"count\":8,\"active_skills\":[\"badskill\"]}"; done
|
||||||
|
} > "$SU_INPUT"
|
||||||
|
su_out=$(SKILLUTIL_RUN --input "$SU_INPUT" --json --min 3 2>/dev/null)
|
||||||
|
su_check=$(echo "$su_out" | node -e '
|
||||||
|
let buf=""; process.stdin.on("data",d=>buf+=d).on("end",()=>{
|
||||||
|
try {
|
||||||
|
const p=JSON.parse(buf);
|
||||||
|
const bad=p.skills.find(s=>s.skill==="badskill");
|
||||||
|
const good=p.skills.find(s=>s.skill==="goodskill");
|
||||||
|
const ok = bad && good && bad.lift<0 && good.lift>0 && p.skills[0].skill==="badskill" && bad.neg===5 && good.pos===5;
|
||||||
|
console.log(ok?"ok":"bad:"+JSON.stringify({bad,good,first:p.skills[0]&&p.skills[0].skill}));
|
||||||
|
} catch(e){ console.log("parse-error:"+e.message); }
|
||||||
|
});')
|
||||||
|
if [ "$su_check" = "ok" ]; then
|
||||||
|
echo " PASS: badskill below baseline + ranked worst-first, goodskill above"; PASS=$((PASS+1))
|
||||||
|
else
|
||||||
|
echo " FAIL: skill-utility ranking wrong ($su_check)"; FAIL=$((FAIL+1))
|
||||||
|
fi
|
||||||
|
rm -f "$SU_INPUT"
|
||||||
|
|
||||||
echo
|
echo
|
||||||
echo "Results: $PASS passed, $FAIL failed"
|
echo "Results: $PASS passed, $FAIL failed"
|
||||||
[ "$FAIL" = "0" ]
|
[ "$FAIL" = "0" ]
|
||||||
|
|||||||
+36
-18
@@ -104,6 +104,7 @@ Per-signal windows (single source of truth: `SIGNAL_WINDOWS_DAYS` in `~/.claude/
|
|||||||
| `weak_agent` | 30 d | subagent quality signal |
|
| `weak_agent` | 30 d | subagent quality signal |
|
||||||
| `subagent_dispatch_pattern` | 30 d | dispatch routing pattern |
|
| `subagent_dispatch_pattern` | 30 d | dispatch routing pattern |
|
||||||
| `silent_drift` | 14 d | exploration-without-action is task-local |
|
| `silent_drift` | 14 d | exploration-without-action is task-local |
|
||||||
|
| `file_reread` | 14 d | redundant same-file reads are task-local |
|
||||||
| `error_after_recovery` | 30 d | recovery-then-same-error patterns persist |
|
| `error_after_recovery` | 30 d | recovery-then-same-error patterns persist |
|
||||||
| `correction_free_streak` | 60 d | wins accumulate slowly |
|
| `correction_free_streak` | 60 d | wins accumulate slowly |
|
||||||
| `clean_recovery` | 60 d | wins accumulate slowly |
|
| `clean_recovery` | 60 d | wins accumulate slowly |
|
||||||
@@ -127,6 +128,7 @@ The hook emits these `type` values into the journal:
|
|||||||
| `build_loop` | 2 build/test/compile commands fail in session | session |
|
| `build_loop` | 2 build/test/compile commands fail in session | session |
|
||||||
| `subagent_dispatch_pattern` | same subagent dispatched ≥3× cumulatively | subagent_type |
|
| `subagent_dispatch_pattern` | same subagent dispatched ≥3× cumulatively | subagent_type |
|
||||||
| `silent_drift` | 5 consecutive read-only PostToolUse without an action tool (reset on action or UserPromptSubmit) | `active_skills[0]` |
|
| `silent_drift` | 5 consecutive read-only PostToolUse without an action tool (reset on action or UserPromptSubmit) | `active_skills[0]` |
|
||||||
|
| `file_reread` | same file Read ≥3× in the 10-tool window, ignoring offset/limit (escapes `retry_loop`'s argsHash dedup) | file basename |
|
||||||
| `error_after_recovery` | same error fingerprint returns within 5 PostToolUse of a `clean_recovery` | (`recovered_from`, `original_fp`) |
|
| `error_after_recovery` | same error fingerprint returns within 5 PostToolUse of a `clean_recovery` | (`recovered_from`, `original_fp`) |
|
||||||
| `correction_free_streak` | 5 clean UserPromptSubmits in a row (no correction phrase) | `active_skills[0]` |
|
| `correction_free_streak` | 5 clean UserPromptSubmits in a row (no correction phrase) | `active_skills[0]` |
|
||||||
| `clean_recovery` | 3 clean PostToolUse events after a `tool_error_loop`/`dead_end`/`retry_loop` | (`recovered_from`, `active_skills[0]`) |
|
| `clean_recovery` | 3 clean PostToolUse events after a `tool_error_loop`/`dead_end`/`retry_loop` | (`recovered_from`, `active_skills[0]`) |
|
||||||
@@ -154,13 +156,14 @@ The hook emits these `type` values into the journal:
|
|||||||
- `build_loop`: cluster by `session`.
|
- `build_loop`: cluster by `session`.
|
||||||
- `subagent_dispatch_pattern`: cluster by `subagent_type`.
|
- `subagent_dispatch_pattern`: cluster by `subagent_type`.
|
||||||
- `silent_drift`: cluster by `active_skills[0]` (empty string when no skill is active).
|
- `silent_drift`: cluster by `active_skills[0]` (empty string when no skill is active).
|
||||||
|
- `file_reread`: cluster by file basename (same offset-agnostic same-file re-Read pattern).
|
||||||
- `error_after_recovery`: cluster by (`recovered_from`, `original_fp`).
|
- `error_after_recovery`: cluster by (`recovered_from`, `original_fp`).
|
||||||
- `correction_free_streak`: cluster by `active_skills[0]`. Treat ≥3 streaks across ≥2 sessions naming the same skill as cross-session evidence.
|
- `correction_free_streak`: cluster by `active_skills[0]`. Treat ≥3 streaks across ≥2 sessions naming the same skill as cross-session evidence.
|
||||||
- `clean_recovery`: cluster by (`recovered_from`, `active_skills[0]`). A win cluster qualifies for `skill_edit` only when the named skill exists in `skills_root`.
|
- `clean_recovery`: cluster by (`recovered_from`, `active_skills[0]`). A win cluster qualifies for `skill_edit` only when the named skill exists in `skills_root`.
|
||||||
- `task_completed`: cluster by sorted `tool_kinds` tuple (the multi-tool recipe). Single entry qualifies for `skill_new` proposal (drafting protocol applies). Cross-session evidence requires ≥2 entries from distinct sessions with same tuple — without it, proposal queues, never auto-applies. Run the existing skill-overlap rule before drafting: if the recipe matches an existing skill's name/description tokens, route to `skill_edit` instead.
|
- `task_completed`: cluster by sorted `tool_kinds` tuple (the multi-tool recipe). Single entry qualifies for `skill_new` proposal (drafting protocol applies). Cross-session evidence requires ≥2 entries from distinct sessions with same tuple — without it, proposal queues, never auto-applies. Run the existing skill-overlap rule before drafting: if the recipe matches an existing skill's name/description tokens, route to `skill_edit` instead.
|
||||||
5. **Multi-axis correlation**: for each session that produced ≥2 distinct struggle types (`tool_error_loop`, `dead_end`, `weak_agent`, `retry_loop`, `edit_churn`, `build_loop`, `silent_drift`, `error_after_recovery`), tag clusters from that session as `multi_axis: true`. This grants +1 confidence at scoring.
|
5. **Multi-axis correlation**: for each session that produced ≥2 distinct struggle types (`tool_error_loop`, `dead_end`, `weak_agent`, `retry_loop`, `edit_churn`, `build_loop`, `silent_drift`, `file_reread`, `error_after_recovery`), tag clusters from that session as `multi_axis: true`. This grants +1 confidence at scoring.
|
||||||
|
|
||||||
5b. **Skill-attribution sub-clustering**: after primary clustering (step 4), for every struggle cluster (`tool_error_loop`, `dead_end`, `weak_agent`, `retry_loop`, `edit_churn`, `build_loop`, `silent_drift`, `error_after_recovery`) that contains entries with non-empty `active_skills[0]`:
|
5b. **Skill-attribution sub-clustering**: after primary clustering (step 4), for every struggle cluster (`tool_error_loop`, `dead_end`, `weak_agent`, `retry_loop`, `edit_churn`, `build_loop`, `silent_drift`, `file_reread`, `error_after_recovery`) that contains entries with non-empty `active_skills[0]`:
|
||||||
- Split into per-skill sub-clusters keyed on `active_skills[0]`. Entries with empty `active_skills` stay in the original cluster.
|
- Split into per-skill sub-clusters keyed on `active_skills[0]`. Entries with empty `active_skills` stay in the original cluster.
|
||||||
- If a sub-cluster has ≥3 entries AND names a skill that exists in `skills_root`, mark it as a candidate for `skill_edit` (struggle-driven variant; see "Struggle-driven `skill_edit` eligibility"). Otherwise treat the parent cluster normally.
|
- If a sub-cluster has ≥3 entries AND names a skill that exists in `skills_root`, mark it as a candidate for `skill_edit` (struggle-driven variant; see "Struggle-driven `skill_edit` eligibility"). Otherwise treat the parent cluster normally.
|
||||||
- The umbrella cluster (cross-skill) still emits its usual proposal type (memory, etc.) — sub-clusters do NOT replace it, they supplement it.
|
- The umbrella cluster (cross-skill) still emits its usual proposal type (memory, etc.) — sub-clusters do NOT replace it, they supplement it.
|
||||||
@@ -247,10 +250,12 @@ Required structure:
|
|||||||
|
|
||||||
```markdown
|
```markdown
|
||||||
---
|
---
|
||||||
name: <human-readable name, ≤80 chars>
|
name: <slug — snake_case, MUST equal the target filename without `.md`, e.g. feedback_go_test_cache>
|
||||||
description: <one-line description used to decide future relevance — be specific, ≤200 chars>
|
description: "<one-line used to decide future relevance — be specific, ≤200 chars>"
|
||||||
type: user | feedback | project | reference
|
metadata:
|
||||||
originSessionId: <session_id from journal entries that fed this cluster>
|
node_type: memory
|
||||||
|
type: user | feedback | project | reference
|
||||||
|
originSessionId: <session_id from journal entries that fed this cluster>
|
||||||
---
|
---
|
||||||
|
|
||||||
<Body content per type, see CLAUDE.md memory schema:
|
<Body content per type, see CLAUDE.md memory schema:
|
||||||
@@ -260,12 +265,17 @@ originSessionId: <session_id from journal entries that fed this cluster>
|
|||||||
- reference: pointer to external system + what's there.>
|
- reference: pointer to external system + what's there.>
|
||||||
```
|
```
|
||||||
|
|
||||||
|
The frontmatter MUST match the live auto-memory schema exactly: `name` is the
|
||||||
|
slug (NOT a prose title), and `node_type`, `type`, `originSessionId` live under
|
||||||
|
a `metadata:` block (verify against an existing file in the target memory dir
|
||||||
|
before drafting — match its shape).
|
||||||
|
|
||||||
Constraints:
|
Constraints:
|
||||||
- Frontmatter fields `name`, `description`, `type` are **required**. Skill enforces this at apply time.
|
- Top-level `name` + `description` and nested `metadata.node_type` (always `memory`) + `metadata.type` are **required**. Skill enforces this at apply time.
|
||||||
- `originSessionId` is required — must be a `session` value from one of the cluster's journal entries.
|
- `metadata.originSessionId` is required — must be a `session` value from one of the cluster's journal entries.
|
||||||
- ≤50 LOC of body content. Surgical.
|
- ≤50 LOC of body content. Surgical.
|
||||||
- Slug (used in `target` path filename) must not collide with any existing memory file.
|
- `name`/slug (also the `target` path filename) must not collide with any existing memory file.
|
||||||
- For `type=feedback` and `type=project`, body MUST contain `**Why:**` and `**How to apply:**` lines (CLAUDE.md memory schema).
|
- For `type: feedback` and `type: project`, body MUST contain `**Why:**` and `**How to apply:**` lines (CLAUDE.md memory schema).
|
||||||
|
|
||||||
## Diagnosis drafting protocol (required for every proposal)
|
## Diagnosis drafting protocol (required for every proposal)
|
||||||
|
|
||||||
@@ -352,10 +362,18 @@ The cooldown gate is keyed on **(target_skill, proposal_fingerprint)** — not o
|
|||||||
`proposal_fingerprint` is computed deterministically as `djb2(skill_slug + "\n" + signal_cluster_id + "\n" + normalized_diff_body)` returned as base36, where:
|
`proposal_fingerprint` is computed deterministically as `djb2(skill_slug + "\n" + signal_cluster_id + "\n" + normalized_diff_body)` returned as base36, where:
|
||||||
|
|
||||||
- `skill_slug` — target skill basename (or proposed slug for `skill_new`)
|
- `skill_slug` — target skill basename (or proposed slug for `skill_new`)
|
||||||
- `signal_cluster_id` — the cluster id you assigned in the clustering trace (e.g. `c1`, `tool_error_loop-ECONNREFUSED:5432`)
|
- `signal_cluster_id` — a **stable** cluster id derived from signal type + key (e.g. `tool_error_loop-ECONNREFUSED:5432`), NOT the ephemeral per-run trace id (`c1`). Stability matters: the same logical proposal must hash identically across `/reflect` runs or the cooldown can never match a prior applied/rejected record.
|
||||||
- `normalized_diff_body` — proposal's `# Proposed change` section with all whitespace collapsed to single spaces and trailing newlines stripped
|
- `normalized_diff_body` — proposal's `# Proposed change` section with all whitespace collapsed to single spaces and trimmed
|
||||||
|
|
||||||
Both apply-time and analyst-time checks invoke `adam-cooldown.mjs --skill <slug> --fingerprint <hash>`. The script returns one of `{"status":"cool"}`, `{"status":"cooldown",...}`, or `{"status":"blacklisted",...}`. Auto-apply requires `cool`.
|
Do NOT hand-compute the hash (an LLM cannot reproduce djb2 reliably). Run the canonical implementation (`computeProposalFingerprint()` in `adam-cooldown.mjs`) via Bash, then write the result into frontmatter:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
node ~/.claude/adam/scripts/adam-cooldown.mjs --compute \
|
||||||
|
--skill <slug> --cluster <signal_cluster_id> --diff-file <file-with-Proposed-change-body>
|
||||||
|
# → {"fingerprint":"<djb2_base36>"} (diff body may also be piped on stdin)
|
||||||
|
```
|
||||||
|
|
||||||
|
Both apply-time and analyst-time *gate* checks then invoke `adam-cooldown.mjs --skill <slug> --fingerprint <hash>`. The script returns one of `{"status":"cool"}`, `{"status":"cooldown",...}`, or `{"status":"blacklisted",...}`. Auto-apply requires `cool`.
|
||||||
|
|
||||||
Backward compat: proposals from before this rubric version (no `proposal_fingerprint` field) are treated as `fingerprint = "legacy"`. The cooldown script matches legacy applied/rejected records against any query fingerprint for the same skill — i.e. coarse-grained gating until those records age out of their windows (7d / 30d).
|
Backward compat: proposals from before this rubric version (no `proposal_fingerprint` field) are treated as `fingerprint = "legacy"`. The cooldown script matches legacy applied/rejected records against any query fingerprint for the same skill — i.e. coarse-grained gating until those records age out of their windows (7d / 30d).
|
||||||
|
|
||||||
@@ -373,7 +391,7 @@ The skill (`adam-self-improvement/SKILL.md` §1) runs `adam-score.mjs` immediate
|
|||||||
|
|
||||||
## A/B effectiveness
|
## A/B effectiveness
|
||||||
|
|
||||||
Every auto-applied edit (`skill_edit`, `skill_new`, `memory`, `nudge`, `reinforcement`) gets a one-line tracking entry written to `~/.claude/adam/ab-tracking.jsonl` by `adam-self-improvement/SKILL.md` immediately after the proposal is moved to `applied/`. Schema:
|
Every auto-applied edit (`skill_edit`, `skill_new`, `memory`, `nudge`) gets a one-line tracking entry written to `~/.claude/adam/ab-tracking.jsonl` by `adam-self-improvement/SKILL.md` immediately after the proposal is moved to `applied/`. **`reinforcement` is the one exception — it is a positive-only ledger and is intentionally NOT A/B-tracked (see §"`reinforcement` proposals"), to avoid skewing regression detection.** Schema:
|
||||||
|
|
||||||
```json
|
```json
|
||||||
{"applied_at":<ms>,"proposal_id":"<id>","proposal_type":"...","target_skill":"<slug>","proposal_fingerprint":"<hash>","originating_signals":[{"type":"<signal>","count":<N>,"session_ids":[...]}],"pre_window_days":7}
|
{"applied_at":<ms>,"proposal_id":"<id>","proposal_type":"...","target_skill":"<slug>","proposal_fingerprint":"<hash>","originating_signals":[{"type":"<signal>","count":<N>,"session_ids":[...]}],"pre_window_days":7}
|
||||||
@@ -417,10 +435,10 @@ The matrix goes into the diagnosis output as `keypoints: {tool_selection: N, sco
|
|||||||
|
|
||||||
Sum:
|
Sum:
|
||||||
- Signal repeated ≥3× across ≥2 sessions: **+2**
|
- Signal repeated ≥3× across ≥2 sessions: **+2**
|
||||||
- Struggle signal (`tool_error_loop`, `dead_end`, `weak_agent`, `retry_loop`, `edit_churn`, `build_loop`, `silent_drift`, `error_after_recovery`) appearing ≥1× within a single session: **+2** *(each struggle entry already represents a hook-side threshold crossing — e.g. 8 tools without a prompt, 3 same-args retries, 4 edits to one file, 5 read-only tools in a row, same-fp error after a recovery. Treat each entry as one piece of evidence. Does not stack with the cross-session bonus.)*
|
- Struggle signal (`tool_error_loop`, `dead_end`, `weak_agent`, `retry_loop`, `edit_churn`, `build_loop`, `silent_drift`, `file_reread`, `error_after_recovery`) appearing ≥1× within a single session: **+2** *(each struggle entry already represents a hook-side threshold crossing — e.g. 8 tools without a prompt, 3 same-args retries, 4 edits to one file, 5 read-only tools in a row, same-fp error after a recovery. Treat each entry as one piece of evidence. Does not stack with the cross-session bonus.)*
|
||||||
- Transcript contains positive endorsement (`yes`, `exactly`, `do that`, `keep doing`) within 2 messages of related action: **+2**
|
- Transcript contains positive endorsement (`yes`, `exactly`, `do that`, `keep doing`) within 2 messages of related action: **+2**
|
||||||
- Multi-axis cluster (≥2 distinct struggle types in same session): **+1**
|
- Multi-axis cluster (≥2 distinct struggle types in same session): **+1**
|
||||||
- Cluster severity-sum ≥ 10 (severity per entry = `max(1, floor(count / divisor))` using `SEVERITY_DIVISORS` from `adam-score.mjs` — `dead_end:8, edit_churn:4, tool_error_loop:3, retry_loop:3, weak_agent:2, build_loop:1`; entries without `count` count as 1): **+1**
|
- Cluster severity-sum ≥ 10 (severity per entry = `max(1, floor(count / divisor))` using `SEVERITY_DIVISORS` from `adam-score.mjs` — `dead_end:8, edit_churn:4, tool_error_loop:3, retry_loop:3, file_reread:3, weak_agent:2, build_loop:1`; entries without `count` count as 1): **+1**
|
||||||
- Cluster severity-sum ≥ 32: **+1** *(additive — a severity-sum of 32 gets +1 from the previous bullet AND +1 here, total +2.)*
|
- Cluster severity-sum ≥ 32: **+1** *(additive — a severity-sum of 32 gets +1 from the previous bullet AND +1 here, total +2.)*
|
||||||
- Skill-attributed sub-cluster (≥3 entries naming the same `active_skills[0]` that exists in `skills_root`): **+1**
|
- Skill-attributed sub-cluster (≥3 entries naming the same `active_skills[0]` that exists in `skills_root`): **+1**
|
||||||
- Type-bias penalty from feedback loop (≥3 rejections, applied:rejected ratio <1:2 for this `type`): **-1**
|
- Type-bias penalty from feedback loop (≥3 rejections, applied:rejected ratio <1:2 for this `type`): **-1**
|
||||||
@@ -498,7 +516,7 @@ MOSS's core thesis: "routing, hook ordering, state invariants, and dispatch live
|
|||||||
2. `cross_session_evidence == true` (≥5 occurrences across ≥3 sessions)
|
2. `cross_session_evidence == true` (≥5 occurrences across ≥3 sessions)
|
||||||
3. `auto_apply_eligible: false` — **always**. Harness edits are never auto-applied.
|
3. `auto_apply_eligible: false` — **always**. Harness edits are never auto-applied.
|
||||||
4. `blast_radius: high`
|
4. `blast_radius: high`
|
||||||
5. Proposal includes a `# Test verification` section with the command `bash ~/.claude/adam/tests/run-tests.sh` and the expected result "94 passed, 0 failed" (or current pass count). The skill runs this test before applying.
|
5. Proposal includes a `# Test verification` section with the command `bash ~/.claude/adam/tests/run-tests.sh` and the expected result "140 passed, 0 failed" (or current pass count). The skill runs this test before applying.
|
||||||
6. Change is surgical: ≤30 LOC diff, single file.
|
6. Change is surgical: ≤30 LOC diff, single file.
|
||||||
7. `# Diagnosis` reconstructs the causal chain from harness-level behavior (not from text-artifact behavior). The mismatch must name a specific code path (function, regex, threshold) in the target file.
|
7. `# Diagnosis` reconstructs the causal chain from harness-level behavior (not from text-artifact behavior). The mismatch must name a specific code path (function, regex, threshold) in the target file.
|
||||||
|
|
||||||
@@ -552,7 +570,7 @@ source_entries:
|
|||||||
- "<another ts>"
|
- "<another ts>"
|
||||||
- "..."
|
- "..."
|
||||||
# skill_edit / skill_new — required for cooldown gate (see "Per-(skill, fingerprint) cooldown" below)
|
# skill_edit / skill_new — required for cooldown gate (see "Per-(skill, fingerprint) cooldown" below)
|
||||||
proposal_fingerprint: "<djb2_base36 hash — computed via computeProposalFingerprint() in adam-cooldown.mjs>"
|
proposal_fingerprint: "<djb2_base36 hash — compute via `adam-cooldown.mjs --compute`; see §Per-(skill, fingerprint) cooldown>"
|
||||||
target_skill: "<slug — populated for skill_edit (basename of target dir) and skill_new (proposed slug)>"
|
target_skill: "<slug — populated for skill_edit (basename of target dir) and skill_new (proposed slug)>"
|
||||||
# A/B effectiveness — required on every proposal; consumed at apply time to seed ab-tracking.jsonl
|
# A/B effectiveness — required on every proposal; consumed at apply time to seed ab-tracking.jsonl
|
||||||
originating_signals:
|
originating_signals:
|
||||||
|
|||||||
+89
-4
@@ -1,9 +1,17 @@
|
|||||||
#!/usr/bin/env node
|
#!/usr/bin/env node
|
||||||
// adam-nudge.mjs — SessionStart hook. Prints two kinds of reminders:
|
// adam-nudge.mjs — SessionStart hook. Prints reminders:
|
||||||
// 1. Pending proposals (≥3 queued in adam/proposals/).
|
// 1. Pending proposals (≥3 queued in adam/proposals/).
|
||||||
// 2. Cross-session nudges (entries in adam/active-nudges.json whose
|
// 2. Cross-session nudges (entries in adam/active-nudges.json whose
|
||||||
// source_session differs from the current session and that haven't
|
// source_session differs from the current session and that haven't
|
||||||
// expired or exhausted their max_displays).
|
// expired or exhausted their max_displays).
|
||||||
|
// 3. Pending local-edit upgrades (`.adam-new` sidecars).
|
||||||
|
// 4. New-release notice: if a newer GitHub release exists than the installed
|
||||||
|
// `.version`, print a notify-only one-line update prompt. Cached + checked
|
||||||
|
// at most once/day, network call hard-capped at 1.5s, fully best-effort —
|
||||||
|
// never blocks SessionStart. Opt out with ADAM_NO_UPDATE_CHECK=1.
|
||||||
|
// NOTE: notify-only by design — applying an update re-runs install.sh,
|
||||||
|
// which resets ADAM's own /reflect-applied skill edits. The user chooses
|
||||||
|
// when to accept that, so we never auto-install.
|
||||||
import { readdirSync, readFileSync, writeFileSync, existsSync } from "node:fs";
|
import { readdirSync, readFileSync, writeFileSync, existsSync } from "node:fs";
|
||||||
import { join } from "node:path";
|
import { join } from "node:path";
|
||||||
import { homedir } from "node:os";
|
import { homedir } from "node:os";
|
||||||
@@ -14,7 +22,13 @@ const ADAM_ROOT = join(CLAUDE_ROOT, "adam");
|
|||||||
const PROPOSALS = join(ADAM_ROOT, "proposals");
|
const PROPOSALS = join(ADAM_ROOT, "proposals");
|
||||||
const NUDGES_FILE = join(ADAM_ROOT, "active-nudges.json");
|
const NUDGES_FILE = join(ADAM_ROOT, "active-nudges.json");
|
||||||
const STATE_FILE = join(ADAM_ROOT, "state.json");
|
const STATE_FILE = join(ADAM_ROOT, "state.json");
|
||||||
|
const VERSION_FILE = join(ADAM_ROOT, ".version");
|
||||||
|
const UPDATE_CHECK_FILE = join(ADAM_ROOT, ".update-check.json");
|
||||||
const THRESHOLD = 3;
|
const THRESHOLD = 3;
|
||||||
|
const UPDATE_CHECK_INTERVAL_MS = 24 * 60 * 60 * 1000;
|
||||||
|
const UPDATE_FETCH_TIMEOUT_MS = 1500;
|
||||||
|
const RELEASES_API = "https://api.github.com/repos/lukaszraczylo/claude-adam/releases/latest";
|
||||||
|
const INSTALL_ONELINER = "curl -fsSL https://raw.githubusercontent.com/lukaszraczylo/claude-adam/main/install.sh | bash";
|
||||||
|
|
||||||
// Known installable paths (mirrors install.sh copy_file list). Checking a
|
// Known installable paths (mirrors install.sh copy_file list). Checking a
|
||||||
// fixed shortlist keeps SessionStart latency under control vs full FS walk.
|
// fixed shortlist keeps SessionStart latency under control vs full FS walk.
|
||||||
@@ -33,6 +47,9 @@ const PENDING_CHECK_PATHS = [
|
|||||||
"adam/scripts/adam-score.mjs",
|
"adam/scripts/adam-score.mjs",
|
||||||
"adam/scripts/adam-ab-measure.mjs",
|
"adam/scripts/adam-ab-measure.mjs",
|
||||||
"adam/scripts/adam-apply-reinforcement.mjs",
|
"adam/scripts/adam-apply-reinforcement.mjs",
|
||||||
|
"adam/scripts/adam-utils.mjs",
|
||||||
|
"adam/scripts/adam-batch.mjs",
|
||||||
|
"adam/scripts/adam-rollback.mjs",
|
||||||
"adam/tests/run-tests.sh",
|
"adam/tests/run-tests.sh",
|
||||||
];
|
];
|
||||||
|
|
||||||
@@ -115,7 +132,75 @@ function emitPendingUpgrades() {
|
|||||||
} catch { /* never break SessionStart */ }
|
} catch { /* never break SessionStart */ }
|
||||||
}
|
}
|
||||||
|
|
||||||
function main() {
|
// --- update notifier (notify-only; see header note) ---
|
||||||
|
|
||||||
|
function readVersion() {
|
||||||
|
try { return readFileSync(VERSION_FILE, "utf8").trim() || null; } catch { return null; }
|
||||||
|
}
|
||||||
|
|
||||||
|
// Parse "vX.Y.Z" (leading v optional; pre-release/build suffix ignored).
|
||||||
|
function parseSemver(s) {
|
||||||
|
if (typeof s !== "string") return null;
|
||||||
|
const m = s.trim().replace(/^v/i, "").match(/^(\d+)\.(\d+)\.(\d+)/);
|
||||||
|
return m ? [Number(m[1]), Number(m[2]), Number(m[3])] : null;
|
||||||
|
}
|
||||||
|
|
||||||
|
// isNewer(a, b): true iff version a is strictly newer than b. Unparseable → false.
|
||||||
|
function isNewer(a, b) {
|
||||||
|
const pa = parseSemver(a), pb = parseSemver(b);
|
||||||
|
if (!pa || !pb) return false;
|
||||||
|
for (let i = 0; i < 3; i++) { if (pa[i] !== pb[i]) return pa[i] > pb[i]; }
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
async function fetchLatestTag() {
|
||||||
|
// Best-effort, hard-capped. Any failure (offline / timeout / rate-limit /
|
||||||
|
// parse / fetch-unavailable) returns null and the caller silently skips.
|
||||||
|
try {
|
||||||
|
if (typeof fetch !== "function") return null;
|
||||||
|
const ctrl = new AbortController();
|
||||||
|
const timer = setTimeout(() => ctrl.abort(), UPDATE_FETCH_TIMEOUT_MS);
|
||||||
|
let tag = null;
|
||||||
|
try {
|
||||||
|
const res = await fetch(RELEASES_API, {
|
||||||
|
signal: ctrl.signal,
|
||||||
|
headers: { "User-Agent": "claude-adam-nudge", "Accept": "application/vnd.github+json" },
|
||||||
|
});
|
||||||
|
if (res && res.ok) {
|
||||||
|
const j = await res.json();
|
||||||
|
if (j && typeof j.tag_name === "string") tag = j.tag_name;
|
||||||
|
}
|
||||||
|
} finally { clearTimeout(timer); }
|
||||||
|
return tag;
|
||||||
|
} catch { return null; }
|
||||||
|
}
|
||||||
|
|
||||||
|
function printUpdateNudge(latest, installed) {
|
||||||
|
process.stdout.write(
|
||||||
|
`[adam] update available: ${installed} → ${latest}. Apply: ${INSTALL_ONELINER}\n` +
|
||||||
|
` (re-runs install.sh — resets ADAM's own /reflect-applied skill edits; apply when you're ready)\n`
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
async function emitUpdateCheck() {
|
||||||
|
if (process.env.ADAM_NO_UPDATE_CHECK) return; // explicit opt-out
|
||||||
|
const installed = readVersion();
|
||||||
|
if (!installed) return; // no marker → nothing to compare
|
||||||
|
const cache = readJson(UPDATE_CHECK_FILE, {}) || {};
|
||||||
|
const now = Date.now();
|
||||||
|
let nudged = false;
|
||||||
|
// Instant nudge from cache (no network).
|
||||||
|
if (cache.latest && isNewer(cache.latest, installed)) { printUpdateNudge(cache.latest, installed); nudged = true; }
|
||||||
|
// Refresh cache at most once/day, best-effort — drives the nudge on the NEXT run.
|
||||||
|
if (!cache.last_check || (now - Number(cache.last_check)) > UPDATE_CHECK_INTERVAL_MS) {
|
||||||
|
const latest = await fetchLatestTag();
|
||||||
|
const next = { last_check: now, latest: latest || cache.latest || null };
|
||||||
|
try { writeFileSync(UPDATE_CHECK_FILE, JSON.stringify(next)); } catch { /* swallow */ }
|
||||||
|
if (latest && !nudged && isNewer(latest, installed)) printUpdateNudge(latest, installed);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
async function main() {
|
||||||
const stdinSession = readSessionInput();
|
const stdinSession = readSessionInput();
|
||||||
const stateSession = (() => {
|
const stateSession = (() => {
|
||||||
const st = readJson(STATE_FILE, null);
|
const st = readJson(STATE_FILE, null);
|
||||||
@@ -125,7 +210,7 @@ function main() {
|
|||||||
emitProposalReminder();
|
emitProposalReminder();
|
||||||
emitActiveNudges(currentSession);
|
emitActiveNudges(currentSession);
|
||||||
emitPendingUpgrades();
|
emitPendingUpgrades();
|
||||||
|
await emitUpdateCheck();
|
||||||
}
|
}
|
||||||
|
|
||||||
try { main(); } catch { /* never block SessionStart */ }
|
main().catch(() => { /* never block SessionStart */ }).finally(() => process.exit(0));
|
||||||
process.exit(0);
|
|
||||||
|
|||||||
@@ -105,11 +105,13 @@ const SUBAGENT_DISPATCH_THRESHOLD = 3;
|
|||||||
const CORRECTION_FREE_THRESHOLD = 5;
|
const CORRECTION_FREE_THRESHOLD = 5;
|
||||||
const CLEAN_RECOVERY_WINDOW = 3;
|
const CLEAN_RECOVERY_WINDOW = 3;
|
||||||
const SILENT_DRIFT_THRESHOLD = 5;
|
const SILENT_DRIFT_THRESHOLD = 5;
|
||||||
|
const FILE_REREAD_THRESHOLD = 3;
|
||||||
const ERROR_AFTER_RECOVERY_WINDOW = 5;
|
const ERROR_AFTER_RECOVERY_WINDOW = 5;
|
||||||
const RECENT_RECOVERIES_MAX = 3;
|
const RECENT_RECOVERIES_MAX = 3;
|
||||||
const STRUGGLE_TYPES = new Set([
|
const STRUGGLE_TYPES = new Set([
|
||||||
"tool_error_loop", "dead_end", "retry_loop", "weak_agent",
|
"tool_error_loop", "dead_end", "retry_loop", "weak_agent",
|
||||||
"edit_churn", "build_loop", "silent_drift", "error_after_recovery",
|
"edit_churn", "build_loop", "silent_drift", "error_after_recovery",
|
||||||
|
"file_reread",
|
||||||
]);
|
]);
|
||||||
const ACTIVE_SKILLS_LOOKBACK = 10;
|
const ACTIVE_SKILLS_LOOKBACK = 10;
|
||||||
const TASK_TOOL_MIN = 5;
|
const TASK_TOOL_MIN = 5;
|
||||||
@@ -447,6 +449,10 @@ function main() {
|
|||||||
const emit = (entry) => {
|
const emit = (entry) => {
|
||||||
if (STRUGGLE_TYPES.has(entry.type)) {
|
if (STRUGGLE_TYPES.has(entry.type)) {
|
||||||
entry.context_window = snapshotContext(state);
|
entry.context_window = snapshotContext(state);
|
||||||
|
// Struggle signals carry the active skill set so the analyst can run
|
||||||
|
// skill-attribution sub-clustering (agents/adam.md §5b) and so silent_drift
|
||||||
|
// — whose primary cluster key IS active_skills[0] — clusters correctly.
|
||||||
|
if (entry.active_skills === undefined) entry.active_skills = activeNames(state, "skill");
|
||||||
struggleEmittedThisTurn = entry.type;
|
struggleEmittedThisTurn = entry.type;
|
||||||
}
|
}
|
||||||
appendJournal(entry);
|
appendJournal(entry);
|
||||||
@@ -466,6 +472,16 @@ function main() {
|
|||||||
emit({ ts, session, cwd, type: "retry_loop", tool, count: sameToolArgs });
|
emit({ ts, session, cwd, type: "retry_loop", tool, count: sameToolArgs });
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Offset-aware same-file reread: consecutive Reads of the same file_path
|
||||||
|
// (ignoring offset/limit) escape the argsHash-based retry_loop dedup above.
|
||||||
|
// Emit a distinct, actionable signal instead of leaking into tool_error_loop.
|
||||||
|
if (READ_ONLY_TOOLS.has(tool) && file) {
|
||||||
|
const sameFileReads = state.tool_window.filter(e => e.tool === tool && e.file === file).length;
|
||||||
|
if (sameFileReads >= FILE_REREAD_THRESHOLD && sameToolArgs < RETRY_THRESHOLD) {
|
||||||
|
emit({ ts, session, cwd, type: "file_reread", tool, file, count: sameFileReads });
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
if (READ_ONLY_TOOLS.has(tool)) {
|
if (READ_ONLY_TOOLS.has(tool)) {
|
||||||
state.silentDriftCounter += 1;
|
state.silentDriftCounter += 1;
|
||||||
if (state.silentDriftCounter >= SILENT_DRIFT_THRESHOLD && !state.silentDriftEmitted) {
|
if (state.silentDriftCounter >= SILENT_DRIFT_THRESHOLD && !state.silentDriftEmitted) {
|
||||||
|
|||||||
+16
-1
@@ -126,7 +126,8 @@ copy_file "$SRC/adam/scripts/adam-archive.mjs" "$DEST/adam
|
|||||||
copy_file "$SRC/adam/scripts/adam-upgrade.mjs" "$DEST/adam/scripts/adam-upgrade.mjs"
|
copy_file "$SRC/adam/scripts/adam-upgrade.mjs" "$DEST/adam/scripts/adam-upgrade.mjs"
|
||||||
# v0.3.3 helper scripts — invoked from SKILL.md / hooks / analyst flow
|
# v0.3.3 helper scripts — invoked from SKILL.md / hooks / analyst flow
|
||||||
for _adam_script in adam-utils adam-window adam-explain adam-nudge-eligibility adam-cooldown \
|
for _adam_script in adam-utils adam-window adam-explain adam-nudge-eligibility adam-cooldown \
|
||||||
adam-score adam-ab-measure adam-apply-reinforcement adam-batch adam-rollback; do
|
adam-score adam-ab-measure adam-apply-reinforcement adam-batch adam-rollback \
|
||||||
|
adam-skill-utility; do
|
||||||
copy_file "$SRC/adam/scripts/${_adam_script}.mjs" \
|
copy_file "$SRC/adam/scripts/${_adam_script}.mjs" \
|
||||||
"$DEST/adam/scripts/${_adam_script}.mjs"
|
"$DEST/adam/scripts/${_adam_script}.mjs"
|
||||||
run "chmod +x \"$DEST/adam/scripts/${_adam_script}.mjs\""
|
run "chmod +x \"$DEST/adam/scripts/${_adam_script}.mjs\""
|
||||||
@@ -143,6 +144,20 @@ copy_file "$SRC/adam/tests/fixtures/seed-corrections.jsonl" "$DEST/adam
|
|||||||
# install marker — used by future runs to detect local mtime drift
|
# install marker — used by future runs to detect local mtime drift
|
||||||
run "touch \"$DEST/adam/.install-marker\""
|
run "touch \"$DEST/adam/.install-marker\""
|
||||||
|
|
||||||
|
# version marker — records the installed release tag for the update notifier
|
||||||
|
# (adam-nudge.mjs compares it against the latest GitHub release).
|
||||||
|
ADAM_VERSION=""
|
||||||
|
if [ -n "$VERSION" ]; then
|
||||||
|
ADAM_VERSION="$VERSION"
|
||||||
|
elif [ "$PIPED" = 1 ] && [ -n "${REF:-}" ]; then
|
||||||
|
ADAM_VERSION="$REF"
|
||||||
|
else
|
||||||
|
ADAM_VERSION="$(git -C "$SRC" describe --tags --abbrev=0 2>/dev/null || true)"
|
||||||
|
fi
|
||||||
|
[ -z "$ADAM_VERSION" ] && ADAM_VERSION="unknown"
|
||||||
|
run "printf '%s\\n' \"$ADAM_VERSION\" > \"$DEST/adam/.version\""
|
||||||
|
log " version marker: $ADAM_VERSION"
|
||||||
|
|
||||||
# --------------------------------------------------------------------- settings.json
|
# --------------------------------------------------------------------- settings.json
|
||||||
SETTINGS="$DEST/settings.json"
|
SETTINGS="$DEST/settings.json"
|
||||||
EXAMPLE="$SRC/settings.json.example"
|
EXAMPLE="$SRC/settings.json.example"
|
||||||
|
|||||||
@@ -81,6 +81,14 @@ node ~/.claude/adam/scripts/adam-batch.mjs --input /tmp/adam-windowed-journal.js
|
|||||||
|
|
||||||
This groups entries by (signal_type, cluster_key) and reports per-batch metadata including `has_context_window` (whether transcript evidence is attached). If the script fails: log stderr, pass `null` to the analyst (graceful degradation — analyst falls back to raw journal clustering).
|
This groups entries by (signal_type, cluster_key) and reports per-batch metadata including `has_context_window` (whether transcript evidence is attached). If the script fails: log stderr, pass `null` to the analyst (graceful degradation — analyst falls back to raw journal clustering).
|
||||||
|
|
||||||
|
**Skill utility** (execution-grounded selection signal, in the spirit of SkillsInjector arXiv 2605.29794 — utility Δ(s), not surface match): compute per-skill good:bad outcome ratios over the windowed journal:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
node ~/.claude/adam/scripts/adam-skill-utility.mjs --input /tmp/adam-windowed-journal.jsonl --json > /tmp/adam-skill-utility.json 2> /tmp/adam-skill-utility.log
|
||||||
|
```
|
||||||
|
|
||||||
|
This ranks skills by how often they co-occur with positive (`task_completed`, `clean_recovery`, `correction_free_streak`) vs negative outcome events, surfacing skills below the baseline positive rate (with sufficient sample) — advisory candidates for description disambiguation or archival. **CO-OCCURRENCE, NOT CAUSATION**: display the worst 3 below-baseline skills (`lift < 0`, not low-sample) to the *user* as a one-line advisory before listing proposals (e.g. `skill-utility: chezmoi 9% pos n=85, ghostty-config 14% pos n=50, …`). Do NOT feed this into the analyst's proposal machinery or auto-draft skill-archival from it — the human decides. If the script fails: log stderr, skip (best-effort).
|
||||||
|
|
||||||
### 2. Dispatch the analyst (two-stage pipeline)
|
### 2. Dispatch the analyst (two-stage pipeline)
|
||||||
|
|
||||||
MOSS §3.3: "A single prompt asked to diagnose, plan, implement, verify, and decide overloads context and produces lower-quality output than a sequenced flow." The analyst is dispatched in two stages with a validation gate between them.
|
MOSS §3.3: "A single prompt asked to diagnose, plan, implement, verify, and decide overloads context and produces lower-quality output than a sequenced flow." The analyst is dispatched in two stages with a validation gate between them.
|
||||||
@@ -215,13 +223,13 @@ For each id that passed verification:
|
|||||||
8. Add `last_auto_edit: <iso8601 utc now>` to the proposal frontmatter before moving it.
|
8. Add `last_auto_edit: <iso8601 utc now>` to the proposal frontmatter before moving it.
|
||||||
9. Tell user: "skill `<slug>` extended (added <N> lines) — auto-applied via win-evidence gate."
|
9. Tell user: "skill `<slug>` extended (added <N> lines) — auto-applied via win-evidence gate."
|
||||||
- Move proposal to `~/.claude/adam/applied/<UTC-ts>-<id>.md`.
|
- Move proposal to `~/.claude/adam/applied/<UTC-ts>-<id>.md`.
|
||||||
- **A/B tracking append**: as a separate atomic step right after the move, append one JSON line to `~/.claude/adam/ab-tracking.jsonl` (create with empty contents if absent). Read fields from the proposal's frontmatter (`proposal_fingerprint`, `originating_signals` — both populated per `agents/adam.md`; `originating_signals` is a list of `{type, count, session_ids}` objects). Schema:
|
- **A/B tracking append** (skip for `reinforcement` — positive-only ledger, intentionally not A/B-tracked per `agents/adam.md` §"`reinforcement` proposals"): as a separate atomic step right after the move, append one JSON line to `~/.claude/adam/ab-tracking.jsonl` (create with empty contents if absent). Read fields from the proposal's frontmatter (`proposal_fingerprint`, `originating_signals` — both populated per `agents/adam.md`; `originating_signals` is a list of `{type, count, session_ids}` objects). Schema:
|
||||||
|
|
||||||
```json
|
```json
|
||||||
{
|
{
|
||||||
"applied_at": <unix_ms now>,
|
"applied_at": <unix_ms now>,
|
||||||
"proposal_id": "<id>",
|
"proposal_id": "<id>",
|
||||||
"proposal_type": "skill_edit|skill_new|memory|nudge|reinforcement",
|
"proposal_type": "skill_edit|skill_new|memory|nudge",
|
||||||
"target_skill": "<slug or target basename>",
|
"target_skill": "<slug or target basename>",
|
||||||
"proposal_fingerprint": "<hash>",
|
"proposal_fingerprint": "<hash>",
|
||||||
"originating_signals": [{"type":"<signal>","count":<N>,"session_ids":[...]}],
|
"originating_signals": [{"type":"<signal>","count":<N>,"session_ids":[...]}],
|
||||||
@@ -300,7 +308,7 @@ Before writing any proposal:
|
|||||||
- For `skill_new`: confirm the slug doesn't collide with any existing skill in `~/.claude/skills/`. If it does, refuse and ask user to rename.
|
- For `skill_new`: confirm the slug doesn't collide with any existing skill in `~/.claude/skills/`. If it does, refuse and ask user to rename.
|
||||||
- For `skill_edit`: confirm the diff is append-only (no `-` lines that remove existing content) and that target SKILL.md exists. When auto-applying, ALSO re-verify the eligibility gate steps in §3 (cooldown, blacklist, byte cap) before any `Edit` call — never trust frontmatter alone.
|
- For `skill_edit`: confirm the diff is append-only (no `-` lines that remove existing content) and that target SKILL.md exists. When auto-applying, ALSO re-verify the eligibility gate steps in §3 (cooldown, blacklist, byte cap) before any `Edit` call — never trust frontmatter alone.
|
||||||
- For `skill_edit` with `auto_apply_eligible: true`: confirm `contradiction_flag` is absent or null in frontmatter. Refuse auto-apply if `contradiction_flag` is set with any non-empty value (treat the agent's flag as a hard veto on auto-apply; user can still manually approve in walk-the-queue if they disagree with the heuristic).
|
- For `skill_edit` with `auto_apply_eligible: true`: confirm `contradiction_flag` is absent or null in frontmatter. Refuse auto-apply if `contradiction_flag` is set with any non-empty value (treat the agent's flag as a hard veto on auto-apply; user can still manually approve in walk-the-queue if they disagree with the heuristic).
|
||||||
- For `memory`: confirm `# Proposed change` body starts with `---` frontmatter containing required fields `name`, `description`, `type`, `originSessionId`. Refuse if frontmatter missing — agent must redraft per the Memory drafting protocol.
|
- For `memory`: confirm `# Proposed change` body starts with `---` frontmatter matching the live auto-memory schema — top-level `name` (the slug) + `description`, plus a `metadata:` block with `node_type: memory`, `type`, and `originSessionId`. Cross-check the shape against an existing file in the target memory dir. Refuse if frontmatter is flat (`type:`/`originSessionId:` at top level) or missing the `metadata:` block — agent must redraft per the Memory drafting protocol.
|
||||||
- For `harness_edit`: confirm `auto_apply_eligible: false` (never auto-apply). Confirm `confidence ≥ 5`. Confirm `# Test verification` section names the test command. Confirm diff is ≤30 LOC and targets a single allowed harness file (see `agents/adam.md` §"Harness self-modification"). Run test suite before AND after applying — revert on any regression.
|
- For `harness_edit`: confirm `auto_apply_eligible: false` (never auto-apply). Confirm `confidence ≥ 5`. Confirm `# Test verification` section names the test command. Confirm diff is ≤30 LOC and targets a single allowed harness file (see `agents/adam.md` §"Harness self-modification"). Run test suite before AND after applying — revert on any regression.
|
||||||
- Confirm `source_entries` is present in proposal frontmatter as a non-empty list (used for archive). Warn (do not refuse) if missing — legacy proposals from before v0.2.0 won't have it.
|
- Confirm `source_entries` is present in proposal frontmatter as a non-empty list (used for archive). Warn (do not refuse) if missing — legacy proposals from before v0.2.0 won't have it.
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user