claw-code/ROADMAP.md

# ROADMAP.md

# Clawable Coding Harness Roadmap

## Goal

Turn claw-code into the most **clawable** coding harness:
- no human-first terminal assumptions
- no fragile prompt injection timing
- no opaque session state
- no hidden plugin or MCP failures
- no manual babysitting for routine recovery

This roadmap assumes the primary users are **claws wired through hooks, plugins, sessions, and channel events**.

## Definition of "clawable"

A clawable harness is:
- deterministic to start
- machine-readable in state and failure modes
- recoverable without a human watching the terminal
- branch/test/worktree aware
- plugin/MCP lifecycle aware
- event-first, not log-first
- capable of autonomous next-step execution

## Current Pain Points

### 1. Session boot is fragile
- trust prompts can block TUI startup
- prompts can land in the shell instead of the coding agent
- "session exists" does not mean "session is ready"

### 2. Truth is split across layers
- tmux state
- clawhip event stream
- git/worktree state
- test state
- gateway/plugin/MCP runtime state

### 3. Events are too log-shaped
- claws currently infer too much from noisy text
- important states are not normalized into machine-readable events

### 4. Recovery loops are too manual
- restart worker
- accept trust prompt
- re-inject prompt
- detect stale branch
- retry failed startup
- classify infra vs code failures manually

### 5. Branch freshness is not enforced enough
- side branches can miss already-landed main fixes
- broad test failures can be stale-branch noise instead of real regressions

### 6. Plugin/MCP failures are under-classified
- startup failures, handshake failures, config errors, partial startup, and degraded mode are not exposed cleanly enough

### 7. Human UX still leaks into claw workflows
- too much depends on terminal/TUI behavior instead of explicit agent state transitions and control APIs

## Product Principles

1. **State machine first** — every worker has explicit lifecycle states.
2. **Events over scraped prose** — channel output should be derived from typed events.
3. **Recovery before escalation** — known failure modes should auto-heal once before asking for help.
4. **Branch freshness before blame** — detect stale branches before treating red tests as new regressions.
5. **Partial success is first-class** — e.g. MCP startup can succeed for some servers and fail for others, with structured degraded-mode reporting.
6. **Terminal is transport, not truth** — tmux/TUI may remain implementation details, but orchestration state must live above them.
7. **Policy is executable** — merge, retry, rebase, stale cleanup, and escalation rules should be machine-enforced.

## Roadmap

## Phase 1 — Reliable Worker Boot

### 1. Ready-handshake lifecycle for coding workers
Add explicit states:
- `spawning`
- `trust_required`
- `ready_for_prompt`
- `prompt_accepted`
- `running`
- `blocked`
- `finished`
- `failed`

Acceptance:
- prompts are never sent before `ready_for_prompt`
- trust prompt state is detectable and emitted
- shell misdelivery becomes detectable as a first-class failure state

### 2. Trust prompt resolver
Add allowlisted auto-trust behavior for known repos/worktrees.

Acceptance:
- trusted repos auto-clear trust prompts
- events emitted for `trust_required` and `trust_resolved`
- non-allowlisted repos remain gated

### 3. Structured session control API
Provide machine control above tmux:
- create worker
- await ready
- send task
- fetch state
- fetch last error
- restart worker
- terminate worker

Acceptance:
- a claw can operate a coding worker without raw send-keys as the primary control plane

## Phase 2 — Event-Native Clawhip Integration

### 4. Canonical lane event schema
Define typed events such as:
- `lane.started`
- `lane.ready`
- `lane.prompt_misdelivery`
- `lane.blocked`
- `lane.red`
- `lane.green`
- `lane.commit.created`
- `lane.pr.opened`
- `lane.merge.ready`
- `lane.finished`
- `lane.failed`
- `branch.stale_against_main`

Acceptance:
- clawhip consumes typed lane events
- Discord summaries are rendered from structured events instead of pane scraping alone

### 5. Failure taxonomy
Normalize failure classes:
- `prompt_delivery`
- `trust_gate`
- `branch_divergence`
- `compile`
- `test`
- `plugin_startup`
- `mcp_startup`
- `mcp_handshake`
- `gateway_routing`
- `tool_runtime`
- `infra`

Acceptance:
- blockers are machine-classified
- dashboards and retry policies can branch on failure type

### 6. Actionable summary compression
Collapse noisy event streams into:
- current phase
- last successful checkpoint
- current blocker
- recommended next recovery action

Acceptance:
- channel status updates stay short and machine-grounded
- claws stop inferring state from raw build spam

## Phase 3 — Branch/Test Awareness and Auto-Recovery

### 7. Stale-branch detection before broad verification
Before broad test runs, compare current branch to `main` and detect if known fixes are missing.

Acceptance:
- emit `branch.stale_against_main`
- suggest or auto-run rebase/merge-forward according to policy
- avoid misclassifying stale-branch failures as new regressions

### 8. Recovery recipes for common failures
Encode known automatic recoveries for:
- trust prompt unresolved
- prompt delivered to shell
- stale branch
- compile red after cross-crate refactor
- MCP startup handshake failure
- partial plugin startup

Acceptance:
- one automatic recovery attempt occurs before escalation
- the attempted recovery is itself emitted as structured event data

### 9. Green-ness contract
Workers should distinguish:
- targeted tests green
- package green
- workspace green
- merge-ready green

Acceptance:
- no more ambiguous "tests passed" messaging
- merge policy can require the correct green level for the lane type
- a single hung test must not mask other failures: enforce per-test
  timeouts in CI (`cargo test --workspace`) so a 6-minute hang in one
  crate cannot prevent downstream crates from running their suites
- when a CI job fails because of a hang, the worker must report it as
  `test.hung` rather than a generic failure, so triage doesn't conflate
  it with a normal `assertion failed`
- recorded pinpoint (2026-04-08): `be561bf` swapped the local
  byte-estimate preflight for a `count_tokens` round-trip and silently
  returned `Ok(())` on any error, so `send_message_blocks_oversized_*`
  hung for ~6 minutes per attempt; the resulting workspace job crash
  hid 6 *separate* pre-existing CLI regressions (compact flag
  discarded, piped stdin vs permission prompter, legacy session layout,
  help/prompt assertions, mock harness count) that only became
  diagnosable after `8c6dfe5` + `5851f2d` restored the fast-fail path

## Phase 4 — Claws-First Task Execution

### 10. Typed task packet format
Define a structured task packet with fields like:
- objective
- scope
- repo/worktree
- branch policy
- acceptance tests
- commit policy
- reporting contract
- escalation policy

Acceptance:
- claws can dispatch work without relying on long natural-language prompt blobs alone
- task packets can be logged, retried, and transformed safely

### 11. Policy engine for autonomous coding
Encode automation rules such as:
- if green + scoped diff + review passed -> merge to dev
- if stale branch -> merge-forward before broad tests
- if startup blocked -> recover once, then escalate
- if lane completed -> emit closeout and cleanup session

Acceptance:
- doctrine moves from chat instructions into executable rules

### 12. Claw-native dashboards / lane board
Expose a machine-readable board of:
- repos
- active claws
- worktrees
- branch freshness
- red/green state
- current blocker
- merge readiness
- last meaningful event

Acceptance:
- claws can query status directly
- human-facing views become a rendering layer, not the source of truth

## Phase 5 — Plugin and MCP Lifecycle Maturity

### 13. First-class plugin/MCP lifecycle contract
Each plugin/MCP integration should expose:
- config validation contract
- startup healthcheck
- discovery result
- degraded-mode behavior
- shutdown/cleanup contract

Acceptance:
- partial-startup and per-server failures are reported structurally
- successful servers remain usable even when one server fails

### 14. MCP end-to-end lifecycle parity
Close gaps from:
- config load
- server registration
- spawn/connect
- initialize handshake
- tool/resource discovery
- invocation path
- error surfacing
- shutdown/cleanup

Acceptance:
- parity harness and runtime tests cover healthy and degraded startup cases
- broken servers are surfaced as structured failures, not opaque warnings

## Immediate Backlog (from current real pain)

Priority order: P0 = blocks CI/green state, P1 = blocks integration wiring, P2 = clawability hardening, P3 = swarm-efficiency improvements.

**P0 — Fix first (CI reliability)**
1. Isolate `render_diff_report` tests into tmpdir — **done**: `render_diff_report_for()` tests run in temp git repos instead of the live working tree, and targeted `cargo test -p rusty-claude-cli render_diff_report -- --nocapture` now stays green during branch/worktree activity
2. Expand GitHub CI from single-crate coverage to workspace-grade verification — **done**: `.github/workflows/rust-ci.yml` now runs `cargo test --workspace` plus fmt/clippy at the workspace level
3. Add release-grade binary workflow — **done**: `.github/workflows/release.yml` now builds tagged Rust release artifacts for the CLI
4. Add container-first test/run docs — **done**: `Containerfile` + `docs/container.md` document the canonical Docker/Podman workflow for build, bind-mount, and `cargo test --workspace` usage
5. Surface `doctor` / preflight diagnostics in onboarding docs and help — **done**: README + USAGE now put `claw doctor` / `/doctor` in the first-run path and point at the built-in preflight report
6. Automate branding/source-of-truth residue checks in CI — **done**: `.github/scripts/check_doc_source_of_truth.py` and the `doc-source-of-truth` CI job now block stale repo/org/invite residue in tracked docs and metadata
7. Eliminate warning spam from first-run help/build path — **done**: current `cargo run -q -p rusty-claude-cli -- --help` renders clean help output without a warning wall before the product surface
8. Promote `doctor` from slash-only to top-level CLI entrypoint — **done**: `claw doctor` is now a local shell entrypoint with regression coverage for direct help and health-report output
9. Make machine-readable status commands actually machine-readable — **done**: `claw --output-format json status` and `claw --output-format json sandbox` now emit structured JSON snapshots instead of prose tables
10. Unify legacy config/skill namespaces in user-facing output — **done**: skills/help JSON/text output now present `.claw` as the canonical namespace and collapse legacy roots behind `.claw`-shaped source ids/labels
11. Honor JSON output on inventory commands like `skills` and `mcp` — **done**: direct CLI inventory commands now honor `--output-format json` with structured payloads for both skills and MCP inventory
12. Audit `--output-format` contract across the whole CLI surface — **done**: direct CLI commands now honor deterministic JSON/text handling across help/version/status/sandbox/agents/mcp/skills/bootstrap-plan/system-prompt/init/doctor, with regression coverage in `output_format_contract.rs` and resumed `/status` JSON coverage

**P1 — Next (integration wiring, unblocks verification)**
2. Add cross-module integration tests — **done**: 12 integration tests covering worker→recovery→policy, stale_branch→policy, green_contract→policy, reconciliation flows
3. Wire lane-completion emitter — **done**: `lane_completion` module with `detect_lane_completion()` auto-sets `LaneContext::completed` from session-finished + tests-green + push-complete → policy closeout
4. Wire `SummaryCompressor` into the lane event pipeline — **done**: `compress_summary_text()` feeds into `LaneEvent::Finished` detail field in `tools/src/lib.rs`

**P2 — Clawability hardening (original backlog)**
5. Worker readiness handshake + trust resolution — **done**: `WorkerStatus` state machine with `Spawning` → `TrustRequired` → `ReadyForPrompt` → `PromptAccepted` → `Running` lifecycle, `trust_auto_resolve` + `trust_gate_cleared` gating
6. Prompt misdelivery detection and recovery — **done**: `prompt_delivery_attempts` counter, `PromptMisdelivery` event detection, `auto_recover_prompt_misdelivery` + `replay_prompt` recovery arm
7. Canonical lane event schema in clawhip — **done**: `LaneEvent` enum with `Started/Blocked/Failed/Finished` variants, `LaneEvent::new()` typed constructor, `tools/src/lib.rs` integration
8. Failure taxonomy + blocker normalization — **done**: `WorkerFailureKind` enum (`TrustGate/PromptDelivery/Protocol/Provider`), `FailureScenario::from_worker_failure_kind()` bridge to recovery recipes
9. Stale-branch detection before workspace tests — **done**: `stale_branch.rs` module with freshness detection, behind/ahead metrics, policy integration
10. MCP structured degraded-startup reporting — **done**: `McpManager` degraded-startup reporting (+183 lines in `mcp_stdio.rs`), failed server classification (startup/handshake/config/partial), structured `failed_servers` + `recovery_recommendations` in tool output
11. Structured task packet format — **done**: `task_packet.rs` module with `TaskPacket` struct, validation, serialization, `TaskScope` resolution (workspace/module/single-file/custom), integrated into `tools/src/lib.rs`
12. Lane board / machine-readable status API — **done**: Lane completion hardening + `LaneContext::completed` auto-detection + MCP degraded reporting surface machine-readable state
13. **Session completion failure classification** — **done**: `WorkerFailureKind::Provider` + `observe_completion()` + recovery recipe bridge landed
14. **Config merge validation gap** — **done**: `config.rs` hook validation before deep-merge (+56 lines), malformed entries fail with source-path context instead of merged parse errors
15. **MCP manager discovery flaky test** — **done**: `manager_discovery_report_keeps_healthy_servers_when_one_server_fails` now runs as a normal workspace test again after repeated stable passes, so degraded-startup coverage is no longer hidden behind `#[ignore]`

16. **Commit provenance / worktree-aware push events** — **done**: `LaneCommitProvenance` now carries branch/worktree/canonical-commit/supersession metadata in lane events, and `dedupe_superseded_commit_events()` is applied before agent manifests are written so superseded commit events collapse to the latest canonical lineage
17. **Orphaned module integration audit** — **done**: `runtime` now keeps `session_control` and `trust_resolver` behind `#[cfg(test)]` until they are wired into a real non-test execution path, so normal builds no longer advertise dead clawability surface area.
18. **Context-window preflight gap** — **done**: provider request sizing now emits `context_window_blocked` before oversized requests leave the process, using a model-context registry instead of the old naive max-token heuristic.
19. **Subcommand help falls through into runtime/API path** — **done**: `claw doctor --help`, `claw status --help`, `claw sandbox --help`, and nested `mcp`/`skills` help are now intercepted locally without runtime/provider startup, with regression tests covering the direct CLI paths.
20. **Session state classification gap (working vs blocked vs finished vs truly stale)** — **done**: agent manifests now derive machine states such as `working`, `blocked_background_job`, `blocked_merge_conflict`, `degraded_mcp`, `interrupted_transport`, `finished_pending_report`, and `finished_cleanable`, and terminal-state persistence records commit provenance plus derived state so downstream monitoring can distinguish quiet progress from truly idle sessions.
21. **Resumed `/status` JSON parity gap** — **done**: resolved by the broader "Resumed local-command JSON parity gap" work tracked as #26 below. Re-verified on `main` HEAD `8dc6580` — `cargo test --release -p rusty-claude-cli resumed_status_command_emits_structured_json_when_requested` passes cleanly (1 passed, 0 failed), so resumed `/status --output-format json` now goes through the same structured renderer as the fresh CLI path. The original failure (`expected value at line 1 column 1` because resumed dispatch fell back to prose) no longer reproduces.
22. **Opaque failure surface for session/runtime crashes** — **done**: `safe_failure_class()` in `error.rs` classifies all API errors into 8 user-safe classes (`provider_auth`, `provider_internal`, `provider_retry_exhausted`, `provider_rate_limit`, `provider_transport`, `provider_error`, `context_window`, `runtime_io`). `format_user_visible_api_error` in `main.rs` attaches session ID + request trace ID to every user-visible error. Coverage in `opaque_provider_wrapper_surfaces_failure_class_session_and_trace` and 3 related tests.
23. **`doctor --output-format json` check-level structure gap** — **done**: `claw doctor --output-format json` now keeps the human-readable `message`/`report` while also emitting structured per-check diagnostics (`name`, `status`, `summary`, `details`, plus typed fields like workspace paths and sandbox fallback data), with regression coverage in `output_format_contract.rs`.
24. **Plugin lifecycle init/shutdown test flakes under workspace-parallel execution** — dogfooding surfaced that `build_runtime_runs_plugin_lifecycle_init_and_shutdown` can fail under `cargo test --workspace` while passing in isolation because sibling tests race on tempdir-backed shell init script paths. This is test brittleness rather than a code-path regression, but it still destabilizes CI confidence and wastes diagnosis cycles. **Action:** isolate temp resources per test robustly (unique dirs + no shared cwd assumptions), audit cleanup timing, and add a regression guard so the plugin lifecycle test remains stable under parallel workspace execution.
25. **`plugins::hooks::collects_and_runs_hooks_from_enabled_plugins` flaked on Linux CI, root cause was a stdin-write race not missing exec bit** — **done at `172a2ad` on 2026-04-08**. Dogfooding reproduced this four times on `main` (CI runs [24120271422](https://github.com/ultraworkers/claw-code/actions/runs/24120271422), [24120538408](https://github.com/ultraworkers/claw-code/actions/runs/24120538408), [24121392171](https://github.com/ultraworkers/claw-code/actions/runs/24121392171), [24121776826](https://github.com/ultraworkers/claw-code/actions/runs/24121776826)), escalating from first-attempt-flake to deterministic-red on the third push. Failure mode was `PostToolUse hook .../hooks/post.sh failed to start for "Read": Broken pipe (os error 32)` surfacing from `HookRunResult`. **Initial diagnosis was wrong.** The first theory (documented in earlier revisions of this entry and in the root-cause note on commit `79da4b8`) was that `write_hook_plugin` in `rust/crates/plugins/src/hooks.rs` was writing the generated `.sh` files without the execute bit and `Command::new(path).spawn()` was racing on fork/exec. An initial chmod-only fix at `4f7b674` was shipped against that theory and **still failed CI on run `24121776826`** with the same `Broken pipe` symptom, falsifying the chmod-only hypothesis. **Actual root cause.** `CommandWithStdin::output_with_stdin` in `rust/crates/plugins/src/hooks.rs` was unconditionally propagating `write_all` errors on the child's stdin pipe, including `std::io::ErrorKind::BrokenPipe`. The test hook scripts run in microseconds (`#!/bin/sh` + a single `printf`), so the child exits and closes its stdin before the parent finishes writing the ~200-byte JSON hook payload. On Linux the pipe raises `EPIPE` immediately; on macOS the pipe happens to buffer the small payload before the child exits, which is why the race only surfaced on ubuntu CI runners. The parent's `write_all` returned `Err(BrokenPipe)`, `output_with_stdin` returned that as a hook failure, and `run_command` classified the hook as "failed to start" even though the child had already run to completion and printed the expected message to stdout. **Fix (commit `172a2ad`, force-pushed over `4f7b674`).** Three parts: (1) **actual fix** — `output_with_stdin` now matches the `write_all` result and swallows `BrokenPipe` specifically, while propagating all other write errors unchanged; after a `BrokenPipe` swallow the code still calls `wait_with_output()` so stdout/stderr/exit code are still captured from the cleanly-exited child. (2) **hygiene hardening** — a new `make_executable` helper sets mode `0o755` on each generated `.sh` via `std::os::unix::fs::PermissionsExt` under `#[cfg(unix)]`. This is defense-in-depth for future non-sh hook runners, not the bug that was biting CI. (3) **regression guard** — new `generated_hook_scripts_are_executable` test under `#[cfg(unix)]` asserts each generated `.sh` file has at least one execute bit set (`mode & 0o111 != 0`) so future tweaks cannot silently regress the hygiene change. **Verification.** `cargo test --release -p plugins` 35 passing, fmt clean, clippy `-D warnings` clean; CI run [24121999385](https://github.com/ultraworkers/claw-code/actions/runs/24121999385) went green on first attempt on `main` for the hotfix commit. **Meta-lesson.** `Broken pipe (os error 32)` from a child-process spawn path is ambiguous between "could not exec" and "exec'd and exited before the parent finished writing stdin." The first theory cargo-culted the "could not exec" reading because the ROADMAP scaffolding anchored on the exec-bit guess; falsification came from empirical CI, not from code inspection. Record the pattern: when a pipe error surfaces on fork/exec, instrument what `wait_with_output()` actually reports on the child before attributing the failure to a permissions or path issue.
26. **Resumed local-command JSON parity gap** — **done**: direct `claw --output-format json` already had structured renderers for `sandbox`, `mcp`, `skills`, `version`, and `init`, but resumed `claw --output-format json --resume <session> /…` paths still fell back to prose because resumed slash dispatch only emitted JSON for `/status`. Resumed `/sandbox`, `/mcp`, `/skills`, `/version`, and `/init` now reuse the same JSON envelopes as their direct CLI counterparts, with regression coverage in `rust/crates/rusty-claude-cli/tests/resume_slash_commands.rs` and `rust/crates/rusty-claude-cli/tests/output_format_contract.rs`.
27. **`dev/rust` `cargo test -p rusty-claude-cli` reads host `~/.claude/plugins/installed/` from real `$HOME` and fails parse-time on any half-installed user plugin** — dogfooding on 2026-04-08 (filed from gaebal-gajae's clawhip bullet at message `1491322807026454579` after the provider-matrix branch QA surfaced it) reproduced 11 deterministic failures on clean `dev/rust` HEAD of the form `panicked at crates/rusty-claude-cli/src/main.rs:3953:31: args should parse: "hook path \`/Users/yeongyu/.claude/plugins/installed/sample-hooks-bundled/./hooks/pre.sh\` does not exist; hook path \`...\post.sh\` does not exist"` covering `parses_prompt_subcommand`, `parses_permission_mode_flag`, `defaults_to_repl_when_no_args`, `parses_resume_flag_with_slash_command`, `parses_system_prompt_options`, `parses_bare_prompt_and_json_output_flag`, `rejects_unknown_allowed_tools`, `parses_resume_flag_with_multiple_slash_commands`, `resolves_model_aliases_in_args`, `parses_allowed_tools_flags_with_aliases_and_lists`, `parses_login_and_logout_subcommands`. **Same failures do NOT reproduce on `main`** (re-verified with `cargo test --release -p rusty-claude-cli` against `main` HEAD `79da4b8`, all 156 tests pass). **Root cause is two-layered.** First, on `dev/rust` `parse_args` eagerly walks user-installed plugin manifests under `~/.claude/plugins/installed/` and validates that every declared hook script exists on disk before returning a `CliAction`, so any half-installed plugin in the developer's real `$HOME` (in this case `~/.claude/plugins/installed/sample-hooks-bundled/` whose `.claude-plugin` manifest references `./hooks/pre.sh` and `./hooks/post.sh` but whose `hooks/` subdirectory was deleted) makes argv parsing itself fail. Second, the test harness on `dev/rust` does not redirect `$HOME` or `XDG_CONFIG_HOME` to a fixture for the duration of the test — there is no `env_lock`-style guard equivalent to the one `main` already uses (`grep -n env_lock rust/crates/rusty-claude-cli/src/main.rs` returns 0 hits on `dev/rust` and 30+ hits on `main`). Together those two gaps mean `dev/rust` `cargo test -p rusty-claude-cli` is non-deterministic on every clean clone whose owner happens to have any non-pristine plugin in `~/.claude/`. **Action (two parts).** (a) Backport the `env_lock`-based test isolation pattern from `main` into `dev/rust`'s `rusty-claude-cli` test module so each test runs against a temp `$HOME`/`XDG_CONFIG_HOME` and cannot read host plugin state. (b) Decouple `parse_args` from filesystem hook validation on `dev/rust` (the same decoupling already on `main`, where hook validation happens later in the lifecycle than argv parsing) so even outside tests a partially installed user plugin cannot break basic CLI invocation. **Branch scope.** This is a `dev/rust` catchup against `main`, not a `main` regression. Tracking it here so the dev/rust merge train picks it up before the next dev/rust release rather than rediscovering it in CI.

28. **Auth-provider truth: error copy fails real users at the env-var-vs-header layer** — dogfooded live on 2026-04-08 in #claw-code (Sisyphus Labs guild), two separate new users hit adjacent failure modes within minutes of each other that both trace back to the same root: the `MissingApiKey` / 401 error surface does not teach users how the auth inputs map to HTTP semantics, so a user who sets a "reasonable-looking" env var still hits a hard error with no signpost. **Case 1 (varleg, Norway).** Wanted to use OpenRouter via the OpenAI-compat path. Found a comparison table claiming "provider-agnostic (Claude, OpenAI, local models)" and assumed it Just Worked. Set `OPENAI_API_KEY` to an OpenRouter `sk-or-v1-...` key and a model name without an `openai/` prefix; claw's provider detection fell through to Anthropic first because `ANTHROPIC_API_KEY` was still in the environment. Unsetting `ANTHROPIC_API_KEY` got them `ANTHROPIC_AUTH_TOKEN or ANTHROPIC_API_KEY is not set` instead of a useful hint that the OpenAI path was right there. Fix delivered live as a channel reply: use `main` branch (not `dev/rust`), export `OPENAI_BASE_URL=https://openrouter.ai/api/v1` alongside `OPENAI_API_KEY`, and prefix the model name with `openai/` so the prefix router wins over env-var presence. **Case 2 (stanley078852).** Had set `ANTHROPIC_AUTH_TOKEN="sk-ant-..."` and was getting 401 `Invalid bearer token` from Anthropic. Root cause: `sk-ant-` keys are `x-api-key`-header keys, not bearer tokens. `ANTHROPIC_API_KEY` path in `anthropic.rs` sends the value as `x-api-key`; `ANTHROPIC_AUTH_TOKEN` path sends it as `Authorization: Bearer` (for OAuth access tokens from `claw login`). Setting an `sk-ant-` key in the wrong env var makes claw send it as `Bearer sk-ant-...` which Anthropic rejects at the edge with 401 before it ever reaches the completions endpoint. The error text propagated all the way to the user (`api returned 401 Unauthorized (authentication_error) ... Invalid bearer token`) with zero signal that the problem was env-var choice, not key validity. Fix delivered live as a channel reply: move the `sk-ant-...` key to `ANTHROPIC_API_KEY` and unset `ANTHROPIC_AUTH_TOKEN`. **Pattern.** Both cases are failures at the *auth-intent translation* layer: the user chose an env var that made syntactic sense to them (`OPENAI_API_KEY` for OpenAI, `ANTHROPIC_AUTH_TOKEN` for Anthropic auth) but the actual wire-format routing requires a more specific choice. The error messages surface the HTTP-layer symptom (401, missing-key) without bridging back to "which env var should you have used and why." **Action.** Three concrete improvements, scoped for a single `main`-side PR: (a) In `ApiError::MissingCredentials` Display, when the Anthropic path is the one being reported but `OPENAI_API_KEY`, `XAI_API_KEY`, or `DASHSCOPE_API_KEY` are present in the environment, extend the message with "— but I see `$OTHER_KEY` set; if you meant to use that provider, prefix your model name with `openai/`, `grok`, or `qwen/` respectively so prefix routing selects it." (b) In the 401-from-Anthropic error path in `anthropic.rs`, when the failing auth source is `BearerToken` AND the bearer token starts with `sk-ant-`, append "— looks like you put an `sk-ant-*` API key in `ANTHROPIC_AUTH_TOKEN`, which is the Bearer-header path. Move it to `ANTHROPIC_API_KEY` instead (that env var maps to `x-api-key`, which is the correct header for `sk-ant-*` keys)." Same treatment for OAuth access tokens landing in `ANTHROPIC_API_KEY` (symmetric mis-assignment). (c) In `rust/README.md` on `main` and the matrix section on `dev/rust`, add a short "Which env var goes where" paragraph mapping `sk-ant-*` → `ANTHROPIC_API_KEY` and OAuth access token → `ANTHROPIC_AUTH_TOKEN`, with the one-line explanation of `x-api-key` vs `Authorization: Bearer`. **Verification path.** Both improvements can be tested with unit tests against `ApiError::fmt` output (the prefix-routing hint) and with a targeted integration test that feeds an `sk-ant-*`-shaped token into `BearerToken` and asserts the fmt output surfac
29. **CLI provider dispatch is hardcoded to Anthropic, ignoring prefix routing** — **done at `8dc6580` on 2026-04-08**. Changed `AnthropicRuntimeClient.client` from concrete `AnthropicClient` to `ApiProviderClient` (the api crate’s `ProviderClient` enum), which dispatches to Anthropic / xAI / OpenAi at construction time based on `detect_provider_kind(&resolved_model)`. 1 file, +59 −7, all 182 rusty-claude-cli tests pass, CI green at run `24125825431`. Users can now run `claw --model openai/gpt-4.1-mini prompt "hello"` with only `OPENAI_API_KEY` set and it routes correctly. **Original filing below for the trace record.** Dogfooded live on 2026-04-08 within hours of ROADMAP #28 landing. Users in #claw-code (nicma at `1491342350960562277`, Jengro at `1491345009021030533`) followed the exact "use main, set OPENAI_API_KEY and OPENAI_BASE_URL, unset ANTHROPIC_*, prefix the model with `openai/`" checklist from the #28 error-copy improvements AND STILL hit `error: missing Anthropic credentials; export ANTHROPIC_AUTH_TOKEN or ANTHROPIC_API_KEY before calling the Anthropic API`. **Reproduction on `main` HEAD `ff1df4c`:** `unset ANTHROPIC_API_KEY ANTHROPIC_AUTH_TOKEN; export OPENAI_API_KEY=sk-...; export OPENAI_BASE_URL=https://api.openai.com/v1; claw --model openai/gpt-4 prompt 'test'` → reproduces the error deterministically. **Root cause (traced).** `rust/crates/rusty-claude-cli/src/main.rs` at `build_runtime_with_plugin_state` (line ~6221) unconditionally builds `AnthropicRuntimeClient::new(session_id, model, ...)` without consulting `providers::detect_provider_kind(&model)`. `BuiltRuntime` at line ~2855 is statically typed as `ConversationRuntime<AnthropicRuntimeClient, CliToolExecutor>`, so even if the dispatch logic existed there would be nowhere to slot an alternative client. `providers/mod.rs::metadata_for_model` correctly identifies `openai/gpt-4` as `ProviderKind::OpenAi` at the metadata layer — the routing decision is *computed* correctly, it's just *never used* to pick a runtime client. The result is that the CLI is structurally single-provider (Anthropic only) even though the `api` crate's `openai_compat.rs`, `XAI_ENV_VARS`, `DASHSCOPE_ENV_VARS`, and `send_message_streaming` all exist and are exercised by unit tests inside the `api` crate. The provider matrix in `rust/README.md` is misleading because it describes the api-crate capabilities, not the CLI's actual dispatch behaviour. **Why #28 didn't catch this.** ROADMAP #28 focused on the `MissingCredentials` error *message* (adding hints when adjacent provider env vars are set, or when a bearer token starts with `sk-ant-*`). None of its tests exercised the `build_runtime` code path — they were all unit tests against `ApiError::fmt` output. The routing bug survives #28 because the `Display` improvements fire AFTER the hardcoded Anthropic client has already been constructed and failed. You need the CLI to dispatch to a different client in the first place for the new hints to even surface at the right moment. **Action (single focused commit).** (1) New `OpenAiCompatRuntimeClient` struct in `rust/crates/rusty-claude-cli/src/main.rs` mirroring `AnthropicRuntimeClient` but delegating to `openai_compat::send_message_streaming`. One client type handles OpenAI, xAI, DashScope, and any OpenAI-compat endpoint — they differ only in base URL and auth env var, both of which come from the `ProviderMetadata` returned by `metadata_for_model`. (2) New enum `DynamicApiClient { Anthropic(AnthropicRuntimeClient), OpenAiCompat(OpenAiCompatRuntimeClient) }` that implements `runtime::ApiClient` by matching on the variant and delegating. (3) Retype `BuiltRuntime` from `ConversationRuntime<AnthropicRuntimeClient, CliToolExecutor>` to `ConversationRuntime<DynamicApiClient, CliToolExecutor>`, update the Deref/DerefMut/new spots. (4) In `build_runtime_with_plugin_state`, call `detect_provider_kind(&model)` and construct either variant of `DynamicApiClient`. Prefix routing wins over env-var presence (that's the whole point). (5) Integration test using a mock OpenAI-compat server (re

41. **Phantom completions root cause: global session store has no per-worktree isolation** —

    **Root cause.** The session store under `~/.local/share/opencode` is global to the host. Every `opencode serve` instance — including the parallel lane workers spawned per worktree — reads and writes the same on-disk session directory. Sessions are keyed only by id and timestamp, not by the workspace they were created in, so there is no structural barrier between a session created in worktree `/tmp/b4-phantom-diag` and one created in `/tmp/b4-omc-flat`. Whichever serve instance picks up a given session id can drive it from whatever CWD that serve happens to be running in.

    **Impact.** Parallel lanes silently cross wires. A lane reports a clean run — file edits, builds, tests — and the orchestrator marks the lane green, but the writes were applied against another worktree's CWD because a sibling `opencode serve` won the session race. The originating worktree shows no diff, the *other* worktree gains unexplained edits, and downstream consumers (clawhip lane events, PR pushes, merge gates) treat the empty originator as a successful no-op. These are the "phantom completions" we keep chasing: success messaging without any landed changes in the lane that claimed them, plus stray edits in unrelated lanes whose own runs never touched those files. Because the report path is happy, retries and recovery recipes never fire, so the lane silently wedges until a human notices the diff is empty.

    **Proposed fix.** Bind every session to its workspace root + branch at creation time and refuse to drive it from any other CWD.

    - At session creation, capture the canonical workspace root (resolved git worktree path) and the active branch and persist them on the session record.
    - On every load (`opencode serve`, slash-command resume, lane recovery), validate that the current process CWD matches the persisted workspace root before any tool with side effects (file_ops, bash, git) is allowed to run. Mismatches surface as a typed `WorkspaceMismatch` failure class instead of silently writing to the wrong tree.
    - Namespace the on-disk session path under the workspace fingerprint (e.g. `<session_store>/<workspace_hash>/<session_id>`) so two parallel `opencode serve` instances physically cannot collide on the same session id.
    - Forks inherit the parent's workspace root by default; an explicit re-bind is required to move a session to a new worktree, and that re-bind is itself recorded as a structured event so the orchestrator can audit cross-worktree handoffs.
    - Surface a `branch.workspace_mismatch` lane event so clawhip stops counting wrong-CWD writes as lane completions.

    **Status.** A `workspace_root` field has been added to `Session` in `rust/crates/runtime/src/session.rs` (with builder, accessor, JSON + JSONL round-trip, fork inheritance, and given/when/then test coverage in `persists_workspace_root_round_trip_and_forks_inherit_it`). The CWD validation, the namespaced on-disk path, and the `branch.workspace_mismatch` lane event are still outstanding and tracked under this item.

**P3 — Swarm efficiency**
13. Swarm branch-lock protocol — **done**: `branch_lock::detect_branch_lock_collisions()` now detects same-branch/same-scope and nested-module collisions before parallel lanes drift into duplicate implementation
14. Commit provenance / worktree-aware push events — **done**: lane event provenance now includes branch/worktree/superseded/canonical lineage metadata, and manifest persistence de-dupes superseded commit events before downstream consumers render them

## Suggested Session Split

### Session A — worker boot protocol
Focus:
- trust prompt detection
- ready-for-prompt handshake
- prompt misdelivery detection

### Session B — clawhip lane events
Focus:
- canonical lane event schema
- failure taxonomy
- summary compression

### Session C — branch/test intelligence
Focus:
- stale-branch detection
- green-level contract
- recovery recipes

### Session D — MCP lifecycle hardening
Focus:
- startup/handshake reliability
- structured failed server reporting
- degraded-mode runtime behavior
- lifecycle tests/harness coverage

### Session E — typed task packets + policy engine
Focus:
- structured task format
- retry/merge/escalation rules
- autonomous lane closure behavior

## MVP Success Criteria

We should consider claw-code materially more clawable when:
- a claw can start a worker and know with certainty when it is ready
- claws no longer accidentally type tasks into the shell
- stale-branch failures are identified before they waste debugging time
- clawhip reports machine states, not just tmux prose
- MCP/plugin startup failures are classified and surfaced cleanly
- a coding lane can self-recover from common startup and branch issues without human babysitting

## Short Version

claw-code should evolve from:
- a CLI a human can also drive

to:
- a **claw-native execution runtime**
- an **event-native orchestration substrate**
- a **plugin/hook-first autonomous coding harness**

## Deployment Architecture Gap (filed from dogfood 2026-04-08)

### WorkerState is in the runtime; /state is NOT in opencode serve

**Root cause discovered during batch 8 dogfood.**

`worker_boot.rs` has a solid `WorkerStatus` state machine (`Spawning → TrustRequired → ReadyForPrompt → Running → Finished/Failed`). It is exported from `runtime/src/lib.rs` as a public API. But claw-code is a **plugin** loaded inside the `opencode` binary — it cannot add HTTP routes to `opencode serve`. The HTTP server is 100% owned by the upstream opencode process (v1.3.15).

**Impact:** There is no way to `curl localhost:4710/state` and get back a JSON `WorkerStatus`. Any such endpoint would require either:
1. Upstreaming a `/state` route into opencode's HTTP server (requires a PR to sst/opencode), or
2. Writing a sidecar HTTP process that queries the `WorkerRegistry` in-process (possible but fragile), or
3. Writing `WorkerStatus` to a well-known file path (`.claw/worker-state.json`) that an external observer can poll.

**Recommended path:** Option 3 — emit `WorkerStatus` transitions to `.claw/worker-state.json` on every state change. This is purely within claw-code's plugin scope, requires no upstream changes, and gives clawhip a file it can poll to distinguish a truly stalled worker from a quiet-but-progressing one.

**Action item:** Wire `WorkerRegistry::transition()` to atomically write `.claw/worker-state.json` on every state transition. Add a `claw state` CLI subcommand that reads and prints this file. Add regression test.

**Prior session note:** A previous session summary claimed commit `0984cca` landed a `/state` HTTP endpoint via axum. This was incorrect — no such commit exists on main, axum is not a dependency, and the HTTP server is not ours. The actual work that exists: `worker_boot.rs` with `WorkerStatus` enum + `WorkerRegistry`, fully wired into `runtime/src/lib.rs` as public exports.

## Startup Friction Gap: No Default trusted_roots in Settings (filed 2026-04-08)

### Every lane starts with manual trust babysitting unless caller explicitly passes roots

**Root cause discovered during direct dogfood of WorkerCreate tool.**

`WorkerCreate` accepts a `trusted_roots: Vec<String>` parameter. If the caller omits it (or passes `[]`), every new worker immediately enters `TrustRequired` and stalls — requiring manual intervention to advance to `ReadyForPrompt`. There is no mechanism to configure a default allowlist in `settings.json` or `.claw/settings.json`.

**Impact:** Batch tooling (clawhip, lane orchestrators) must pass `trusted_roots` explicitly on every `WorkerCreate` call. If a batch script forgets the field, all workers in that batch stall silently at `trust_required`. This was the root cause of several "batch 8 lanes not advancing" incidents.

**Recommended fix:**
1. Add a `trusted_roots` field to `RuntimeConfig` (or a nested `[trust]` table), loaded via `ConfigLoader`.
2. In `WorkerRegistry::spawn_worker()`, merge config-level `trusted_roots` with any per-call overrides.
3. Default: empty list (safest). Users opt in by adding their repo paths to settings.
4. Update `config_validate` schema with the new field.

**Action item:** Wire `RuntimeConfig::trusted_roots()` → `WorkerRegistry::spawn_worker()` default. Cover with test: config with `trusted_roots = ["/tmp"]` → spawning worker in `/tmp/x` auto-resolves trust without caller passing the field.

## Observability Transport Decision (filed 2026-04-08)

### Canonical state surface: CLI/file-based. HTTP endpoint deferred.

**Decision:** `claw state` reading `.claw/worker-state.json` is the **blessed observability contract** for clawhip and downstream tooling. This is not a stepping-stone — it is the supported surface. Build against it.

**Rationale:**
- claw-code is a plugin running inside the opencode binary. It cannot add HTTP routes to `opencode serve` — that server belongs to upstream sst/opencode.
- The file-based surface is fully within plugin scope: `emit_state_file()` in `worker_boot.rs` writes atomically on every `WorkerStatus` transition.
- `claw state --output-format json` gives clawhip everything it needs: `status`, `is_ready`, `seconds_since_update`, `trust_gate_cleared`, `last_event`, `updated_at`.
- Polling a local file has lower latency and fewer failure modes than an HTTP round-trip to a sidecar.
- An HTTP state endpoint would require either (a) upstreaming a route to sst/opencode — a multi-week PR cycle with no guarantee of acceptance — or (b) a sidecar process that queries `WorkerRegistry` in-process, which is fragile and adds an extra failure domain.

**What downstream tooling (clawhip) should do:**
1. After `WorkerCreate`, poll `.claw/worker-state.json` (or run `claw state --output-format json`) in the worker's CWD at whatever interval makes sense (e.g. 5s).
2. Trust `seconds_since_update > 60` in `trust_required` status as the stall signal.
3. Call `WorkerResolveTrust` tool to unblock, or `WorkerRestart` to reset.

**HTTP endpoint tracking:** Not scheduled. If a concrete use case emerges that file polling cannot serve (e.g. remote workers over a network boundary), open a new issue to upstream a `/worker/state` route to sst/opencode at that time. Until then: file/CLI is canonical.

## Provider Routing: Model-Name Prefix Must Win Over Env-Var Presence (fixed 2026-04-08, `0530c50`)

### `openai/gpt-4.1-mini` was silently misrouted to Anthropic when ANTHROPIC_API_KEY was set

**Root cause:** `metadata_for_model` returned `None` for any model not matching `claude` or `grok` prefix.
`detect_provider_kind` then fell through to auth-sniffer order: first `has_auth_from_env_or_saved()` (Anthropic), then `OPENAI_API_KEY`, then `XAI_API_KEY`.

If `ANTHROPIC_API_KEY` was present in the environment (e.g. user has both Anthropic and OpenRouter configured), any unknown model — including explicitly namespaced ones like `openai/gpt-4.1-mini` — was silently routed to the Anthropic client, which then failed with `missing Anthropic credentials` or a confusing 402/auth error rather than routing to OpenAI-compatible.

**Fix:** Added explicit prefix checks in `metadata_for_model`:
- `openai/` prefix → `ProviderKind::OpenAi`
- `gpt-` prefix → `ProviderKind::OpenAi`

Model name prefix now wins unconditionally over env-var presence. Regression test locked in: `providers::tests::openai_namespaced_model_routes_to_openai_not_anthropic`.

**Lesson:** Auth-sniffer fallback order is fragile. Any new provider added in the future should be registered in `metadata_for_model` via a model-name prefix, not left to env-var order. This is the canonical extension point.
-												docs: add clawable harness roadmap

											
										
										
											2026-04-03 14:46:06 +00:00
+								# ROADMAP.md
 								# Clawable Coding Harness Roadmap
 								## Goal
 								Turn claw-code into the most **clawable** coding harness:
 								- no human-first terminal assumptions
 								- no fragile prompt injection timing
 								- no opaque session state
 								- no hidden plugin or MCP failures
 								- no manual babysitting for routine recovery
 								This roadmap assumes the primary users are **claws wired through hooks, plugins, sessions, and channel events**.
 								## Definition of "clawable"
 								A clawable harness is:
 								- deterministic to start
 								- machine-readable in state and failure modes
 								- recoverable without a human watching the terminal
 								- branch/test/worktree aware
 								- plugin/MCP lifecycle aware
 								- event-first, not log-first
 								- capable of autonomous next-step execution
 								## Current Pain Points
 								### 1. Session boot is fragile
 								- trust prompts can block TUI startup
 								- prompts can land in the shell instead of the coding agent
 								- "session exists" does not mean "session is ready"
 								### 2. Truth is split across layers
 								- tmux state
 								- clawhip event stream
 								- git/worktree state
 								- test state
 								- gateway/plugin/MCP runtime state
 								### 3. Events are too log-shaped
 								- claws currently infer too much from noisy text
 								- important states are not normalized into machine-readable events
 								### 4. Recovery loops are too manual
 								- restart worker
 								- accept trust prompt
 								- re-inject prompt
 								- detect stale branch
 								- retry failed startup
 								- classify infra vs code failures manually
 								### 5. Branch freshness is not enforced enough
 								- side branches can miss already-landed main fixes
 								- broad test failures can be stale-branch noise instead of real regressions
 								### 6. Plugin/MCP failures are under-classified
 								- startup failures, handshake failures, config errors, partial startup, and degraded mode are not exposed cleanly enough
 								### 7. Human UX still leaks into claw workflows
 								- too much depends on terminal/TUI behavior instead of explicit agent state transitions and control APIs
 								## Product Principles
 . **State machine first** — every worker has explicit lifecycle states.
 . **Events over scraped prose** — channel output should be derived from typed events.
 . **Recovery before escalation** — known failure modes should auto-heal once before asking for help.
 . **Branch freshness before blame** — detect stale branches before treating red tests as new regressions.
 . **Partial success is first-class** — e.g. MCP startup can succeed for some servers and fail for others, with structured degraded-mode reporting.
 . **Terminal is transport, not truth** — tmux/TUI may remain implementation details, but orchestration state must live above them.
 . **Policy is executable** — merge, retry, rebase, stale cleanup, and escalation rules should be machine-enforced.
 								## Roadmap
 								## Phase 1 — Reliable Worker Boot
 								### 1. Ready-handshake lifecycle for coding workers
 								Add explicit states:
 								- `spawning`
 								- `trust_required`
 								- `ready_for_prompt`
 								- `prompt_accepted`
 								- `running`
 								- `blocked`
 								- `finished`
 								- `failed`
 								Acceptance:
 								- prompts are never sent before `ready_for_prompt`
 								- trust prompt state is detectable and emitted
 								- shell misdelivery becomes detectable as a first-class failure state
 								### 2. Trust prompt resolver
 								Add allowlisted auto-trust behavior for known repos/worktrees.
 								Acceptance:
 								- trusted repos auto-clear trust prompts
 								- events emitted for `trust_required` and `trust_resolved`
 								- non-allowlisted repos remain gated
 								### 3. Structured session control API
 								Provide machine control above tmux:
 								- create worker
 								- await ready
 								- send task
 								- fetch state
 								- fetch last error
 								- restart worker
 								- terminate worker
 								Acceptance:
 								- a claw can operate a coding worker without raw send-keys as the primary control plane
 								## Phase 2 — Event-Native Clawhip Integration
 								### 4. Canonical lane event schema
 								Define typed events such as:
 								- `lane.started`
 								- `lane.ready`
 								- `lane.prompt_misdelivery`
 								- `lane.blocked`
 								- `lane.red`
 								- `lane.green`
 								- `lane.commit.created`
 								- `lane.pr.opened`
 								- `lane.merge.ready`
 								- `lane.finished`
 								- `lane.failed`
 								- `branch.stale_against_main`
 								Acceptance:
 								- clawhip consumes typed lane events
 								- Discord summaries are rendered from structured events instead of pane scraping alone
 								### 5. Failure taxonomy
 								Normalize failure classes:
 								- `prompt_delivery`
 								- `trust_gate`
 								- `branch_divergence`
 								- `compile`
 								- `test`
 								- `plugin_startup`
 								- `mcp_startup`
 								- `mcp_handshake`
 								- `gateway_routing`
 								- `tool_runtime`
 								- `infra`
 								Acceptance:
 								- blockers are machine-classified
 								- dashboards and retry policies can branch on failure type
 								### 6. Actionable summary compression
 								Collapse noisy event streams into:
 								- current phase
 								- last successful checkpoint
 								- current blocker
 								- recommended next recovery action
 								Acceptance:
 								- channel status updates stay short and machine-grounded
 								- claws stop inferring state from raw build spam
 								## Phase 3 — Branch/Test Awareness and Auto-Recovery
 								### 7. Stale-branch detection before broad verification
 								Before broad test runs, compare current branch to `main` and detect if known fixes are missing.
 								Acceptance:
 								- emit `branch.stale_against_main`
 								- suggest or auto-run rebase/merge-forward according to policy
 								- avoid misclassifying stale-branch failures as new regressions
 								### 8. Recovery recipes for common failures
 								Encode known automatic recoveries for:
 								- trust prompt unresolved
 								- prompt delivered to shell
 								- stale branch
 								- compile red after cross-crate refactor
 								- MCP startup handshake failure
 								- partial plugin startup
 								Acceptance:
 								- one automatic recovery attempt occurs before escalation
 								- the attempted recovery is itself emitted as structured event data
 								### 9. Green-ness contract
 								Workers should distinguish:
 								- targeted tests green
 								- package green
 								- workspace green
 								- merge-ready green
 								Acceptance:
 								- no more ambiguous "tests passed" messaging
 								- merge policy can require the correct green level for the lane type
-												docs(roadmap): record cascade-masking pinpoint under green-ness contract (#9)

Concrete follow-up captured from today's dogfood session:

A single hung test (oversized-request preflight, 6 minutes per attempt
after `be561bf` silently swallowed count_tokens errors) crashed the
`cargo test --workspace` job before downstream crates could run, hiding
6 separate pre-existing CLI regressions until `8c6dfe5` + `5851f2d`
restored the fast-fail path.

Two new acceptance criteria for #9:
- per-test timeouts in CI so one hang cannot mask other failures
- distinguish `test.hung` from generic test failures in worker reports

											
										
										
											2026-04-08 15:03:30 +09:00
+								- a single hung test must not mask other failures: enforce per-test
 								  timeouts in CI (`cargo test --workspace`) so a 6-minute hang in one
 								  crate cannot prevent downstream crates from running their suites
 								- when a CI job fails because of a hang, the worker must report it as
 								  `test.hung` rather than a generic failure, so triage doesn't conflate
 								  it with a normal `assertion failed`
 								- recorded pinpoint (2026-04-08): `be561bf` swapped the local
 								  byte-estimate preflight for a `count_tokens` round-trip and silently
 								  returned `Ok(())` on any error, so `send_message_blocks_oversized_*`
 								  hung for ~6 minutes per attempt; the resulting workspace job crash
 								  hid 6 *separate* pre-existing CLI regressions (compact flag
 								  discarded, piped stdin vs permission prompter, legacy session layout,
 								  help/prompt assertions, mock harness count) that only became
 								  diagnosable after `8c6dfe5` + `5851f2d` restored the fast-fail path
-												docs: add clawable harness roadmap

											
										
										
											2026-04-03 14:46:06 +00:00
 								## Phase 4 — Claws-First Task Execution
 								### 10. Typed task packet format
 								Define a structured task packet with fields like:
 								- objective
 								- scope
 								- repo/worktree
 								- branch policy
 								- acceptance tests
 								- commit policy
 								- reporting contract
 								- escalation policy
 								Acceptance:
 								- claws can dispatch work without relying on long natural-language prompt blobs alone
 								- task packets can be logged, retried, and transformed safely
 								### 11. Policy engine for autonomous coding
 								Encode automation rules such as:
 								- if green + scoped diff + review passed -> merge to dev
 								- if stale branch -> merge-forward before broad tests
 								- if startup blocked -> recover once, then escalate
 								- if lane completed -> emit closeout and cleanup session
 								Acceptance:
 								- doctrine moves from chat instructions into executable rules
 								### 12. Claw-native dashboards / lane board
 								Expose a machine-readable board of:
 								- repos
 								- active claws
 								- worktrees
 								- branch freshness
 								- red/green state
 								- current blocker
 								- merge readiness
 								- last meaningful event
 								Acceptance:
 								- claws can query status directly
 								- human-facing views become a rendering layer, not the source of truth
 								## Phase 5 — Plugin and MCP Lifecycle Maturity
 								### 13. First-class plugin/MCP lifecycle contract
 								Each plugin/MCP integration should expose:
 								- config validation contract
 								- startup healthcheck
 								- discovery result
 								- degraded-mode behavior
 								- shutdown/cleanup contract
 								Acceptance:
 								- partial-startup and per-server failures are reported structurally
 								- successful servers remain usable even when one server fails
 								### 14. MCP end-to-end lifecycle parity
 								Close gaps from:
 								- config load
 								- server registration
 								- spawn/connect
 								- initialize handshake
 								- tool/resource discovery
 								- invocation path
 								- error surfacing
 								- shutdown/cleanup
 								Acceptance:
 								- parity harness and runtime tests cover healthy and degraded startup cases
 								- broken servers are surfaced as structured failures, not opaque warnings
 								## Immediate Backlog (from current real pain)
-												docs(roadmap): prioritize backlog — P0/P1/P2/P3 ordering with wiring items first

											
										
										
											2026-04-04 04:31:38 +09:00
+								Priority order: P0 = blocks CI/green state, P1 = blocks integration wiring, P2 = clawability hardening, P3 = swarm-efficiency improvements.
 								**P0 — Fix first (CI reliability)**
-												Close the clawability backlog with deterministic CLI output and lane lineage

Finish the remaining roadmap work by making direct CLI JSON output deterministic across the non-interactive surface, restoring the degraded-startup MCP test as a real workspace test, and adding branch-lock plus commit-lineage primitives so downstream lane consumers can distinguish superseded worktree commits from canonical lineage.

Constraint: Keep the user-facing config namespace centered on .claw while preserving legacy fallback discovery for compatibility
Constraint: Verification needed to stay clean-room and reproducible from the checked-in workspace alone
Rejected: Leave the output-format contract implied by ad-hoc smoke runs only | too easy for direct CLI regressions to slip back into prose-only output
Rejected: Keep commit provenance as free-form detail text | downstream consumers need structured branch/worktree/supersession metadata
Confidence: medium
Scope-risk: moderate
Directive: Extend the JSON contract through the same direct CLI entrypoints instead of adding one-off serializers on parallel code paths
Tested: python .github/scripts/check_doc_source_of_truth.py
Tested: cd rust && cargo fmt --all --check
Tested: cd rust && cargo test --workspace
Tested: cd rust && cargo clippy -p commands -p tools -p rusty-claude-cli --all-targets --no-deps -- -D warnings
Not-tested: full cargo clippy --workspace --all-targets -- -D warnings still reports unrelated pre-existing runtime lint debt outside this change set

											
										
										
											2026-04-05 18:40:33 +00:00
+. Isolate `render_diff_report` tests into tmpdir — **done**: `render_diff_report_for()` tests run in temp git repos instead of the live working tree, and targeted `cargo test -p rusty-claude-cli render_diff_report -- --nocapture` now stays green during branch/worktree activity
 . Expand GitHub CI from single-crate coverage to workspace-grade verification — **done**: `.github/workflows/rust-ci.yml` now runs `cargo test --workspace` plus fmt/clippy at the workspace level
 . Add release-grade binary workflow — **done**: `.github/workflows/release.yml` now builds tagged Rust release artifacts for the CLI
 . Add container-first test/run docs — **done**: `Containerfile` + `docs/container.md` document the canonical Docker/Podman workflow for build, bind-mount, and `cargo test --workspace` usage
 . Surface `doctor` / preflight diagnostics in onboarding docs and help — **done**: README + USAGE now put `claw doctor` / `/doctor` in the first-run path and point at the built-in preflight report
 . Automate branding/source-of-truth residue checks in CI — **done**: `.github/scripts/check_doc_source_of_truth.py` and the `doc-source-of-truth` CI job now block stale repo/org/invite residue in tracked docs and metadata
 . Eliminate warning spam from first-run help/build path — **done**: current `cargo run -q -p rusty-claude-cli -- --help` renders clean help output without a warning wall before the product surface
 . Promote `doctor` from slash-only to top-level CLI entrypoint — **done**: `claw doctor` is now a local shell entrypoint with regression coverage for direct help and health-report output
 . Make machine-readable status commands actually machine-readable — **done**: `claw --output-format json status` and `claw --output-format json sandbox` now emit structured JSON snapshots instead of prose tables
 . Unify legacy config/skill namespaces in user-facing output — **done**: skills/help JSON/text output now present `.claw` as the canonical namespace and collapse legacy roots behind `.claw`-shaped source ids/labels
 . Honor JSON output on inventory commands like `skills` and `mcp` — **done**: direct CLI inventory commands now honor `--output-format json` with structured payloads for both skills and MCP inventory
 . Audit `--output-format` contract across the whole CLI surface — **done**: direct CLI commands now honor deterministic JSON/text handling across help/version/status/sandbox/agents/mcp/skills/bootstrap-plan/system-prompt/init/doctor, with regression coverage in `output_format_contract.rs` and resumed `/status` JSON coverage
-												docs(roadmap): prioritize backlog — P0/P1/P2/P3 ordering with wiring items first

											
										
										
											2026-04-04 04:31:38 +09:00
 								**P1 — Next (integration wiring, unblocks verification)**
-												docs(ROADMAP): mark P1.2 and P1.4 as done

- P1.2: Cross-module integration tests — 12 tests landed
- P1.4: SummaryCompressor wiring — compress_summary_text() feeds
  into LaneEvent::Finished detail field

Both verified in codebase. P1.3 (lane-completion emitter) remains open.

											
										
										
											2026-04-04 21:38:05 +09:00
+. Add cross-module integration tests — **done**: 12 integration tests covering worker→recovery→policy, stale_branch→policy, green_contract→policy, reconciliation flows
-												feat(tools): add lane_completion module (P1.3)

Implement automatic lane completion detection:
- detect_lane_completion(): checks session-finished + tests-green + pushed
- evaluate_completed_lane(): triggers CloseoutLane + CleanupSession actions
- 6 tests covering all conditions

Bridges the gap where LaneContext::completed was a passive bool
that nothing automatically set. Now completion is auto-detected.

ROADMAP P1.3 marked done.

											
										
										
											2026-04-04 22:05:49 +09:00
+. Wire lane-completion emitter — **done**: `lane_completion` module with `detect_lane_completion()` auto-sets `LaneContext::completed` from session-finished + tests-green + push-complete → policy closeout
-												docs(ROADMAP): mark P1.2 and P1.4 as done

- P1.2: Cross-module integration tests — 12 tests landed
- P1.4: SummaryCompressor wiring — compress_summary_text() feeds
  into LaneEvent::Finished detail field

Both verified in codebase. P1.3 (lane-completion emitter) remains open.

											
										
										
											2026-04-04 21:38:05 +09:00
+. Wire `SummaryCompressor` into the lane event pipeline — **done**: `compress_summary_text()` feeds into `LaneEvent::Finished` detail field in `tools/src/lib.rs`
-												docs(roadmap): prioritize backlog — P0/P1/P2/P3 ordering with wiring items first

											
										
										
											2026-04-04 04:31:38 +09:00
 								**P2 — Clawability hardening (original backlog)**
-												docs: mark P2.5 and P2.6 complete in ROADMAP

Worker boot recovery hardening landed:
- P2.5: Worker readiness handshake + trust resolution (state machine)
- P2.6: Prompt misdelivery detection and recovery (replay arm)

[source: direct_development]

											
										
										
											2026-04-04 23:51:48 +09:00
+. Worker readiness handshake + trust resolution — **done**: `WorkerStatus` state machine with `Spawning` → `TrustRequired` → `ReadyForPrompt` → `PromptAccepted` → `Running` lifecycle, `trust_auto_resolve` + `trust_gate_cleared` gating
 . Prompt misdelivery detection and recovery — **done**: `prompt_delivery_attempts` counter, `PromptMisdelivery` event detection, `auto_recover_prompt_misdelivery` + `replay_prompt` recovery arm
-												docs: mark P2 backlog items complete in ROADMAP

Updated ROADMAP to reflect shipped P2 items:
- P2.7: Canonical lane event schema in clawhip
- P2.8: Failure taxonomy + blocker normalization
- P2.9: Stale-branch detection before workspace tests
- P2.10: MCP structured degraded-startup reporting
- P2.12: Lane board / machine-readable status API

Remaining P2: P2.11 (task packets - in progress), P2.14 (config merge), P2.15 (flaky test)

											
										
										
											2026-04-04 23:52:11 +09:00
+. Canonical lane event schema in clawhip — **done**: `LaneEvent` enum with `Started/Blocked/Failed/Finished` variants, `LaneEvent::new()` typed constructor, `tools/src/lib.rs` integration
 . Failure taxonomy + blocker normalization — **done**: `WorkerFailureKind` enum (`TrustGate/PromptDelivery/Protocol/Provider`), `FailureScenario::from_worker_failure_kind()` bridge to recovery recipes
 . Stale-branch detection before workspace tests — **done**: `stale_branch.rs` module with freshness detection, behind/ahead metrics, policy integration
 . MCP structured degraded-startup reporting — **done**: `McpManager` degraded-startup reporting (+183 lines in `mcp_stdio.rs`), failed server classification (startup/handshake/config/partial), structured `failed_servers` + `recovery_recommendations` in tool output
-												docs: mark P2.11 complete in ROADMAP

Structured task packet format shipped at dbfc9d5:
- TaskPacket struct with validation and serialization
- TaskScope resolution (workspace/module/single-file/custom)
- Integration into tools/src/lib.rs
- task_registry.rs coordination for runtime task tracking

											
										
										
											2026-04-05 00:11:58 +09:00
+. Structured task packet format — **done**: `task_packet.rs` module with `TaskPacket` struct, validation, serialization, `TaskScope` resolution (workspace/module/single-file/custom), integrated into `tools/src/lib.rs`
-												docs: mark P2 backlog items complete in ROADMAP

Updated ROADMAP to reflect shipped P2 items:
- P2.7: Canonical lane event schema in clawhip
- P2.8: Failure taxonomy + blocker normalization
- P2.9: Stale-branch detection before workspace tests
- P2.10: MCP structured degraded-startup reporting
- P2.12: Lane board / machine-readable status API

Remaining P2: P2.11 (task packets - in progress), P2.14 (config merge), P2.15 (flaky test)

											
										
										
											2026-04-04 23:52:11 +09:00
+. Lane board / machine-readable status API — **done**: Lane completion hardening + `LaneContext::completed` auto-detection + MCP degraded reporting surface machine-readable state
-												docs(ROADMAP): update P2 backlog with completion status and new gap

- P2.13: Mark session completion failure classification as done
  (WorkerFailureKind::Provider + observe_completion() + recovery bridge)
- P2.14: Add config merge validation gap (active bug being fixed in
  clawcode-issue-9507-claw-help-hooks-merge lane)

The config merge bug: deep_merge_objects() can produce non-string
values in hooks arrays, which fail validation in optional_string_array()
at claw --help time with 'field PreToolUse must contain only strings'.

											
										
										
											2026-04-04 21:33:01 +09:00
+. **Session completion failure classification** — **done**: `WorkerFailureKind::Provider` + `observe_completion()` + recovery recipe bridge landed
-												docs: mark P2.14 complete in ROADMAP

Config merge validation gap fixed at 5bee22b:
- Hook validation before deep-merge in config.rs
- Source-path context for malformed entries
- Prevents non-string hook arrays from poisoning runtime

											
										
										
											2026-04-05 00:16:07 +09:00
+. **Config merge validation gap** — **done**: `config.rs` hook validation before deep-merge (+56 lines), malformed entries fail with source-path context instead of merged parse errors
-												Close the clawability backlog with deterministic CLI output and lane lineage

Finish the remaining roadmap work by making direct CLI JSON output deterministic across the non-interactive surface, restoring the degraded-startup MCP test as a real workspace test, and adding branch-lock plus commit-lineage primitives so downstream lane consumers can distinguish superseded worktree commits from canonical lineage.

Constraint: Keep the user-facing config namespace centered on .claw while preserving legacy fallback discovery for compatibility
Constraint: Verification needed to stay clean-room and reproducible from the checked-in workspace alone
Rejected: Leave the output-format contract implied by ad-hoc smoke runs only | too easy for direct CLI regressions to slip back into prose-only output
Rejected: Keep commit provenance as free-form detail text | downstream consumers need structured branch/worktree/supersession metadata
Confidence: medium
Scope-risk: moderate
Directive: Extend the JSON contract through the same direct CLI entrypoints instead of adding one-off serializers on parallel code paths
Tested: python .github/scripts/check_doc_source_of_truth.py
Tested: cd rust && cargo fmt --all --check
Tested: cd rust && cargo test --workspace
Tested: cd rust && cargo clippy -p commands -p tools -p rusty-claude-cli --all-targets --no-deps -- -D warnings
Not-tested: full cargo clippy --workspace --all-targets -- -D warnings still reports unrelated pre-existing runtime lint debt outside this change set

											
										
										
											2026-04-05 18:40:33 +00:00
+. **MCP manager discovery flaky test** — **done**: `manager_discovery_report_keeps_healthy_servers_when_one_server_fails` now runs as a normal workspace test again after repeated stable passes, so degraded-startup coverage is no longer hidden behind `#[ignore]`
-												docs: add P2.16 orphaned module integration audit pinpoint

session_control is pub exported but has zero consumers workspace-wide.
trust_resolver types are re-exported but never instantiated outside
unit tests. These implement core clawability contracts that are
structurally dead — built but not wired into the actual execution path.

											
										
										
											2026-04-05 04:33:59 +00:00
-												Close the clawability backlog with deterministic CLI output and lane lineage

Finish the remaining roadmap work by making direct CLI JSON output deterministic across the non-interactive surface, restoring the degraded-startup MCP test as a real workspace test, and adding branch-lock plus commit-lineage primitives so downstream lane consumers can distinguish superseded worktree commits from canonical lineage.

Constraint: Keep the user-facing config namespace centered on .claw while preserving legacy fallback discovery for compatibility
Constraint: Verification needed to stay clean-room and reproducible from the checked-in workspace alone
Rejected: Leave the output-format contract implied by ad-hoc smoke runs only | too easy for direct CLI regressions to slip back into prose-only output
Rejected: Keep commit provenance as free-form detail text | downstream consumers need structured branch/worktree/supersession metadata
Confidence: medium
Scope-risk: moderate
Directive: Extend the JSON contract through the same direct CLI entrypoints instead of adding one-off serializers on parallel code paths
Tested: python .github/scripts/check_doc_source_of_truth.py
Tested: cd rust && cargo fmt --all --check
Tested: cd rust && cargo test --workspace
Tested: cd rust && cargo clippy -p commands -p tools -p rusty-claude-cli --all-targets --no-deps -- -D warnings
Not-tested: full cargo clippy --workspace --all-targets -- -D warnings still reports unrelated pre-existing runtime lint debt outside this change set

											
										
										
											2026-04-05 18:40:33 +00:00
+. **Commit provenance / worktree-aware push events** — **done**: `LaneCommitProvenance` now carries branch/worktree/canonical-commit/supersession metadata in lane events, and `dedupe_superseded_commit_events()` is applied before agent manifests are written so superseded commit events collapse to the latest canonical lineage
 . **Orphaned module integration audit** — **done**: `runtime` now keeps `session_control` and `trust_resolver` behind `#[cfg(test)]` until they are wired into a real non-test execution path, so normal builds no longer advertise dead clawability surface area.
 . **Context-window preflight gap** — **done**: provider request sizing now emits `context_window_blocked` before oversized requests leave the process, using a model-context registry instead of the old naive max-token heuristic.
 . **Subcommand help falls through into runtime/API path** — **done**: `claw doctor --help`, `claw status --help`, `claw sandbox --help`, and nested `mcp`/`skills` help are now intercepted locally without runtime/provider startup, with regression tests covering the direct CLI paths.
-												Classify quiet agent states before they look stale

Persist derived machine states for agent manifests so downstream monitors can distinguish working, blocked, degraded, and finished-cleanable lanes without inferring everything from prose. This also records commit provenance in terminal-state manifests and marks the new session-state classification roadmap item as done.

Constraint: Keep the change scoped to manifest persistence and tests without introducing a new monitoring service layer
Rejected: Leave state classification as downstream text scraping only | repeated dogfood runs showed quiet/finished lanes being misreported as stale
Confidence: medium
Scope-risk: narrow
Directive: Reuse derived_state + commit provenance from manifests before adding any new stale-session heuristics elsewhere
Tested: python .github/scripts/check_doc_source_of_truth.py
Tested: cd rust && cargo fmt --all --check
Tested: cd rust && cargo test -q -p tools
Tested: cd rust && cargo clippy -p tools --all-targets --no-deps -- -D warnings
Not-tested: full cargo clippy --workspace --all-targets -- -D warnings still fails on unrelated pre-existing runtime lint debt

											
										
										
											2026-04-05 18:46:53 +00:00
+. **Session state classification gap (working vs blocked vs finished vs truly stale)** — **done**: agent manifests now derive machine states such as `working`, `blocked_background_job`, `blocked_merge_conflict`, `degraded_mcp`, `interrupted_transport`, `finished_pending_report`, and `finished_cleanable`, and terminal-state persistence records commit provenance plus derived state so downstream monitoring can distinguish quiet progress from truly idle sessions.
-												docs(roadmap): mark #21 and #29 as done

#21 (Resumed /status JSON parity gap): resolved by the broader
Resumed local-command JSON parity gap work tracked as #26. Re-verified
on main HEAD 8dc6580 — the regression test passes.

#29 (CLI provider dispatch hardcoded to Anthropic): landed at 8dc6580.
ApiProviderClient dispatch now routes correctly based on
detect_provider_kind. Original filing preserved as trace record.

											
										
										
											2026-04-08 17:43:47 +09:00
+. **Resumed `/status` JSON parity gap** — **done**: resolved by the broader "Resumed local-command JSON parity gap" work tracked as #26 below. Re-verified on `main` HEAD `8dc6580` — `cargo test --release -p rusty-claude-cli resumed_status_command_emits_structured_json_when_requested` passes cleanly (1 passed, 0 failed), so resumed `/status --output-format json` now goes through the same structured renderer as the fresh CLI path. The original failure (`expected value at line 1 column 1` because resumed dispatch fell back to prose) no longer reproduces.
-												style: cargo fmt — fix CI formatting failures

Pre-existing formatting issues in anthropic.rs surfaced by CI cargo fmt check.
No functional changes.

											
										
										
											2026-04-08 11:21:13 +09:00
+. **Opaque failure surface for session/runtime crashes** — **done**: `safe_failure_class()` in `error.rs` classifies all API errors into 8 user-safe classes (`provider_auth`, `provider_internal`, `provider_retry_exhausted`, `provider_rate_limit`, `provider_transport`, `provider_error`, `context_window`, `runtime_io`). `format_user_visible_api_error` in `main.rs` attaches session ID + request trace ID to every user-visible error. Coverage in `opaque_provider_wrapper_surfaces_failure_class_session_and_trace` and 3 related tests.
-												Emit structured doctor JSON diagnostics

											
										
										
											2026-04-06 01:42:46 +00:00
+. **`doctor --output-format json` check-level structure gap** — **done**: `claw doctor --output-format json` now keeps the human-readable `message`/`report` while also emitting structured per-check diagnostics (`name`, `status`, `summary`, `details`, plus typed fields like workspace paths and sandbox fallback data), with regression coverage in `output_format_contract.rs`.
-												docs: record plugin lifecycle test flake

											
										
										
											2026-04-06 01:15:30 +00:00
+. **Plugin lifecycle init/shutdown test flakes under workspace-parallel execution** — dogfooding surfaced that `build_runtime_runs_plugin_lifecycle_init_and_shutdown` can fail under `cargo test --workspace` while passing in isolation because sibling tests race on tempdir-backed shell init script paths. This is test brittleness rather than a code-path regression, but it still destabilizes CI confidence and wastes diagnosis cycles. **Action:** isolate temp resources per test robustly (unique dirs + no shared cwd assumptions), audit cleanup timing, and add a regression guard so the plugin lifecycle test remains stable under parallel workspace execution.
-												docs(roadmap): correct #25 root cause — BrokenPipe tolerance, not chmod

The original ROADMAP #25 entry claimed the root cause was missing
exec bits on generated hook scripts. That was wrong — a chmod-only
fix (4f7b674) still failed CI. The actual bug was output_with_stdin
unconditionally propagating BrokenPipe from write_all when the child
exits before the parent finishes writing stdin.

Updated per gaebal-gajae's direction: actual fix, hygiene hardening,
and regression guard are now clearly separated. Added a meta-lesson
about Broken pipe ambiguity in fork/exec paths so future investigators
don't cargo-cult the same wrong first theory.

											
										
										
											2026-04-08 15:53:26 +09:00
+. **`plugins::hooks::collects_and_runs_hooks_from_enabled_plugins` flaked on Linux CI, root cause was a stdin-write race not missing exec bit** — **done at `172a2ad` on 2026-04-08**. Dogfooding reproduced this four times on `main` (CI runs [24120271422](https://github.com/ultraworkers/claw-code/actions/runs/24120271422), [24120538408](https://github.com/ultraworkers/claw-code/actions/runs/24120538408), [24121392171](https://github.com/ultraworkers/claw-code/actions/runs/24121392171), [24121776826](https://github.com/ultraworkers/claw-code/actions/runs/24121776826)), escalating from first-attempt-flake to deterministic-red on the third push. Failure mode was `PostToolUse hook .../hooks/post.sh failed to start for "Read": Broken pipe (os error 32)` surfacing from `HookRunResult`. **Initial diagnosis was wrong.** The first theory (documented in earlier revisions of this entry and in the root-cause note on commit `79da4b8`) was that `write_hook_plugin` in `rust/crates/plugins/src/hooks.rs` was writing the generated `.sh` files without the execute bit and `Command::new(path).spawn()` was racing on fork/exec. An initial chmod-only fix at `4f7b674` was shipped against that theory and **still failed CI on run `24121776826`** with the same `Broken pipe` symptom, falsifying the chmod-only hypothesis. **Actual root cause.** `CommandWithStdin::output_with_stdin` in `rust/crates/plugins/src/hooks.rs` was unconditionally propagating `write_all` errors on the child's stdin pipe, including `std::io::ErrorKind::BrokenPipe`. The test hook scripts run in microseconds (`#!/bin/sh` + a single `printf`), so the child exits and closes its stdin before the parent finishes writing the ~200-byte JSON hook payload. On Linux the pipe raises `EPIPE` immediately; on macOS the pipe happens to buffer the small payload before the child exits, which is why the race only surfaced on ubuntu CI runners. The parent's `write_all` returned `Err(BrokenPipe)`, `output_with_stdin` returned that as a hook failure, and `run_command` classified the hook as "failed to start" even though the child had already run to completion and printed the expected message to stdout. **Fix (commit `172a2ad`, force-pushed over `4f7b674`).** Three parts: (1) **actual fix** — `output_with_stdin` now matches the `write_all` result and swallows `BrokenPipe` specifically, while propagating all other write errors unchanged; after a `BrokenPipe` swallow the code still calls `wait_with_output()` so stdout/stderr/exit code are still captured from the cleanly-exited child. (2) **hygiene hardening** — a new `make_executable` helper sets mode `0o755` on each generated `.sh` via `std::os::unix::fs::PermissionsExt` under `#[cfg(unix)]`. This is defense-in-depth for future non-sh hook runners, not the bug that was biting CI. (3) **regression guard** — new `generated_hook_scripts_are_executable` test under `#[cfg(unix)]` asserts each generated `.sh` file has at least one execute bit set (`mode & 0o111 != 0`) so future tweaks cannot silently regress the hygiene change. **Verification.** `cargo test --release -p plugins` 35 passing, fmt clean, clippy `-D warnings` clean; CI run [24121999385](https://github.com/ultraworkers/claw-code/actions/runs/24121999385) went green on first attempt on `main` for the hotfix commit. **Meta-lesson.** `Broken pipe (os error 32)` from a child-process spawn path is ambiguous between "could not exec" and "exec'd and exited before the parent finished writing stdin." The first theory cargo-culted the "could not exec" reading because the ROADMAP scaffolding anchored on the exec-bit guess; falsification came from empirical CI, not from code inspection. Record the pattern: when a pipe error surfaces on fork/exec, instrument what `wait_with_output()` actually reports on the child before attributing the failure to a permissions or path issue.
-												Clarify the resumed JSON parity audit record

The audit fix already landed, but the roadmap entry was split across two separate done items for /sandbox and inventory even though the underlying defect was one resumed-local-command JSON parity surface. Consolidating the note makes the machine-readable gap precise and keeps the backlog trail aligned with the actual fix scope.

Constraint: Preserve the existing issue ordering and backlog context around issues 23-24
Rejected: Leave the split entries as-is | obscures that one parity bug covered the same resumed JSON dispatch path
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Record future parity audits as one backlog item per underlying contract gap, not per individual command symptom
Tested: Existing green verification from HEAD remains applicable; docs-only wording update
Not-tested: No additional code-path verification required for this wording-only change

											
										
										
											2026-04-06 01:45:26 +00:00
+. **Resumed local-command JSON parity gap** — **done**: direct `claw --output-format json` already had structured renderers for `sandbox`, `mcp`, `skills`, `version`, and `init`, but resumed `claw --output-format json --resume <session> /…` paths still fell back to prose because resumed slash dispatch only emitted JSON for `/status`. Resumed `/sandbox`, `/mcp`, `/skills`, `/version`, and `/init` now reuse the same JSON envelopes as their direct CLI counterparts, with regression coverage in `rust/crates/rusty-claude-cli/tests/resume_slash_commands.rs` and `rust/crates/rusty-claude-cli/tests/output_format_contract.rs`.
-												docs(roadmap): file dev/rust plugin-validation host-home leak as backlog #27

Filing per gaebal-gajae's status summary at message 1491322807026454579
in #clawcode-building-in-public, with corrected scope after re-running
`cargo test -p rusty-claude-cli` against main HEAD (79da4b8): the 11
deterministic failures only reproduce on dev/rust, not main, so this is
a dev/rust catchup item rather than a main regression.

Two-layered root cause documented:
1. dev/rust `parse_args` eagerly validates user plugin hook scripts
   exist on disk before returning a CliAction
2. dev/rust test harness does not redirect $HOME/XDG_CONFIG_HOME to a
   fixture (no `env_lock` equivalent — main has 30+ env_lock hits, dev
   has zero)

Together they make dev/rust `cargo test -p rusty-claude-cli` fail on
any clean clone whose owner has a half-installed user plugin in
~/.claude/plugins/installed/. main has both the env_lock test isolation
AND the parse_args/hook-validation decoupling already; dev/rust is just
behind on the merge train.

Action block in #27 spells out backporting env_lock + the parse_args
decoupling so the next dev/rust release picks this up.

											
										
										
											2026-04-08 15:30:04 +09:00
+. **`dev/rust` `cargo test -p rusty-claude-cli` reads host `~/.claude/plugins/installed/` from real `$HOME` and fails parse-time on any half-installed user plugin** — dogfooding on 2026-04-08 (filed from gaebal-gajae's clawhip bullet at message `1491322807026454579` after the provider-matrix branch QA surfaced it) reproduced 11 deterministic failures on clean `dev/rust` HEAD of the form `panicked at crates/rusty-claude-cli/src/main.rs:3953:31: args should parse: "hook path \`/Users/yeongyu/.claude/plugins/installed/sample-hooks-bundled/./hooks/pre.sh\` does not exist; hook path \`...\post.sh\` does not exist"` covering `parses_prompt_subcommand`, `parses_permission_mode_flag`, `defaults_to_repl_when_no_args`, `parses_resume_flag_with_slash_command`, `parses_system_prompt_options`, `parses_bare_prompt_and_json_output_flag`, `rejects_unknown_allowed_tools`, `parses_resume_flag_with_multiple_slash_commands`, `resolves_model_aliases_in_args`, `parses_allowed_tools_flags_with_aliases_and_lists`, `parses_login_and_logout_subcommands`. **Same failures do NOT reproduce on `main`** (re-verified with `cargo test --release -p rusty-claude-cli` against `main` HEAD `79da4b8`, all 156 tests pass). **Root cause is two-layered.** First, on `dev/rust` `parse_args` eagerly walks user-installed plugin manifests under `~/.claude/plugins/installed/` and validates that every declared hook script exists on disk before returning a `CliAction`, so any half-installed plugin in the developer's real `$HOME` (in this case `~/.claude/plugins/installed/sample-hooks-bundled/` whose `.claude-plugin` manifest references `./hooks/pre.sh` and `./hooks/post.sh` but whose `hooks/` subdirectory was deleted) makes argv parsing itself fail. Second, the test harness on `dev/rust` does not redirect `$HOME` or `XDG_CONFIG_HOME` to a fixture for the duration of the test — there is no `env_lock`-style guard equivalent to the one `main` already uses (`grep -n env_lock rust/crates/rusty-claude-cli/src/main.rs` returns 0 hits on `dev/rust` and 30+ hits on `main`). Together those two gaps mean `dev/rust` `cargo test -p rusty-claude-cli` is non-deterministic on every clean clone whose owner happens to have any non-pristine plugin in `~/.claude/`. **Action (two parts).** (a) Backport the `env_lock`-based test isolation pattern from `main` into `dev/rust`'s `rusty-claude-cli` test module so each test runs against a temp `$HOME`/`XDG_CONFIG_HOME` and cannot read host plugin state. (b) Decouple `parse_args` from filesystem hook validation on `dev/rust` (the same decoupling already on `main`, where hook validation happens later in the lifecycle than argv parsing) so even outside tests a partially installed user plugin cannot break basic CLI invocation. **Branch scope.** This is a `dev/rust` catchup against `main`, not a `main` regression. Tracking it here so the dev/rust merge train picks it up before the next dev/rust release rather than rediscovering it in CI.
-												fix(runtime): document phantom completion root cause + add workspace_root to session (#41)

Global session store causes cross-worktree confusion in parallel lanes.
Added workspace_root field to session metadata and documented root cause
in ROADMAP.md.

											
										
										
											2026-04-07 14:21:53 +09:00
-												docs(roadmap): file CLI provider dispatch bug as #29, mark #28 as partial

#28 error-copy improvements landed on ff1df4c but real users (nicma,
Jengro) hit `error: missing Anthropic credentials` within hours when
using `--model openai/gpt-4` with OPENAI_API_KEY set and all
ANTHROPIC_* env vars unset on main.

Traced root cause in build_runtime_with_plugin_state at line ~6244:
AnthropicRuntimeClient::new() is hardcoded. BuiltRuntime is
statically typed as ConversationRuntime<AnthropicRuntimeClient, ...>.
providers::detect_provider_kind() computes the right routing at the
metadata layer but the runtime client is never dispatched.

Files #29 with the detailed trace + a focused action plan:
DynamicApiClient enum wrapping Anthropic + OpenAiCompat variants,
retype BuiltRuntime, dispatch in build_runtime based on
detect_provider_kind, integration test with mock OpenAI-compat
server.

#28 is marked partial — the error-copy improvements are real and
stayed in, but the routing gap they were meant to cover is the
actual bug and needs #29 to land.

											
										
										
											2026-04-08 17:01:14 +09:00
+. **Auth-provider truth: error copy fails real users at the env-var-vs-header layer** — dogfooded live on 2026-04-08 in #claw-code (Sisyphus Labs guild), two separate new users hit adjacent failure modes within minutes of each other that both trace back to the same root: the `MissingApiKey` / 401 error surface does not teach users how the auth inputs map to HTTP semantics, so a user who sets a "reasonable-looking" env var still hits a hard error with no signpost. **Case 1 (varleg, Norway).** Wanted to use OpenRouter via the OpenAI-compat path. Found a comparison table claiming "provider-agnostic (Claude, OpenAI, local models)" and assumed it Just Worked. Set `OPENAI_API_KEY` to an OpenRouter `sk-or-v1-...` key and a model name without an `openai/` prefix; claw's provider detection fell through to Anthropic first because `ANTHROPIC_API_KEY` was still in the environment. Unsetting `ANTHROPIC_API_KEY` got them `ANTHROPIC_AUTH_TOKEN or ANTHROPIC_API_KEY is not set` instead of a useful hint that the OpenAI path was right there. Fix delivered live as a channel reply: use `main` branch (not `dev/rust`), export `OPENAI_BASE_URL=https://openrouter.ai/api/v1` alongside `OPENAI_API_KEY`, and prefix the model name with `openai/` so the prefix router wins over env-var presence. **Case 2 (stanley078852).** Had set `ANTHROPIC_AUTH_TOKEN="sk-ant-..."` and was getting 401 `Invalid bearer token` from Anthropic. Root cause: `sk-ant-` keys are `x-api-key`-header keys, not bearer tokens. `ANTHROPIC_API_KEY` path in `anthropic.rs` sends the value as `x-api-key`; `ANTHROPIC_AUTH_TOKEN` path sends it as `Authorization: Bearer` (for OAuth access tokens from `claw login`). Setting an `sk-ant-` key in the wrong env var makes claw send it as `Bearer sk-ant-...` which Anthropic rejects at the edge with 401 before it ever reaches the completions endpoint. The error text propagated all the way to the user (`api returned 401 Unauthorized (authentication_error) ... Invalid bearer token`) with zero signal that the problem was env-var choice, not key validity. Fix delivered live as a channel reply: move the `sk-ant-...` key to `ANTHROPIC_API_KEY` and unset `ANTHROPIC_AUTH_TOKEN`. **Pattern.** Both cases are failures at the *auth-intent translation* layer: the user chose an env var that made syntactic sense to them (`OPENAI_API_KEY` for OpenAI, `ANTHROPIC_AUTH_TOKEN` for Anthropic auth) but the actual wire-format routing requires a more specific choice. The error messages surface the HTTP-layer symptom (401, missing-key) without bridging back to "which env var should you have used and why." **Action.** Three concrete improvements, scoped for a single `main`-side PR: (a) In `ApiError::MissingCredentials` Display, when the Anthropic path is the one being reported but `OPENAI_API_KEY`, `XAI_API_KEY`, or `DASHSCOPE_API_KEY` are present in the environment, extend the message with "— but I see `$OTHER_KEY` set; if you meant to use that provider, prefix your model name with `openai/`, `grok`, or `qwen/` respectively so prefix routing selects it." (b) In the 401-from-Anthropic error path in `anthropic.rs`, when the failing auth source is `BearerToken` AND the bearer token starts with `sk-ant-`, append "— looks like you put an `sk-ant-*` API key in `ANTHROPIC_AUTH_TOKEN`, which is the Bearer-header path. Move it to `ANTHROPIC_API_KEY` instead (that env var maps to `x-api-key`, which is the correct header for `sk-ant-*` keys)." Same treatment for OAuth access tokens landing in `ANTHROPIC_API_KEY` (symmetric mis-assignment). (c) In `rust/README.md` on `main` and the matrix section on `dev/rust`, add a short "Which env var goes where" paragraph mapping `sk-ant-*` → `ANTHROPIC_API_KEY` and OAuth access token → `ANTHROPIC_AUTH_TOKEN`, with the one-line explanation of `x-api-key` vs `Authorization: Bearer`. **Verification path.** Both improvements can be tested with unit tests against `ApiError::fmt` output (the prefix-routing hint) and with a targeted integration test that feeds an `sk-ant-*`-shaped token into `BearerToken` and asserts the fmt output surfac
-												docs(roadmap): mark #21 and #29 as done

#21 (Resumed /status JSON parity gap): resolved by the broader
Resumed local-command JSON parity gap work tracked as #26. Re-verified
on main HEAD 8dc6580 — the regression test passes.

#29 (CLI provider dispatch hardcoded to Anthropic): landed at 8dc6580.
ApiProviderClient dispatch now routes correctly based on
detect_provider_kind. Original filing preserved as trace record.

											
										
										
											2026-04-08 17:43:47 +09:00
+. **CLI provider dispatch is hardcoded to Anthropic, ignoring prefix routing** — **done at `8dc6580` on 2026-04-08**. Changed `AnthropicRuntimeClient.client` from concrete `AnthropicClient` to `ApiProviderClient` (the api crate’s `ProviderClient` enum), which dispatches to Anthropic / xAI / OpenAi at construction time based on `detect_provider_kind(&resolved_model)`. 1 file, +59 −7, all 182 rusty-claude-cli tests pass, CI green at run `24125825431`. Users can now run `claw --model openai/gpt-4.1-mini prompt "hello"` with only `OPENAI_API_KEY` set and it routes correctly. **Original filing below for the trace record.** Dogfooded live on 2026-04-08 within hours of ROADMAP #28 landing. Users in #claw-code (nicma at `1491342350960562277`, Jengro at `1491345009021030533`) followed the exact "use main, set OPENAI_API_KEY and OPENAI_BASE_URL, unset ANTHROPIC_*, prefix the model with `openai/`" checklist from the #28 error-copy improvements AND STILL hit `error: missing Anthropic credentials; export ANTHROPIC_AUTH_TOKEN or ANTHROPIC_API_KEY before calling the Anthropic API`. **Reproduction on `main` HEAD `ff1df4c`:** `unset ANTHROPIC_API_KEY ANTHROPIC_AUTH_TOKEN; export OPENAI_API_KEY=sk-...; export OPENAI_BASE_URL=https://api.openai.com/v1; claw --model openai/gpt-4 prompt 'test'` → reproduces the error deterministically. **Root cause (traced).** `rust/crates/rusty-claude-cli/src/main.rs` at `build_runtime_with_plugin_state` (line ~6221) unconditionally builds `AnthropicRuntimeClient::new(session_id, model, ...)` without consulting `providers::detect_provider_kind(&model)`. `BuiltRuntime` at line ~2855 is statically typed as `ConversationRuntime<AnthropicRuntimeClient, CliToolExecutor>`, so even if the dispatch logic existed there would be nowhere to slot an alternative client. `providers/mod.rs::metadata_for_model` correctly identifies `openai/gpt-4` as `ProviderKind::OpenAi` at the metadata layer — the routing decision is *computed* correctly, it's just *never used* to pick a runtime client. The result is that the CLI is structurally single-provider (Anthropic only) even though the `api` crate's `openai_compat.rs`, `XAI_ENV_VARS`, `DASHSCOPE_ENV_VARS`, and `send_message_streaming` all exist and are exercised by unit tests inside the `api` crate. The provider matrix in `rust/README.md` is misleading because it describes the api-crate capabilities, not the CLI's actual dispatch behaviour. **Why #28 didn't catch this.** ROADMAP #28 focused on the `MissingCredentials` error *message* (adding hints when adjacent provider env vars are set, or when a bearer token starts with `sk-ant-*`). None of its tests exercised the `build_runtime` code path — they were all unit tests against `ApiError::fmt` output. The routing bug survives #28 because the `Display` improvements fire AFTER the hardcoded Anthropic client has already been constructed and failed. You need the CLI to dispatch to a different client in the first place for the new hints to even surface at the right moment. **Action (single focused commit).** (1) New `OpenAiCompatRuntimeClient` struct in `rust/crates/rusty-claude-cli/src/main.rs` mirroring `AnthropicRuntimeClient` but delegating to `openai_compat::send_message_streaming`. One client type handles OpenAI, xAI, DashScope, and any OpenAI-compat endpoint — they differ only in base URL and auth env var, both of which come from the `ProviderMetadata` returned by `metadata_for_model`. (2) New enum `DynamicApiClient { Anthropic(AnthropicRuntimeClient), OpenAiCompat(OpenAiCompatRuntimeClient) }` that implements `runtime::ApiClient` by matching on the variant and delegating. (3) Retype `BuiltRuntime` from `ConversationRuntime<AnthropicRuntimeClient, CliToolExecutor>` to `ConversationRuntime<DynamicApiClient, CliToolExecutor>`, update the Deref/DerefMut/new spots. (4) In `build_runtime_with_plugin_state`, call `detect_provider_kind(&model)` and construct either variant of `DynamicApiClient`. Prefix routing wins over env-var presence (that's the whole point). (5) Integration test using a mock OpenAI-compat server (re
-												docs(roadmap): file auth-provider truth pinpoint as backlog #28

Filed from live #claw-code dogfood on 2026-04-08 where two real users
hit adjacent auth confusion within minutes:

- varleg set OPENAI_API_KEY for OpenRouter but prefix routing didn't
  win because the model name wasn't prefixed with openai/; unsetting
  ANTHROPIC_API_KEY then hit MissingApiKey with no hint that the
  OpenAI path was already configured
- stanley078852 put an sk-ant-* key in ANTHROPIC_AUTH_TOKEN instead
  of ANTHROPIC_API_KEY, causing claw to send it as
  Authorization: Bearer sk-ant-..., which Anthropic rejects at the
  edge with 401 Invalid bearer token

Both fixes delivered live in #claw-code as direct replies, but the
pattern is structural: the error surface doesn't bridge HTTP-layer
symptoms back to env-var choice.

Action block spells out a single main-side PR with three
improvements: (a) MissingCredentials hint when an adjacent
provider's env var is already set, (b) 401-on-Anthropic hint when
bearer token starts with sk-ant-, (c) 'which env var goes where'
paragraph in both README matrices mapping sk-ant-* -> x-api-key and
OAuth access token -> Authorization: Bearer.

All three improvements are unit-testable against ApiError::fmt
output with no HTTP calls required.

											
										
										
											2026-04-08 15:58:46 +09:00
-												fix(runtime): document phantom completion root cause + add workspace_root to session (#41)

Global session store causes cross-worktree confusion in parallel lanes.
Added workspace_root field to session metadata and documented root cause
in ROADMAP.md.

											
										
										
											2026-04-07 14:21:53 +09:00
+. **Phantom completions root cause: global session store has no per-worktree isolation** —
 								    **Root cause.** The session store under `~/.local/share/opencode` is global to the host. Every `opencode serve` instance — including the parallel lane workers spawned per worktree — reads and writes the same on-disk session directory. Sessions are keyed only by id and timestamp, not by the workspace they were created in, so there is no structural barrier between a session created in worktree `/tmp/b4-phantom-diag` and one created in `/tmp/b4-omc-flat`. Whichever serve instance picks up a given session id can drive it from whatever CWD that serve happens to be running in.
 								    **Impact.** Parallel lanes silently cross wires. A lane reports a clean run — file edits, builds, tests — and the orchestrator marks the lane green, but the writes were applied against another worktree's CWD because a sibling `opencode serve` won the session race. The originating worktree shows no diff, the *other* worktree gains unexplained edits, and downstream consumers (clawhip lane events, PR pushes, merge gates) treat the empty originator as a successful no-op. These are the "phantom completions" we keep chasing: success messaging without any landed changes in the lane that claimed them, plus stray edits in unrelated lanes whose own runs never touched those files. Because the report path is happy, retries and recovery recipes never fire, so the lane silently wedges until a human notices the diff is empty.
 								    **Proposed fix.** Bind every session to its workspace root + branch at creation time and refuse to drive it from any other CWD.
 								    - At session creation, capture the canonical workspace root (resolved git worktree path) and the active branch and persist them on the session record.
 								    - On every load (`opencode serve`, slash-command resume, lane recovery), validate that the current process CWD matches the persisted workspace root before any tool with side effects (file_ops, bash, git) is allowed to run. Mismatches surface as a typed `WorkspaceMismatch` failure class instead of silently writing to the wrong tree.
 								    - Namespace the on-disk session path under the workspace fingerprint (e.g. `<session_store>/<workspace_hash>/<session_id>`) so two parallel `opencode serve` instances physically cannot collide on the same session id.
 								    - Forks inherit the parent's workspace root by default; an explicit re-bind is required to move a session to a new worktree, and that re-bind is itself recorded as a structured event so the orchestrator can audit cross-worktree handoffs.
 								    - Surface a `branch.workspace_mismatch` lane event so clawhip stops counting wrong-CWD writes as lane completions.
 								    **Status.** A `workspace_root` field has been added to `Session` in `rust/crates/runtime/src/session.rs` (with builder, accessor, JSON + JSONL round-trip, fork inheritance, and given/when/then test coverage in `persists_workspace_root_round_trip_and_forks_inherit_it`). The CWD validation, the namespaced on-disk path, and the `branch.workspace_mismatch` lane event are still outstanding and tracked under this item.
-												docs(roadmap): prioritize backlog — P0/P1/P2/P3 ordering with wiring items first

											
										
										
											2026-04-04 04:31:38 +09:00
+								**P3 — Swarm efficiency**
-												Close the clawability backlog with deterministic CLI output and lane lineage

Finish the remaining roadmap work by making direct CLI JSON output deterministic across the non-interactive surface, restoring the degraded-startup MCP test as a real workspace test, and adding branch-lock plus commit-lineage primitives so downstream lane consumers can distinguish superseded worktree commits from canonical lineage.

Constraint: Keep the user-facing config namespace centered on .claw while preserving legacy fallback discovery for compatibility
Constraint: Verification needed to stay clean-room and reproducible from the checked-in workspace alone
Rejected: Leave the output-format contract implied by ad-hoc smoke runs only | too easy for direct CLI regressions to slip back into prose-only output
Rejected: Keep commit provenance as free-form detail text | downstream consumers need structured branch/worktree/supersession metadata
Confidence: medium
Scope-risk: moderate
Directive: Extend the JSON contract through the same direct CLI entrypoints instead of adding one-off serializers on parallel code paths
Tested: python .github/scripts/check_doc_source_of_truth.py
Tested: cd rust && cargo fmt --all --check
Tested: cd rust && cargo test --workspace
Tested: cd rust && cargo clippy -p commands -p tools -p rusty-claude-cli --all-targets --no-deps -- -D warnings
Not-tested: full cargo clippy --workspace --all-targets -- -D warnings still reports unrelated pre-existing runtime lint debt outside this change set

											
										
										
											2026-04-05 18:40:33 +00:00
+. Swarm branch-lock protocol — **done**: `branch_lock::detect_branch_lock_collisions()` now detects same-branch/same-scope and nested-module collisions before parallel lanes drift into duplicate implementation
 . Commit provenance / worktree-aware push events — **done**: lane event provenance now includes branch/worktree/superseded/canonical lineage metadata, and manifest persistence de-dupes superseded commit events before downstream consumers render them
-												docs: add clawable harness roadmap

											
										
										
											2026-04-03 14:46:06 +00:00
 								## Suggested Session Split
 								### Session A — worker boot protocol
 								Focus:
 								- trust prompt detection
 								- ready-for-prompt handshake
 								- prompt misdelivery detection
 								### Session B — clawhip lane events
 								Focus:
 								- canonical lane event schema
 								- failure taxonomy
 								- summary compression
 								### Session C — branch/test intelligence
 								Focus:
 								- stale-branch detection
 								- green-level contract
 								- recovery recipes
 								### Session D — MCP lifecycle hardening
 								Focus:
 								- startup/handshake reliability
 								- structured failed server reporting
 								- degraded-mode runtime behavior
 								- lifecycle tests/harness coverage
 								### Session E — typed task packets + policy engine
 								Focus:
 								- structured task format
 								- retry/merge/escalation rules
 								- autonomous lane closure behavior
 								## MVP Success Criteria
 								We should consider claw-code materially more clawable when:
 								- a claw can start a worker and know with certainty when it is ready
 								- claws no longer accidentally type tasks into the shell
 								- stale-branch failures are identified before they waste debugging time
 								- clawhip reports machine states, not just tmux prose
 								- MCP/plugin startup failures are classified and surfaced cleanly
 								- a coding lane can self-recover from common startup and branch issues without human babysitting
 								## Short Version
 								claw-code should evolve from:
 								- a CLI a human can also drive
 								to:
 								- a **claw-native execution runtime**
 								- an **event-native orchestration substrate**
 								- a **plugin/hook-first autonomous coding harness**
-												docs(roadmap): document WorkerState deployment architecture gap

WorkerStatus state machine exists in worker_boot.rs and is exported
from runtime/src/lib.rs. But claw-code is a plugin — it cannot add
HTTP routes to opencode serve (upstream binary, not ours).

/state HTTP endpoint via axum was never implemented. Prior session
summary claiming commit 0984cca was incorrect.

Recommended path: write WorkerStatus transitions to
.claw/worker-state.json on each transition (file-based observability,
no upstream changes required). Wire WorkerRegistry::transition() to
atomic file writes + add  CLI subcommand.

											
										
										
											2026-04-08 00:07:06 +09:00
 								## Deployment Architecture Gap (filed from dogfood 2026-04-08)
 								### WorkerState is in the runtime; /state is NOT in opencode serve
 								**Root cause discovered during batch 8 dogfood.**
 								`worker_boot.rs` has a solid `WorkerStatus` state machine (`Spawning → TrustRequired → ReadyForPrompt → Running → Finished/Failed`). It is exported from `runtime/src/lib.rs` as a public API. But claw-code is a **plugin** loaded inside the `opencode` binary — it cannot add HTTP routes to `opencode serve`. The HTTP server is 100% owned by the upstream opencode process (v1.3.15).
 								**Impact:** There is no way to `curl localhost:4710/state` and get back a JSON `WorkerStatus`. Any such endpoint would require either:
 . Upstreaming a `/state` route into opencode's HTTP server (requires a PR to sst/opencode), or
 . Writing a sidecar HTTP process that queries the `WorkerRegistry` in-process (possible but fragile), or
 . Writing `WorkerStatus` to a well-known file path (`.claw/worker-state.json`) that an external observer can poll.
 								**Recommended path:** Option 3 — emit `WorkerStatus` transitions to `.claw/worker-state.json` on every state change. This is purely within claw-code's plugin scope, requires no upstream changes, and gives clawhip a file it can poll to distinguish a truly stalled worker from a quiet-but-progressing one.
 								**Action item:** Wire `WorkerRegistry::transition()` to atomically write `.claw/worker-state.json` on every state transition. Add a `claw state` CLI subcommand that reads and prints this file. Add regression test.
 								**Prior session note:** A previous session summary claimed commit `0984cca` landed a `/state` HTTP endpoint via axum. This was incorrect — no such commit exists on main, axum is not a dependency, and the HTTP server is not ours. The actual work that exists: `worker_boot.rs` with `WorkerStatus` enum + `WorkerRegistry`, fully wired into `runtime/src/lib.rs` as public exports.
-												docs(roadmap): file startup-friction gap — no default trusted_roots in settings

WorkerCreate requires trusted_roots per-call; no config-level default.
Any batch that forgets the field stalls all workers at trust_required.
Root cause of several 'batch lanes not advancing' incidents.

Recommended fix: wire RuntimeConfig::trusted_roots() as default into
WorkerRegistry::spawn_worker(), with per-call overrides. Update
config_validate schema to include the new field.

											
										
										
											2026-04-08 02:02:48 +09:00
 								## Startup Friction Gap: No Default trusted_roots in Settings (filed 2026-04-08)
 								### Every lane starts with manual trust babysitting unless caller explicitly passes roots
 								**Root cause discovered during direct dogfood of WorkerCreate tool.**
 								`WorkerCreate` accepts a `trusted_roots: Vec<String>` parameter. If the caller omits it (or passes `[]`), every new worker immediately enters `TrustRequired` and stalls — requiring manual intervention to advance to `ReadyForPrompt`. There is no mechanism to configure a default allowlist in `settings.json` or `.claw/settings.json`.
 								**Impact:** Batch tooling (clawhip, lane orchestrators) must pass `trusted_roots` explicitly on every `WorkerCreate` call. If a batch script forgets the field, all workers in that batch stall silently at `trust_required`. This was the root cause of several "batch 8 lanes not advancing" incidents.
 								**Recommended fix:**
 . Add a `trusted_roots` field to `RuntimeConfig` (or a nested `[trust]` table), loaded via `ConfigLoader`.
 . In `WorkerRegistry::spawn_worker()`, merge config-level `trusted_roots` with any per-call overrides.
 . Default: empty list (safest). Users opt in by adding their repo paths to settings.
 . Update `config_validate` schema with the new field.
 								**Action item:** Wire `RuntimeConfig::trusted_roots()` → `WorkerRegistry::spawn_worker()` default. Cover with test: config with `trusted_roots = ["/tmp"]` → spawning worker in `/tmp/x` auto-resolves trust without caller passing the field.
-												docs(roadmap): settle observability transport — CLI/file is canonical, HTTP deferred

Closes the ambiguity gaebal-gajae flagged: downstream tooling was left
guessing which integration surface to build against.

Decision: claw state + .claw/worker-state.json is the blessed contract.
HTTP endpoint not scheduled. Rationale documented:
- plugin scope constraint (can't add routes to opencode serve)
- file polling has lower latency and fewer failure modes than HTTP
- HTTP would require upstreaming to sst/opencode or a fragile sidecar

Clawhip integration contract documented:
- poll .claw/worker-state.json after WorkerCreate
- seconds_since_update > 60 in trust_required = stall signal
- WorkerResolveTrust to unblock, WorkerRestart to reset

											
										
										
											2026-04-08 03:34:31 +09:00
 								## Observability Transport Decision (filed 2026-04-08)
 								### Canonical state surface: CLI/file-based. HTTP endpoint deferred.
 								**Decision:** `claw state` reading `.claw/worker-state.json` is the **blessed observability contract** for clawhip and downstream tooling. This is not a stepping-stone — it is the supported surface. Build against it.
 								**Rationale:**
 								- claw-code is a plugin running inside the opencode binary. It cannot add HTTP routes to `opencode serve` — that server belongs to upstream sst/opencode.
 								- The file-based surface is fully within plugin scope: `emit_state_file()` in `worker_boot.rs` writes atomically on every `WorkerStatus` transition.
 								- `claw state --output-format json` gives clawhip everything it needs: `status`, `is_ready`, `seconds_since_update`, `trust_gate_cleared`, `last_event`, `updated_at`.
 								- Polling a local file has lower latency and fewer failure modes than an HTTP round-trip to a sidecar.
 								- An HTTP state endpoint would require either (a) upstreaming a route to sst/opencode — a multi-week PR cycle with no guarantee of acceptance — or (b) a sidecar process that queries `WorkerRegistry` in-process, which is fragile and adds an extra failure domain.
 								**What downstream tooling (clawhip) should do:**
 . After `WorkerCreate`, poll `.claw/worker-state.json` (or run `claw state --output-format json`) in the worker's CWD at whatever interval makes sense (e.g. 5s).
 . Trust `seconds_since_update > 60` in `trust_required` status as the stall signal.
 . Call `WorkerResolveTrust` tool to unblock, or `WorkerRestart` to reset.
 								**HTTP endpoint tracking:** Not scheduled. If a concrete use case emerges that file polling cannot serve (e.g. remote workers over a network boundary), open a new issue to upstream a `/worker/state` route to sst/opencode at that time. Until then: file/CLI is canonical.
-												docs(roadmap): document provider routing fix and auth-sniffer fragility lesson

Filed: openai/ prefix model misrouting (fixed in 0530c50).
Documents root cause, fix, and the architectural lesson:
  - metadata_for_model is the canonical extension point for new providers
  - auth-sniffer fallback order must never override explicit model-name prefix
  - regression test locked in to guard this invariant

											
										
										
											2026-04-08 05:35:12 +09:00
 								## Provider Routing: Model-Name Prefix Must Win Over Env-Var Presence (fixed 2026-04-08, `0530c50`)
 								### `openai/gpt-4.1-mini` was silently misrouted to Anthropic when ANTHROPIC_API_KEY was set
 								**Root cause:** `metadata_for_model` returned `None` for any model not matching `claude` or `grok` prefix.
 								`detect_provider_kind` then fell through to auth-sniffer order: first `has_auth_from_env_or_saved()` (Anthropic), then `OPENAI_API_KEY`, then `XAI_API_KEY`.
 								If `ANTHROPIC_API_KEY` was present in the environment (e.g. user has both Anthropic and OpenRouter configured), any unknown model — including explicitly namespaced ones like `openai/gpt-4.1-mini` — was silently routed to the Anthropic client, which then failed with `missing Anthropic credentials` or a confusing 402/auth error rather than routing to OpenAI-compatible.
 								**Fix:** Added explicit prefix checks in `metadata_for_model`:
 								- `openai/` prefix → `ProviderKind::OpenAi`
 								- `gpt-` prefix → `ProviderKind::OpenAi`
 								Model name prefix now wins unconditionally over env-var presence. Regression test locked in: `providers::tests::openai_namespaced_model_routes_to_openai_not_anthropic`.
 								**Lesson:** Auth-sniffer fallback order is fragile. Any new provider added in the future should be registered in `metadata_for_model` via a model-name prefix, not left to env-var order. This is the canonical extension point.