claw-code/CLAUDE.md

# CLAUDE.md — Python Reference Implementation

**This file guides work on `src/` and `tests/` — the Python reference harness for claw-code protocol.**

The production CLI lives in `rust/`; this directory (`src/`, `tests/`, `.py` files) is a **protocol validation and dogfood surface**.

## What this Python harness does

**Machine-first orchestration layer** — proves that the claw-code JSON protocol is:
- Deterministic and recoverable (every output is reproducible)
- Self-describing (SCHEMAS.md documents every field)
- Clawable (external agents can build ONE error handler for all commands)

## Stack
- **Language:** Python 3.13+
- **Dependencies:** minimal (no frameworks; pure stdlibs + attrs/dataclasses)
- **Test runner:** pytest
- **Protocol contract:** SCHEMAS.md (machine-readable JSON envelope)

## Quick start

```bash
# 1. Install dependencies (if not already in venv)
python3 -m venv .venv && source .venv/bin/activate
# (dependencies minimal; standard library mostly)

# 2. Run tests
python3 -m pytest tests/ -q

# 3. Try a command
python3 -m src.main bootstrap "hello" --output-format json | python3 -m json.tool
```

## Verification workflow

```bash
# Unit tests (fast)
python3 -m pytest tests/ -q 2>&1 | tail -3

# Type checking (optional but recommended)
python3 -m mypy src/ --ignore-missing-imports 2>&1 | tail -5
```

## Repository shape

- **`src/`** — Python reference harness implementing SCHEMAS.md protocol
  - `main.py` — CLI entry point; all 14 clawable commands
  - `query_engine.py` — core TurnResult / QueryEngineConfig
  - `runtime.py` — PortRuntime; turn loop + cancellation (#164 Stage A/B)
  - `session_store.py` — session persistence
  - `transcript.py` — turn transcript assembly
  - `commands.py`, `tools.py` — simulated command/tool trees
  - `models.py` — PermissionDenial, UsageSummary, etc.

- **`tests/`** — comprehensive protocol validation (22 baseline → 192 passing as of 2026-04-22)
  - `test_cli_parity_audit.py` — proves all 14 clawable commands accept --output-format
  - `test_json_envelope_field_consistency.py` — validates SCHEMAS.md contract
  - `test_cancel_observed_field.py` — #164 Stage B: cancellation observability + safe-to-reuse semantics
  - `test_run_turn_loop_*.py` — turn loop behavior (timeout, cancellation, continuation, permissions)
  - `test_submit_message_*.py` — budget, cancellation contracts
  - `test_*_cli.py` — command-specific JSON output validation

- **`SCHEMAS.md`** — canonical JSON contract
  - Common fields (all envelopes): timestamp, command, exit_code, output_format, schema_version
  - Error envelope shape
  - Not-found envelope shape
  - Per-command success schemas (14 commands documented)
  - Turn Result fields (including cancel_observed as of #164 Stage B)

- **`.gitignore`** — excludes `.port_sessions/` (dogfood-run state)

## Key concepts

### Clawable surface (14 commands)

Every clawable command **must**:
1. Accept `--output-format {text,json}`
2. Return JSON envelopes matching SCHEMAS.md
3. Use common fields (timestamp, command, exit_code, output_format, schema_version)
4. Exit 0 on success, 1 on error/not-found, 2 on timeout

**Commands:** list-sessions, delete-session, load-session, flush-transcript, show-command, show-tool, exec-command, exec-tool, route, bootstrap, command-graph, tool-pool, bootstrap-graph, turn-loop

**Validation:** `test_cli_parity_audit.py` auto-tests all 14 for --output-format acceptance.

### OPT_OUT surfaces (12 commands)

Explicitly exempt from --output-format requirement (for now):
- Rich-Markdown reports: summary, manifest, parity-audit, setup-report
- List commands with query filters: subsystems, commands, tools
- Simulation/debug: remote-mode, ssh-mode, teleport-mode, direct-connect-mode, deep-link-mode

**Future work:** audit OPT_OUT surfaces for JSON promotion (post-#164).

### Protocol layers

**Coverage (#167–#170):** All clawable commands emit JSON
**Enforcement (#171):** Parity CI prevents new commands skipping JSON
**Documentation (#172):** SCHEMAS.md locks field contract
**Alignment (#173):** Test framework validates docs ↔ code match
**Field evolution (#164 Stage B):** cancel_observed proves protocol extensibility

## Testing & coverage

### Run full suite
```bash
python3 -m pytest tests/ -q
```

### Run one test file
```bash
python3 -m pytest tests/test_cancel_observed_field.py -v
```

### Run one test
```bash
python3 -m pytest tests/test_cancel_observed_field.py::TestCancelObservedField::test_default_value_is_false -v
```

### Check coverage (optional)
```bash
python3 -m pip install coverage  # if not already installed
python3 -m coverage run -m pytest tests/
python3 -m coverage report --skip-covered
```

Target: >90% line coverage for src/ (currently ~85%).

## Common workflows

### Add a new clawable command

1. Add parser in `main.py` (argparse)
2. Add `--output-format` flag
3. Emit JSON envelope using `wrap_json_envelope(data, command_name)`
4. Add command to CLAWABLE_SURFACES in test_cli_parity_audit.py
5. Document in SCHEMAS.md (schema + example)
6. Write test in tests/test_*_cli.py or tests/test_json_envelope_field_consistency.py
7. Run full suite to confirm parity

### Modify TurnResult or protocol fields

1. Update dataclass in `query_engine.py`
2. Update SCHEMAS.md with new field + rationale
3. Write test in `tests/test_json_envelope_field_consistency.py` that validates field presence
4. Update all places that construct TurnResult (grep for `TurnResult(`)
5. Update bootstrap/turn-loop JSON builders in main.py
6. Run `tests/` to ensure no regressions

### Promote an OPT_OUT surface to CLAWABLE

**Prerequisite:** Real demand signal logged in `OPT_OUT_DEMAND_LOG.md` (threshold: 2+ independent signals per surface). Speculative promotions are not allowed.

Once demand is evidenced:
1. Add --output-format flag to argparse
2. Emit wrap_json_envelope() output in JSON path
3. Move command from OPT_OUT_SURFACES to CLAWABLE_SURFACES
4. Document in SCHEMAS.md
5. Write test for JSON output
6. Run parity audit to confirm no regressions
7. Update `OPT_OUT_DEMAND_LOG.md` to mark signal as resolved

### File a demand signal (when a claw actually needs JSON from an OPT_OUT surface)

1. Open `OPT_OUT_DEMAND_LOG.md`
2. Find the surface's entry under Group A/B/C
3. Append a dated entry with Source, Use Case, and Markdown-alternative-checked explanation
4. If this is the 2nd signal for the same surface, file a promotion pinpoint in ROADMAP.md

## Dogfood principles

The Python harness is continuously dogfood-tested:
- Every cycle ships to `main` with detailed commit messages
- New tests are written before/alongside implementation
- Test suite must pass before pushing (zero-regression principle)
- Commits grouped by pinpoint (#159, #160, ..., #174)
- Failure modes classified per exit code: 0=success, 1=error, 2=timeout

## Protocol governance

- **SCHEMAS.md is the source of truth** — any implementation must match field-for-field
- **Tests enforce the contract** — drift is caught by test suite
- **Field additions are forward-compatible** — new fields get defaults, old clients ignore them
- **Exit codes are signals** — claws use them for conditional logic (0→continue, 1→escalate, 2→timeout)
- **Timestamps are audit trails** — every envelope includes ISO 8601 UTC time for chronological ordering

## Related docs

- **`ERROR_HANDLING.md`** — Unified error-handling pattern for claws (one handler for all 14 clawable commands)
- **`SCHEMAS.md`** — JSON protocol specification (read before implementing)
- **`OPT_OUT_AUDIT.md`** — Governance for the 12 non-clawable surfaces
- **`OPT_OUT_DEMAND_LOG.md`** — Active survey recording real demand signals (evidence base for decisions)
- **`ROADMAP.md`** — macro roadmap and macro pain points
- **`PHILOSOPHY.md`** — system design intent
- **`PARITY.md`** — status of Python ↔ Rust protocol equivalence
-												docs: CLAUDE.md reframe — market Python harness as machine-first protocol validation layer

Rewrote CLAUDE.md to accurately describe the Python reference implementation:
- Shifted framing from outdated Rust-focused guidance to protocol-validation focus
- Clarified that src/tests/ is a dogfood surface proving SCHEMAS.md contract
- Added machine-first marketing: deterministic, self-describing, clawable
- Documented all 14 clawable commands (post-#164 Stage B promotion)
- Added OPT_OUT surfaces audit queue (12 commands, future work)
- Included protocol layers: Coverage → Enforcement → Documentation → Alignment
- Added quick-start workflow for Python harness
- Documented common workflows (add command, modify fields, promote OPT_OUT→CLAWABLE)
- Emphasized protocol governance: SCHEMAS.md as source of truth
- Exit codes documented as signals (0=success, 1=error, 2=timeout)

Result: Developers can now understand the Python harness purpose without reading
ROADMAP.md or inferring from test names. Protocol-first mental model is explicit.

Related: #173 (protocol closure), #164 Stage B (field evolution), #174 (this cycle).

											
										
										
											2026-04-22 19:53:12 +09:00
+								# CLAUDE.md — Python Reference Implementation
-												feat: default OAuth config for API endpoint, merge UI polish rendering

											
										
										
											2026-04-01 03:20:26 +00:00
-												docs: CLAUDE.md reframe — market Python harness as machine-first protocol validation layer

Rewrote CLAUDE.md to accurately describe the Python reference implementation:
- Shifted framing from outdated Rust-focused guidance to protocol-validation focus
- Clarified that src/tests/ is a dogfood surface proving SCHEMAS.md contract
- Added machine-first marketing: deterministic, self-describing, clawable
- Documented all 14 clawable commands (post-#164 Stage B promotion)
- Added OPT_OUT surfaces audit queue (12 commands, future work)
- Included protocol layers: Coverage → Enforcement → Documentation → Alignment
- Added quick-start workflow for Python harness
- Documented common workflows (add command, modify fields, promote OPT_OUT→CLAWABLE)
- Emphasized protocol governance: SCHEMAS.md as source of truth
- Exit codes documented as signals (0=success, 1=error, 2=timeout)

Result: Developers can now understand the Python harness purpose without reading
ROADMAP.md or inferring from test names. Protocol-first mental model is explicit.

Related: #173 (protocol closure), #164 Stage B (field evolution), #174 (this cycle).

											
										
										
											2026-04-22 19:53:12 +09:00
+								**This file guides work on `src/` and `tests/` — the Python reference harness for claw-code protocol.**
-												feat: default OAuth config for API endpoint, merge UI polish rendering

											
										
										
											2026-04-01 03:20:26 +00:00
-												docs: CLAUDE.md reframe — market Python harness as machine-first protocol validation layer

Rewrote CLAUDE.md to accurately describe the Python reference implementation:
- Shifted framing from outdated Rust-focused guidance to protocol-validation focus
- Clarified that src/tests/ is a dogfood surface proving SCHEMAS.md contract
- Added machine-first marketing: deterministic, self-describing, clawable
- Documented all 14 clawable commands (post-#164 Stage B promotion)
- Added OPT_OUT surfaces audit queue (12 commands, future work)
- Included protocol layers: Coverage → Enforcement → Documentation → Alignment
- Added quick-start workflow for Python harness
- Documented common workflows (add command, modify fields, promote OPT_OUT→CLAWABLE)
- Emphasized protocol governance: SCHEMAS.md as source of truth
- Exit codes documented as signals (0=success, 1=error, 2=timeout)

Result: Developers can now understand the Python harness purpose without reading
ROADMAP.md or inferring from test names. Protocol-first mental model is explicit.

Related: #173 (protocol closure), #164 Stage B (field evolution), #174 (this cycle).

											
										
										
											2026-04-22 19:53:12 +09:00
+								The production CLI lives in `rust/`; this directory (`src/`, `tests/`, `.py` files) is a **protocol validation and dogfood surface**.
-												feat: default OAuth config for API endpoint, merge UI polish rendering

											
										
										
											2026-04-01 03:20:26 +00:00
-												docs: CLAUDE.md reframe — market Python harness as machine-first protocol validation layer

Rewrote CLAUDE.md to accurately describe the Python reference implementation:
- Shifted framing from outdated Rust-focused guidance to protocol-validation focus
- Clarified that src/tests/ is a dogfood surface proving SCHEMAS.md contract
- Added machine-first marketing: deterministic, self-describing, clawable
- Documented all 14 clawable commands (post-#164 Stage B promotion)
- Added OPT_OUT surfaces audit queue (12 commands, future work)
- Included protocol layers: Coverage → Enforcement → Documentation → Alignment
- Added quick-start workflow for Python harness
- Documented common workflows (add command, modify fields, promote OPT_OUT→CLAWABLE)
- Emphasized protocol governance: SCHEMAS.md as source of truth
- Exit codes documented as signals (0=success, 1=error, 2=timeout)

Result: Developers can now understand the Python harness purpose without reading
ROADMAP.md or inferring from test names. Protocol-first mental model is explicit.

Related: #173 (protocol closure), #164 Stage B (field evolution), #174 (this cycle).

											
										
										
											2026-04-22 19:53:12 +09:00
+								## What this Python harness does
 								**Machine-first orchestration layer** — proves that the claw-code JSON protocol is:
 								- Deterministic and recoverable (every output is reproducible)
 								- Self-describing (SCHEMAS.md documents every field)
 								- Clawable (external agents can build ONE error handler for all commands)
 								## Stack
 								- **Language:** Python 3.13+
 								- **Dependencies:** minimal (no frameworks; pure stdlibs + attrs/dataclasses)
 								- **Test runner:** pytest
 								- **Protocol contract:** SCHEMAS.md (machine-readable JSON envelope)
 								## Quick start
 								```bash
 								# 1. Install dependencies (if not already in venv)
 								python3 -m venv .venv && source .venv/bin/activate
 								# (dependencies minimal; standard library mostly)
 								# 2. Run tests
 								python3 -m pytest tests/ -q
 								# 3. Try a command
 								python3 -m src.main bootstrap "hello" --output-format json | python3 -m json.tool
 								```
 								## Verification workflow
 								```bash
 								# Unit tests (fast)
 								python3 -m pytest tests/ -q 2>&1 | tail -3
 								# Type checking (optional but recommended)
 								python3 -m mypy src/ --ignore-missing-imports 2>&1 | tail -5
 								```
-												feat: default OAuth config for API endpoint, merge UI polish rendering

											
										
										
											2026-04-01 03:20:26 +00:00
 								## Repository shape
-												docs: CLAUDE.md reframe — market Python harness as machine-first protocol validation layer

Rewrote CLAUDE.md to accurately describe the Python reference implementation:
- Shifted framing from outdated Rust-focused guidance to protocol-validation focus
- Clarified that src/tests/ is a dogfood surface proving SCHEMAS.md contract
- Added machine-first marketing: deterministic, self-describing, clawable
- Documented all 14 clawable commands (post-#164 Stage B promotion)
- Added OPT_OUT surfaces audit queue (12 commands, future work)
- Included protocol layers: Coverage → Enforcement → Documentation → Alignment
- Added quick-start workflow for Python harness
- Documented common workflows (add command, modify fields, promote OPT_OUT→CLAWABLE)
- Emphasized protocol governance: SCHEMAS.md as source of truth
- Exit codes documented as signals (0=success, 1=error, 2=timeout)

Result: Developers can now understand the Python harness purpose without reading
ROADMAP.md or inferring from test names. Protocol-first mental model is explicit.

Related: #173 (protocol closure), #164 Stage B (field evolution), #174 (this cycle).

											
										
										
											2026-04-22 19:53:12 +09:00
 								- **`src/`** — Python reference harness implementing SCHEMAS.md protocol
 								  - `main.py` — CLI entry point; all 14 clawable commands
 								  - `query_engine.py` — core TurnResult / QueryEngineConfig
 								  - `runtime.py` — PortRuntime; turn loop + cancellation (#164 Stage A/B)
 								  - `session_store.py` — session persistence
 								  - `transcript.py` — turn transcript assembly
 								  - `commands.py`, `tools.py` — simulated command/tool trees
 								  - `models.py` — PermissionDenial, UsageSummary, etc.
 								- **`tests/`** — comprehensive protocol validation (22 baseline → 192 passing as of 2026-04-22)
 								  - `test_cli_parity_audit.py` — proves all 14 clawable commands accept --output-format
 								  - `test_json_envelope_field_consistency.py` — validates SCHEMAS.md contract
 								  - `test_cancel_observed_field.py` — #164 Stage B: cancellation observability + safe-to-reuse semantics
 								  - `test_run_turn_loop_*.py` — turn loop behavior (timeout, cancellation, continuation, permissions)
 								  - `test_submit_message_*.py` — budget, cancellation contracts
 								  - `test_*_cli.py` — command-specific JSON output validation
 								- **`SCHEMAS.md`** — canonical JSON contract
 								  - Common fields (all envelopes): timestamp, command, exit_code, output_format, schema_version
 								  - Error envelope shape
 								  - Not-found envelope shape
 								  - Per-command success schemas (14 commands documented)
 								  - Turn Result fields (including cancel_observed as of #164 Stage B)
 								- **`.gitignore`** — excludes `.port_sessions/` (dogfood-run state)
 								## Key concepts
 								### Clawable surface (14 commands)
 								Every clawable command **must**:
 . Accept `--output-format {text,json}`
 . Return JSON envelopes matching SCHEMAS.md
 . Use common fields (timestamp, command, exit_code, output_format, schema_version)
 . Exit 0 on success, 1 on error/not-found, 2 on timeout
 								**Commands:** list-sessions, delete-session, load-session, flush-transcript, show-command, show-tool, exec-command, exec-tool, route, bootstrap, command-graph, tool-pool, bootstrap-graph, turn-loop
 								**Validation:** `test_cli_parity_audit.py` auto-tests all 14 for --output-format acceptance.
 								### OPT_OUT surfaces (12 commands)
 								Explicitly exempt from --output-format requirement (for now):
 								- Rich-Markdown reports: summary, manifest, parity-audit, setup-report
 								- List commands with query filters: subsystems, commands, tools
 								- Simulation/debug: remote-mode, ssh-mode, teleport-mode, direct-connect-mode, deep-link-mode
 								**Future work:** audit OPT_OUT surfaces for JSON promotion (post-#164).
 								### Protocol layers
 								**Coverage (#167–#170):** All clawable commands emit JSON
 								**Enforcement (#171):** Parity CI prevents new commands skipping JSON
 								**Documentation (#172):** SCHEMAS.md locks field contract
 								**Alignment (#173):** Test framework validates docs ↔ code match
 								**Field evolution (#164 Stage B):** cancel_observed proves protocol extensibility
 								## Testing & coverage
 								### Run full suite
 								```bash
 								python3 -m pytest tests/ -q
 								```
 								### Run one test file
 								```bash
 								python3 -m pytest tests/test_cancel_observed_field.py -v
 								```
 								### Run one test
 								```bash
 								python3 -m pytest tests/test_cancel_observed_field.py::TestCancelObservedField::test_default_value_is_false -v
 								```
 								### Check coverage (optional)
 								```bash
 								python3 -m pip install coverage  # if not already installed
 								python3 -m coverage run -m pytest tests/
 								python3 -m coverage report --skip-covered
 								```
 								Target: >90% line coverage for src/ (currently ~85%).
 								## Common workflows
 								### Add a new clawable command
 . Add parser in `main.py` (argparse)
 . Add `--output-format` flag
 . Emit JSON envelope using `wrap_json_envelope(data, command_name)`
 . Add command to CLAWABLE_SURFACES in test_cli_parity_audit.py
 . Document in SCHEMAS.md (schema + example)
 . Write test in tests/test_*_cli.py or tests/test_json_envelope_field_consistency.py
 . Run full suite to confirm parity
 								### Modify TurnResult or protocol fields
 . Update dataclass in `query_engine.py`
 . Update SCHEMAS.md with new field + rationale
 . Write test in `tests/test_json_envelope_field_consistency.py` that validates field presence
 . Update all places that construct TurnResult (grep for `TurnResult(`)
 . Update bootstrap/turn-loop JSON builders in main.py
 . Run `tests/` to ensure no regressions
 								### Promote an OPT_OUT surface to CLAWABLE
-												docs: OPT_OUT_DEMAND_LOG.md — evidentiary base for governance decisions

Cycle #21 ships governance infrastructure, not implementation. Maintainership
mode means sometimes the right deliverable is a decision framework, not code.

Problem context:
OPT_OUT_AUDIT.md (cycle #18 bonus) established 'demand-backed audit' as the
next step. But without a structured way to record demand signals, 'demand-backed'
was just a slogan — the next audit cycle would have no evidence to work from.

This commit creates the evidentiary base:

New file: OPT_OUT_DEMAND_LOG.md
- Per-surface entries for all 12 OPT_OUT commands (Groups A/B/C)
- Current state: 0 signals across all surfaces (consistent with audit prediction)
- Signal entry template with required fields:
  - Source (who/what)
  - Use case (concrete orchestration problem)
  - Markdown-alternative-checked (why existing output insufficient)
  - Date
- Promotion thresholds:
  - 2+ independent signals for same surface → file promotion pinpoint
  - 1 signal + existing stable schema → file pinpoint for discussion
  - 0 signals → stays OPT_OUT (rationale preserved)

Decision framework for cycle #22 (audit close):
- If 0 signals total: move to PERMANENTLY_OPT_OUT, close audit
- If 1-2 signals: file individual promotion pinpoints with evidence
- If 3+ signals: reopen audit, question classification itself

Updated files:
- OPT_OUT_AUDIT.md: Added demand log reference in Related section
- CLAUDE.md: Added prerequisites for promotions (must have logged signals),
  added 'File a demand signal' workflow section

Philosophy:
'Prevent speculative expansion' — schema bloat protection discipline.
Every new CLAWABLE surface is a maintenance tax. Evidence requirement keeps
the protocol lean. OPT_OUT surfaces are intentionally not-clawable until
proven otherwise by external demand.

Operational impact:
Next cycles can now:
1. Watch for real claws hitting OPT_OUT surface limits
2. Log signals in structured format (no ad-hoc filing)
3. Run audit at cycle #22 with actual data, not speculation

No code changes. No test changes. Pure governance infrastructure.

Related: #18 cycle (OPT_OUT_AUDIT.md), maintainership phase transition.

											
										
										
											2026-04-22 20:34:35 +09:00
+								**Prerequisite:** Real demand signal logged in `OPT_OUT_DEMAND_LOG.md` (threshold: 2+ independent signals per surface). Speculative promotions are not allowed.
 								Once demand is evidenced:
-												docs: CLAUDE.md reframe — market Python harness as machine-first protocol validation layer

Rewrote CLAUDE.md to accurately describe the Python reference implementation:
- Shifted framing from outdated Rust-focused guidance to protocol-validation focus
- Clarified that src/tests/ is a dogfood surface proving SCHEMAS.md contract
- Added machine-first marketing: deterministic, self-describing, clawable
- Documented all 14 clawable commands (post-#164 Stage B promotion)
- Added OPT_OUT surfaces audit queue (12 commands, future work)
- Included protocol layers: Coverage → Enforcement → Documentation → Alignment
- Added quick-start workflow for Python harness
- Documented common workflows (add command, modify fields, promote OPT_OUT→CLAWABLE)
- Emphasized protocol governance: SCHEMAS.md as source of truth
- Exit codes documented as signals (0=success, 1=error, 2=timeout)

Result: Developers can now understand the Python harness purpose without reading
ROADMAP.md or inferring from test names. Protocol-first mental model is explicit.

Related: #173 (protocol closure), #164 Stage B (field evolution), #174 (this cycle).

											
										
										
											2026-04-22 19:53:12 +09:00
+. Add --output-format flag to argparse
 . Emit wrap_json_envelope() output in JSON path
 . Move command from OPT_OUT_SURFACES to CLAWABLE_SURFACES
 . Document in SCHEMAS.md
 . Write test for JSON output
 . Run parity audit to confirm no regressions
-												docs: OPT_OUT_DEMAND_LOG.md — evidentiary base for governance decisions

Cycle #21 ships governance infrastructure, not implementation. Maintainership
mode means sometimes the right deliverable is a decision framework, not code.

Problem context:
OPT_OUT_AUDIT.md (cycle #18 bonus) established 'demand-backed audit' as the
next step. But without a structured way to record demand signals, 'demand-backed'
was just a slogan — the next audit cycle would have no evidence to work from.

This commit creates the evidentiary base:

New file: OPT_OUT_DEMAND_LOG.md
- Per-surface entries for all 12 OPT_OUT commands (Groups A/B/C)
- Current state: 0 signals across all surfaces (consistent with audit prediction)
- Signal entry template with required fields:
  - Source (who/what)
  - Use case (concrete orchestration problem)
  - Markdown-alternative-checked (why existing output insufficient)
  - Date
- Promotion thresholds:
  - 2+ independent signals for same surface → file promotion pinpoint
  - 1 signal + existing stable schema → file pinpoint for discussion
  - 0 signals → stays OPT_OUT (rationale preserved)

Decision framework for cycle #22 (audit close):
- If 0 signals total: move to PERMANENTLY_OPT_OUT, close audit
- If 1-2 signals: file individual promotion pinpoints with evidence
- If 3+ signals: reopen audit, question classification itself

Updated files:
- OPT_OUT_AUDIT.md: Added demand log reference in Related section
- CLAUDE.md: Added prerequisites for promotions (must have logged signals),
  added 'File a demand signal' workflow section

Philosophy:
'Prevent speculative expansion' — schema bloat protection discipline.
Every new CLAWABLE surface is a maintenance tax. Evidence requirement keeps
the protocol lean. OPT_OUT surfaces are intentionally not-clawable until
proven otherwise by external demand.

Operational impact:
Next cycles can now:
1. Watch for real claws hitting OPT_OUT surface limits
2. Log signals in structured format (no ad-hoc filing)
3. Run audit at cycle #22 with actual data, not speculation

No code changes. No test changes. Pure governance infrastructure.

Related: #18 cycle (OPT_OUT_AUDIT.md), maintainership phase transition.

											
										
										
											2026-04-22 20:34:35 +09:00
+. Update `OPT_OUT_DEMAND_LOG.md` to mark signal as resolved
 								### File a demand signal (when a claw actually needs JSON from an OPT_OUT surface)
 . Open `OPT_OUT_DEMAND_LOG.md`
 . Find the surface's entry under Group A/B/C
 . Append a dated entry with Source, Use Case, and Markdown-alternative-checked explanation
 . If this is the 2nd signal for the same surface, file a promotion pinpoint in ROADMAP.md
-												docs: CLAUDE.md reframe — market Python harness as machine-first protocol validation layer

Rewrote CLAUDE.md to accurately describe the Python reference implementation:
- Shifted framing from outdated Rust-focused guidance to protocol-validation focus
- Clarified that src/tests/ is a dogfood surface proving SCHEMAS.md contract
- Added machine-first marketing: deterministic, self-describing, clawable
- Documented all 14 clawable commands (post-#164 Stage B promotion)
- Added OPT_OUT surfaces audit queue (12 commands, future work)
- Included protocol layers: Coverage → Enforcement → Documentation → Alignment
- Added quick-start workflow for Python harness
- Documented common workflows (add command, modify fields, promote OPT_OUT→CLAWABLE)
- Emphasized protocol governance: SCHEMAS.md as source of truth
- Exit codes documented as signals (0=success, 1=error, 2=timeout)

Result: Developers can now understand the Python harness purpose without reading
ROADMAP.md or inferring from test names. Protocol-first mental model is explicit.

Related: #173 (protocol closure), #164 Stage B (field evolution), #174 (this cycle).

											
										
										
											2026-04-22 19:53:12 +09:00
 								## Dogfood principles
 								The Python harness is continuously dogfood-tested:
 								- Every cycle ships to `main` with detailed commit messages
 								- New tests are written before/alongside implementation
 								- Test suite must pass before pushing (zero-regression principle)
 								- Commits grouped by pinpoint (#159, #160, ..., #174)
 								- Failure modes classified per exit code: 0=success, 1=error, 2=timeout
 								## Protocol governance
 								- **SCHEMAS.md is the source of truth** — any implementation must match field-for-field
 								- **Tests enforce the contract** — drift is caught by test suite
 								- **Field additions are forward-compatible** — new fields get defaults, old clients ignore them
 								- **Exit codes are signals** — claws use them for conditional logic (0→continue, 1→escalate, 2→timeout)
 								- **Timestamps are audit trails** — every envelope includes ISO 8601 UTC time for chronological ordering
 								## Related docs
-												docs: ERROR_HANDLING.md — unified error handler pattern for orchestration code

Cycle #22 ships documentation that operationalizes cycles #178–#179.

Problem context:
After #178 (parse-error envelope) and #179 (stderr hygiene + real error message),
claws can now build a unified error handler for all 14 clawable commands.
But there was no guide on how to actually do that. Operators had the pieces;
they didn't have the pattern.

This file changes that.

New file: ERROR_HANDLING.md
- Quick reference: exit codes + envelope shapes (0=success, 1=error, 2=timeout)
- One-handler pattern: ~80 lines of Python showing how to parse error.kind,
  check retryable, and decide recovery strategy
- Four practical recovery patterns:
  - Retry on transient errors (filesystem, timeout)
  - Reuse session after timeout (if cancel_observed=true)
  - Validate command syntax before dispatch (dry-run --help)
  - Log errors for observability
- Error kinds enumeration (parse, session_not_found, filesystem, runtime, timeout)
- Common mistakes to avoid (6 patterns with BAD vs GOOD examples)
- Testing your error handler (unit test examples)

Operational impact:
Orchestration code now has a canonical pattern. Claws can:
- Copy-paste the run_claw_command() function (works for all commands)
- Classify errors uniformly (no special cases per command)
- Decide recovery deterministically (error.kind + retryable + cancel_observed)
- Log/monitor/escalate with confidence

Related cycles:
- #178: Parse-error envelope (commands now emit structured JSON on invalid argv)
- #179: Stderr hygiene + real message (JSON mode silences argparse, carries actual error)
- #164 Stage B: cancel_observed field (callers know if session is safe for reuse)

Updated CLAUDE.md:
- Added ERROR_HANDLING.md to 'Related docs' section
- Now documents the one-handler pattern as a guideline

No code changes. No test changes. Pure documentation.

This completes the documentation trail from protocol (SCHEMAS.md) →
governance (OPT_OUT_AUDIT.md, OPT_OUT_DEMAND_LOG.md) → practice (ERROR_HANDLING.md).

											
										
										
											2026-04-22 20:42:43 +09:00
+								- **`ERROR_HANDLING.md`** — Unified error-handling pattern for claws (one handler for all 14 clawable commands)
-												docs: CLAUDE.md reframe — market Python harness as machine-first protocol validation layer

Rewrote CLAUDE.md to accurately describe the Python reference implementation:
- Shifted framing from outdated Rust-focused guidance to protocol-validation focus
- Clarified that src/tests/ is a dogfood surface proving SCHEMAS.md contract
- Added machine-first marketing: deterministic, self-describing, clawable
- Documented all 14 clawable commands (post-#164 Stage B promotion)
- Added OPT_OUT surfaces audit queue (12 commands, future work)
- Included protocol layers: Coverage → Enforcement → Documentation → Alignment
- Added quick-start workflow for Python harness
- Documented common workflows (add command, modify fields, promote OPT_OUT→CLAWABLE)
- Emphasized protocol governance: SCHEMAS.md as source of truth
- Exit codes documented as signals (0=success, 1=error, 2=timeout)

Result: Developers can now understand the Python harness purpose without reading
ROADMAP.md or inferring from test names. Protocol-first mental model is explicit.

Related: #173 (protocol closure), #164 Stage B (field evolution), #174 (this cycle).

											
										
										
											2026-04-22 19:53:12 +09:00
+								- **`SCHEMAS.md`** — JSON protocol specification (read before implementing)
-												docs: ERROR_HANDLING.md — unified error handler pattern for orchestration code

Cycle #22 ships documentation that operationalizes cycles #178–#179.

Problem context:
After #178 (parse-error envelope) and #179 (stderr hygiene + real error message),
claws can now build a unified error handler for all 14 clawable commands.
But there was no guide on how to actually do that. Operators had the pieces;
they didn't have the pattern.

This file changes that.

New file: ERROR_HANDLING.md
- Quick reference: exit codes + envelope shapes (0=success, 1=error, 2=timeout)
- One-handler pattern: ~80 lines of Python showing how to parse error.kind,
  check retryable, and decide recovery strategy
- Four practical recovery patterns:
  - Retry on transient errors (filesystem, timeout)
  - Reuse session after timeout (if cancel_observed=true)
  - Validate command syntax before dispatch (dry-run --help)
  - Log errors for observability
- Error kinds enumeration (parse, session_not_found, filesystem, runtime, timeout)
- Common mistakes to avoid (6 patterns with BAD vs GOOD examples)
- Testing your error handler (unit test examples)

Operational impact:
Orchestration code now has a canonical pattern. Claws can:
- Copy-paste the run_claw_command() function (works for all commands)
- Classify errors uniformly (no special cases per command)
- Decide recovery deterministically (error.kind + retryable + cancel_observed)
- Log/monitor/escalate with confidence

Related cycles:
- #178: Parse-error envelope (commands now emit structured JSON on invalid argv)
- #179: Stderr hygiene + real message (JSON mode silences argparse, carries actual error)
- #164 Stage B: cancel_observed field (callers know if session is safe for reuse)

Updated CLAUDE.md:
- Added ERROR_HANDLING.md to 'Related docs' section
- Now documents the one-handler pattern as a guideline

No code changes. No test changes. Pure documentation.

This completes the documentation trail from protocol (SCHEMAS.md) →
governance (OPT_OUT_AUDIT.md, OPT_OUT_DEMAND_LOG.md) → practice (ERROR_HANDLING.md).

											
										
										
											2026-04-22 20:42:43 +09:00
+								- **`OPT_OUT_AUDIT.md`** — Governance for the 12 non-clawable surfaces
 								- **`OPT_OUT_DEMAND_LOG.md`** — Active survey recording real demand signals (evidence base for decisions)
-												docs: CLAUDE.md reframe — market Python harness as machine-first protocol validation layer

Rewrote CLAUDE.md to accurately describe the Python reference implementation:
- Shifted framing from outdated Rust-focused guidance to protocol-validation focus
- Clarified that src/tests/ is a dogfood surface proving SCHEMAS.md contract
- Added machine-first marketing: deterministic, self-describing, clawable
- Documented all 14 clawable commands (post-#164 Stage B promotion)
- Added OPT_OUT surfaces audit queue (12 commands, future work)
- Included protocol layers: Coverage → Enforcement → Documentation → Alignment
- Added quick-start workflow for Python harness
- Documented common workflows (add command, modify fields, promote OPT_OUT→CLAWABLE)
- Emphasized protocol governance: SCHEMAS.md as source of truth
- Exit codes documented as signals (0=success, 1=error, 2=timeout)

Result: Developers can now understand the Python harness purpose without reading
ROADMAP.md or inferring from test names. Protocol-first mental model is explicit.

Related: #173 (protocol closure), #164 Stage B (field evolution), #174 (this cycle).

											
										
										
											2026-04-22 19:53:12 +09:00
+								- **`ROADMAP.md`** — macro roadmap and macro pain points
 								- **`PHILOSOPHY.md`** — system design intent
 								- **`PARITY.md`** — status of Python ↔ Rust protocol equivalence