Skip to content

harden(go-migration): require real cutover evidence#116

Merged
mrjf merged 1 commit into
mainfrom
codex/rock-solid-go-parity-gate
Jun 9, 2026
Merged

harden(go-migration): require real cutover evidence#116
mrjf merged 1 commit into
mainfrom
codex/rock-solid-go-parity-gate

Conversation

@mrjf

@mrjf mrjf commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

harden(go-migration): require real cutover evidence

TL;DR

This PR changes the Go migration completion gate so it can no longer declare success from representative help output, obsolete Python tests, or placeholder mappings. It adds an explicit option parity gate, requires legacy Python tests to map to existing Go-only behavior tests, and wires strict coverage enforcement into the migration workflow. The important result is that the current migration now fails strict completion with concrete evidence instead of reporting “done.”

Important

This PR intentionally proves the migration is not deletion-grade ready yet; the report-mode workflow can still collect evidence without blocking non-crane PRs.

Problem (WHY)

  • The scorer had no first-class gate for CLI option parity, so commands could look present while Python options were still missing.
  • Python test coverage could be marked obsolete or mapped to help/surface tests and still look complete.
  • The Go-only cutover replay did not prove that mapped Go tests existed or that they performed real Go-only behavior.
  • [!] The cutover doc still claimed deletion-grade readiness even though strict checks expose missing behavior.

Why these matter: the migration gate is supposed to transform generated progress into verifiable action, not trust labels or naming conventions. That matches the PROSE principle that “Grounding outputs in deterministic tool execution transforms probabilistic generation into verifiable action.” It also follows the Agent Skills guidance that agents “pattern-match well against concrete structures” and that validation should “do the work, run a validator ... fix any issues, and repeat until validation passes.”

Approach (WHAT)

# Fix Principle
1 Add option_parity as an explicit scorer ratio gate and require it for deletion-grade readiness. Deterministic tool execution
2 Make Python option inventory tests emit counted pass/total data and hard-fail under APM_ENFORCE_COMPLETION_GATES=1. Concrete structures
3 Reject obsolete Python tests by default; allow them only in report mode with --allow-obsolete-python-tests. Validator loop
4 Require Go cutover coverage mappings to point at existing TestGoCutoverReal... tests. Real behavior evidence
5 Add real state/behavior fixtures that currently catch missing config, MCP, marketplace, and runtime behavior. Regression traps
6 Update Actions so crane PRs and manual strict runs fail on incomplete coverage, while ordinary PRs still collect evidence. Low-noise CI

Implementation (HOW)

  • .crane/scripts/score.go -- Adds OptionParity to the score schema, parses the new option_parity gate event, and requires it for cutover_ready.
  • .github/workflows/migration-ci.yml -- Exports APM_ENFORCE_PYTHON_BEHAVIOR_CONTRACTS=1 for strict completion runs and only passes report-mode escape hatches outside strict mode.
  • cmd/apm/python_behavior_contracts_test.go -- Counts every Python CLI option from the extracted command inventory, emits option_parity, and fails strict mode with the exact missing options.
  • cmd/apm/go_cutover_coverage_test.go -- Discovers actual Go test functions, rejects stale mapping names, and only counts existing TestGoCutoverReal... mappings as final cutover evidence.
  • cmd/apm/real_behavior_test.go -- Adds real temporary-project fixtures for persisted config, MCP manifests, marketplace mutation/validation, and runtime removal.
  • scripts/ci/python_behavior_contracts.py and tests/parity/test_python_behavior_contracts.py -- Treat python_tests.obsolete as report-only debt and hard-fail strict coverage.
  • Docs and manifests -- Update CUTOVER.md, parity README text, and coverage manifest descriptions so the documented gate matches the enforced gate.
  • Unit tests -- Update scorer and workflow tests to assert the new strict gate wiring.

Diagrams

Legend: The diagram shows how this PR turns collected parity evidence into strict completion gates before the scorer can mark the migration ready.

flowchart LR
    subgraph Evidence[Evidence]
        GoEvents["go test events"]
        PyInventory["Python behavior inventory"]
        Coverage["coverage manifests"]
    end
    subgraph Gates[Strict gates]
        Option["option_parity"]
        Behavior["python_behavior_contracts"]
        Real["functional and state_diff"]
    end
    subgraph Score[Completion score]
        Scorer["score.go"]
        Ready["deletion_grade_ready"]
    end
    PyInventory --> Option
    Coverage --> Behavior
    GoEvents --> Real
    Option --> Scorer
    Behavior --> Scorer
    Real --> Scorer
    Scorer --> Ready
    classDef new stroke-dasharray: 5 5;
    class Option,Behavior,Real new;
Loading

Trade-offs

  • Strict failure instead of optimistic completion. Chose to make the current migration fail strict gates; rejected preserving the old green score because it hid missing work.
  • Report-mode escape hatches remain. Chose --allow-obsolete-python-tests only for collection/reporting; rejected using it in strict mode.
  • Go-only behavior prefix is intentionally narrow. Chose TestGoCutoverReal... as the deletion-grade evidence prefix; rejected counting Python-vs-Go completion or help tests as final proof.
  • This PR does not implement the missing Go behavior. It makes the missing work visible and blocked; the next PRs should fix the concrete command gaps.

Benefits

  1. migration_score = 1.0 now requires option parity in addition to help/surface parity.
  2. Obsolete Python tests no longer count as completion evidence in strict mode.
  3. Stale or placeholder Go test names no longer satisfy the all-Go cutover replay.
  4. Strict mode now exposes current gaps with counted evidence: 134/273 option parity, 17204/23771 behavior-backed mappings, and 20/26 real behavior fixtures.
  5. The cutover document no longer says the Go port is deletion-grade ready while the strict gate disagrees.

Validation

uv run pytest tests/unit/test_crane_score.py tests/unit/test_migration_ci_workflow.py -q:

27 passed in 107.78s (0:01:47)

uv run pytest tests/parity/test_python_behavior_contracts.py -q --tb=short:

2 passed, 136 skipped, 1 xfailed in 23.88s

APM_PYTHON_BIN="$PWD/.venv/bin/apm" go test ./cmd/apm -run '^TestParityPythonOptionsFromSource$' -count=1:

ok  	github.com/githubnext/apm/cmd/apm	6.410s
Expected strict failures proving the gate now blocks false completion

APM_ENFORCE_COMPLETION_GATES=1 APM_PYTHON_BIN="$PWD/.venv/bin/apm" go test ./cmd/apm -run '^TestParityPythonOptionsFromSource$' -count=1:

{"crane":"gate","name":"option_parity","passing":134,"total":273}
HARD-GATE FAILED: Go help is missing 139/273 Python CLI options.

go test ./cmd/apm -run '^TestGoCutover' -count=1:

{"crane":"gate","name":"python_behavior_contracts","passing":17204,"total":23771}
Go cutover coverage is not behavior-backed: 6567/23771 Python tests do not map to a real Go-only cutover behavior test.
{"crane":"gate","name":"functional","passing":20,"total":26}
{"crane":"gate","name":"state_diff","passing":20,"total":26}

APM_ENFORCE_PYTHON_BEHAVIOR_CONTRACTS=1 uv run pytest tests/parity/test_python_behavior_contracts.py::test_python_contract_coverage_manifest_is_complete -q --tb=short:

obsolete-python-test-coverage: 24177
1 failed in 16.66s

Additional checks:

ruff check ...                         All checks passed!
ruff format --check ...                3 files already formatted
git diff --check                       passed

Scenario Evidence

# Scenario (user promise) Principle(s) Test(s) proving it Type
1 Crane cannot mark the Go migration complete while Python CLI options are missing. DevX, Governed by policy cmd/apm/python_behavior_contracts_test.go::TestParityPythonOptionsFromSource integration
2 Legacy Python tests must be replaced by behavior-backed Go evidence, not obsolete labels or help-only mappings. Governed by policy, OSS / community-driven cmd/apm/go_cutover_coverage_test.go::TestGoCutoverPythonTestConversionCoverage
tests/parity/test_python_behavior_contracts.py::test_python_contract_coverage_manifest_is_complete
integration
3 Real commands must mutate or read real project state before deletion-grade cutover can pass. Portability by manifest, DevX cmd/apm/real_behavior_test.go::TestGoCutoverRealFunctionalAndStateDiffContracts integration
4 Strict migration CI fails on incomplete evidence, but report-mode CI still produces summaries. Governed by policy, DevX tests/unit/test_migration_ci_workflow.py::test_migration_ci_enforces_completion_for_crane_prs_and_explicit_manual_runs unit

How to test

  • Run uv run pytest tests/unit/test_crane_score.py tests/unit/test_migration_ci_workflow.py -q and expect all tests to pass.
  • Run APM_PYTHON_BIN="$PWD/.venv/bin/apm" go test ./cmd/apm -run '^TestParityPythonOptionsFromSource$' -count=1 and expect report mode to pass.
  • Run APM_ENFORCE_COMPLETION_GATES=1 APM_PYTHON_BIN="$PWD/.venv/bin/apm" go test ./cmd/apm -run '^TestParityPythonOptionsFromSource$' -count=1 and expect the option_parity hard gate to fail with missing options.
  • Run go test ./cmd/apm -run '^TestGoCutover' -count=1 and expect the cutover gate to fail with behavior-backed coverage and real-command fixture gaps.
  • In Actions, run “Migration Parity and Benchmarks” with enforce_completion=true and expect strict coverage failures until the Go implementation actually closes the gaps.

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Migration Benchmark Results

Migration CLI Benchmark

Includes fixture-backed commands that must read, write, execute, or fail against real project state. The installed-project fixture contains apm.yml, apm.lock.yaml, apm_modules packages, local .apm primitives, target directories, deployed prompt files, and sample source files.
The harness checks return-code parity for each command. Detailed stdout/stderr byte counts are kept in the JSON samples, but this is not an output-parity test.

Max allowed Go/Python median ratio: 5.00

Benchmark Command Fixture Python median Go median Go/Python Result Return codes
init scaffold init --yes empty-project 0.4992s 0.0013s 0.00x 377.79x faster {'python': [0], 'go': [0]}
targets json targets --json installed-project 0.4711s 0.0016s 0.00x 293.68x faster {'python': [0], 'go': [0]}
script list list installed-project 0.4972s 0.0017s 0.00x 301.19x faster {'python': [0], 'go': [0]}
deps list deps list installed-project 0.4875s 0.0015s 0.00x 329.04x faster {'python': [0], 'go': [0]}
deps tree deps tree installed-project 0.4789s 0.0015s 0.00x 327.19x faster {'python': [0], 'go': [0]}
install local package install --no-policy ./packages/local-tools local-install-project 0.5259s 0.0018s 0.00x 290.91x faster {'python': [0], 'go': [0]}
compile copilot target compile --target copilot compilation-project 0.5099s 0.0015s 0.00x 332.20x faster {'python': [0], 'go': [0]}
pack output pack --output dist installed-project 0.4959s 0.0017s 0.00x 287.55x faster {'python': [0], 'go': [0]}
run script run stamp runnable-project 0.4784s 0.0025s 0.01x 192.66x faster {'python': [0], 'go': [0]}
audit hidden unicode audit --ci audit-finding-project 0.4970s 0.0017s 0.00x 288.90x faster {'python': [1], 'go': [1]}

Workloads

  • init scaffold: Creates a new apm.yml in an otherwise empty project directory.
  • targets json: Reads configured project targets from apm.yml and emits machine output.
  • script list: Reads apm.yml scripts and renders the runnable script inventory.
  • deps list: Scans apm_modules package directories and apm.lock.yaml metadata.
  • deps tree: Builds a dependency tree from apm.lock.yaml and installed package metadata.
  • install local package: Installs a local package and materializes lock/module state.
  • compile copilot target: Discovers local primitives and writes the Copilot target artifact.
  • pack output: Resolves local package contents and writes a distributable artifact.
  • run script: Executes a project script and writes the script's side-effect file.
  • audit hidden unicode: Scans a real installed file and fails on planted hidden Unicode.

@mrjf mrjf merged commit e96b795 into main Jun 9, 2026
13 checks passed
@mrjf mrjf deleted the codex/rock-solid-go-parity-gate branch June 9, 2026 21:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant