Skip to content

Flush the stdio subprocess's coverage data before the clean-exit line#2840

Open
maxisbey wants to merge 1 commit into
mainfrom
maxisbey/fix-stdio-subprocess-coverage-race
Open

Flush the stdio subprocess's coverage data before the clean-exit line#2840
maxisbey wants to merge 1 commit into
mainfrom
maxisbey/fix-stdio-subprocess-coverage-race

Conversation

@maxisbey

Copy link
Copy Markdown
Contributor

Makes the stdio interaction test's subprocess flush its coverage data before printing the clean-exit line, so the transport's terminate escalation can no longer silently destroy the data file.

Motivation and Context

checks / test (3.10, locked, windows-latest) failed on #2838 with every test passing but the 100% gate tripping on tests\interaction\transports\_stdio_server.py at 52.17% — exactly the subprocess-only lines (failing job).

Root cause: the child only persisted its coverage data in coverage's atexit hook, which runs after the stdio-echo: clean exit print the test synchronizes on. The failing run's test took 12.40s ≈ 2.4s session + exactly the 10.0s PROCESS_TERMINATION_TIMEOUT the test monkeypatches — the child printed the line (test green), then didn't finish interpreter teardown + the sqlite save within the grace, and the Windows escalation (TerminateJobObject) killed it mid-save. A torn save leaves a valid-but-empty data file that coverage combine merges without a warning, which is why the failing run still combined the same 6 data files as passing runs.

This is the same race #2773 narrowed by raising the grace from 2s to 10s; the wider grace made it rarer, not impossible. Extracting this test's wall time from ~400 recent Windows CI job logs shows a smooth heavy tail — p50 3.3s, p99 ~7s, then 8.8s and 9.0s (passed, barely under the grace) and 12.4s (killed) — so any finite grace loses occasionally as long as the data write happens after the observable sync point.

The fix inverts the ordering instead: main() now stops and saves coverage before printing the clean-exit line. That makes the test's existing stderr assertion the durability barrier — once the parent has seen the line, the data is on disk and a late kill is harmless. A kill landing before the print now fails the stderr assertion loudly instead of tripping the coverage gate one step later.

Details that fall out of the ordering:

  • Nothing measured can execute after the flush (it would be unrecordable by construction), so the trailing cov.save() and print(...) lines carry # pragma: lax no cover.
  • cov.stop() is load-bearing twice: it ends tracing, and it leaves nothing for coverage's atexit re-save to flush — so a kill during interpreter teardown can't corrupt the already-written file (coverage opens it with sqlite journaling off; a torn rewrite would not roll back).
  • The if cov is not None branch is # pragma: no branch: under coverage the instance always exists, and without coverage nothing is measured anyway.
  • The test's monkeypatch comment is updated to describe the new invariant; the 10s grace stays (it now only bounds how long a pre-print straggler gets before a loud failure).

How Has This Been Tested?

  • Reproduced the failure mode deterministically: with a 3s post-print stall injected into the child and a 1s grace, the pre-fix code passes the test while the child's data file goes missing (the CI signature on demand); the fixed code under the identical kill retains the complete data file.
  • coverage run -m pytest tests/interaction/transports/test_stdio.py → both tests pass, _stdio_server.py reports 100.00% (statements + branches) from the combined parent+child data.
  • Full ./scripts/test: 100.00% total, strict-no-cover clean.
  • ~2,500 instrumented spawn/close cycles of this exact flow on windows-latest runners (and ~2,200 on Linux) while investigating: the child's post-print phase is interpreter teardown plus the coverage sqlite save, and the save is the phase that runner load stretches — which is what made the atexit-ordering the wrong side of the sync point.

Breaking Changes

None — test-support change only.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)

Checklist

  • I have read the MCP Documentation
  • My code follows the repository's style guidelines
  • New and existing tests pass locally
  • I have added appropriate error handling
  • I have added or updated documentation as needed

Additional context

The import coverage at module top makes this test-support file depend on the dev environment even for the docstring's standalone python -m invocation; that's already true of the repo's dev venv everywhere this module is used, so it keeps imports-at-top rather than adding a guarded import plus more pragma machinery.

AI Disclaimer

The stdio interaction test's child only persisted its coverage data in
coverage's atexit hook. On a slow Windows runner the interpreter
teardown plus sqlite save can overrun the transport's termination
grace, and the Job Object kill then silently destroys (or tears) the
data file: every test passes but the 100% gate trips on
_stdio_server.py's subprocess-only lines.

Save the data explicitly before printing the clean-exit line instead.
The test already synchronizes on that line, so once it has been
observed the data is durably on disk and a late kill is harmless. A
kill before the print now fails the stderr assertion loudly instead of
tripping the coverage gate. The trailing lines are excluded (lax no
cover): nothing measured can execute after the flush by construction.
@maxisbey maxisbey marked this pull request as ready for review June 11, 2026 22:07
@maxisbey maxisbey enabled auto-merge (squash) June 11, 2026 22:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant