fix: prevent discovery cache corruption under concurrent writers#316
Open
akroshg wants to merge 2 commits into
Open
fix: prevent discovery cache corruption under concurrent writers#316akroshg wants to merge 2 commits into
akroshg wants to merge 2 commits into
Conversation
When several processes built against the same ~/.webui/cache/components/
directory concurrently, every writer staged its bytes under the same
{key}.tmp path. The last finisher's rename clobbered the file in place
while another writer was still mid-write, leaving zero-length or
truncated entries that later reads consumed as cache hits.
Stage each write under a unique {key}.{pid}.{counter}.tmp name so
concurrent writers no longer collide, and clean up the temp file when
the final rename fails. Adds test_concurrent_put_does_not_corrupt_cache
(8 threads / 64 KB payload) to guard against regression.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When the platform-specific native addon failed to load (corrupted binary, missing symbol, version skew), packages/webui silently fell back to the CLI path with no indication of *why*, making the failure mode hard to triage and producing a noisy warning on every build call. Capture the original loader error and include it in the WASM fallback warning, the CLI fallback warning, and the "cannot build" error. Gate the CLI fallback warning behind cliFallbackWarned so it fires once per process instead of per build invocation. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
mohamedmansour
approved these changes
May 29, 2026
Comment on lines
+204
to
+209
| // many times per second; spamming a multi-line warning on every call | ||
| // is noisy and (for some terminals) actually expensive. | ||
| cliFallbackWarned = true; | ||
| console.warn( | ||
| `[webui] Native addon failed to load; falling back to CLI binary at ${binPath}.` + | ||
| describeAddonError(), |
Contributor
There was a problem hiding this comment.
Why are we warning and not an error? If native addon failed to load, then the whole thing is broken, right? So spamming might be better or we loose that error if other logs come. A better solution is have better error handling so that we crash on startup with detailed error message
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two fixes that surfaced while debugging intermittent build failures observed by an Edge consumer running many parallel WebUI builds against a shared environment.
1.
fix(discovery): prevent cache corruption under concurrent writersDiscoveryCache::putstaged every write under{key}.tmpin the shared~/.webui/cache/components/directory. When multiple processes resolved the same package concurrently, each writer reused the same temp path, so a late writer'srenamecould clobber the cache entry while another writer was still mid-write — producing zero-length or truncated cache entries that subsequent reads then consumed as cache hits.Fix: stage writes under
{key}.{pid}.{counter}.tmpso concurrent writers no longer collide, and clean up the temp file when the finalrenamefails.Test: new
test_concurrent_put_does_not_corrupt_cache— 8 threads × 64 KB payload, asserts that the post-storm cache is intact and that no orphan.tmpfiles are left for the key under test. The straggler check is scoped to our key's prefix so sibling tests writing into the same shared cache cannot make it flake.2.
chore(node): surface native addon load failures in build fallbackWhen the platform-specific native addon failed to load (corrupted binary, missing symbol, version skew),
packages/webuisilently fell back to the CLI path with no indication of why, and the warning fired on every build invocation.Fix: capture the loader error, include it in the WASM fallback warning, the CLI fallback warning, and the "cannot build" error. Gate the CLI fallback warning behind
cliFallbackWarnedso it fires once per process.Validation
cargo fmt --all -- --check— cleancargo clippy --workspace -- -D warnings— cleancargo test -p microsoft-webui-discovery --lib— 32 passed, including the new race-regression testTest coverage notes
cache.rs: covered end-to-end by the new concurrency stress test plus the existing round-trip / invalidation / cache-miss tests.index.ts: not covered by a new automated test. The existingintegration.test.tsonly exercises the happy path. Fault-injecting an addon load failure would require mocking module-private functions or running with a deliberately broken native addon, which is disproportionate effort for ~50 lines of bookkeeping and string formatting. The change is small, obviously correct on inspection, and ships behind a fallback that is itself already exercised in production.