[0.84] Reduce e2e test flakiness#16045
Merged
vmoroz merged 7 commits intomicrosoft:0.84-stablefrom Apr 29, 2026
Merged
Conversation
Performance Test ResultsBranch: ❌ Regressions DetectedScrollView
✅ Passed150 scenario(s) across 27 suite(s) — no regressionsSectionList
FlatList
TouchableOpacity
TouchableHighlight
Pressable
Modal
Image
ActivityIndicator
Switch
Button
TextInput
View
Text
SectionList.native-perf-test.ts
FlatList.native-perf-test.ts
TouchableHighlight.native-perf-test.ts
TouchableOpacity.native-perf-test.ts
Pressable.native-perf-test.ts
ScrollView.native-perf-test.ts
ActivityIndicator.native-perf-test.ts
TextInput.native-perf-test.ts
Switch.native-perf-test.ts
Button.native-perf-test.ts
Modal.native-perf-test.ts
Image.native-perf-test.ts
View.native-perf-test.ts
Text.native-perf-test.ts
|
775e88c to
cc789b7
Compare
The HomeUIA snapshots do not provide much useful test coverage, and cause a lot of churn on the test snapshots. Removing them to make snapshot changes less ignorable. Resolves the ±1px height drift on Appearance / AppState tabs by removing the test that flaked on it. (cherry picked from commit 08ad79f) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced Apr 28, 2026
acoates-ms
approved these changes
Apr 29, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Type of Change
Why
PR validation on
0.84-stableflakes intermittently on the E2E Fabric jobs(
PR (Tests E2E Test App Fabric X64Hermes)/X86Hermes), with no usabledebugging signal — when the step fails the artifacts contain no crash or hang
dump. Re-runs eventually go green without a code change, which masks any real
underlying regression.
This PR addresses the problem on three tracks:
Make E2E failures debuggable — register Windows Error Reporting
LocalDumps for the test app and node, capture full-memory dumps from
surviving processes after a failed step, install an in-process unhandled-
exception filter as the primary crash mechanism (hosted CI agents route WER
through a corporate-server policy that silently ignores per-exe LocalDumps),
and bundle matching PDBs and a debugging README into the
Crash dumps - <job>artifact.Fix the two known flake families that produced no dump (because there was
no crash to dump):
HomeUIADump.test.ts±1 px height drift on text-renderedSpriteVisuals — Composition snaps a near-half-integer DWrite textmeasurement to either side of the integer boundary on different commits.
Hardened the dump path to take multiple stable readings, and cherry-picked
PR Remove HomeUIA snapshots #15921 from
mainwhich removed the test entirely (low-coverage,high-churn).
searchBoxhelper timeout ("Unable to enter correct search text intotest searchbox") in 8 component-test files plus
RNTesterNavigation.ts. WinAppDriver'ssetValuefalls back tosynthesized keystrokes for custom RN
TextInputs, which appendrather than replace — so
waitUntilretries make the state worse, notbetter. Clearing the field before each
setValueplus a faster retrycadence resolves it.
Make transient install-step failures self-heal —
midgard-yarn-strictoccasionally exits with code 57005 (0xDEAD) rightafter
[2/2] Fetching packages…, requiring a manual re-run that masksthe underlying transient. Wire ADO's built-in
retryCountOnTaskFailure: 2on every install / init / lage step so asingle transient retries automatically before failing the build.
What
Crash-dump collection mechanism (Phases A, B, D)
.ado/scripts/SetupLocalDumps.cmd— made idempotent (reg add /f),parameterized on dump folder, registers the exe in
AeDebug\AutoExclusionListso WER wins over the JIT path.
.ado/templates/prepare-build-env.yml— new opt-inlocalDumpsExeNamesarray parameter (default
[]→ no-op for existing callers); iterates andregisters each name with
SetupLocalDumps.cmd. Also grantsSYSTEM:(OI)(CI)FandUsers:(OI)(CI)FACLs onCrashDumpRootPathso theWER service (LocalSystem) and packaged apps can write dumps there.
.ado/jobs/e2e-test.yml:[RNTesterApp-Fabric, node]toprepare-build-env.yml.Capture dumps of surviving test processesstep runsprocdump64 -maagainst any still-aliveRNTesterApp-Fabric/node,writing into
$(CrashDumpRootPath)\hang\(subfolder is required — fileswritten at the root were observed to disappear during the post-failure
Update snapshotsstep on hosted agents).Collect in-process and fallback crash dumpsstep copies anyin-process minidumps from
%ProgramData%\RNW-E2E-Dumps\into$(CrashDumpRootPath)\in-process\, and scans common WER fallbacklocations into
$(CrashDumpRootPath)\recovered\.Bundle symbols and README with crash dumpsstep copies all*.pdbfrom the test app's Release output into
$(CrashDumpRootPath)\symbols\(mirroring the deploy tree) and writes a
README.mdat the artifactroot with WinDbg instructions and
_NT_SYMBOL_PATHwiring. Gated onactual
.dmp/.mdmpfiles existing —$(CrashDumpRootPath)doublesas
MSBUILDDEBUGPATH, so build-time MSBuild failure logs land theretoo without needing symbols bundled.
false, for re-validatingthe crash and hang capture paths when an agent image change forces a
re-check:
simulateCrashForTestingandsimulateHangForTesting. Thecrash path uses a sentinel file at
%ProgramData%\rnw-e2e-simulate-crash.flag;the hang path uses an env var
RNW_SIMULATE_HANG=1that gates a newHangSimulationTest.test.ts.packages/e2e-test-app-fabric/windows/RNTesterApp-Fabric/RNTesterApp-Fabric.cpp:InstallInProcessCrashDumpWriter()— top-levelSetUnhandledExceptionFilterthat writes
MiniDumpWithFullMemory | WithHandleData | WithThreadInfo | WithUnloadedModules | WithProcessThreadDatato%ProgramData%\RNW-E2E-Dumps\RNTesterApp-Fabric-<timestamp>-<pid>.dmp,then returns
EXCEPTION_CONTINUE_SEARCHso the process still terminatesand any downstream handlers run.
MaybeSimulateCrashForTesting()— flag-file-gated null-pointer write forcrash-path validation.
HangForTestingautomation command — PostsSleep(INFINITE)onto the UIdispatcher, jamming the UI thread on the next work item (realistic
deadlock shape).
packages/e2e-test-app-fabric/test/HangSimulationTest.test.ts— opt-intest (auto-skips unless
RNW_SIMULATE_HANG=1) that drivesHangForTestingand lets the test step time out so the post-failure ProcDump capture path
has a hung packaged-app process to dump.
Snapshot dump stabilization
RNTesterApp-Fabric.cpp—DumpVisualTreenow takes up to 3 dumps with50 ms gaps and returns the first dump that matches its successor (i.e. two
consecutive dumps stringify identically). Targets composition's per-commit
rounding non-determinism on text-derived
Visual::Sizevalues (~24.5 → 24vs 25 across commits). No client / test / snapshot changes; ~100 ms added
per
dumpVisualTreecall.[0.84] Remove HomeUIA snapshots (#15921)frommain—Andrew Coates' PR rationale: "The HomeUIA snapshots do not provide much
useful test coverage, and cause a lot of churn on the test snapshots.
Removing them to make snapshot changes less ignorable." Belt-and-suspenders
with multi-dump.
searchBoxhelper flakeSame flake-prone pattern duplicated across 9 sites. Updated all of them with
a single fix:
await searchBox.clearValue();beforesetValue()inside the pollcallback. Without the clear, retries append to existing text and the
getText() === inputcomparison never converges.timeout: 5000 → 10000and reducedinterval: 1500 → 500for moreretries within a longer window.
Files:
TextInputComponentTest.test.ts,AccessibilityTest.test.ts,ButtonComponentTest.test.ts,FlatListComponentTest.test.ts(×2 helpers),PointerButtonComponentTest.test.ts,SwitchComponentTest.test.ts,TouchableComponentTest.test.ts,ViewComponentTest.test.ts,RNTesterNavigation.ts(inline poll ingoToExample).Install / init / lage step retry
midgard-yarn-strict@1.2.4is unmaintained (last published 2021) and itsbundled
yrafonly auto-retries onECONNRESET/ESOCKETTIMEDOUT/ETIMEDOUT/ENOTFOUND. Other transient failures — including a fetchhelper killed mid-flight (the observed mode, exit code 57005 / 0xDEAD) —
bypass that retry path entirely. PR build 630484 hit this on the
Strict yarn install @rnw-scripts/beachball-configstep right after[2/2] Fetching packages…; a manual re-run went green with no code change.ADO supports
retryCountOnTaskFailure: 2at the step level. Added to everystep that fetches from the npm registry:
.ado/build-template.yml—Strict yarn install @rnw-scripts/beachball-configBuild prepare-release and beachball-config..ado/prepare-release-bot.yml—yarn install+Build prepare-release and dependencies..ado/templates/strict-yarn-install.yml,.ado/templates/yarn-install.yml(bothManagedImageandHostedImagebranches).
.ado/templates/react-native-init-windows.yml—creaternwapp.cmdandcreaternwlib.cmdinit steps (each runs ~6 npm/yarn fetches internally).Cost when the install passes first try: zero. When it flakes: ADO retries
the step up to twice before failing the build, visible in the pipeline UI
as explicit retry attempts so genuine deterministic failures still surface
clearly within ~1 minute instead of being masked by a manual re-run cycle.
Screenshots
N/A — pipeline / native / test changes only.
Testing
The crash-dump mechanism was validated end-to-end via the two opt-in
simulation parameters before they were defaulted back to
false:simulateCrashForTesting=true→MaybeSimulateCrashForTestingreads the sentinel flag and dereferences anull pointer at startup →
InstallInProcessCrashDumpWriter's UEF writesfull-memory
.dmpfiles (~32 MB each) to%ProgramData%\RNW-E2E-Dumps\→ diagnostic step copies them into theartifact under
in-process/.simulateHangForTesting=true→HangForTestingpostsSleep(INFINITE)onto the UI dispatcher → jest teststep times out → post-failure ProcDump captures full-memory dumps of the
still-alive packaged app (~250 MB each) under
hang/. Confirmed dumpsride to the artifact intact for both X64Hermes and X86Hermes.
The snapshot fix is observed in build 630476 (latest re-run):
HomeUIADumppassed and all 828 snapshots passed across the suite. The
searchBoxfixis unvalidated by CI yet — the fix targets the failure mode of build 630476
(
TextInput triggers onPressIn and updates state text→ "Unable to entercorrect search text into test searchbox" at 5095 ms).
The crash-dump artifact format is documented inline in
$(CrashDumpRootPath)\README.md, written by the bundle step.Changelog
Should this change be included in the release notes: no
This is internal CI / test infrastructure. No runtime impact for consumers
of
react-native-windows. The only product-code change is the in-processcrash-dump writer in the E2E test app (
RNTesterApp-Fabric), which is notshipped.
Microsoft Reviewers: Open in CodeFlow