Fix head-of-line blocking in SLC for subquery shapes via ETS link-values cache and inverted index#3937
Conversation
…in ETS `get_link_values` previously made a GenServer.call to the Materializer process. In the SLC's filter step, this is called sequentially for every outer shape (shapes with subquery dependencies) for every change in a transaction — O(N_shapes × D_deps × C_changes) serialised GenServer calls, all from a single process. With N=300, D=3, C=20 this adds up to ~1.8s of blocking per transaction; larger values explain the observed 6-14s delays. Fix: cache `value_counts` in a per-stack named ETS table (read_concurrency: true). `get_link_values` now does an ETS lookup (~5µs) instead of a GenServer.call (~100µs + queue wait). This also eliminates the secondary bottleneck in ConsumerRegistry.broadcast: N outer-shape consumers previously serialised through D Materializer queues (last consumer waited N×D calls); now each consumer does D concurrent ETS reads independently, reducing broadcast time from O(N×D) to O(D). The ETS table is created by ConsumerRegistry.new (owned by the SLC process) so it has the same lifetime as the stack. Materializers write to ETS on startup (after reading historical data) and after each committed transaction. Reads fall back to GenServer.call for backward compatibility (e.g. tests using StubMaterializer). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3937 +/- ##
==========================================
+ Coverage 87.21% 88.77% +1.56%
==========================================
Files 25 25
Lines 2394 2415 +21
Branches 599 611 +12
==========================================
+ Hits 2088 2144 +56
+ Misses 304 269 -35
Partials 2 2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
- Reformat write_link_values pattern-match argument to satisfy mix format - Extract link_values_from_counts/1 to eliminate duplication between handle_call and write_link_values Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
CI observed 5 SSE requests before fallback state propagated, but the test only allowed up to 4. Increase the mock threshold and assertion to 5. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
commit: |
|
Not bad. Needs additional verification:
|
This comment has been minimized.
This comment has been minimized.
|
One thing Claude has highlighted:
whilst Codex thinks it's a serious concern:
|
4fb6972 to
8c74b85
Compare
Claude Code ReviewSummaryThis PR fixes a significant head-of-line blocking issue in the ShapeLogCollector: shapes with subqueries were serialising O(N×D×C) GenServer calls through the Materializer per transaction. The fix has two parts: an ETS cache for What's Working Well
Issues FoundImportant (Should Fix)1. Zero test coverage for All existing and new filter tests use defp sublink_affected_shapes(%Filter{stack_id: nil}, _table_name, _record), do: MapSet.new()This means A test covering the happy path would need:
2. Potential false negatives for OR + sublink WHERE clauses
Once registered, For a hypothetical shape with The correctness depends entirely on Electric's grammar never producing a top-level OR that combines a sublink arm with a non-sublink arm. If that invariant is guaranteed at parse/shape-creation time it should be documented (and ideally asserted). If not, 3. Registration ( if shape.shape_dependencies_handles \!= [] do
unregister_sublink_shape(filter, shape_id, shape)
endShapes with deps that went through the equality index (not 4. Stale ETS entries for removed dep shapes
Suggestions (Nice to Have)5. Missing The function is public and called across module boundaries ( 6. Comment in The doc says dep shapes through an equality index "must be evaluated normally by 7. TypeScript test bound increase without root-cause context
Issue ConformanceIssue #3928 is well-matched: the root cause identified in the issue is directly addressed. The PR description is thorough and the non-goals are clearly stated. The issue is somewhat under-specified (no explicit acceptance criteria), but the PR's benchmarking details and targeted regression test fill that gap adequately. Review iteration: 1 | 2026-03-03 |
|
@claude
to add a Consumer process is the one acting on materializer's life cycle events (search for Also, pay attention to this comment. |
Dep shapes in other_shapes were costing N×D ETS reads per change
(one refs_fun call per shape × D dep handles). With 300 shapes and
3 deps each, that is ~90ms of ETS reads per transaction even after
the GenServer→ETS cache from the previous commit.
Add two private ETS tables to Filter:
- sublink_field_table: {relation, field} → [{dep_handle, field_type}]
- sublink_dep_table: dep_handle → MapSet(outer_shape_handles)
On add_shape, dep shapes that land in other_shapes (non-optimisable
WHERE) are registered in the inverted index instead of being evaluated
by the O(N) other_shapes loop. On affected_shapes, a single pass over
the changed record's fields produces candidates in O(fields × K) where
K is the number of distinct dep handles for that field — typically 1
when many shapes share the same inner shape.
Dep shapes in equality/inclusion indexes are unaffected: they are still
handled by the existing index path with ETS-backed refs_fun.
Result: the SLC filter step for subquery shapes scales with the number
of distinct dep handles (usually 1), not with the number of outer shapes.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When the materializer hasn't committed yet, the ETS table has no entry for the dep handle. Previously the `with` chain would fail and no candidates were added, causing the outer shape to be silently skipped. Now when `link_values_table` has no entry for a dep handle, we include all dep shapes from `sublink_dep_table` as candidates. The subsequent re-evaluation via `refs_fun` will raise (materializer unavailable) which bubbles up to the top-level `try/catch` in `affected_shapes`, returning all shape IDs as the safe fallback — preserving the existing guarantee that any crash causes the transaction to be sent to all shapes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When a dep shape has a WHERE clause like `status = 'published' AND parent_id IN (SELECT ...)`, `optimise_where` extracts the equality condition and places the shape in the equality index. The `and_where` sublink expression then lands in a *nested* WhereCondition's other_shapes — not the top-level one checked by `in_other_shapes?`. Before: `other_shapes_affected` skipped ALL dep shapes unconditionally, so dep shapes in nested conditions were silently dropped from the result. After: Only skip dep shapes that are registered in the inverted index (i.e., those that landed in the top-level other_shapes). Dep shapes in nested conditions are not registered there, so they fall through to the normal `WhereClause.includes_record?` evaluation path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add `Materializer.delete_link_values/2` to remove a dep shape's ETS
entry from the link-values table.
- Call it in Consumer's `terminate/2` (cleans up when the dep shape's
consumer stops) and in `{:dependency_materializer_down, handle}`
handler (cleans up when a dep materializer crashes).
- Add `@spec` for `link_values_table_name/1`.
- In `Filter.remove_shape`, guard `unregister_sublink_shape` with
`registered_in_inverted_index?` instead of the broader dep-handle
check, so shapes that went through the equality index (never
registered in the inverted index) don't incur unnecessary ETS lookups
on removal.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Latest commit: Fix stale ETS cleanup and tighten unregister guardAddresses @alco's concern about stale ETS entries:
Re: CI failures — all three reported failures (
|
Claude Code ReviewSummaryNo new commits since iteration 6. The PR continues to address all requirements from issue #3928 with no critical issues. The single remaining open item is the untested inverted-index runtime path (now seventh consecutive iteration). What's Working Well
Issues FoundCritical (Must Fix)None. Important (Should Fix)1. Inverted-index runtime path remains untested -- seventh consecutive iteration No new test coverage has been added. The following code paths remain exercised only in production:
A minimal test fixture requires:
The RowExpr regression (Critical, iteration 5) introduced and fixed within this PR is direct evidence that this path carries production correctness requirements the current test suite cannot catch. Suggestions (Nice to Have)2. File:
Issue ConformanceAll requirements from issue #3928 are addressed. The fix correctly targets the O(NxDxC) GenServer serialisation bottleneck and the O(N) Previous Review Status
Review iteration: 7 | 2026-03-03 |
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Root cause of the 5th in-flight request was not identified; widening the bound masks rather than fixes the flakiness. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- sublink_affected_shapes: add rescue ArgumentError so a missing link-values ETS table (e.g. during ConsumerRegistry restart) returns an empty MapSet instead of cascading to the broad "return all shapes" fallback - get_link_values: use :infinity timeout on GenServer fallback; include exit reason in error message - write_link_values: log a warning when ETS table is missing so cache degradation is visible - delete_link_values: log at debug level when table is missing; add @doc - get_link_values: add @doc describing two-level lookup behaviour - init_link_values_table: fix @SPEC to include :undefined (rescue branch); update @doc to say "Idempotent" rather than "Must be called before" - registered_in_inverted_index?: add fast-path clause for shapes with no dep handles to avoid unnecessary ETS lookups in the hot path - consumer.ex terminate/2: restore explanatory comment on delete_link_values - snapshot_filter_ets: include sublink_field_table and sublink_dep_table so the remove-shape symmetry test catches inverted-index leaks - Code simplification: extract record_matches_dep?/4 and union_shapes_for_dep/3; remove stale TODO block from materializer.ex Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
I'm rerunning my benchmarks on the latest version of the code. |
The requested change was for a piece of temporary code. It's been removed.
Replace O(N×ETS) registered_in_inverted_index? check with an in-struct MapSet lookup so other_shapes_affected can short-circuit before calling get_shape for the common case (all N shapes are sublink-indexed). - Add `sublink_shapes_set` MapSet field to Filter struct; maintained by register/unregister_sublink_shape - registered_in_inverted_index?/2 now does MapSet.member? (no ETS) and drops the now-unnecessary `shape` argument - Move the check before get_shape in other_shapes_affected so indexed shapes never trigger an ETS load - Add filter.sublink_candidates_count span attribute and filter.sublink_reeval.duration_µs timer to expose where the remaining wall-clock time goes in sublink_affected_shapes Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Shapes with RowExpr left-hand sides (e.g. (user_id, team_id) IN (SELECT ...)) produce no indexed sublink_fields, so they must not be added to sublink_shapes_set. Previously they were added unconditionally, causing other_shapes_affected to skip them while sublink_affected_shapes could never find them → shape received no updates. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Found 1 test failure on Blacksmith runners: Failure
|
|
This is wild. Well done on getting this merged so quickly. |
|
This PR has been released! 🚀 The following packages include changes from this PR:
Thanks for contributing to Electric! |
Shapes with subqueries (`IN (SELECT ...)`) were causing 6–14s SLC stalls under load. This fixes three independent bottlenecks: O(N×D) serialised GenServer calls in the SLC filter step, O(N) other_shapes scan for dep-shape record changes, and O(D) ETS probe per shape in the hot-path membership check.
Root Cause
Three separate hot paths were serialising work:
Bottleneck 1 — SLC filter step (O(N×D×C) GenServer calls)
`get_link_values` made a `GenServer.call` per invocation. In `other_shapes_affected`, this is called for every outer shape × every dep × every changed record. With N=300, D=3, C=20 that's ~18,000 sequential GenServer calls from a single process per transaction.
Bottleneck 2 — Dep-shape record changes (O(N) other_shapes scan)
When a dep-shape's own table changes (e.g. a row is inserted into the subquery's source table), `Filter.affected_shapes` had to scan all N outer shapes to find which ones were affected, evaluating each WHERE clause against the Materializer's current link values.
Bottleneck 3 — Hot-path membership check (O(D) ETS probes per shape)
`registered_in_inverted_index?` previously walked each shape's `shape_dependencies_handles` list and did an ETS lookup per dep handle to check membership — O(D) probes, called for every shape in `other_shapes_affected` on every change.
Approach
Fix 1: Per-stack ETS link-values cache
Cache `value_counts` in a named ETS table (`read_concurrency: true`) owned by `ConsumerRegistry`. `get_link_values` does an ETS lookup (~5µs) instead of a GenServer call (~100µs + queue wait). Written after each committed transaction and after initial materialization.
Fix 2: Inverted index in Filter
Two new private ETS tables in `Filter` — `sublink_field_table` and `sublink_dep_table` — form an inverted index for dep shapes that land in non-optimisable WHERE clauses. When a dep-table record changes, the index directly returns the outer shapes whose link field matches, avoiding the O(N) scan entirely.
Fix 3: O(1) membership check via in-struct MapSet
Added `sublink_shapes_set` (a `MapSet` on the `Filter` struct) tracking which shape IDs are registered in the inverted index. `registered_in_inverted_index?` is now `MapSet.member?` — O(1) and zero ETS traffic — moved to run before the `get_shape` ETS lookup in `other_shapes_affected` so indexed shapes never touch ETS at all.
A guard (`map_size(sublink_fields) > 0`) ensures composite-PK subqueries like `(user_id, team_id) IN (SELECT ...)` — which produce no indexable fields — are never added to `sublink_shapes_set` and continue to be evaluated by `other_shapes_affected` normally. Without this guard they would be silently dropped.
Two new telemetry spans expose the remaining work: `filter.sublink_candidates_count` and `filter.sublink_reeval.duration_µs`.
Key invariants:
Non-goals: This doesn't fix O(N) consumer fan-out in `broadcast` — only the O(D) Materializer serialisation within each consumer. Fixing fan-out would require a different pubsub model.
Trade-offs: ETS is write-once-per-txn, read-many. Stale reads between write and next commit are not a concern since `get_link_values` is only called during the SLC filter step, which runs after the commit that wrote the values.
Verification
The regression test `get_link_values reads from ETS cache and does not require the GenServer to be alive` stops the Materializer GenServer after populating ETS, then asserts `get_link_values` returns the correct values — fails without the fix, passes with it.
Files Changed
Fixes #3928
🤖 Generated with Claude Code