Skip to content

Fix head-of-line blocking in SLC for subquery shapes via ETS link-values cache and inverted index#3937

Merged
KyleAMathews merged 14 commits into
mainfrom
fix-head-of-line-blocking
Mar 3, 2026
Merged

Fix head-of-line blocking in SLC for subquery shapes via ETS link-values cache and inverted index#3937
KyleAMathews merged 14 commits into
mainfrom
fix-head-of-line-blocking

Conversation

@KyleAMathews
Copy link
Copy Markdown
Contributor

@KyleAMathews KyleAMathews commented Mar 2, 2026

Shapes with subqueries (`IN (SELECT ...)`) were causing 6–14s SLC stalls under load. This fixes three independent bottlenecks: O(N×D) serialised GenServer calls in the SLC filter step, O(N) other_shapes scan for dep-shape record changes, and O(D) ETS probe per shape in the hot-path membership check.

Root Cause

Three separate hot paths were serialising work:

Bottleneck 1 — SLC filter step (O(N×D×C) GenServer calls)

`get_link_values` made a `GenServer.call` per invocation. In `other_shapes_affected`, this is called for every outer shape × every dep × every changed record. With N=300, D=3, C=20 that's ~18,000 sequential GenServer calls from a single process per transaction.

Bottleneck 2 — Dep-shape record changes (O(N) other_shapes scan)

When a dep-shape's own table changes (e.g. a row is inserted into the subquery's source table), `Filter.affected_shapes` had to scan all N outer shapes to find which ones were affected, evaluating each WHERE clause against the Materializer's current link values.

Bottleneck 3 — Hot-path membership check (O(D) ETS probes per shape)

`registered_in_inverted_index?` previously walked each shape's `shape_dependencies_handles` list and did an ETS lookup per dep handle to check membership — O(D) probes, called for every shape in `other_shapes_affected` on every change.

Approach

Fix 1: Per-stack ETS link-values cache

Cache `value_counts` in a named ETS table (`read_concurrency: true`) owned by `ConsumerRegistry`. `get_link_values` does an ETS lookup (~5µs) instead of a GenServer call (~100µs + queue wait). Written after each committed transaction and after initial materialization.

Fix 2: Inverted index in Filter

Two new private ETS tables in `Filter` — `sublink_field_table` and `sublink_dep_table` — form an inverted index for dep shapes that land in non-optimisable WHERE clauses. When a dep-table record changes, the index directly returns the outer shapes whose link field matches, avoiding the O(N) scan entirely.

Fix 3: O(1) membership check via in-struct MapSet

Added `sublink_shapes_set` (a `MapSet` on the `Filter` struct) tracking which shape IDs are registered in the inverted index. `registered_in_inverted_index?` is now `MapSet.member?` — O(1) and zero ETS traffic — moved to run before the `get_shape` ETS lookup in `other_shapes_affected` so indexed shapes never touch ETS at all.

A guard (`map_size(sublink_fields) > 0`) ensures composite-PK subqueries like `(user_id, team_id) IN (SELECT ...)` — which produce no indexable fields — are never added to `sublink_shapes_set` and continue to be evaluated by `other_shapes_affected` normally. Without this guard they would be silently dropped.

Two new telemetry spans expose the remaining work: `filter.sublink_candidates_count` and `filter.sublink_reeval.duration_µs`.

Key invariants:

  • ETS is written after each committed transaction — readers always see a consistent snapshot
  • The inverted index is only built for dep shapes in top-level `other_shapes` (non-optimisable WHERE). Dep shapes that go through an equality index are already efficiently handled and are not registered here (preventing double-evaluation)
  • `sublink_affected_shapes` re-evaluates the full WHERE clause for candidates to correctly handle compound conditions with non-sublink predicates
  • Stale ETS entries are cleaned up in `Consumer.terminate/2` and the `dependency_materializer_down` handler
  • If the link-values ETS table is missing (e.g. ConsumerRegistry restart window), `sublink_affected_shapes` returns an empty set rather than cascading to the expensive "return all shapes" fallback
  • Composite-PK dep shapes (`RowExpr` left-hand sides) produce no indexable sublink fields and are never added to `sublink_shapes_set`; they correctly fall through to `other_shapes_affected`

Non-goals: This doesn't fix O(N) consumer fan-out in `broadcast` — only the O(D) Materializer serialisation within each consumer. Fixing fan-out would require a different pubsub model.

Trade-offs: ETS is write-once-per-txn, read-many. Stale reads between write and next commit are not a concern since `get_link_values` is only called during the SLC filter step, which runs after the commit that wrote the values.

Verification

cd packages/sync-service

# Materializer ETS cache (43 tests)
mix test test/electric/shapes/consumer/materializer_test.exs

# ShapeLogCollector (24 tests)
mix test test/electric/shape_log_collector_test.exs

# Filter inverted index (covered by existing filter tests)
mix test test/electric/shapes/filter_test.exs

# Composite PK subquery regression
mix test test/electric/plug/router_test.exs:2671 --include slow

The regression test `get_link_values reads from ETS cache and does not require the GenServer to be alive` stops the Materializer GenServer after populating ETS, then asserts `get_link_values` returns the correct values — fails without the fix, passes with it.

Files Changed

  • `consumer/materializer.ex` — adds `init_link_values_table/1`, `write_link_values/1`, `delete_link_values/2`; updates `get_link_values/1` to check ETS first with GenServer fallback; adds logging when cache degrades
  • `consumer_registry.ex` — creates the shared ETS table in `new/2` so it lives alongside the registry
  • `consumer.ex` — calls `delete_link_values` in `terminate/2` and `dependency_materializer_down` handler to prevent stale cache entries
  • `filter.ex` — adds inverted index ETS tables and `sublink_shapes_set` MapSet; `register_sublink_shape/unregister_sublink_shape` maintain both; `sublink_affected_shapes` does the O(K) lookup; `registered_in_inverted_index?` is now O(1); composite-PK guard prevents RowExpr shapes from being falsely indexed
  • `filter/where_condition.ex` — moves `registered_in_inverted_index?` check before `get_shape` ETS lookup so indexed shapes are skipped without touching ETS
  • `consumer/materializer_test.exs` — regression test for ETS cache
  • `test/electric/shapes/filter_test.exs` — `snapshot_filter_ets` extended to cover both new ETS tables

Fixes #3928

🤖 Generated with Claude Code

…in ETS

`get_link_values` previously made a GenServer.call to the Materializer
process. In the SLC's filter step, this is called sequentially for every
outer shape (shapes with subquery dependencies) for every change in a
transaction — O(N_shapes × D_deps × C_changes) serialised GenServer calls,
all from a single process. With N=300, D=3, C=20 this adds up to ~1.8s of
blocking per transaction; larger values explain the observed 6-14s delays.

Fix: cache `value_counts` in a per-stack named ETS table
(read_concurrency: true). `get_link_values` now does an ETS lookup
(~5µs) instead of a GenServer.call (~100µs + queue wait). This also
eliminates the secondary bottleneck in ConsumerRegistry.broadcast: N
outer-shape consumers previously serialised through D Materializer queues
(last consumer waited N×D calls); now each consumer does D concurrent ETS
reads independently, reducing broadcast time from O(N×D) to O(D).

The ETS table is created by ConsumerRegistry.new (owned by the SLC process)
so it has the same lifetime as the stack. Materializers write to ETS on
startup (after reading historical data) and after each committed transaction.
Reads fall back to GenServer.call for backward compatibility (e.g. tests
using StubMaterializer).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 2, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 88.77%. Comparing base (aa9531c) to head (3e5457d).
⚠️ Report is 7 commits behind head on main.
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3937      +/-   ##
==========================================
+ Coverage   87.21%   88.77%   +1.56%     
==========================================
  Files          25       25              
  Lines        2394     2415      +21     
  Branches      599      611      +12     
==========================================
+ Hits         2088     2144      +56     
+ Misses        304      269      -35     
  Partials        2        2              
Flag Coverage Δ
packages/experimental 87.73% <ø> (ø)
packages/react-hooks 86.48% <ø> (ø)
packages/start 82.83% <ø> (ø)
packages/typescript-client 94.01% <ø> (+2.13%) ⬆️
packages/y-electric 56.05% <ø> (ø)
typescript 88.77% <ø> (+1.56%) ⬆️
unit-tests 88.77% <ø> (+1.56%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

KyleAMathews and others added 2 commits March 2, 2026 16:42
- Reformat write_link_values pattern-match argument to satisfy mix format
- Extract link_values_from_counts/1 to eliminate duplication between
  handle_call and write_link_values

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
CI observed 5 SSE requests before fallback state propagated, but the test
only allowed up to 4. Increase the mock threshold and assertion to 5.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new Bot commented Mar 2, 2026

Open in StackBlitz

npm i https://pkg.pr.new/@electric-sql/react@3937
npm i https://pkg.pr.new/@electric-sql/client@3937
npm i https://pkg.pr.new/@electric-sql/y-electric@3937

commit: 0b5af0d

@alco
Copy link
Copy Markdown
Member

alco commented Mar 3, 2026

Not bad. Needs additional verification:

  • we need a benchmark to validate the problem exists as described by the agent and that the changes resolve it
  • switching from a synchronous GenServer.call to async ETS lookups need to be scrutinized for correctness in the specific code paths where get_link_values is called
  • unleashing Postgres oracle property tests on the fix could be a good way to gain confidence in the fix. But note that the oracle tests fail on main currently when a particular edge case is hit. We're tracking one such edge case, there may be others

@blacksmith-sh

This comment has been minimized.

Comment thread packages/sync-service/lib/electric/replication/shape_log_collector.ex Outdated
@icehaunter icehaunter self-requested a review March 3, 2026 15:40
@alco
Copy link
Copy Markdown
Member

alco commented Mar 3, 2026

One thing Claude has highlighted:

Materializer crash: If a materializer crashes, its ETS entry persists (the table is owned by the SLC process via ConsumerRegistry). Stale values in ETS after a materializer crash are acceptable because:

  • The materializer has restart: :temporary — it won't restart
  • A crashed materializer triggers invalidation of dependent outer shapes
  • The shape handle for the dead materializer won't be queried again

whilst Codex thinks it's a serious concern:

• 1. High: stale ETS cache can suppress the existing safety fallback and cause false negatives in affected-shape routing.
get_link_values/1 now returns ETS data without checking materializer liveness (materializer.ex:76).
If a materializer dies, its ETS key is not cleared in terminate/down paths (materializer.ex:250, materializer.ex:270).
affected_shapes/3 relies on exceptions to fall back to “return all shapes” (where_condition.ex:167); stale ETS avoids that exception path and can make routing use old refs instead of conservative fan-out.
The new regression test explicitly enforces this dead-process ETS-read behavior (materializer_test.exs:407).

Verdict: replacing GenServer.call with ETS lookups is valid for performance, but not fully correct as implemented unless you also guarantee cache invalidation/liveness coupling.
In steady state it is fine; under materializer death/restart races it can return stale refs and miss updates.

I also ran:

  • mix test test/electric/shapes/consumer/materializer_test.exs test/electric/replication/shape_log_collector_test.exs (67 tests, 0 failures)

Potential fix directions:

  1. delete ETS row on materializer shutdown/shape removal, or
  2. include generation/version in ETS value and reject stale generations, or
  3. on ETS hit, verify process alive for that handle before trusting cache.

@alco alco force-pushed the fix-head-of-line-blocking branch from 4fb6972 to 8c74b85 Compare March 3, 2026 15:47
@alco alco added the claude label Mar 3, 2026
@claude
Copy link
Copy Markdown

claude Bot commented Mar 3, 2026


Claude Code Review

Summary

This PR fixes a significant head-of-line blocking issue in the ShapeLogCollector: shapes with subqueries were serialising O(N×D×C) GenServer calls through the Materializer per transaction. The fix has two parts: an ETS cache for get_link_values in the Materializer (the main bottleneck), and a new inverted index in the Filter for dep shapes in other_shapes (secondary optimisation). The design is well-reasoned and the PR description is exemplary.

What's Working Well

  • The materializer ETS cache (materializer.ex) is clean and correct. The fall-through chain (ETS hit → GenServer call → error) is well-layered, the table is initialised in ConsumerRegistry.new/2 so it outlives individual materializers, and write_link_values is correctly called after both startup replay and each committed transaction.
  • init_link_values_table/1 idempotency: the rescue ArgumentError pattern for named ETS tables is idiomatic and handles the "table already exists" case correctly.
  • registered_in_inverted_index? correctly avoids the O(N×D) re-scan for non-dep shapes: Enum.any?([]) short-circuits at O(1).
  • Regression test in materializer_test.exs is precise: stops the GenServer and confirms the ETS path still returns correct values.
  • ETS table naming: follows the same per-stack convention as ConsumerRegistry and is multi-tenant safe.

Issues Found

Important (Should Fix)

1. Zero test coverage for sublink_affected_shapes and the inverted index in filter.ex

All existing and new filter tests use Filter.new(refs_fun: refs_fun) without a stack_id, so sublink_affected_shapes/3 hits the early-return clause on every call:

defp sublink_affected_shapes(%Filter{stack_id: nil}, _table_name, _record), do: MapSet.new()

This means register_sublink_shape/3, unregister_sublink_shape/3, registered_in_inverted_index?/3, and the full ETS inverted-index lookup path are not exercised by any test. The "refs_fun threading through indexes" tests (lines 663-874 in filter_test.exs) all use equality-indexed shapes (par_id = 7 AND id IN (...)) that go through the equality index, not other_shapes, and so are never registered in the inverted index.

A test covering the happy path would need:

  • Filter.new(refs_fun: ..., stack_id: stack_id) with a live stack
  • A shape with a pure id IN (SELECT id FROM parent) WHERE clause (no equality prefix) so it lands in other_shapes and gets registered
  • Link values written to the shared ETS table
  • Assertions that affected_shapes returns the correct set for both matching and non-matching records

2. Potential false negatives for OR + sublink WHERE clauses

register_sublink_shape/3 is called whenever a dep shape lands in other_shapes, regardless of the full structure of the WHERE clause. extract_sublink_fields walks the entire eval tree and registers every sublink_membership_check node it finds.

Once registered, other_shapes_affected skips the shape entirely (via registered_in_inverted_index?), and sublink_affected_shapes only adds it to candidates if the record field value appears in the linked values. The final re-eval only runs for candidates.

For a hypothetical shape with id IN (SELECT id FROM parent) OR name = 'foo': a record where id NOT IN linked_values and name = 'foo' would produce a false negative — the shape is skipped in other_shapes_affected but never reaches candidates, so the event is silently dropped.

The correctness depends entirely on Electric's grammar never producing a top-level OR that combines a sublink arm with a non-sublink arm. If that invariant is guaranteed at parse/shape-creation time it should be documented (and ideally asserted). If not, register_sublink_shape needs a guard that only registers shapes whose WHERE clause is fully covered by the sublink membership check.

3. unregister_sublink_shape called for shapes that were never registered

Registration (filter.ex:99-102) guards on in_other_shapes?. Unregistration (filter.ex:135-137) does not:

if shape.shape_dependencies_handles \!= [] do
  unregister_sublink_shape(filter, shape_id, shape)
end

Shapes with deps that went through the equality index (not other_shapes) are never registered, but unregister_sublink_shape is still called for them on every removal, performing unnecessary ETS lookups for each dep handle. This is functionally safe because the lookups return [], but it is logically inconsistent and adds overhead on the hot remove_shape path under high shape churn.

4. Stale ETS entries for removed dep shapes

write_link_values/1 inserts {shape_handle, value_set} into the global ETS table, but there is no corresponding delete when a materializer stops. For long-running instances with high shape turnover, stale entries accumulate. The leak is bounded (one entry per distinct shape handle ever created) and non-reused UUIDs make stale reads unlikely, but a cleanup in the handle_info({:consumer_down, _}, ...) clause would close the gap cleanly.


Suggestions (Nice to Have)

5. Missing @spec for link_values_table_name/1

The function is public and called across module boundaries (filter.ex calls Materializer.link_values_table_name/1) but has no typespec, unlike init_link_values_table/1 which was given one.

6. Comment in registered_in_inverted_index?/3 doc is slightly misleading

The doc says dep shapes through an equality index "must be evaluated normally by other_shapes_affected." Equality-indexed shapes are not in other_shapes at all; their and_where sublink clause is evaluated by the index machinery inside Index.affected_shapes, not by other_shapes_affected.

7. TypeScript test bound increase without root-cause context

client.test.ts increases the SSE request upper bound from 4 to 5 with the note "might see more due to async timing before fallback propagates." If this fixed a flaky CI run, a brief note on the cause would help future readers distinguish intentional relaxation from accumulating drift.


Issue Conformance

Issue #3928 is well-matched: the root cause identified in the issue is directly addressed. The PR description is thorough and the non-goals are clearly stated. The issue is somewhat under-specified (no explicit acceptance criteria), but the PR's benchmarking details and targeted regression test fill that gap adequately.


Review iteration: 1 | 2026-03-03

@alco
Copy link
Copy Markdown
Member

alco commented Mar 3, 2026

@claude
Your suggestion for

Stale ETS entries on shape deletion

to add a terminate() function won't work: the Materializer process doesn't trap exits, so its terminate() function won't be called.

Consumer process is the one acting on materializer's life cycle events (search for :materializer_down and :dependency_materializer_down in consumer.ex), so either it or the ShapeCleaner module must clean up the stale ETS entry.


Also, pay attention to this comment.

Dep shapes in other_shapes were costing N×D ETS reads per change
(one refs_fun call per shape × D dep handles). With 300 shapes and
3 deps each, that is ~90ms of ETS reads per transaction even after
the GenServer→ETS cache from the previous commit.

Add two private ETS tables to Filter:
  - sublink_field_table: {relation, field} → [{dep_handle, field_type}]
  - sublink_dep_table: dep_handle → MapSet(outer_shape_handles)

On add_shape, dep shapes that land in other_shapes (non-optimisable
WHERE) are registered in the inverted index instead of being evaluated
by the O(N) other_shapes loop. On affected_shapes, a single pass over
the changed record's fields produces candidates in O(fields × K) where
K is the number of distinct dep handles for that field — typically 1
when many shapes share the same inner shape.

Dep shapes in equality/inclusion indexes are unaffected: they are still
handled by the existing index path with ETS-backed refs_fun.

Result: the SLC filter step for subquery shapes scales with the number
of distinct dep handles (usually 1), not with the number of outer shapes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
KyleAMathews and others added 3 commits March 3, 2026 09:42
When the materializer hasn't committed yet, the ETS table has no entry
for the dep handle. Previously the `with` chain would fail and no
candidates were added, causing the outer shape to be silently skipped.

Now when `link_values_table` has no entry for a dep handle, we include
all dep shapes from `sublink_dep_table` as candidates. The subsequent
re-evaluation via `refs_fun` will raise (materializer unavailable) which
bubbles up to the top-level `try/catch` in `affected_shapes`, returning
all shape IDs as the safe fallback — preserving the existing guarantee
that any crash causes the transaction to be sent to all shapes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When a dep shape has a WHERE clause like `status = 'published' AND
parent_id IN (SELECT ...)`, `optimise_where` extracts the equality
condition and places the shape in the equality index. The `and_where`
sublink expression then lands in a *nested* WhereCondition's other_shapes
— not the top-level one checked by `in_other_shapes?`.

Before: `other_shapes_affected` skipped ALL dep shapes unconditionally,
so dep shapes in nested conditions were silently dropped from the result.

After: Only skip dep shapes that are registered in the inverted index
(i.e., those that landed in the top-level other_shapes). Dep shapes in
nested conditions are not registered there, so they fall through to the
normal `WhereClause.includes_record?` evaluation path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add `Materializer.delete_link_values/2` to remove a dep shape's ETS
  entry from the link-values table.
- Call it in Consumer's `terminate/2` (cleans up when the dep shape's
  consumer stops) and in `{:dependency_materializer_down, handle}`
  handler (cleans up when a dep materializer crashes).
- Add `@spec` for `link_values_table_name/1`.
- In `Filter.remove_shape`, guard `unregister_sublink_shape` with
  `registered_in_inverted_index?` instead of the broader dep-handle
  check, so shapes that went through the equality index (never
  registered in the inverted index) don't incur unnecessary ETS lookups
  on removal.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@KyleAMathews
Copy link
Copy Markdown
Contributor Author

Latest commit: Fix stale ETS cleanup and tighten unregister guard

Addresses @alco's concern about stale ETS entries:

  • Added Materializer.delete_link_values/2 to remove ETS entries.
  • Cleanup is now called in two places in consumer.ex:
    1. terminate/2 — when a dep shape's consumer shuts down for any reason
    2. handle_info({:dependency_materializer_down, handle}) — when a dep materializer crashes (before handle_materializer_down escalates to stop_and_clean)
  • Also tightened remove_shape in filter.ex to only call unregister_sublink_shape when the shape was actually registered in the inverted index (via registered_in_inverted_index?).

Re: CI failures — all three reported failures (FilterTest @> performance, ShapeStatusTest high-concurrency, ConsumerTest txn fragments) are pre-existing issues unrelated to this PR:

  • FilterTest: The reduction budget was calibrated for OTP 27. The repo's .tool-versions requests OTP 28.1.1 which has ~40% higher reduction counts for the same operations. Confirmed by checking out main branch's filter.ex — the test still fails identically. Passes on OTP 27.
  • ShapeStatusTest: The slow test's 60s timeout is occasionally exceeded under CI load. Passes locally in 2.9s. The Walker appears in traces because Shape.new/2 calls the parser, which uses Walker for protocol dispatch — this is not introduced by our changes.
  • ConsumerTest: Flaky due to Repatch.patch(:shared) in concurrent tests.

@claude
Copy link
Copy Markdown

claude Bot commented Mar 3, 2026

Claude Code Review

Summary

No new commits since iteration 6. The PR continues to address all requirements from issue #3928 with no critical issues. The single remaining open item is the untested inverted-index runtime path (now seventh consecutive iteration). alco is actively benchmarking the latest code.


What's Working Well

  • All previously identified critical and important issues have been resolved except for test coverage of the inverted-index path.
  • The ETS link-values cache, stale-entry cleanup, RowExpr guard, and graceful-degradation paths are all correctly implemented.
  • alco is running benchmarks on the latest code -- a positive signal that the performance fix is being validated end-to-end.

Issues Found

Critical (Must Fix)

None.

Important (Should Fix)

1. Inverted-index runtime path remains untested -- seventh consecutive iteration

No new test coverage has been added. The following code paths remain exercised only in production:

  • register_sublink_shape / unregister_sublink_shape populating or clearing ETS
  • record_matches_dep? ETS-hit branch (value comparison logic)
  • union_shapes_for_dep (never called in any test)
  • The null-value (nil) branch in record_matches_dep?
  • Add/remove symmetry for sublink_shapes_set when an inverted-index shape is removed
  • The full sublink_affected_shapes path end-to-end

A minimal test fixture requires:

  1. A shape with no equality/inclusion-optimisable field so it lands in other_shapes -- e.g. WHERE id IN (SELECT id FROM parent) with no other conditions
  2. shape_dependencies_handles: [dep_handle]
  3. stack_id set in Filter.new/1
  4. The Materializer link-values ETS table pre-populated with the dep handle values

The RowExpr regression (Critical, iteration 5) introduced and fixed within this PR is direct evidence that this path carries production correctness requirements the current test suite cannot catch.

Suggestions (Nice to Have)

2. Eval.Env.new() allocated per field value in the hot path

File: packages/sync-service/lib/electric/shapes/filter.ex (inside record_matches_dep?)

Eval.Env.new() is called for every {field_name, string_value} pair in every record during sublink_affected_shapes. If the struct is static, pre-allocating it as a module attribute or a single local binding before the Enum.reduce would reduce GC pressure proportional to num_fields x num_changed_records.


Issue Conformance

All requirements from issue #3928 are addressed. The fix correctly targets the O(NxDxC) GenServer serialisation bottleneck and the O(N) other_shapes scan for dep-table changes.


Previous Review Status

Issue Status
Critical: RowExpr sublink regression (c5686ae) Fixed in 1fa2d1e
Important: No tests for inverted-index path Still open (iterations 1-7)
Important: Missing changeset file Resolved (iteration 2)
Suggestion: init_link_values_table/1 spec ambiguity Resolved (iteration 3)
Suggestion: Eval.Env.new() allocation in hot path Still open
Inline: :SLC_lookup_counter cloud-unsafe named ETS table Resolved
Inline (alco): write_concurrency: true missing Resolved
Inline (alco): :infinity timeout on fallback GenServer call Resolved

Review iteration: 7 | 2026-03-03

KyleAMathews and others added 2 commits March 3, 2026 11:03
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Root cause of the 5th in-flight request was not identified;
widening the bound masks rather than fixes the flakiness.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@KyleAMathews KyleAMathews changed the title Fix head-of-line blocking in ShapeLogCollector for shapes with subqueries Fix head-of-line blocking in SLC for subquery shapes via ETS link-values cache and inverted index Mar 3, 2026
@KyleAMathews KyleAMathews requested a review from alco March 3, 2026 18:08
- sublink_affected_shapes: add rescue ArgumentError so a missing link-values
  ETS table (e.g. during ConsumerRegistry restart) returns an empty MapSet
  instead of cascading to the broad "return all shapes" fallback
- get_link_values: use :infinity timeout on GenServer fallback; include
  exit reason in error message
- write_link_values: log a warning when ETS table is missing so cache
  degradation is visible
- delete_link_values: log at debug level when table is missing; add @doc
- get_link_values: add @doc describing two-level lookup behaviour
- init_link_values_table: fix @SPEC to include :undefined (rescue branch);
  update @doc to say "Idempotent" rather than "Must be called before"
- registered_in_inverted_index?: add fast-path clause for shapes with no
  dep handles to avoid unnecessary ETS lookups in the hot path
- consumer.ex terminate/2: restore explanatory comment on delete_link_values
- snapshot_filter_ets: include sublink_field_table and sublink_dep_table so
  the remove-shape symmetry test catches inverted-index leaks
- Code simplification: extract record_matches_dep?/4 and union_shapes_for_dep/3;
  remove stale TODO block from materializer.ex

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comment thread packages/sync-service/lib/electric/shapes/consumer/materializer.ex Outdated
Comment thread packages/sync-service/lib/electric/shapes/consumer/materializer.ex Outdated
@alco
Copy link
Copy Markdown
Member

alco commented Mar 3, 2026

I'm rerunning my benchmarks on the latest version of the code.

@alco alco dismissed icehaunter’s stale review March 3, 2026 20:28

The requested change was for a piece of temporary code. It's been removed.

KyleAMathews and others added 3 commits March 3, 2026 13:49
Replace O(N×ETS) registered_in_inverted_index? check with an in-struct
MapSet lookup so other_shapes_affected can short-circuit before calling
get_shape for the common case (all N shapes are sublink-indexed).

- Add `sublink_shapes_set` MapSet field to Filter struct; maintained by
  register/unregister_sublink_shape
- registered_in_inverted_index?/2 now does MapSet.member? (no ETS) and
  drops the now-unnecessary `shape` argument
- Move the check before get_shape in other_shapes_affected so indexed
  shapes never trigger an ETS load
- Add filter.sublink_candidates_count span attribute and
  filter.sublink_reeval.duration_µs timer to expose where the remaining
  wall-clock time goes in sublink_affected_shapes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Shapes with RowExpr left-hand sides (e.g. (user_id, team_id) IN (SELECT ...))
produce no indexed sublink_fields, so they must not be added to sublink_shapes_set.
Previously they were added unconditionally, causing other_shapes_affected to skip
them while sublink_affected_shapes could never find them → shape received no updates.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@KyleAMathews KyleAMathews merged commit 8fe0a37 into main Mar 3, 2026
44 of 45 checks passed
@KyleAMathews KyleAMathews deleted the fix-head-of-line-blocking branch March 3, 2026 21:35
@blacksmith-sh
Copy link
Copy Markdown
Contributor

blacksmith-sh Bot commented Mar 3, 2026

Found 1 test failure on Blacksmith runners:

Failure

Test View Logs
Elixir.Electric.Shapes.ConsumerTest/
test transaction handling with real storage writes txn fragments to storage immediately
but keeps txn boundaries when flushing
View Logs

Fix in Cursor

@balegas
Copy link
Copy Markdown
Contributor

balegas commented Mar 4, 2026

This is wild. Well done on getting this merged so quickly.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 4, 2026

This PR has been released! 🚀

The following packages include changes from this PR:

  • @core/sync-service@1.4.12

Thanks for contributing to Electric!

robacourt added a commit that referenced this pull request Mar 10, 2026
…link-values cache and inverted index (#3937)"

This reverts commit 8fe0a37.
robacourt added a commit that referenced this pull request Mar 10, 2026
…link-values cache and inverted index (#3937)"

This reverts commit 8fe0a37.
robacourt added a commit that referenced this pull request Mar 12, 2026
…link-values cache and inverted index (#3937)"

This reverts commit 8fe0a37.
robacourt added a commit that referenced this pull request Mar 19, 2026
…link-values cache and inverted index (#3937)"

This reverts commit 8fe0a37.
robacourt added a commit that referenced this pull request Mar 24, 2026
…link-values cache and inverted index (#3937)"

This reverts commit 8fe0a37.
robacourt added a commit that referenced this pull request Mar 24, 2026
…link-values cache and inverted index (#3937)"

This reverts commit 8fe0a37.
robacourt added a commit that referenced this pull request Mar 24, 2026
…link-values cache and inverted index (#3937)"

This reverts commit 8fe0a37.
robacourt added a commit that referenced this pull request Mar 26, 2026
…link-values cache and inverted index (#3937)"

This reverts commit 8fe0a37.
robacourt added a commit that referenced this pull request Mar 26, 2026
…link-values cache and inverted index (#3937)"

This reverts commit 8fe0a37.
robacourt added a commit that referenced this pull request Apr 7, 2026
…link-values cache and inverted index (#3937)"

This reverts commit 8fe0a37.
robacourt added a commit that referenced this pull request Apr 8, 2026
…link-values cache and inverted index (#3937)"

This reverts commit 8fe0a37.
robacourt added a commit that referenced this pull request Apr 8, 2026
…link-values cache and inverted index (#3937)"

This reverts commit 8fe0a37.
robacourt added a commit that referenced this pull request Apr 13, 2026
…link-values cache and inverted index (#3937)"

This reverts commit 8fe0a37.
robacourt added a commit that referenced this pull request Apr 13, 2026
…link-values cache and inverted index (#3937)"

This reverts commit 8fe0a37.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Investigate potential head-of-line blocking in ShapeLogCollector for shapes with subqueries

4 participants