Working with Postgres WAL format from Elixir#1
Merged
Conversation
VAX-57 Transform Postgres WAL files into internal replication message format
Postgres produces WAL with "logical" level of detail, which can be decoded into logical replication streams. WAL files replication is managed through replication slots. Once some entry is read from the replication slot, the log entries are cleared from the server. In this task we want to read entries from a PG WAL file, decode and transform them into a format of our choice. We shall define the message format of the events that are passed down the pipeline. This task is already partly done in https://github.com/vaxine-io/electric
|
da5afce to
7f206c3
Compare
icehaunter
pushed a commit
that referenced
this pull request
Apr 17, 2023
* browser: wip structuring the web browser adapter. * browser: more wip on browser adapter. * browser: tweak to await return values. * browser: use run not exec in db adapter. * deps: tidy up dependencies. * browser: support opening multiple databases. * browser: flesh out testing the db proxy API. * browser: test the statement API. * browser: worker side `ElectricStatement` proxy wrapper. * browser: test db and statement commit notifications. * browser: actually run all the browser tests. * refactor: rename adapters to drivers. * scripts: rm `./dist` before building.
KyleAMathews
added a commit
that referenced
this pull request
Oct 13, 2025
Based on thorough validation against PostgreSQL documentation: **Issue #1 - Troubleshooting "must be owner" error:** - REMOVED incorrect suggestion to use GRANT ALL PRIVILEGES - PostgreSQL ownership rights cannot be granted via privileges - ALTER TABLE and adding tables to publications require actual ownership **Issue #2 - Quick Start ownership transfer:** - ADDED GRANT CREATE ON SCHEMA public before ownership transfer - PostgreSQL requires new owner to have CREATE privilege on schema - Without this, ALTER TABLE ... OWNER TO will fail **Issue #3 - Publication ownership requirements:** - CLARIFIED that you must own BOTH publication AND each table - Updated Core Permission Requirements table - Per PostgreSQL docs: Adding a table additionally requires owning that table **Issue #4 - AWS wal_level defaults:** - CORRECTED incorrect claim about wal_level defaults - PostgreSQL default is replica (not minimal for RDS) - RDS/Aurora use standard PostgreSQL defaults All fixes validated by research agents against official PostgreSQL docs. Thanks to external reviewer for catching these critical errors.
KyleAMathews
added a commit
that referenced
this pull request
Oct 13, 2025
Based on thorough validation against PlanetScale documentation: **Issue #1 - Default postgres role and REPLICATION:** - FIXED incorrect claim that default roles lack REPLICATION - PlanetScale's default postgres role DOES include REPLICATION attribute - Updated to: "You can use it directly, or create dedicated role for least-privilege" - Source: PlanetScale roles documentation shows CREATE ROLE ... REPLICATION **Issue #2 - Logical replication defaults:** - SOFTENED absolute claim "not enabled by default" - Changed to: "may not be enabled. Verify and configure as needed" - More future-proof: PlanetScale deliberately avoids stating defaults - Moved verification step (SHOW wal_level) to top for clarity **Issue #3 - Connection limit defaults:** - REMOVED undocumented "25 connections" default claim - No public PlanetScale docs state this specific default - Changed to: "Small clusters may start with low max_connections" - Provided sizing guidance: "≥ 3× Electric's pool size" without hard numbers - Example: 20 connections → set max_connections to at least 60 All fixes validated by research agents against official documentation. Thanks to external reviewer for catching these precision issues.
msfstef
added a commit
that referenced
this pull request
Nov 4, 2025
Fixes sentry errors [#1](https://electricsql-04.sentry.io/issues/74727274/) and [#2](https://electricsql-04.sentry.io/issues/74727257) I have marked the `:disk_full` error as retryable, since storage might be freed up automatically or added by the db administrator as a response to this error and should thus not shut down the system. I have marked the `:duplicate_file` for the replication slot specifically as retryable as well, as it is a tmp file for an atomic write that seems like a race. Interesting that this occurred and might be worth looking into if it keeps occurring.
KyleAMathews
pushed a commit
that referenced
this pull request
Nov 13, 2025
Fixes an additional race condition where a stale abort completion could overwrite the state after resume() has been called. Timeline of the bug: 1. Request #1 running with AbortController #1 2. Tab hidden → pause() sets state to 'pause-requested', aborts #1 3. Tab visible → resume() sets state to 'active', starts request #2 4. Old request #1's abort completes, sets state to 'paused' 5. Stream stuck because state is 'paused' but should be 'active' Fix: Only transition to 'paused' if state is still 'pause-requested'. If resume() already changed it to 'active', don't overwrite it. This ensures that old abort completions don't interfere with the new active request started by resume().
6 tasks
alco
added a commit
that referenced
this pull request
Jan 20, 2026
### 🔗 Context When this workflow is called by the Changesets workflow, the `github.event_name` context variable evaluates to `push` (the parent's trigger) rather than `workflow_call`. This caused our previous logic to skip the release flow and default to canary builds. So the most recent [package publishing](https://github.com/electric-sql/electric/actions/runs/21003147814/job/60379128532) only pushed canary images: ``` #1 [internal] pushing docker.io/electricsql/electric:canary #1 0.000 pushing sha256:e372c6ad86713cdbf726bbef83920eb32ecf9034d3794abde8d9fa73805413b1 to docker.io/electricsql/electric:canary #1 DONE 1.9s #1 [internal] pushing docker.io/electricsql/electric-canary:3516b9780 #1 0.000 pushing sha256:e372c6ad86713cdbf726bbef83920eb32ecf9034d3794abde8d9fa73805413b1 to docker.io/electricsql/electric-canary:3516b9780 #1 ... #2 [internal] pushing docker.io/electricsql/electric-canary:latest #2 0.000 pushing sha256:e372c6ad86713cdbf726bbef83920eb32ecf9034d3794abde8d9fa73805413b1 to docker.io/electricsql/electric-canary:latest #2 DONE 1.8s #1 [internal] pushing docker.io/electricsql/electric-canary:3516b9780 #1 DONE 1.8s ``` ### 🛠️ Changes - Updated `derive_build_vars` job to prioritize `inputs.release_tag` and `github.event.release.tag_name` over the event name string. - Refactored shell logic to use more idiomatic `-n` (non-zero string) checks. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Refactor** * Simplified the Docker Hub image sync workflow's release detection logic to prioritize explicit release tags and fallback to commit-based runs, preserving existing behavior for release and manual triggers. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->
This was referenced Mar 3, 2026
balegas
added a commit
that referenced
this pull request
Mar 8, 2026
- Remove expiry_manager.get_shape_count span: runs every 60s per stack, is a simple ETS count, generated 887K events/day - Remove expiry_manager.get_least_recently_used span: ETS fold, only meaningful as part of the expire_shapes parent span - Remove shape_status.validate_shape_handle span: sub-millisecond ETS hash comparison called 2x per request, was the #1 span by volume at 3.5M events/day These operations are too fast and frequent to warrant individual spans. The parent spans (Plug_shape_get, expiry_manager.expire_shapes) provide sufficient context for debugging. Fixes #3977 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2 tasks
This was referenced Mar 23, 2026
6 tasks
KyleAMathews
added a commit
that referenced
this pull request
Apr 10, 2026
Adds test/pbt-micro.test.ts — a dedicated PBT suite that exercises twelve narrow invariants in the ShapeStream client. Each TARGET's opening comment documents the invariant under test, and the PBTs shook loose eight real bugs that are fixed in this commit: #1 canonicalShapeKey used URLSearchParams.set() for custom params, collapsing duplicate keys (e.g. ?tag=a&tag=b → ?tag=b) so two genuinely distinct shapes shared a cache key. Switched to append(). #2 Shape#process clobbered shouldNotify by assignment in three places, so any sequence where a change message followed a status-change message would silently drop the change's notification. OR-accumulate instead, and track hadData before must-refetch clears state so subscribers still see the reset. #3 SubsetParams GET serialization dropped limit=0 and offset=0 via falsy checks. Switched to explicit !== undefined guards. #4 Shape#requestedSubSnapshots dedup keyed on bigintSafeStringify, which preserves insertion order, so permutation-equivalent params produced different keys and re-execution fired the same snapshot N times. Added canonicalBigintSafeStringify to helpers.ts that recursively sorts object keys. #5 snakeToCamel collapsed runs of underscores into a single camelCase boundary, so user_id and user__id (distinct db columns) decoded to the same app key, corrupting rows with mapped values. snakeToCamel now preserves (n-1) literal underscores for a run of n, and camelToSnake's boundary regex was widened to ([a-z_])([A-Z]) so the round-trip is injective. #6 Shape#reexecuteSnapshots caught and discarded errors from stream.requestSnapshot, silently dropping failed sub-snapshot re-execution on shape rotation. Errors are now collected and the first one is surfaced via #error + #notify. #7 SnapshotTracker populated xmaxSnapshots and snapshotsByDatabaseLsn in addSnapshot but never cleaned them up — neither on removeSnapshot nor on addSnapshot with a repeated mark. A later shouldRejectMessage eviction loop would walk the stale reverse index and wrongly delete the current snapshot, allowing duplicate change messages to slip through. Stored databaseLsn on each entry and added #detachFromReverseIndexes that runs on both add (before overwriting) and remove. #8 Shape#awaitUpToDate never observed the stream's error state, so calling requestSnapshot on a terminally-errored stream would hang forever on the setInterval polling loop. The helper now checks #error / stream.error up front, subscribes to the stream's onError, and settles the internal promise via reject on any terminal error path. Also: - vitest.pbt.config.ts — dedicated config that skips the real-Electric globalSetup and includes both model-based and pbt-micro test files. - bin/pbt-soak.sh — soak runner that loops PBT iterations with fresh seeds, captures counterexamples on failure. - test/pbt-micro.test.ts TARGET 4 (UpToDateTracker) restores real timers in afterEach so TARGET 12's setInterval doesn't hang when the suite runs end-to-end. - SPEC.md cross-references the L6 fetchSnapshotWithRetry PBT from the unconditional-409-cache-buster invariant. - model-based.test.ts gains response builders for update, delete, and mixed-batch 200s with lsn/op_position/txids headers for more realistic change sequences. Verified with 319 unit tests, 44 PBT tests at 500 runs each (and 2000-run soak), 62 column-mapper/snapshot-tracker tests, typecheck clean, eslint clean, static-analysis tests clean. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
5 tasks
msfstef
added a commit
that referenced
this pull request
Apr 14, 2026
…th (#4127) ## Summary During deployments, we observed an onslaught of `ArgumentError` crashes when the stack is trying to get ready: ``` ** (ArgumentError) errors were found at the given arguments: * 1st argument: the table identifier does not refer to an existing ETS table (stdlib) :ets.next_lookup(#Reference<...>, {51460894745528, 0}) pure_file_storage.ex:1148: Electric.ShapeCache.PureFileStorage.read_range_from_ets_cache/5 pure_file_storage.ex:1076: Electric.ShapeCache.PureFileStorage.stream_main_log/3 ``` ### Root cause Each shape's Consumer process owns an unnamed ETS buffer table for in-memory log data. The table's TID (reference) is stored in the stack-wide named ETS table so that HTTP reader processes can look it up via `for_shape/2` and `read_or_initialize_metadata/2`. During stack restarts, Consumer processes terminate (or crash and restart), which destroys their buffer ETS tables. HTTP requests that are in-flight — or arrive during this window — can capture a stale TID and crash when they attempt `:ets.next_lookup` on it. ### Race windows identified 1. **Terminate ordering gap** — Between `:ets.delete(buffer_ets)` and `clean_shape_ets_entry` (which nils the reference in stack ETS), new readers could look up the dead TID from stack ETS. 2. **Already-captured TID** — Even after the reference is nil'd, readers that captured the TID *before* terminate started still hold the stale reference in a local variable. 3. **Startup churn** — During stack initialization many Consumers start simultaneously and some may fail/restart, creating repeated windows where stale TIDs exist in the stack ETS. ### Fix - **`safe_next_lookup/2`**: Wraps `:ets.next_lookup` in a rescue for `ArgumentError`, returning `:ets_dead`. The recursive `read_range_from_ets_cache/5` handles this the same as `:"$end_of_table"` — returning whatever was accumulated so far. The callers at both call sites (pure ETS path and mixed disk+ETS path) already detect empty/partial reads and fall back to reading from disk. The rescue is isolated to this leaf function specifically to **preserve tail call optimization** in the recursive reader. - **Reversed terminate ordering**: `clean_shape_ets_entry` (which nils the `ets_table` field) now runs *before* `:ets.delete(buffer_ets)`, closing race window #1 for new readers — they see `ets_table: nil` and hit the existing nil guard at `read_range_from_ets_cache(nil, _, _)`. Git history confirms the original intent was nil-before-delete (commit 5677df7), and the current ordering was an artifact of successive refactors. ### Data correctness analysis The key invariant that makes this safe: **`terminate` calls `close_all_files` (which flushes buffered data to disk via `IO.binwrite` + `:file.datasync`) *before* touching the ETS table.** After terminate, all data that was in the buffer ETS is on disk. The existing code already handles the semantically identical case of "ETS cleared by a concurrent flush" — comments at lines 1060 and 1079 describe this exact pattern. A deleted table is the same situation: data was in ETS, now it's on disk. Failure scenario analysis: | Scenario | Wrong data? | Wrong order? | Duplicates? | Missing data? | |---|---|---|---|---| | Clean terminate (flush succeeds) | No | No | No | No — chunk index finds flushed data | | Partial ETS read before crash | No | No | No | No — partial data discarded, full range re-read from disk | | Consumer killed without terminate | No | No | No | Yes — tail data lost (inherent to unclean kill, not introduced by this fix) | | No chunk index entry yet | No | No | No | Yes — empty response, client retries | In all cases: data returned is correct and ordered. The only failure mode is returning less data than the absolute latest, which triggers client retry. ## Test plan - [x] New test: reader falls back to disk when ETS table is deleted (pure ETS path) - [x] New test: reader falls back to disk when ETS table is deleted (mixed disk + ETS path) - [x] Both tests fail with `ArgumentError` before the fix, pass after - [x] Existing "ETS read/write race condition" tests still pass - [x] Full `test/electric/shape_cache/` suite passes (181 tests, 0 failures) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2 tasks
4 tasks
balegas
added a commit
that referenced
this pull request
May 8, 2026
#1 — `mcp.json` is now explicit opt-in. New `loadProjectMcpConfig` option (boolean | string) on `BuiltinAgentsServer`; defaults to off because stdio servers spawn local commands. The Electron desktop and the `electric-ax` CLI opt in; library embedders get the safe default. #2 — `stop()` tears down MCP resources. Added `Registry.close()` (closes every transport, forgets auth state, emits a final empty snapshot). `BuiltinAgentsServer.stop()` now also disposes the `mcp.json` chokidar watcher and unregisters the `mcp` tool provider. #3 — `RuntimeRegistry.register()` accumulates types per runtime instead of last-write-wins, fixing `/api/runtimes` losing earlier types when entity-type registration POSTs land in parallel. #4 — `applyMerged` is async/await so `mcpRegistry.applyConfig` rejections actually reach the catch (previously voided inside a `.then`, escaping as unhandled rejections). Optional `onConfigError` callback exposed for embedders. #5 — `composeToolsWithProviders` warns when a named MCP server in `mcp.tools(['x'])` is unavailable (unknown or not yet ready). Wildcard sentinels stay silent; missing names dedupe within a single call. #6a — `hashConfig()` includes `timeoutMs` so timeout-only edits no longer skip the reconfigure path and leave the entry's stale. #6b — `mcp.tools()` (no arg) is the canonical "every registered server" form. `mcp.tools('*')` kept for back-compat. Built-ins `horton`/`worker` and the docs use the no-arg form. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 13, 2026
7 tasks
This was referenced May 19, 2026
2 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.