Skip to content

feat(k8s, helm): Enable running OpenShell Gateway with multiple replicas #1021

@TaylorMutch

Description

@TaylorMutch

Problem Statement

When running on Kubernetes, the OpenShell Gateway currently runs as a single-replica StatefulSet. This blocks rolling deployments (upgrades cause a full outage for existing supervisor connections), prevents horizontal scaling under load, and means any Gateway pod failure interrupts all active sandbox connections until the pod restarts.

Proposed Design

Multi-replica support requires changes across four areas, delivered in sequence. SQLite remains the supported backend for single-replica deployments (dev, small, low-cost). PostgreSQL is required for multi-replica.

Blockers to resolve

1. Storage — SQLite cannot support multiple writers

SQLite uses per-process file locking and the StatefulSet's ReadWriteOnce PVC physically prevents two pods from mounting the same volume on different nodes. The persistence layer already supports PostgreSQL (crates/openshell-server/src/persistence/postgres.rs). Multi-replica deployments must use PostgreSQL; the Helm chart should document this and warn when replicaCount > 1 is set with a SQLite dbUrl. SQLite deployments remain on the current StatefulSet + PVC path unchanged.

2. Reconciliation — concurrent loops cause data races

reconcile_loop() and watch_loop() run independently on every replica (compute/mod.rs:533–542). With multiple replicas this causes double-deletes and conflicting updates on shared sandbox records. Replicas need a coordination mechanism so only the relevant replica reconciles a given sandbox. The design for this is tracked separately.

3. Supervisor sessions — in-memory state is not shared

The SupervisorSessionRegistry (supervisor_session.rs:70) is per-replica and in-memory. A supervisor reconnecting to a different replica after a pod restart will fail to find its session, breaking SSH relay. Replicas need to either own a stable subset of supervisors or share session state. The design for this is tracked separately.

4. SSH connection limits — not globally enforced

Connection slots are tracked in per-replica Mutex<HashMap> (ssh_tunnel.rs:38), making per-token and per-sandbox limits "per replica" rather than global. Once replica ownership is established these limits can be enforced locally; global enforcement will require the shared database.

Phased delivery

Phase Change Unblocks
1 PostgreSQL backend + Deployment (replacing StatefulSet) everything else
2 Replica ownership for reconciliation and session routing safe rolling deploys
3 Persistent supervisor session state transparent pod failure recovery
4 Shared connection limit accounting correct global per-token limits

Phases 1 and 2 are the minimum for safe rolling deployments. Phases 3 and 4 are required for full HA.

Alternatives Considered

ReadWriteMany PVC with SQLite over NFS: Avoids the PostgreSQL dependency but SQLite over NFS is unreliable — WAL lock propagation is slow and network failures can corrupt the database. Not recommended.

Kubernetes Lease-based leader election for reconciliation: Solves the reconciliation race but ties multi-replica behavior to Kubernetes, breaking Docker and Podman deployments. A deployment-agnostic coordination mechanism is preferred.

Agent Investigation

  • PostgreSQL persistence is fully implemented at crates/openshell-server/src/persistence/postgres.rs — no new persistence code is needed for Phase 1.
  • reconcile_loop() and watch_loop() are spawned unconditionally per replica at compute/mod.rs:533–542. The sync_lock mutex at line 1015 is process-local only.
  • SupervisorSessionRegistry at supervisor_session.rs:70–94 is purely in-memory with no cross-replica sharing.
  • SSH connection slots are tracked in two Mutex<HashMap> fields on ServerState at ssh_tunnel.rs:38–64.
  • The StatefulSet PVC uses accessModes: ["ReadWriteOnce"] (templates/statefulset.yaml:179), physically preventing multi-node pod scheduling with SQLite.

Checklist

  • I've reviewed existing issues and the architecture docs
  • This is a design proposal, not a "please build this" request

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions