Skip to content

feat: embed agentgateway in openshell-sandbox for inference + MCP #998

@EItanya

Description

@EItanya

Problem Statement

The sandbox today rolls its own LLM proxy: crates/openshell-sandbox/src/l7/inference.rs does pattern matching, crates/openshell-router does request rewriting and streaming, and crates/openshell-sandbox/src/proxy.rs::process_inference_keepalive glues them together. This works for the three providers we ship, but every new capability — guardrails, retries, OTel tracing, budget controls, additional providers, MCP federation, A2A — is custom code we have to write and maintain.

agentgateway is a Rust LLM/MCP/A2A gateway that already implements all of those. Embedding it in-process inside openshell-sandbox lets us delete duplicated logic and pick up those features for free.

Proposed Design

Scope

  • In: replace the inference.local interception path; add mcp.local as a new listener.
  • Out: A2A, the L4 OPA proxy, forward HTTP proxy, secret resolver, and any openshell-server change.

Architecture

  • Embedding: link agentgateway as an in-process library dependency. Construct agentgateway::ProxyInputs and call Gateway::proxy_bind(...) directly with the post-OPA TcpStream. No subprocess, no loopback hop.
  • TLS: agentgateway terminates TLS using a per-SNI cert resolver backed by the existing sandbox CA in ProxyTlsState. Cert-resolver hook to be contributed (code drop pending).
  • Config: in-process. The supervisor builds an agentgateway Stores directly from GetInferenceBundle results (and a sandbox-settings-driven MCP target list as a Phase 1 stop-gap) and hot-swaps it on bundle revision changes — replacing the existing spawn_route_refresh loop.
  • Observability: an OCSF adapter consumes agentgateway access logs and emits HttpActivityBuilder events; guardrail blocks dual-emit DetectionFinding. Per-binary identity from resolve_process_identity is propagated via header injection at handoff.
  • What stays: the L4 CONNECT lifecycle, OPA decision, process-identity resolution, denial aggregator, forward HTTP proxy, secret resolver, and openshell-router::system_inference (still used for in-process system calls).

Benefits

  • Delete duplicated proxy logic in openshell-sandbox/l7/inference.rs, openshell-router::backend, and the route-refresh loop.
  • Gain agentgateway-native features without writing them: guardrails, retries, OTel tracing, budget controls, broader provider coverage.
  • MCP gateway as a new product capability with no custom-built MCP code in OpenShell.
  • Aligns OpenShell's L7 surface with the broader agent-gateway ecosystem; future agentgateway upgrades land features automatically.

What's involved

Upstream / fork work on agentgateway:

  • Cargo features to slim the build (no XDS client, no kube controller, no OIDC by default).
  • Programmatic Stores builder API for in-memory config.
  • Per-SNI cert resolver hook on ServerTLSConfig.
  • Pluggable access-log sink (so OpenShell can adapt logs to OCSF).

OpenShell-side work:

  • New crates/openshell-sandbox/src/agw/ module (config translator, lifetime owner, handoff functions).
  • Wire handle_inference_interception to the agw handoff behind a feature flag for safe migration.
  • Add handle_mcp_interception for mcp.local and a sandbox-settings-driven MCP target list.
  • OCSF access-log adapter.
  • Cut over and shrink openshell-router to system_inference only.

Alternatives Considered

  • Sidecar container/process. Breaks the "all egress through the sandbox proxy" model and would need extra netns rules to constrain it. Rejected.
  • Subprocess managed by the supervisor. Workable, but the in-process path is cleaner and avoids a localhost TLS hop given that we want to terminate TLS in agentgateway anyway.
  • Keep building it ourselves. Each new capability (guardrails, MCP, A2A, more providers) is custom code we'd otherwise inherit.
  • Extend openshell-router to cover MCP/guardrails/etc. Same maintenance burden as today, multiplied.

Risks / Open Questions

  • Agentgateway is shaped as a binary, not a polished library; the upstream work above is a real chunk of effort. A short spike should validate before committing.
  • MITM cert-resolver hook size depends on the code drop.
  • No GetMcpBundle proto exists; Phase 1 reads MCP targets from sandbox settings as a stop-gap.

Agent Investigation

  • Reviewed architecture/gateway.md, architecture/inference-routing.md, crates/openshell-sandbox/src/proxy.rs, crates/openshell-sandbox/src/l7/, and crates/openshell-router/.
  • Reviewed agentgateway at ~/src/agentgateway/agentgateway — confirmed ProxyInputs::new, Gateway::proxy_bind, in-memory Stores, AI/MCP/A2A backend types, and that HTTPS/TLS/SSH listener protocols exist.
  • Confirmed the L4 OPA proxy, process-identity correlation, and forward HTTP proxy in openshell-sandbox have no agentgateway analogue and must remain.
  • Confirmed openshell-router::system_inference is still needed post-cutover for in-process system calls.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions