feat: embed agentgateway in openshell-sandbox for inference + MCP

## Problem Statement

The sandbox today rolls its own LLM proxy: `crates/openshell-sandbox/src/l7/inference.rs` does pattern matching, `crates/openshell-router` does request rewriting and streaming, and `crates/openshell-sandbox/src/proxy.rs::process_inference_keepalive` glues them together. This works for the three providers we ship, but every new capability — guardrails, retries, OTel tracing, budget controls, additional providers, MCP federation, A2A — is custom code we have to write and maintain.

[agentgateway](https://github.com/agentgateway/agentgateway) is a Rust LLM/MCP/A2A gateway that already implements all of those. Embedding it in-process inside `openshell-sandbox` lets us delete duplicated logic and pick up those features for free.

## Proposed Design

### Scope

- **In:** replace the `inference.local` interception path; add `mcp.local` as a new listener.
- **Out:** A2A, the L4 OPA proxy, forward HTTP proxy, secret resolver, and any `openshell-server` change.

### Architecture

- **Embedding:** link agentgateway as an in-process library dependency. Construct `agentgateway::ProxyInputs` and call `Gateway::proxy_bind(...)` directly with the post-OPA `TcpStream`. No subprocess, no loopback hop.
- **TLS:** agentgateway terminates TLS using a per-SNI cert resolver backed by the existing sandbox CA in `ProxyTlsState`. Cert-resolver hook to be contributed (code drop pending).
- **Config:** in-process. The supervisor builds an agentgateway `Stores` directly from `GetInferenceBundle` results (and a sandbox-settings-driven MCP target list as a Phase 1 stop-gap) and hot-swaps it on bundle revision changes — replacing the existing `spawn_route_refresh` loop.
- **Observability:** an OCSF adapter consumes agentgateway access logs and emits `HttpActivityBuilder` events; guardrail blocks dual-emit `DetectionFinding`. Per-binary identity from `resolve_process_identity` is propagated via header injection at handoff.
- **What stays:** the L4 CONNECT lifecycle, OPA decision, process-identity resolution, denial aggregator, forward HTTP proxy, secret resolver, and `openshell-router::system_inference` (still used for in-process system calls).

### Benefits

- Delete duplicated proxy logic in `openshell-sandbox/l7/inference.rs`, `openshell-router::backend`, and the route-refresh loop.
- Gain agentgateway-native features without writing them: guardrails, retries, OTel tracing, budget controls, broader provider coverage.
- MCP gateway as a new product capability with no custom-built MCP code in OpenShell.
- Aligns OpenShell's L7 surface with the broader agent-gateway ecosystem; future agentgateway upgrades land features automatically.

### What's involved

**Upstream / fork work on agentgateway:**

- Cargo features to slim the build (no XDS client, no kube controller, no OIDC by default).
- Programmatic `Stores` builder API for in-memory config.
- Per-SNI cert resolver hook on `ServerTLSConfig`.
- Pluggable access-log sink (so OpenShell can adapt logs to OCSF).

**OpenShell-side work:**

- New `crates/openshell-sandbox/src/agw/` module (config translator, lifetime owner, handoff functions).
- Wire `handle_inference_interception` to the agw handoff behind a feature flag for safe migration.
- Add `handle_mcp_interception` for `mcp.local` and a sandbox-settings-driven MCP target list.
- OCSF access-log adapter.
- Cut over and shrink `openshell-router` to `system_inference` only.

## Alternatives Considered

- **Sidecar container/process.** Breaks the "all egress through the sandbox proxy" model and would need extra netns rules to constrain it. Rejected.
- **Subprocess managed by the supervisor.** Workable, but the in-process path is cleaner and avoids a localhost TLS hop given that we want to terminate TLS in agentgateway anyway.
- **Keep building it ourselves.** Each new capability (guardrails, MCP, A2A, more providers) is custom code we'd otherwise inherit.
- **Extend `openshell-router` to cover MCP/guardrails/etc.** Same maintenance burden as today, multiplied.

## Risks / Open Questions

- Agentgateway is shaped as a binary, not a polished library; the upstream work above is a real chunk of effort. A short spike should validate before committing.
- MITM cert-resolver hook size depends on the code drop.
- No `GetMcpBundle` proto exists; Phase 1 reads MCP targets from sandbox settings as a stop-gap.

## Agent Investigation

- Reviewed `architecture/gateway.md`, `architecture/inference-routing.md`, `crates/openshell-sandbox/src/proxy.rs`, `crates/openshell-sandbox/src/l7/`, and `crates/openshell-router/`.
- Reviewed agentgateway at `~/src/agentgateway/agentgateway` — confirmed `ProxyInputs::new`, `Gateway::proxy_bind`, in-memory `Stores`, AI/MCP/A2A backend types, and that HTTPS/TLS/SSH listener protocols exist.
- Confirmed the L4 OPA proxy, process-identity correlation, and forward HTTP proxy in `openshell-sandbox` have no agentgateway analogue and must remain.
- Confirmed `openshell-router::system_inference` is still needed post-cutover for in-process system calls.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: embed agentgateway in openshell-sandbox for inference + MCP #998

Problem Statement

Proposed Design

Scope

Architecture

Benefits

What's involved

Alternatives Considered

Risks / Open Questions

Agent Investigation

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

feat: embed agentgateway in openshell-sandbox for inference + MCP #998

Description

Problem Statement

Proposed Design

Scope

Architecture

Benefits

What's involved

Alternatives Considered

Risks / Open Questions

Agent Investigation

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions