feat(provider): selective credential passthrough#1355
Conversation
Add an opt-in passthrough mechanism for provider credentials that bypass the canonical openshell:resolve:env:* placeholder and L7 proxy substitution. Provider gains a repeated string passthrough_credentials field listing credential keys whose real value should be injected directly into the agent process environment. Required for SDKs that validate credential format in-process (Slack xoxb-*/xapp-*), credentials consumed by in-process crypto (HMAC signing secrets, signature verification), and transports the proxy cannot rewrite (WebSocket payloads after HTTP 101 upgrade). Passthrough drops the "agent never sees the real secret" invariant for the listed keys; documented as a per-credential security trade-off. Server: validation rejects keys not in credentials, invalid env var names, duplicates, and over-cap. resolve_provider_environment returns a ProviderEnvironment struct with the merged env map and the union of passthrough keys; passthrough flag follows the winning value on duplicate keys across providers. Update merge auto-prunes orphaned passthrough entries when their credential is deleted in the same call (proto3 cannot distinguish "field unset" from "set to empty list"), preserving the "recover via update" path. Explicit non-empty incoming list still rejects orphans as a caller bug. Sandbox: SecretResolver::from_provider_env_with_passthrough handles the new mode — passthrough keys go to child env as real values, resolver does not learn them; non-passthrough keys still get the canonical placeholder. CLI: provider create/update accept --passthrough KEY (repeatable); provider get annotates passthrough keys, provider list adds a PASSTHROUGH column. Refs #894 (selective passthrough lands; custom-format placeholders deferred to a follow-up). Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
Resolve conflicts from upstream: - Architecture doc consolidation (#1184) deleted sandbox-providers.md; migrate the Selective Passthrough section to architecture/sandbox.md. - Provider credential rotation infrastructure (provider_env_revision + ProviderCredentialState) and selective passthrough share the same proto field slot and the same SecretResolver entry point. Merge them: proto GetSandboxProviderEnvironmentResponse keeps both passthrough_keys and provider_env_revision; SecretResolver::from_provider_env_for_revision takes both passthrough_keys and revision; ProviderCredentialState plumbs passthrough_keys through from_environment / install_environment; ProviderEnvironment carries all three fields end-to-end. - Provider proto struct picked up passthrough_credentials in our branch; add the missing field to test fixtures in run.rs and grpc/sandbox.rs. - provider_create grew a passthrough argument; pass &[] in provider_commands_integration.rs. Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
Signed-off-by: Tinson Lai <tinsonl@nvidia.com> # Conflicts: # mise.lock
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
Resolve three conflicts from upstream credential-rewrite work (#1286): - `secrets.rs`: the inner constructor now accepts both `passthrough_keys` (from this branch) and `include_current_aliases` (from main). Passthrough keys short-circuit into `child_env` directly; the alias-fallback insert runs only for placeholder keys. Threaded `passthrough_keys` through both public wrappers (`from_provider_env_for_revision`, `from_provider_env_for_current_revision`). - `provider_credentials.rs`: `from_environment` and `install_environment` keep this branch's `passthrough_keys` parameter and main's three-tuple return shape (`generation_resolver` + `current_resolver`). - `run.rs`: an additive `service_expose`/`service_list`/... block from main collided with this branch's `#[allow(clippy::too_many_arguments)]` annotation on `provider_create`. Keep both — annotation stays on `provider_create`, services block sits before it. Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
|
🌿 Preview your docs: https://nvidia-preview-pr-1355.docs.buildwithfern.com/openshell |
Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
|
Label |
Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
|
My code review agent GPT-5.5 flagged the following issue, and I agree that there should be a way to clear the passthrough credentials:
|
|
Label |
Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
…quest proto3 cannot distinguish 'absent' from 'empty list' on a repeated field, so an empty incoming passthrough_credentials list is treated as 'preserve existing'. A bool clear_passthrough_credentials on UpdateProviderRequest gives callers an explicit revoke-all path. The CLI exposes it as `openshell provider update --clear-passthrough`, mutually exclusive with `--passthrough KEY`. Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
Apply review feedback on PR #1355: - Format provider_update test call to satisfy `cargo fmt --check`. - Update six provider_create test sites and one Provider literal to carry the passthrough column added by the merge from main. - Qualify the "agent never sees real credentials" claim with a pointer to selective passthrough so the default placeholder path stays the obvious choice but the opt-in is discoverable. - Clarify that `--clear-passthrough` only affects sandboxes launched (or relaunched) after the update; live agent processes keep their injected env until restart. - Document the `clear_passthrough_credentials = true` update mode in the sandbox architecture doc. - Replace "honour" with "apply" in the doc comment and MDX page for consistency with the rest of the codebase. Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
| })? { | ||
| Some(record) => { | ||
| hasher.update(record.id.as_bytes()); | ||
| hasher.update(record.updated_at_ms.to_le_bytes()); |
There was a problem hiding this comment.
GPT 5.5 code review:
Provider env revisions still rely on
record.updated_at_ms, and the new passthrough state is not hashed directly. The sandbox only refreshes provider credentials whenprovider_env_revisionchanges (crates/openshell-sandbox/src/lib.rs (line 2335)), so two provider writes inside the same millisecond can leave a sandbox using stale provider mode or credentials. That is especially risky for passthrough revocation because the old real-secret exposure may persist until another detectable revision change.Suggested fix: hash
record.resource_versionincompute_provider_env_revision; optionally also hash sortedpassthrough_credentialsexplicitly for defense in depth.
I asked for a more in-depth explanation because I didn't follow this initially myself.
The core issue is that
compute_provider_env_revision()decides whether a running sandbox should refresh provider credentials. The sandbox poll loop compares the latestprovider_env_revisionagainst its current value; if unchanged, it does nothing. So this revision has to change for every provider state change that can affect the sandbox environment.Right now it hashes record.updated_at_ms:
hasher.update(record.id.as_bytes()); hasher.update(record.updated_at_ms.to_le_bytes());That is fragile because
updated_at_msonly has millisecond precision. If two provider updates land in the same millisecond, the persisted provider can change while the computed revision stays the same. For this PR, that matters more than usual because provider updates now control whether a credential is injected as a real secret via passthrough or as a placeholder. A missed revision can mean a sandbox keeps stale credential behavior until some later unrelated config change wakes it up.The best fix is to hash the store’s monotonic
resource_versioninstead of, or in addition to,updated_at_ms:hasher.update(record.id.as_bytes()); hasher.update(record.resource_version.to_le_bytes());I would prefer replacing
updated_at_mswithresource_versionif that field is guaranteed to increment on every provider write, which the SQLite persistence path appears to do. Keeping both is also fine:hasher.update(record.id.as_bytes()); hasher.update(record.resource_version.to_le_bytes()); hasher.update(record.updated_at_ms.to_le_bytes());For defense in depth, I’d also hash the actual fields that affect env resolution, including passthrough:
let mut passthrough_keys: Vec<_> = provider.passthrough_credentials.iter().collect(); passthrough_keys.sort(); for key in passthrough_keys { hasher.update(b"passthrough"); hasher.update(key.as_bytes()); }That makes the revision logic self-documenting: credentials, expiry, and passthrough mode all participate directly. Then add a focused test that performs two provider updates without sleeping and asserts the revision changes after each update. The existing test sleeps for 2ms, which hides exactly the race this should protect against.
|
|
||
| impl ProviderEnvironment { | ||
| #[cfg(test)] | ||
| fn is_empty(&self) -> bool { |
There was a problem hiding this comment.
GPT 5.5 code review.
crates/openshell-server/src/grpc/provider.rs (line 51): CI branch checks are blocked by Clippy becauseProviderEnvironment::is_emptyis unused under test builds. Local reproduction:cargo clippy -p openshell-server --all-targets -- -D warningsfails withmethod is_empty is never used. Suggested fix: remove the helper, use it in tests, or add a narrow#[allow(dead_code)]if it is intentionally retained.
More in-depth explanation:
The PR adds test-only helper methods on
ProviderEnvironment:impl ProviderEnvironment { #[cfg(test)] fn is_empty(&self) -> bool { self.environment.is_empty() } #[cfg(test)] fn get(&self, key: &str) -> Option<&String> { self.environment.get(key) } #[cfg(test)] fn contains_key(&self, key: &str) -> bool { self.environment.contains_key(key) } }
getandcontains_keyare used by tests, butis_emptyis not. That produces adead_codewarning in test builds. CI runs Clippy with-D warnings, so the warning becomes a hard failure:error: method `is_empty` is never used --> crates/openshell-server/src/grpc/provider.rs:51:8 | 51 | fn is_empty(&self) -> bool { | ^^^^^^^^ = note: `-D dead-code` implied by `-D warnings`So this is not a sandbox escape or behavior bug. It is a branch-protection blocker: the Rust checks cannot pass until it is fixed.
The best fix is simply to remove
is_emptyif no test uses it:impl ProviderEnvironment { #[cfg(test)] fn get(&self, key: &str) -> Option<&String> { self.environment.get(key) } #[cfg(test)] fn contains_key(&self, key: &str) -> bool { self.environment.contains_key(key) } }I would avoid adding
#[allow(dead_code)]here. That would make CI green, but it preserves unused API surface for no benefit and weakens the signal of-D warnings. Using the helper in a test is also acceptable if there is a natural assertion like “detached provider env resolves empty,” but adding a test assertion solely to justify the helper is a little backwards.Recommended review comment:
is_emptyis test-only but unused, andcargo clippy -p openshell-server --all-targets -- -D warningsfails becausedead_codeis denied. Please remove this helper unless a test actually needs it. Prefer removing it over adding an allow, since the helper is private and has no behavioral value right now.
Summary
Adds an opt-in passthrough mechanism for provider credentials. The supervisor injects the real value into the agent's environment instead of the canonical
openshell:resolve:env:<KEY>placeholder, and the L7 proxy performs no substitution for that key. Required for SDKs that validate credential format in-process (Slackxoxb-*/xapp-*), credentials consumed by in-process crypto (HMAC signing secrets, signature verification), and transports the proxy cannot rewrite (WebSocket payloads after HTTP 101 upgrade).Related Issue
Fixes #894.
Related: #872, #913, #988.
Groundwork for NVIDIA/NemoClaw#1569, NVIDIA/NemoClaw#2024, NVIDIA/NemoClaw#2031, NVIDIA/NemoClaw#2085, and NVIDIA/NemoClaw#3179.
Changes
Adds one new optional field to
Providerinproto/datamodel.proto:passthrough_credentials:repeated stringlisting credential keys whose real value is injected directly into the agent process. Drops the "agent never sees the real secret" invariant for the listed keys; treated as a per-credential security trade-off.GetSandboxProviderEnvironmentResponsegainspassthrough_keysso the supervisor knows which entries inenvironmentare real values rather than canonical-placeholder targets.Custom-format placeholders (the second mode in earlier drafts of this PR) are deliberately deferred. Passthrough alone covers all three failure classes from #894 (in-process format check, opaque transport, in-process crypto). A custom-placeholder field is purely additive when a future Class A use case justifies it.
Server changes (
crates/openshell-server):grpc/provider.rs: round-trip on Create/Update/Get/List.resolve_provider_environmentreturns aProviderEnvironmentstruct containing the merged env map and the union of passthrough keys; on duplicate keys across providers, the passthrough flag follows the winning value.grpc/validation.rs: validate each entry as a valid env var name, present incredentials, no duplicates, under the entry-count cap.grpc/policy.rs: plumbpassthrough_keysintoGetSandboxProviderEnvironmentResponse.clear_passthrough_credentials=trueonUpdateProviderRequest= replace with the empty list. proto3 cannot distinguish "absent" from "empty list" on a repeated field, so this is the only way to revoke passthrough for every key. Mutually exclusive with a non-emptypassthrough_credentialson the same request;Sandbox changes (
crates/openshell-sandbox):grpc_client.rs:fetch_provider_environmentnow returns aProviderEnvironmentcarrying both the env map and the passthrough key list.secrets.rs: newSecretResolver::from_provider_env_with_passthrough— for passthrough keys puts the real value inchild_envand skips resolver registration; non-passthrough keys keep the existing canonical-placeholder behaviour. Resolver isNonewhen every key is passthrough.lib.rs: thread the passthrough list through the supervisor launch path; OCSF log includes both env_count and passthrough_count.CLI changes (
crates/openshell-cli):provider create/provider update:--passthrough KEY(repeatable; key must also appear in--credentialon create).provider update:--clear-passthrough(bool) wipes the passthrough list; mutually exclusive with--passthrough.provider get: annotates passthrough keys (KEY (passthrough)).provider list: adds aPASSTHROUGHcount column.Architecture docs (
architecture/sandbox.md): new "Selective Passthrough" section covering motivation, configuration, resolution-and-merge semantics, supervisor injection, the auto-prune-on-credential-deletion behaviour, and the security trade-off.User-facing docs (
docs/sandboxes/manage-providers.mdx): how-to with example,<Warning>on the security trade-off, and a "Revoking passthrough" subsection covering--clear-passthrough.Testing
mise run pre-commitpassesChecklist