feat(provider): selective credential passthrough by laitingsheng · Pull Request #1355 · NVIDIA/OpenShell

laitingsheng · 2026-05-13T04:45:47Z

Summary

Adds an opt-in passthrough mechanism for provider credentials. The supervisor injects the real value into the agent's environment instead of the canonical openshell:resolve:env:<KEY> placeholder, and the L7 proxy performs no substitution for that key. Required for SDKs that validate credential format in-process (Slack xoxb-*/xapp-*), credentials consumed by in-process crypto (HMAC signing secrets, signature verification), and transports the proxy cannot rewrite (WebSocket payloads after HTTP 101 upgrade).

Related Issue

Fixes #894.

Related: #872, #913, #988.

Groundwork for NVIDIA/NemoClaw#1569, NVIDIA/NemoClaw#2024, NVIDIA/NemoClaw#2031, NVIDIA/NemoClaw#2085, and NVIDIA/NemoClaw#3179.

Changes

Adds one new optional field to Provider in proto/datamodel.proto:

passthrough_credentials: repeated string listing credential keys whose real value is injected directly into the agent process. Drops the "agent never sees the real secret" invariant for the listed keys; treated as a per-credential security trade-off.

GetSandboxProviderEnvironmentResponse gains passthrough_keys so the supervisor knows which entries in environment are real values rather than canonical-placeholder targets.

Custom-format placeholders (the second mode in earlier drafts of this PR) are deliberately deferred. Passthrough alone covers all three failure classes from #894 (in-process format check, opaque transport, in-process crypto). A custom-placeholder field is purely additive when a future Class A use case justifies it.

Server changes (crates/openshell-server):

grpc/provider.rs: round-trip on Create/Update/Get/List. resolve_provider_environment returns a ProviderEnvironment struct containing the merged env map and the union of passthrough keys; on duplicate keys across providers, the passthrough flag follows the winning value.
grpc/validation.rs: validate each entry as a valid env var name, present in credentials, no duplicates, under the entry-count cap.
grpc/policy.rs: plumb passthrough_keys into GetSandboxProviderEnvironmentResponse.
Update merge has three modes:
- clear_passthrough_credentials=true on UpdateProviderRequest = replace with the empty list. proto3 cannot distinguish "absent" from "empty list" on a repeated field, so this is the only way to revoke passthrough for every key. Mutually exclusive with a non-empty passthrough_credentials on the same request;
- empty incoming + clear=false = preserve existing list, but auto-prune entries whose credential was deleted in the same update — without this, marking a credential passthrough and then deleting that credential locks the provider in an invalid state;
- non-empty incoming = replace verbatim; explicit orphans still rejected as a caller bug.

Sandbox changes (crates/openshell-sandbox):

grpc_client.rs: fetch_provider_environment now returns a ProviderEnvironment carrying both the env map and the passthrough key list.
secrets.rs: new SecretResolver::from_provider_env_with_passthrough — for passthrough keys puts the real value in child_env and skips resolver registration; non-passthrough keys keep the existing canonical-placeholder behaviour. Resolver is None when every key is passthrough.
lib.rs: thread the passthrough list through the supervisor launch path; OCSF log includes both env_count and passthrough_count.

CLI changes (crates/openshell-cli):

provider create/provider update: --passthrough KEY (repeatable; key must also appear in --credential on create).
provider update: --clear-passthrough (bool) wipes the passthrough list; mutually exclusive with --passthrough.
provider get: annotates passthrough keys (KEY (passthrough)).
provider list: adds a PASSTHROUGH count column.

Architecture docs (architecture/sandbox.md): new "Selective Passthrough" section covering motivation, configuration, resolution-and-merge semantics, supervisor injection, the auto-prune-on-credential-deletion behaviour, and the security trade-off.

User-facing docs (docs/sandboxes/manage-providers.mdx): how-to with example, <Warning> on the security trade-off, and a "Revoking passthrough" subsection covering --clear-passthrough.

Testing

mise run pre-commit passes
Unit tests added/updated
E2E tests added/updated (if applicable)

Checklist

Follows Conventional Commits
Commits are signed off (DCO)
Architecture docs updated (if applicable)

Add an opt-in passthrough mechanism for provider credentials that bypass the canonical openshell:resolve:env:* placeholder and L7 proxy substitution. Provider gains a repeated string passthrough_credentials field listing credential keys whose real value should be injected directly into the agent process environment. Required for SDKs that validate credential format in-process (Slack xoxb-*/xapp-*), credentials consumed by in-process crypto (HMAC signing secrets, signature verification), and transports the proxy cannot rewrite (WebSocket payloads after HTTP 101 upgrade). Passthrough drops the "agent never sees the real secret" invariant for the listed keys; documented as a per-credential security trade-off. Server: validation rejects keys not in credentials, invalid env var names, duplicates, and over-cap. resolve_provider_environment returns a ProviderEnvironment struct with the merged env map and the union of passthrough keys; passthrough flag follows the winning value on duplicate keys across providers. Update merge auto-prunes orphaned passthrough entries when their credential is deleted in the same call (proto3 cannot distinguish "field unset" from "set to empty list"), preserving the "recover via update" path. Explicit non-empty incoming list still rejects orphans as a caller bug. Sandbox: SecretResolver::from_provider_env_with_passthrough handles the new mode — passthrough keys go to child env as real values, resolver does not learn them; non-passthrough keys still get the canonical placeholder. CLI: provider create/update accept --passthrough KEY (repeatable); provider get annotates passthrough keys, provider list adds a PASSTHROUGH column. Refs #894 (selective passthrough lands; custom-format placeholders deferred to a follow-up). Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

Resolve conflicts from upstream: - Architecture doc consolidation (#1184) deleted sandbox-providers.md; migrate the Selective Passthrough section to architecture/sandbox.md. - Provider credential rotation infrastructure (provider_env_revision + ProviderCredentialState) and selective passthrough share the same proto field slot and the same SecretResolver entry point. Merge them: proto GetSandboxProviderEnvironmentResponse keeps both passthrough_keys and provider_env_revision; SecretResolver::from_provider_env_for_revision takes both passthrough_keys and revision; ProviderCredentialState plumbs passthrough_keys through from_environment / install_environment; ProviderEnvironment carries all three fields end-to-end. - Provider proto struct picked up passthrough_credentials in our branch; add the missing field to test fixtures in run.rs and grpc/sandbox.rs. - provider_create grew a passthrough argument; pass &[] in provider_commands_integration.rs. Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

Signed-off-by: Tinson Lai <tinsonl@nvidia.com> # Conflicts: # mise.lock

copy-pr-bot · 2026-05-13T04:47:21Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

Resolve three conflicts from upstream credential-rewrite work (#1286): - `secrets.rs`: the inner constructor now accepts both `passthrough_keys` (from this branch) and `include_current_aliases` (from main). Passthrough keys short-circuit into `child_env` directly; the alias-fallback insert runs only for placeholder keys. Threaded `passthrough_keys` through both public wrappers (`from_provider_env_for_revision`, `from_provider_env_for_current_revision`). - `provider_credentials.rs`: `from_environment` and `install_environment` keep this branch's `passthrough_keys` parameter and main's three-tuple return shape (`generation_resolver` + `current_resolver`). - `run.rs`: an additive `service_expose`/`service_list`/... block from main collided with this branch's `#[allow(clippy::too_many_arguments)]` annotation on `provider_create`. Keep both — annotation stays on `provider_create`, services block sits before it. Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

github-actions · 2026-05-13T12:39:23Z

🌿 Preview your docs: https://nvidia-preview-pr-1355.docs.buildwithfern.com/openshell

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

github-actions · 2026-05-13T14:33:30Z

Label test:e2e applied for ada8cee. Open the existing run and click Re-run all jobs to execute with the label set. The E2E Gate check on this PR will flip green automatically once the run finishes.

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

amfred · 2026-05-26T22:47:23Z

My code review agent GPT-5.5 flagged the following issue, and I agree that there should be a way to clear the passthrough credentials:

[P1] UpdateProvider cannot clear the last passthrough credential
crates/openshell-server/src/grpc/provider.rs:162

The update merge treats an empty incoming passthrough_credentials list as “preserve existing,” and a non-empty list as replacement. Because this is a proto3 repeated field, clients have no presence bit to distinguish “omitted” from “explicitly set to empty.” That means once a provider has exactly one passthrough key, there is no direct API or CLI path to turn passthrough off while keeping the credential. openshell provider update --passthrough ... can replace with another non-empty list, but openshell provider update with no passthrough preserves the old list.

That is a security-relevant footgun: passthrough is the mode that exposes raw secrets to the agent process, so operators need a clean revocation path. I’d add either an explicit clear_passthrough_credentials field, an update field mask, or a dedicated provider passthrough update RPC/operation. Then add tests covering “single passthrough key -> clear to empty.”

Related code:

crates/openshell-server/src/grpc/provider.rs:155-170

crates/openshell-cli/src/run.rs:4253-4257

docs/sandboxes/manage-providers.mdx documents update support but not a way to clear passthrough.

github-actions · 2026-05-27T01:36:28Z

Label test:e2e applied for ba366d7. Open the existing run and click Re-run all jobs to execute with the label set. The run will execute the standard E2E suite after building the required gateway and supervisor images once. The matching required CI gate status on this PR will flip green automatically once the run finishes.

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

…quest proto3 cannot distinguish 'absent' from 'empty list' on a repeated field, so an empty incoming passthrough_credentials list is treated as 'preserve existing'. A bool clear_passthrough_credentials on UpdateProviderRequest gives callers an explicit revoke-all path. The CLI exposes it as `openshell provider update --clear-passthrough`, mutually exclusive with `--passthrough KEY`. Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

Apply review feedback on PR #1355: - Format provider_update test call to satisfy `cargo fmt --check`. - Update six provider_create test sites and one Provider literal to carry the passthrough column added by the merge from main. - Qualify the "agent never sees real credentials" claim with a pointer to selective passthrough so the default placeholder path stays the obvious choice but the opt-in is discoverable. - Clarify that `--clear-passthrough` only affects sandboxes launched (or relaunched) after the update; live agent processes keep their injected env until restart. - Document the `clear_passthrough_credentials = true` update mode in the sandbox architecture doc. - Replace "honour" with "apply" in the doc comment and MDX page for consistency with the rest of the codebase. Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

amfred · 2026-05-27T14:16:15Z

            })? {
            Some(record) => {
                hasher.update(record.id.as_bytes());
                hasher.update(record.updated_at_ms.to_le_bytes());


GPT 5.5 code review:

Provider env revisions still rely on record.updated_at_ms, and the new passthrough state is not hashed directly. The sandbox only refreshes provider credentials when provider_env_revision changes (crates/openshell-sandbox/src/lib.rs (line 2335)), so two provider writes inside the same millisecond can leave a sandbox using stale provider mode or credentials. That is especially risky for passthrough revocation because the old real-secret exposure may persist until another detectable revision change.

Suggested fix: hash record.resource_version in compute_provider_env_revision; optionally also hash sorted passthrough_credentials explicitly for defense in depth.

I asked for a more in-depth explanation because I didn't follow this initially myself.

The core issue is that compute_provider_env_revision() decides whether a running sandbox should refresh provider credentials. The sandbox poll loop compares the latest provider_env_revision against its current value; if unchanged, it does nothing. So this revision has to change for every provider state change that can affect the sandbox environment.

Right now it hashes record.updated_at_ms:

hasher.update(record.id.as_bytes()); hasher.update(record.updated_at_ms.to_le_bytes());

That is fragile because updated_at_ms only has millisecond precision. If two provider updates land in the same millisecond, the persisted provider can change while the computed revision stays the same. For this PR, that matters more than usual because provider updates now control whether a credential is injected as a real secret via passthrough or as a placeholder. A missed revision can mean a sandbox keeps stale credential behavior until some later unrelated config change wakes it up.

The best fix is to hash the store’s monotonic resource_version instead of, or in addition to, updated_at_ms:

hasher.update(record.id.as_bytes()); hasher.update(record.resource_version.to_le_bytes());

I would prefer replacing updated_at_ms with resource_version if that field is guaranteed to increment on every provider write, which the SQLite persistence path appears to do. Keeping both is also fine:

hasher.update(record.id.as_bytes()); hasher.update(record.resource_version.to_le_bytes()); hasher.update(record.updated_at_ms.to_le_bytes());

For defense in depth, I’d also hash the actual fields that affect env resolution, including passthrough:

let mut passthrough_keys: Vec<_> = provider.passthrough_credentials.iter().collect(); passthrough_keys.sort(); for key in passthrough_keys { hasher.update(b"passthrough"); hasher.update(key.as_bytes()); }

That makes the revision logic self-documenting: credentials, expiry, and passthrough mode all participate directly. Then add a focused test that performs two provider updates without sleeping and asserts the revision changes after each update. The existing test sleeps for 2ms, which hides exactly the race this should protect against.

amfred · 2026-05-27T14:27:03Z


 impl ProviderEnvironment {
    #[cfg(test)]
    fn is_empty(&self) -> bool {


GPT 5.5 code review.

crates/openshell-server/src/grpc/provider.rs (line 51): CI branch checks are blocked by Clippy because ProviderEnvironment::is_empty is unused under test builds. Local reproduction: cargo clippy -p openshell-server --all-targets -- -D warnings fails with method is_empty is never used. Suggested fix: remove the helper, use it in tests, or add a narrow #[allow(dead_code)] if it is intentionally retained.

More in-depth explanation:

The PR adds test-only helper methods on ProviderEnvironment:

impl ProviderEnvironment { #[cfg(test)] fn is_empty(&self) -> bool { self.environment.is_empty() } #[cfg(test)] fn get(&self, key: &str) -> Option<&String> { self.environment.get(key) } #[cfg(test)] fn contains_key(&self, key: &str) -> bool { self.environment.contains_key(key) } }

get and contains_key are used by tests, but is_empty is not. That produces a dead_code warning in test builds. CI runs Clippy with -D warnings, so the warning becomes a hard failure:

error: method `is_empty` is never used --> crates/openshell-server/src/grpc/provider.rs:51:8 | 51 | fn is_empty(&self) -> bool { | ^^^^^^^^ = note: `-D dead-code` implied by `-D warnings`

So this is not a sandbox escape or behavior bug. It is a branch-protection blocker: the Rust checks cannot pass until it is fixed.

The best fix is simply to remove is_empty if no test uses it:

impl ProviderEnvironment { #[cfg(test)] fn get(&self, key: &str) -> Option<&String> { self.environment.get(key) } #[cfg(test)] fn contains_key(&self, key: &str) -> bool { self.environment.contains_key(key) } }

I would avoid adding #[allow(dead_code)] here. That would make CI green, but it preserves unused API surface for no benefit and weakens the signal of -D warnings. Using the helper in a test is also acceptable if there is a natural assertion like “detached provider env resolves empty,” but adding a test assertion solely to justify the helper is a little backwards.

Recommended review comment:

is_empty is test-only but unused, and cargo clippy -p openshell-server --all-targets -- -D warnings fails because dead_code is denied. Please remove this helper unless a test actually needs it. Prefer removing it over adding an allow, since the helper is private and has no behavioral value right now.

laitingsheng added 3 commits May 1, 2026 18:01

chore: merge main into fix/894-customisable-credential-placeholder

d1a3885

Signed-off-by: Tinson Lai <tinsonl@nvidia.com> # Conflicts: # mise.lock

laitingsheng requested review from a team, derekwaynecarr, maxamillion and mrunalp as code owners May 13, 2026 04:45

laitingsheng marked this pull request as draft May 13, 2026 04:47

docs(providers): clarify passthrough key resolution

771b630

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

laitingsheng marked this pull request as ready for review May 13, 2026 14:19

style: collapse single-line calls to satisfy rustfmt

ada8cee

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

laitingsheng added the test:e2e Requires end-to-end coverage label May 13, 2026

test(e2e): accept revisioned placeholder in coexistence test

ba366d7

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

laitingsheng added test:e2e Requires end-to-end coverage and removed test:e2e Requires end-to-end coverage labels May 27, 2026

laitingsheng added 3 commits May 27, 2026 02:31

chore: merge main into fix/894-customisable-credential-placeholder

f67061a

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

amfred reviewed May 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(provider): selective credential passthrough#1355

feat(provider): selective credential passthrough#1355
laitingsheng wants to merge 10 commits into
mainfrom
fix/894-customisable-credential-placeholder

laitingsheng commented May 13, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented May 13, 2026

Uh oh!

github-actions Bot commented May 13, 2026

Uh oh!

github-actions Bot commented May 13, 2026

Uh oh!

amfred commented May 26, 2026

Uh oh!

github-actions Bot commented May 27, 2026

Uh oh!

amfred May 27, 2026

Uh oh!

amfred May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

laitingsheng commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Issue

Changes

Testing

Checklist

Uh oh!

copy-pr-bot Bot commented May 13, 2026

Uh oh!

github-actions Bot commented May 13, 2026

Uh oh!

github-actions Bot commented May 13, 2026

Uh oh!

amfred commented May 26, 2026

Uh oh!

github-actions Bot commented May 27, 2026

Uh oh!

amfred May 27, 2026

Choose a reason for hiding this comment

Uh oh!

amfred May 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

laitingsheng commented May 13, 2026 •

edited

Loading