-
Notifications
You must be signed in to change notification settings - Fork 437
cosmos_database_container_item_query: 93% of failures are auth credential errors (CredentialUnavailableException) #2290
Description
Summary
Telemetry analysis of cosmos_database_container_item_query over the last 30 days shows 21,183 total failures. 93% are caused by a single root cause: CredentialUnavailableException (HTTP 401) — users whose Azure CLI credentials are expired or not configured.
This suggests the tool's error handling and user guidance for auth failures could be improved.
Failure Breakdown (30d)
| # | Exception Type | Status | Count | % | Avg Latency | Root Cause |
|---|---|---|---|---|---|---|
| 1 | Azure.Identity.CredentialUnavailableException |
401 | 19,640 | 92.7% | 249ms | Azure CLI not logged in or token expired |
| 2 | (no exception captured) | — | 1,190 | 5.6% | 61,000ms | Silent timeouts — 61s avg, no error details surfaced |
| 3 | Microsoft.Azure.Cosmos.CosmosException |
403 | 150 | 0.7% | ~2s | User lacks Cosmos DB data-plane RBAC |
| 4 | System.ArgumentNullException |
500 | 100 | 0.5% | ~1s | Null required parameter in tool invocation |
| 5 | ValidationError |
— | 35 | 0.2% | ~1s | Missing required args (--account, --database, --container, --subscription) |
| 6 | Azure.RequestFailedException (AuthorizationFailed) |
403 | ~30 | 0.1% | ~8s | ARM-level RBAC denial |
| 7 | System.Net.Http.HttpRequestException |
503 | 18 | <0.1% | ~3s | Cosmos DB service unavailable |
| 8 | CosmosOperationCanceledException |
500 | 8 | <0.1% | 22 min | Extreme query timeouts |
| 9 | InvalidAuthenticationTokenTenant |
401 | ~6 | <0.1% | ~2s | Wrong Entra ID tenant |
| 10 | System.TypeInitializationException |
500 | 10 | <0.1% | ~30ms | SDK initialization failure |
Recommendations
1. Better auth error UX (addresses 93% of failures)
When CredentialUnavailableException is caught, return a clear, actionable error message instead of a generic failure:
"Azure credentials not found. Please run
az loginto authenticate, then try again."
Consider proactively checking credential availability before attempting the Cosmos DB call.
2. Surface error details for silent timeouts (addresses 5.6%)
The 1,190 failures with no exception type captured have a 61-second average duration — these appear to be connection timeouts where the error is swallowed. Ensure the timeout exception and message are captured in telemetry.
Environment
- Data source:
RawEventsDependenciestable inAzureDevExp - Time range: Last 30 days (as of 2026-03-30)
- Clients affected: Primarily VS Code (
clientname == 'Visual Studio Code')
Metadata
Metadata
Assignees
Labels
Type
Projects
Status