-
Notifications
You must be signed in to change notification settings - Fork 1.1k
CompactionProcessor never triggers for Claude models — tiktoken underestimates by 40%+ #975
Description
Environment
@github/copilot-sdk: 0.1.32@github/copilot(CLI): 1.0.6
Description
The CompactionProcessor in the Copilot CLI (app.js) estimates context utilization using tiktoken with a model-specific correction factor (MEs map: claude-opus-4.5: 1.15). This factor is far too low — in production testing, the SDK's estimated token count remained below the 80% background compaction threshold while actual Anthropic API input tokens exceeded 200k and triggered a hard 400 invalid_request_error: prompt is too long.
Evidence from a live session
- At 150k actual Anthropic
input_tokens, the CompactionProcessor had not triggered - At 200k actual tokens (hard API limit), zero compaction events had been emitted
- The
session.compaction_startevent was never fired across 9+ queries - When we manually called
session.rpc.compaction.compact()at 150k real tokens, the SDK reportedconversationTokens: 160,471internally — showing the SDK's own estimate was closer to reality than expected, but its threshold still wasn't reached
Impact
Sessions using Claude models become permanently stuck once context exceeds 200k tokens. Subsequent queries add to the context rather than reducing it, making recovery impossible without starting a new conversation.
Expected behavior
CompactionProcessor should trigger background compaction before hitting the model's context limit, regardless of provider.
Suggested fix options
- Use Anthropic's
/v1/messages/count_tokensendpoint for accurate Claude token counting - Increase the correction factor from 1.15 to at least 1.40 — overestimation is safe (triggers compaction earlier), underestimation is dangerous (never triggers)
- Use the actual
input_tokensfrom Anthropic API usage responses to calibrate the estimate at runtime
Current workaround
We call session.rpc.compaction.compact() manually after each query when real input_tokens (from assistant.usage events) exceed 120k tokens (60% of Claude's 200k limit).