feat(graphile-llm): wire billing metering, inference logging, and global routes#1192
Merged
Conversation
Contributor
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
3 tasks
3c6b433 to
3a257e3
Compare
Add per-database billing integration to the graphile-llm package: - config-cache.ts: LRU cache (5-min TTL, 50 entries) for billing_module metadata and API key resolution from app_secrets per database_id - metering.ts: billing-aware wrappers (meteredEmbed, meteredChat) that call check_billing_quota() before and record_usage() after LLM calls - LlmModulePlugin: exposes metering options on the build context - LlmTextSearchPlugin: metered embedding with graceful degradation — when quota exceeded, skips vector path (text-only search continues) - LlmTextMutationPlugin: metered embedding that throws QuotaExceededError on mutations (can't silently skip writing a vector the user asked for) - MeteringConfig on GraphileLlmOptions: configurable meter slugs, estimated tokens, and skip toggle. Auto-detects billing_module. - Uses Graphile withPgClient pattern for all billing SQL calls Billing functions (check_billing_quota, record_usage) are resolved from the tenant database's billing_module metaschema. When billing is not provisioned, all calls pass through unmetered.
…Plugin - Extract all billing/metering logic into metering-plugin.ts - LlmModulePlugin, TextSearchPlugin, TextMutationPlugin are now pure (no billing imports, no metering context building) - LlmMeteringPlugin uses AsyncLocalStorage to transparently wrap the embedder with quota checks — downstream plugins are unaware of billing - Entity ID resolved via configurable callback (default: jwt.claims.user_id) instead of non-existent jwt.claims.membership_id - Metering is opt-in: only loaded when metering option is truthy - Add schema-existence guard in config-cache (checks metaschema_modules_public exists before querying billing_module table) - Graceful degradation: missing schema, missing entity_id, or failed billing calls all result in unmetered passthrough
- LlmModulePlugin now exposes llmEmbeddingModel and llmChatModel on build - LlmMeteringPlugin reads model names from build and uses them as default meter slugs (e.g. 'text-embedding-3-small' → billing meters table) - Three-level waterfall: per-model → inference pool → universal credits (handled by billing module's category_meter field) - Remove hardcoded 'embedding_tokens'/'chat_tokens' defaults - Add docs/spec/llm-metering.md — full architecture reference for two-tier billing, model=meter slug convention, and waterfall
…user_module The table was renamed from metaschema_modules_public.encrypted_secrets_module to metaschema_modules_public.config_secrets_user_module. Also updated the JOIN column from private_schema_id to schema_id to match the new schema.
…t length Removed the configurable estimatedEmbeddingTokens option — token counts are now estimated directly from the input text length (~4 chars/token). No tokenizer needed since the billing system uses tokens as abstract units and the credit_cost per model normalizes relative expense.
…point
- Add LlmAgentDiscoveryPlugin: discovers @agentThread/@agentMessage/@agentTask
tables at PostGraphile schema build time via smart tags and caches results
for REST middleware consumption
- Add LLM API router (llm-api.ts): Express router providing:
- POST /orgs/:entity_id/threads — create agent thread (RLS enforced)
- POST /orgs/:entity_id/threads/:thread_id/messages — send message + SSE
stream assistant response via OllamaClient, persist both messages
- Wire into server.ts after authenticate middleware, before graphile
- Export plugin, getAgentDiscovery, types from graphile-llm
- Add enableAgentDiscovery option to GraphileLlmPreset (default: true)
- Add graphile-llm and @agentic-kit/ollama deps to graphql-server
Restores: - Embedder resolution priority comments in llm-module-plugin.ts - v4-style resolver wrapping explanation in text-mutation-plugin.ts - Section markers (// Only intercept, // Find vector columns, etc.) - graphile-postgis pattern reference in text-search-plugin.ts - Array handling comment (AND/OR) in embedTextInWhere
…ce, where/filter)
The ~4 chars/token heuristic was unreliable for non-English text and code. - Pre-check now passes amount=1 (just checks 'has any quota remaining?') - Post-call records actual char count as the metered amount - Token-accurate billing will come when providers return usage metadata
Add env.ts as the single source of truth for all LLM environment variables and defaults (provider, model, baseUrl for both embedding and chat). Remove all scattered process.env reads and null coalescing from embedder.ts, chat.ts, llm-module-plugin.ts, and llm-api.ts. - Remove @constructive-io/graphql-env dependency from graphile-llm - Defaults: ollama provider, nomic-embed-text / llama3 models, http://localhost:11434 base URL - Export getLlmEnvOptions from package index - Update tests for new default behavior (env always resolves)
Agent table discovery now queries metaschema_modules_public.agent_chat_module at runtime instead of scanning for @agentThread/@agentMessage/@agentTask smart tags at PostGraphile schema build time. Changes: - agent-discovery-plugin: replaced GraphileConfig.Plugin build hook with async getAgentDiscovery(pool, dbname) that queries the module config table with a 60s TTL cache - llm-api: router no longer takes getDiscovery param; calls getAgentDiscovery directly with the per-request pool - server.ts: simplified createLlmApiRouter() call (no args) - preset.ts: removed LlmAgentDiscoveryPlugin from plugin list - types.ts: removed enableAgentDiscovery option
b5d6099 to
9c0265c
Compare
API key management will be handled as a separate, intentional feature. Removed: - apiKey from EmbedderConfig and ChatConfig types - api_key_ref from LlmModuleData interface - resolveApiKey() and SECRETS_MODULE_SCHEMA_SQL from config-cache - apiKey field from LlmBillingCacheEntry
Major changes: - Add global routes: POST /v1/threads, POST /v1/threads/:thread_id/messages These bill to actor_id from JWT (no entity_id in URL needed) - Wire billing metering: check_billing_quota pre-check before LLM call, record_usage with actual token counts after response - Store token_usage jsonb on assistant messages (input, output, totalTokens, model, latency_ms) - Use callWithUsage() wrapper: uses generateWithUsage() when available (@agentic-kit/ollama >=1.3.0), falls back to generate() + token estimation - Refactor: extract shared handleCreateThread/handleSendMessage handlers used by both entity-scoped and global routes
Discovers inference_log_module config at runtime (60s TTL cache), and INSERTs into the inference log table after each LLM call with: entity_id, actor_id, model, provider, request_type, input/output/total tokens, latency_ms, and status. Both streaming and non-streaming paths log (fire-and-forget). Gracefully skips if inference_log_module is not provisioned.
Tests: - Route registration: verifies all 4 routes (entity-scoped + global) - Auth: both global and entity-scoped routes reject missing JWT - Token usage: non-streaming response includes usage object (prompt_tokens, completion_tokens, total_tokens)
…ime check Per team discussion — use existing generate() API with token estimation from text length (~4 chars/token). The callWithUsage wrapper remains as a clean abstraction point for when a provider-native token counting API is approved.
…lugin caches - Add ModuleConfigCache<T> to graphile-cache — generic LRU cache with TTL, bounded memory, and named logging for module config lookups - Migrate agent-discovery-plugin.ts from unbounded Map to ModuleConfigCache - Migrate config-cache.ts (billing) from raw LRUCache to ModuleConfigCache - Migrate llm-api.ts inference log cache from unbounded Map to ModuleConfigCache - Replace direct lru-cache dependency in graphile-llm with graphile-cache
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
…ge INSERT inference_log is the source of truth for usage data. Agent messages now store only the model name for display — full token counts, latency, and provider info are recorded in the inference_log table.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Wire billing metering into the graphile-llm plugin system and LLM REST API endpoints. This PR adds:
ModuleConfigCache<T>— generic LRU cache added tographile-cachefor module config lookups. Replaces 3 ad-hoc caching implementations (2 unboundedMaps, 1 rawLRUCache) with a single bounded, TTL-based cache class.Metering plugin (
metering-plugin.ts) — wraps the embedder function with billing quota checks and usage recording. Pre-checks quota before embedding, records actual char-based usage after.Billing config cache (
config-cache.ts) — discovers billing module function names (record_usage, check_billing_quota) at runtime per database, cached viaModuleConfigCache.Agent discovery (
agent-discovery-plugin.ts) — discovers agent tables fromagent_chat_moduleconfig table at runtime, cached viaModuleConfigCache.LLM REST API (
llm-api.ts) — Express router with entity-scoped (/v1/orgs/:entity_id/threads) and global (/v1/threads, bills to actor_id from JWT) routes. SSE streaming chat with metering, token estimation, and inference logging.Environment config (
env.ts) — single source of truth for LLM defaults (provider, model, baseUrl) viagetLlmEnvOptions().Latency telemetry — every LLM operation (embed, chat, search) records
latency_msin console logs.Cache standardization
All 3 caches in graphile-llm now use
ModuleConfigCache<T>fromgraphile-cache:agent-discoveryMap+ manual expirybilling-configLRUCacheinference-logMap+ manual expiryAgent message schema
agent_messagestores onlymodel(text) for display. Full usage data (input_tokens,output_tokens,total_tokens,latency_ms,provider,status, etc.) is recorded ininference_log— the canonical source of truth for all LLM usage. See companion PR constructive-db#1275.Review & Testing Checklist for Human
ModuleConfigCacheis correctly bounded — check thatlru-cacheeviction works when max entries exceedednull(quota exceeded signal) whencheck_billing_quotarejects/v1/threadsroute bills toactor_idfrom JWT (noentity_idin URL)Notes
generateWithUsagefrom@agentic-kit/ollamais NOT used — pending team approval. Token counts useMath.ceil(text.length / 4)estimation.apiKey/api_key_refsupport was removed (tracked in constructive-planning#905)Link to Devin session: https://app.devin.ai/sessions/2b5a29d83d3f478e8d3d972653b4879c
Requested by: @pyramation