feat(graphile-llm): wire billing metering, inference logging, and global routes by pyramation · Pull Request #1192 · constructive-io/constructive

pyramation · 2026-05-18T20:44:07Z

Summary

Wire billing metering into the graphile-llm plugin system and LLM REST API endpoints. This PR adds:

ModuleConfigCache<T> — generic LRU cache added to graphile-cache for module config lookups. Replaces 3 ad-hoc caching implementations (2 unbounded Maps, 1 raw LRUCache) with a single bounded, TTL-based cache class.
Metering plugin (metering-plugin.ts) — wraps the embedder function with billing quota checks and usage recording. Pre-checks quota before embedding, records actual char-based usage after.
Billing config cache (config-cache.ts) — discovers billing module function names (record_usage, check_billing_quota) at runtime per database, cached via ModuleConfigCache.
Agent discovery (agent-discovery-plugin.ts) — discovers agent tables from agent_chat_module config table at runtime, cached via ModuleConfigCache.
LLM REST API (llm-api.ts) — Express router with entity-scoped (/v1/orgs/:entity_id/threads) and global (/v1/threads, bills to actor_id from JWT) routes. SSE streaming chat with metering, token estimation, and inference logging.
Environment config (env.ts) — single source of truth for LLM defaults (provider, model, baseUrl) via getLlmEnvOptions().
Latency telemetry — every LLM operation (embed, chat, search) records latency_ms in console logs.

Cache standardization

All 3 caches in graphile-llm now use ModuleConfigCache<T> from graphile-cache:

Cache	TTL	Max	Was
`agent-discovery`	60s	100	Unbounded `Map` + manual expiry
`billing-config`	5min	50	Raw `LRUCache`
`inference-log`	60s	100	Unbounded `Map` + manual expiry

Agent message schema

agent_message stores only model (text) for display. Full usage data (input_tokens, output_tokens, total_tokens, latency_ms, provider, status, etc.) is recorded in inference_log — the canonical source of truth for all LLM usage. See companion PR constructive-db#1275.

Review & Testing Checklist for Human

Verify ModuleConfigCache is correctly bounded — check that lru-cache eviction works when max entries exceeded
Verify metering plugin correctly returns null (quota exceeded signal) when check_billing_quota rejects
Test the global /v1/threads route bills to actor_id from JWT (no entity_id in URL)
Verify SSE streaming works end-to-end with a real Ollama instance

Notes

generateWithUsage from @agentic-kit/ollama is NOT used — pending team approval. Token counts use Math.ceil(text.length / 4) estimation.
apiKey / api_key_ref support was removed (tracked in constructive-planning#905)
Cache standardization tracked in constructive-planning#906

Link to Devin session: https://app.devin.ai/sessions/2b5a29d83d3f478e8d3d972653b4879c
Requested by: @pyramation

devin-ai-integration · 2026-05-18T20:44:10Z

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

Disable automatic comment and CI monitoring

Add per-database billing integration to the graphile-llm package: - config-cache.ts: LRU cache (5-min TTL, 50 entries) for billing_module metadata and API key resolution from app_secrets per database_id - metering.ts: billing-aware wrappers (meteredEmbed, meteredChat) that call check_billing_quota() before and record_usage() after LLM calls - LlmModulePlugin: exposes metering options on the build context - LlmTextSearchPlugin: metered embedding with graceful degradation — when quota exceeded, skips vector path (text-only search continues) - LlmTextMutationPlugin: metered embedding that throws QuotaExceededError on mutations (can't silently skip writing a vector the user asked for) - MeteringConfig on GraphileLlmOptions: configurable meter slugs, estimated tokens, and skip toggle. Auto-detects billing_module. - Uses Graphile withPgClient pattern for all billing SQL calls Billing functions (check_billing_quota, record_usage) are resolved from the tenant database's billing_module metaschema. When billing is not provisioned, all calls pass through unmetered.

…Plugin - Extract all billing/metering logic into metering-plugin.ts - LlmModulePlugin, TextSearchPlugin, TextMutationPlugin are now pure (no billing imports, no metering context building) - LlmMeteringPlugin uses AsyncLocalStorage to transparently wrap the embedder with quota checks — downstream plugins are unaware of billing - Entity ID resolved via configurable callback (default: jwt.claims.user_id) instead of non-existent jwt.claims.membership_id - Metering is opt-in: only loaded when metering option is truthy - Add schema-existence guard in config-cache (checks metaschema_modules_public exists before querying billing_module table) - Graceful degradation: missing schema, missing entity_id, or failed billing calls all result in unmetered passthrough

- LlmModulePlugin now exposes llmEmbeddingModel and llmChatModel on build - LlmMeteringPlugin reads model names from build and uses them as default meter slugs (e.g. 'text-embedding-3-small' → billing meters table) - Three-level waterfall: per-model → inference pool → universal credits (handled by billing module's category_meter field) - Remove hardcoded 'embedding_tokens'/'chat_tokens' defaults - Add docs/spec/llm-metering.md — full architecture reference for two-tier billing, model=meter slug convention, and waterfall

…user_module The table was renamed from metaschema_modules_public.encrypted_secrets_module to metaschema_modules_public.config_secrets_user_module. Also updated the JOIN column from private_schema_id to schema_id to match the new schema.

…t length Removed the configurable estimatedEmbeddingTokens option — token counts are now estimated directly from the input text length (~4 chars/token). No tokenizer needed since the billing system uses tokens as abstract units and the credit_cost per model normalizes relative expense.

…point - Add LlmAgentDiscoveryPlugin: discovers @agentThread/@agentMessage/@agentTask tables at PostGraphile schema build time via smart tags and caches results for REST middleware consumption - Add LLM API router (llm-api.ts): Express router providing: - POST /orgs/:entity_id/threads — create agent thread (RLS enforced) - POST /orgs/:entity_id/threads/:thread_id/messages — send message + SSE stream assistant response via OllamaClient, persist both messages - Wire into server.ts after authenticate middleware, before graphile - Export plugin, getAgentDiscovery, types from graphile-llm - Add enableAgentDiscovery option to GraphileLlmPreset (default: true) - Add graphile-llm and @agentic-kit/ollama deps to graphql-server

Restores: - Embedder resolution priority comments in llm-module-plugin.ts - v4-style resolver wrapping explanation in text-mutation-plugin.ts - Section markers (// Only intercept, // Find vector columns, etc.) - graphile-postgis pattern reference in text-search-plugin.ts - Array handling comment (AND/OR) in embedTextInWhere

…ce, where/filter)

The ~4 chars/token heuristic was unreliable for non-English text and code. - Pre-check now passes amount=1 (just checks 'has any quota remaining?') - Post-call records actual char count as the metered amount - Token-accurate billing will come when providers return usage metadata

Add env.ts as the single source of truth for all LLM environment variables and defaults (provider, model, baseUrl for both embedding and chat). Remove all scattered process.env reads and null coalescing from embedder.ts, chat.ts, llm-module-plugin.ts, and llm-api.ts. - Remove @constructive-io/graphql-env dependency from graphile-llm - Defaults: ollama provider, nomic-embed-text / llama3 models, http://localhost:11434 base URL - Export getLlmEnvOptions from package index - Update tests for new default behavior (env always resolves)

Agent table discovery now queries metaschema_modules_public.agent_chat_module at runtime instead of scanning for @agentThread/@agentMessage/@agentTask smart tags at PostGraphile schema build time. Changes: - agent-discovery-plugin: replaced GraphileConfig.Plugin build hook with async getAgentDiscovery(pool, dbname) that queries the module config table with a 60s TTL cache - llm-api: router no longer takes getDiscovery param; calls getAgentDiscovery directly with the per-request pool - server.ts: simplified createLlmApiRouter() call (no args) - preset.ts: removed LlmAgentDiscoveryPlugin from plugin list - types.ts: removed enableAgentDiscovery option

API key management will be handled as a separate, intentional feature. Removed: - apiKey from EmbedderConfig and ChatConfig types - api_key_ref from LlmModuleData interface - resolveApiKey() and SECRETS_MODULE_SCHEMA_SQL from config-cache - apiKey field from LlmBillingCacheEntry

Major changes: - Add global routes: POST /v1/threads, POST /v1/threads/:thread_id/messages These bill to actor_id from JWT (no entity_id in URL needed) - Wire billing metering: check_billing_quota pre-check before LLM call, record_usage with actual token counts after response - Store token_usage jsonb on assistant messages (input, output, totalTokens, model, latency_ms) - Use callWithUsage() wrapper: uses generateWithUsage() when available (@agentic-kit/ollama >=1.3.0), falls back to generate() + token estimation - Refactor: extract shared handleCreateThread/handleSendMessage handlers used by both entity-scoped and global routes

Discovers inference_log_module config at runtime (60s TTL cache), and INSERTs into the inference log table after each LLM call with: entity_id, actor_id, model, provider, request_type, input/output/total tokens, latency_ms, and status. Both streaming and non-streaming paths log (fire-and-forget). Gracefully skips if inference_log_module is not provisioned.

Tests: - Route registration: verifies all 4 routes (entity-scoped + global) - Auth: both global and entity-scoped routes reject missing JWT - Token usage: non-streaming response includes usage object (prompt_tokens, completion_tokens, total_tokens)

…ime check Per team discussion — use existing generate() API with token estimation from text length (~4 chars/token). The callWithUsage wrapper remains as a clean abstraction point for when a provider-native token counting API is approved.

…mbed paths

…lugin caches - Add ModuleConfigCache<T> to graphile-cache — generic LRU cache with TTL, bounded memory, and named logging for module config lookups - Migrate agent-discovery-plugin.ts from unbounded Map to ModuleConfigCache - Migrate config-cache.ts (billing) from raw LRUCache to ModuleConfigCache - Migrate llm-api.ts inference log cache from unbounded Map to ModuleConfigCache - Replace direct lru-cache dependency in graphile-llm with graphile-cache

socket-security · 2026-05-21T00:27:52Z

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff	Package	Supply Chain Security	Vulnerability	Quality	Maintenance	License
	@types/babel__generator@7.27.0
	@testing-library/react@11.2.5
	@tanstack/react-query@5.90.21
	@testing-library/jest-dom@5.11.10

View full report

…ge INSERT inference_log is the source of truth for usage data. Agent messages now store only the model name for display — full token counts, latency, and provider info are recorded in the inference_log table.

devin-ai-integration Bot assigned pyramation May 18, 2026

pyramation mentioned this pull request May 19, 2026

feat(graphile-llm): add inference usage logging to metering plugin #1196

Merged

3 tasks

devin-ai-integration Bot force-pushed the feat/llm-billing-metering branch from 3c6b433 to 3a257e3 Compare May 20, 2026 03:09

devin-ai-integration Bot changed the title ~~feat(graphile-llm): wire billing metering into LLM plugins~~ feat(graphile-llm): billing metering + agent discovery + REST streaming endpoint May 20, 2026

devin-ai-integration Bot changed the title ~~feat(graphile-llm): billing metering + agent discovery + REST streaming endpoint~~ feat(graphile-llm): billing metering + agent discovery + REST streaming endpoint + env config May 20, 2026

pyramation added 12 commits May 20, 2026 22:29

remove docs/spec/llm-metering.md (moving to constructive-db)

19bec10

fix: restore remaining inline comments (inject/remove, recurse, repla…

941d909

…ce, where/filter)

devin-ai-integration Bot force-pushed the feat/llm-billing-metering branch from b5d6099 to 9c0265c Compare May 20, 2026 22:31

pyramation added 4 commits May 20, 2026 22:44

devin-ai-integration Bot changed the title ~~feat(graphile-llm): billing metering + agent discovery + REST streaming endpoint + env config~~ feat(graphile-llm): wire billing metering, inference logging, and global routes May 20, 2026

pyramation added 4 commits May 20, 2026 23:46

fix(graphile-llm): restore embed latency timing in text-mutation-plugin

6dd823b

fix(graphile-llm): add latency timing to search embed and unmetered e…

e90630b

…mbed paths

refactor(llm-api): replace token_usage jsonb with model text on messa…

e17e1c2

…ge INSERT inference_log is the source of truth for usage data. Agent messages now store only the model name for display — full token counts, latency, and provider info are recorded in the inference_log table.

pyramation merged commit 3349822 into main May 21, 2026
37 checks passed

pyramation deleted the feat/llm-billing-metering branch May 21, 2026 01:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(graphile-llm): wire billing metering, inference logging, and global routes#1192

feat(graphile-llm): wire billing metering, inference logging, and global routes#1192
pyramation merged 21 commits into
mainfrom
feat/llm-billing-metering

pyramation commented May 18, 2026 •

edited by devin-ai-integration Bot

Loading

Uh oh!

devin-ai-integration Bot commented May 18, 2026

Uh oh!

socket-security Bot commented May 21, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pyramation commented May 18, 2026 • edited by devin-ai-integration Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Cache standardization

Agent message schema

Review & Testing Checklist for Human

Notes

Uh oh!

devin-ai-integration Bot commented May 18, 2026

🤖 Devin AI Engineer

Uh oh!

socket-security Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pyramation commented May 18, 2026 •

edited by devin-ai-integration Bot

Loading

socket-security Bot commented May 21, 2026 •

edited

Loading