KillToken™ API
Build against the LLM optimization gateway.
KillToken™ sits between your app and your model providers (OpenAI, Anthropic, Gemini, Mistral, DeepSeek, OpenRouter, Together AI, Perplexity, xAI, Azure OpenAI, any self-hosted OpenAI-compatible endpoint, AWS Bedrock, and Google Vertex AI). Send server-side LLM traffic through one gateway to measure prompt waste, track cost, enable safe optimization, reuse repeat-safe responses, and export tenant-level ROI data.
Start in minutes
Mint an API key, call `/v1/chat`, and inspect KillToken™ metrics.
Gateway reference
Request fields, response shape, optimization modes, and wrapper endpoints.
Analytics APIs
Requests, summaries, exports, ROI reports, and dashboard-backed metrics.
Production checks
Health, readiness, body limits, Redis cache, and Mongo persistence notes.
Quickstart
Call KillToken™ from a backend, worker, or secure codespace process. Do not expose tenant API keys in browser or mobile client code.
curl http://localhost:3000/v1/chat \
-H "Authorization: Bearer kt_..." \
-H "Content-Type: application/json" \
-d '{
"provider": "openai",
"model": "gpt-4.1",
"optimizationMode": "measure_only",
"messages": [
{ "role": "user", "content": "Write a concise project update." }
],
"metadata": { "feature": "weekly-summary" }
}'
Authentication
Create a tenant API key in the dashboard, then send it as a Bearer token. The full key is shown once and only a hash plus preview are stored.
Authorization: Bearer kt_...
Provider credentials (strict BYOK)
KillToken™ is strict BYOK: every provider call uses the authenticated tenant's own stored, encrypted credential. There is no env/platform fallback — no OpenAI/Anthropic env keys, no AWS environment credentials, and no Google ADC, gcloud, metadata-server, or platform service accounts. If a tenant has no active credential for the requested provider, the call returns 400 provider_credential_required before any cache, idempotency, or provider call. Manage credentials in the dashboard or via POST/GET/PATCH/DELETE /v1/provider-credentials.
- Single-key providers (
openai,anthropic,gemini,mistral,deepseek,openrouter,together,perplexity,xai) supply anapiKey.azure_openaiandopenai_compatiblealso supply anapiKeyplus non-secretconfig(endpoint/baseUrl + default model/deployment). - Multi-secret providers (
aws_bedrock,google_vertex) supply asecretsbundle instead of anapiKey—aws_bedrockuses{ accessKeyId, secretAccessKey, sessionToken? };google_vertexuses{ clientEmail, privateKey, privateKeyId? }. MixingapiKeywithsecretsreturns400 invalid_provider_secrets. - google_vertex config requires
projectId,location, anddefaultModel(optional https-originendpointUrl). The stored service-account key signs a short-lived OAuth2 JWT — secrets are encrypted at rest and never returned, logged, or echoed in responses or errors.
Integrating KillToken™ into your app
KillToken™ is a server-side gateway. Call it from a backend, worker, cron, or codespace — never from a browser or mobile client, because the request carries your tenant API key. Your code never holds provider keys; strict BYOK uses your tenant's stored credential.
- Mint a tenant API key in the dashboard (sent as
Authorization: Bearer kt_...). - Add a provider credential in the dashboard or via
POST /v1/provider-credentials. - Call
/v1/chat(or a compatible wrapper) from your backend.
Runnable copies live in the repo's examples/ folder (sdk-chat.mjs, node-fetch-chat.mjs, openai-sdk-compatible.mjs, anthropic-messages-wrapper.mjs); each reads KILLTOKEN_BASE_URL and KILLTOKEN_API_KEY from the environment and contains no provider keys.
Official SDK (recommended)
The first-party @killtoken/sdk package is the recommended backend path — server-side only, strict BYOK-safe, and dependency-light. It throws a KillTokenAPIError (status/code/safe message) on non-2xx and never includes keys, secrets, or headers in errors.
import { KillTokenClient } from "@killtoken/sdk";
// apiKey is your KillToken tenant key (kt_...), NOT a provider key.
const client = new KillTokenClient({ baseUrl: process.env.KILLTOKEN_BASE_URL, apiKey: process.env.KILLTOKEN_API_KEY });
const { response, metrics } = await client.chat({ provider: "openai", model: "gpt-4.1-mini",
messages: [{ role: "user", content: "Hello" }], idempotencyKey: "req-123", cachePolicy: { exactCache: "read_write" } });
// Also: client.providerCredentials.list() / create() / update() / delete() / test()
Plain fetch
const res = await fetch(`${process.env.KILLTOKEN_BASE_URL}/v1/chat`, {
method: "POST",
headers: { "content-type": "application/json", authorization: `Bearer ${process.env.KILLTOKEN_API_KEY}` },
body: JSON.stringify({ provider: "openai", model: "gpt-4.1-mini", optimizationMode: "measure_only",
messages: [{ role: "user", content: "Hello" }], idempotencyKey: "req-123", cachePolicy: { exactCache: "read_write" } })
});
const { response, metrics } = await res.json();
Official OpenAI SDK (baseURL pointed at the wrapper)
import OpenAI from "openai";
// baseURL ends with /v1/openai; apiKey is your KillToken tenant key (NOT an OpenAI key).
const client = new OpenAI({ baseURL: `${process.env.KILLTOKEN_BASE_URL}/v1/openai`, apiKey: process.env.KILLTOKEN_API_KEY });
const completion = await client.chat.completions.create({ model: "gpt-4.1-mini", messages: [{ role: "user", content: "Hello" }] });
const metrics = completion.killtoken?.metrics; // also in the x-killtoken-metrics header
Anthropic Messages wrapper
await fetch(`${process.env.KILLTOKEN_BASE_URL}/v1/anthropic/messages`, {
method: "POST",
headers: { "content-type": "application/json", authorization: `Bearer ${process.env.KILLTOKEN_API_KEY}` },
body: JSON.stringify({ model: "claude-3-5-haiku-latest", max_tokens: 256, messages: [{ role: "user", content: "Hello" }] })
});
Strict BYOK error handling
provider_credential_required— add a stored credential for that provider; KillToken never falls back to env/platform keys.invalid_api_key— the tenant Bearer key is missing/invalid (or a supplied provider key was empty on a credential write).provider_not_supported—provideris not a supported value.streaming_not_supported— the wrappers rejectstream: true; send a non-streaming request.
idempotencyKey & cachePolicy
idempotencyKey is a string you choose per request; repeating it returns the stored result without re-calling or re-billing the provider — safe to retry on timeouts. cachePolicy.exactCache (read_write / read_only / write_only / bypass) reuses byte-identical prior responses; a hit skips the provider call and shows in metrics.cacheStatus. Both require a cache backend (KILLTOKEN_CACHE_ENABLED=true).
POST /v1/chat
Primary gateway endpoint. The provider response is returned unchanged alongside KillToken™ metrics.
| Field | Required | Notes |
|---|---|---|
| provider | yes | One of `openai`, `anthropic`, `gemini`, `mistral`, `deepseek`, `openrouter`, `together`, `perplexity`, `xai`, `azure_openai`, `openai_compatible`, `aws_bedrock`, `google_vertex`. Requires an active tenant BYOK credential for that provider (strict BYOK — no env/platform fallback). |
| model | yes | Provider model name. KillToken™ does not route to a different model. |
| messages | yes | Chat messages sent through the gateway. |
| optimizationMode | no | Use `measure_only` or `safe` for the MVP. Defaults to `measure_only`. |
| metadata | no | Tenant-owned trace/search context. |
| providerOptions | no | Provider-specific options, forwarded when supported. |
| cachePolicy | no | Exact-cache behavior. Defaults to bypass. |
| idempotencyKey | no | Replay-safe key for retries. Max 255 characters. |
export async function callKillToken(messages) {
const res = await fetch(`${process.env.KILLTOKEN_BASE_URL}/v1/chat`, {
method: "POST",
headers: {
"authorization": `Bearer ${process.env.KILLTOKEN_API_KEY}`,
"content-type": "application/json"
},
body: JSON.stringify({
provider: "openai",
model: "gpt-4.1",
optimizationMode: "measure_only",
messages
})
});
if (!res.ok) throw new Error(`KillToken request failed: ${res.status}`);
return res.json();
}
Cache & idempotency
Caching is server-enabled, then request-opt-in. Use exact cache only when the same request should return the same answer.
bypassDefault. Do not read or write exact cache.
read_onlyReturn a hit if present; do not write misses.
write_onlySkip lookup; write the provider result.
read_writeRead first and write on miss.
OpenAI-compatible wrapper
Point OpenAI-style chat-completions clients at KillToken™. Metrics are returned under `killtoken.metrics` and in response headers.
Streaming is not implemented in the MVP. `stream: true` returns `422 streaming_not_supported`.
Anthropic Messages wrapper
Anthropic-style requests use top-level `system` plus `messages`. Unsupported OpenAI-style tool payloads are rejected before provider execution.
Metrics, exports, and reports
Read APIs are tenant-scoped and privacy-safe by default. Request lists and exports omit raw prompt content.
Paginated request trace list with provider/model/mode/cache filters.
Single tenant-owned request trace.
Aggregate totals, savings, cache hit rate, and top templates.
CSV request export with fixed privacy-safe columns.
JSON analytics export with filters and timestamp.
Structured ROI report for estimated, verified, potential, and cache savings.
Operations
GET /health
Lightweight liveness check. No auth.
GET /ready
Readiness checks for persistence, cache, and dashboard auth. No secrets returned.
Self-hosting on Render? The repo's docs/render-deploy.md covers the service blueprint, MongoDB Atlas, Upstash Redis, domain mapping, and rollback.
Common errors
| Status | Error | Meaning |
|---|---|---|
| 400 | invalid_messages | Messages are missing or malformed. |
| 400 | invalid_cache_policy | Cache policy is malformed. |
| 401 | invalid_api_key | Missing, unknown, or revoked Bearer key. |
| 422 | provider_not_supported | Provider is not one of `openai`, `anthropic`, `gemini`, `mistral`, `deepseek`, `openrouter`, `together`, `perplexity`, `xai`, `azure_openai`, `openai_compatible`, `aws_bedrock`, `google_vertex`. |
| 422 | streaming_not_supported | Streaming proxy support is not in the MVP. |