KillToken™ API

Build against the LLM optimization gateway.

KillToken™ sits between your app and your model providers (OpenAI, Anthropic, Gemini, Mistral, DeepSeek, OpenRouter, Together AI, Perplexity, xAI, Azure OpenAI, any self-hosted OpenAI-compatible endpoint, AWS Bedrock, and Google Vertex AI). Send server-side LLM traffic through one gateway to measure prompt waste, track cost, enable safe optimization, reuse repeat-safe responses, and export tenant-level ROI data.

Quickstart

Call KillToken™ from a backend, worker, or secure codespace process. Do not expose tenant API keys in browser or mobile client code.

curl
curl http://localhost:3000/v1/chat \
  -H "Authorization: Bearer kt_..." \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "openai",
    "model": "gpt-4.1",
    "optimizationMode": "measure_only",
    "messages": [
      { "role": "user", "content": "Write a concise project update." }
    ],
    "metadata": { "feature": "weekly-summary" }
  }'

Authentication

Create a tenant API key in the dashboard, then send it as a Bearer token. The full key is shown once and only a hash plus preview are stored.

Authorization: Bearer kt_...

Provider credentials (strict BYOK)

KillToken™ is strict BYOK: every provider call uses the authenticated tenant's own stored, encrypted credential. There is no env/platform fallback — no OpenAI/Anthropic env keys, no AWS environment credentials, and no Google ADC, gcloud, metadata-server, or platform service accounts. If a tenant has no active credential for the requested provider, the call returns 400 provider_credential_required before any cache, idempotency, or provider call. Manage credentials in the dashboard or via POST/GET/PATCH/DELETE /v1/provider-credentials.

  • Single-key providers (openai, anthropic, gemini, mistral, deepseek, openrouter, together, perplexity, xai) supply an apiKey. azure_openai and openai_compatible also supply an apiKey plus non-secret config (endpoint/baseUrl + default model/deployment).
  • Multi-secret providers (aws_bedrock, google_vertex) supply a secrets bundle instead of an apiKeyaws_bedrock uses { accessKeyId, secretAccessKey, sessionToken? }; google_vertex uses { clientEmail, privateKey, privateKeyId? }. Mixing apiKey with secrets returns 400 invalid_provider_secrets.
  • google_vertex config requires projectId, location, and defaultModel (optional https-origin endpointUrl). The stored service-account key signs a short-lived OAuth2 JWT — secrets are encrypted at rest and never returned, logged, or echoed in responses or errors.

Integrating KillToken™ into your app

KillToken™ is a server-side gateway. Call it from a backend, worker, cron, or codespace — never from a browser or mobile client, because the request carries your tenant API key. Your code never holds provider keys; strict BYOK uses your tenant's stored credential.

  1. Mint a tenant API key in the dashboard (sent as Authorization: Bearer kt_...).
  2. Add a provider credential in the dashboard or via POST /v1/provider-credentials.
  3. Call /v1/chat (or a compatible wrapper) from your backend.

Runnable copies live in the repo's examples/ folder (sdk-chat.mjs, node-fetch-chat.mjs, openai-sdk-compatible.mjs, anthropic-messages-wrapper.mjs); each reads KILLTOKEN_BASE_URL and KILLTOKEN_API_KEY from the environment and contains no provider keys.

Official SDK (recommended)

The first-party @killtoken/sdk package is the recommended backend path — server-side only, strict BYOK-safe, and dependency-light. It throws a KillTokenAPIError (status/code/safe message) on non-2xx and never includes keys, secrets, or headers in errors.

import { KillTokenClient } from "@killtoken/sdk";
// apiKey is your KillToken tenant key (kt_...), NOT a provider key.
const client = new KillTokenClient({ baseUrl: process.env.KILLTOKEN_BASE_URL, apiKey: process.env.KILLTOKEN_API_KEY });
const { response, metrics } = await client.chat({ provider: "openai", model: "gpt-4.1-mini",
  messages: [{ role: "user", content: "Hello" }], idempotencyKey: "req-123", cachePolicy: { exactCache: "read_write" } });
// Also: client.providerCredentials.list() / create() / update() / delete() / test()

Plain fetch

const res = await fetch(`${process.env.KILLTOKEN_BASE_URL}/v1/chat`, {
  method: "POST",
  headers: { "content-type": "application/json", authorization: `Bearer ${process.env.KILLTOKEN_API_KEY}` },
  body: JSON.stringify({ provider: "openai", model: "gpt-4.1-mini", optimizationMode: "measure_only",
    messages: [{ role: "user", content: "Hello" }], idempotencyKey: "req-123", cachePolicy: { exactCache: "read_write" } })
});
const { response, metrics } = await res.json();

Official OpenAI SDK (baseURL pointed at the wrapper)

import OpenAI from "openai";
// baseURL ends with /v1/openai; apiKey is your KillToken tenant key (NOT an OpenAI key).
const client = new OpenAI({ baseURL: `${process.env.KILLTOKEN_BASE_URL}/v1/openai`, apiKey: process.env.KILLTOKEN_API_KEY });
const completion = await client.chat.completions.create({ model: "gpt-4.1-mini", messages: [{ role: "user", content: "Hello" }] });
const metrics = completion.killtoken?.metrics; // also in the x-killtoken-metrics header

Anthropic Messages wrapper

await fetch(`${process.env.KILLTOKEN_BASE_URL}/v1/anthropic/messages`, {
  method: "POST",
  headers: { "content-type": "application/json", authorization: `Bearer ${process.env.KILLTOKEN_API_KEY}` },
  body: JSON.stringify({ model: "claude-3-5-haiku-latest", max_tokens: 256, messages: [{ role: "user", content: "Hello" }] })
});

Strict BYOK error handling

  • provider_credential_required — add a stored credential for that provider; KillToken never falls back to env/platform keys.
  • invalid_api_key — the tenant Bearer key is missing/invalid (or a supplied provider key was empty on a credential write).
  • provider_not_supportedprovider is not a supported value.
  • streaming_not_supported — the wrappers reject stream: true; send a non-streaming request.

idempotencyKey & cachePolicy

idempotencyKey is a string you choose per request; repeating it returns the stored result without re-calling or re-billing the provider — safe to retry on timeouts. cachePolicy.exactCache (read_write / read_only / write_only / bypass) reuses byte-identical prior responses; a hit skips the provider call and shows in metrics.cacheStatus. Both require a cache backend (KILLTOKEN_CACHE_ENABLED=true).

POST /v1/chat

Primary gateway endpoint. The provider response is returned unchanged alongside KillToken™ metrics.

FieldRequiredNotes
provideryesOne of `openai`, `anthropic`, `gemini`, `mistral`, `deepseek`, `openrouter`, `together`, `perplexity`, `xai`, `azure_openai`, `openai_compatible`, `aws_bedrock`, `google_vertex`. Requires an active tenant BYOK credential for that provider (strict BYOK — no env/platform fallback).
modelyesProvider model name. KillToken™ does not route to a different model.
messagesyesChat messages sent through the gateway.
optimizationModenoUse `measure_only` or `safe` for the MVP. Defaults to `measure_only`.
metadatanoTenant-owned trace/search context.
providerOptionsnoProvider-specific options, forwarded when supported.
cachePolicynoExact-cache behavior. Defaults to bypass.
idempotencyKeynoReplay-safe key for retries. Max 255 characters.
TypeScript backend example
export async function callKillToken(messages) {
  const res = await fetch(`${process.env.KILLTOKEN_BASE_URL}/v1/chat`, {
    method: "POST",
    headers: {
      "authorization": `Bearer ${process.env.KILLTOKEN_API_KEY}`,
      "content-type": "application/json"
    },
    body: JSON.stringify({
      provider: "openai",
      model: "gpt-4.1",
      optimizationMode: "measure_only",
      messages
    })
  });

  if (!res.ok) throw new Error(`KillToken request failed: ${res.status}`);
  return res.json();
}

Cache & idempotency

Caching is server-enabled, then request-opt-in. Use exact cache only when the same request should return the same answer.

bypass

Default. Do not read or write exact cache.

read_only

Return a hit if present; do not write misses.

write_only

Skip lookup; write the provider result.

read_write

Read first and write on miss.

OpenAI-compatible wrapper

Point OpenAI-style chat-completions clients at KillToken™. Metrics are returned under `killtoken.metrics` and in response headers.

POST /v1/openai/chat/completions

Streaming is not implemented in the MVP. `stream: true` returns `422 streaming_not_supported`.

Anthropic Messages wrapper

Anthropic-style requests use top-level `system` plus `messages`. Unsupported OpenAI-style tool payloads are rejected before provider execution.

POST /v1/anthropic/messages

Metrics, exports, and reports

Read APIs are tenant-scoped and privacy-safe by default. Request lists and exports omit raw prompt content.

GET /v1/requests

Paginated request trace list with provider/model/mode/cache filters.

GET /v1/requests/:requestId

Single tenant-owned request trace.

GET /v1/analytics/summary

Aggregate totals, savings, cache hit rate, and top templates.

GET /v1/exports/requests.csv

CSV request export with fixed privacy-safe columns.

GET /v1/exports/analytics.json

JSON analytics export with filters and timestamp.

GET /v1/reports/roi

Structured ROI report for estimated, verified, potential, and cache savings.

Operations

GET /health

Lightweight liveness check. No auth.

GET /ready

Readiness checks for persistence, cache, and dashboard auth. No secrets returned.

Self-hosting on Render? The repo's docs/render-deploy.md covers the service blueprint, MongoDB Atlas, Upstash Redis, domain mapping, and rollback.

Common errors

StatusErrorMeaning
400invalid_messagesMessages are missing or malformed.
400invalid_cache_policyCache policy is malformed.
401invalid_api_keyMissing, unknown, or revoked Bearer key.
422provider_not_supportedProvider is not one of `openai`, `anthropic`, `gemini`, `mistral`, `deepseek`, `openrouter`, `together`, `perplexity`, `xai`, `azure_openai`, `openai_compatible`, `aws_bedrock`, `google_vertex`.
422streaming_not_supportedStreaming proxy support is not in the MVP.