LLM Gateway

One API, any provider. Use the OpenAI SDK you already know — Backbone routes to the right backend.

Overview

The AI Gateway exposes OpenAI-compatible endpoints (/v1/chat/completions, /v1/audio/transcriptions, /v1/models) and routes requests to whatever provider you've configured. Swap between OpenAI, Anthropic, Azure, Vertex AI, or Ollama without touching your code.

Why use it?

No vendor lock-in — switch providers by changing the model string, not your codebase
Standard API — any OpenAI-compatible SDK, tool, or framework just works
Streaming — full SSE support for real-time responses
Centralized credentials — manage API keys and routing per-organization

Models

Platform Models

Backbone comes with pre-configured models available on every tier. Use them by name — no provider prefix required.

Checking available models

Use the GET /api/v1/models endpoint to see all models currently available to your organization, including both platform and BYOK models.

Bring Your Own Key (BYOK)

On the Scale tier and above, you can connect your own provider accounts and use the provider/model format:

openai/gpt-4o
anthropic/claude-sonnet-4-5-20250929
azure-openai/gpt-4
vertex-ai/gemini-pro
ollama/llama3

The prefix tells the gateway where to route. The model name after the slash is passed directly to the provider. BYOK usage is billed through your own provider account and bypasses platform token limits.

BYOK availability

Bring Your Own Key requires the Scale tier or higher. Configure your providers in the AI Providers section of the sidebar.

Chat Completions

`POST /api/v1/chat/completions`

The main endpoint. Drop-in replacement for https://api.openai.com/v1/chat/completions.

Request

curl -X POST https://backbone.manfred-kunze.dev/api/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk_your_api_key" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is Spring Boot?"}
    ]
  }'

Request Parameters

Field	Type	Required	Description
`model`	string	Yes	Platform model name (e.g., `gpt-4o`) or `provider/model` for BYOK
`messages`	array	Yes	Chat messages (`role` + `content`)
`stream`	boolean	No	Enable SSE streaming (default: `false`)
`temperature`	number	No	Sampling temperature (0-2)
`max_tokens`	number	No	Max tokens in response
`top_p`	number	No	Nucleus sampling (0-1)
`frequency_penalty`	number	No	Frequency penalty (-2 to 2)
`presence_penalty`	number	No	Presence penalty (-2 to 2)
`stop`	array	No	Stop sequences
`tools`	array	No	Tool definitions for function calling

Response

{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "model": "gpt-4o",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "Spring Boot is a Java-based framework..."
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 150,
    "total_tokens": 175
  }
}

Streaming

Set "stream": true and you'll get Server-Sent Events:

Request

from openai import OpenAI

client = OpenAI(
    api_key="sk_your_api_key",
    base_url="https://backbone.manfred-kunze.dev/api/v1"
)

stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Write a haiku about code"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

SSE format

Streaming responses follow the standard OpenAI SSE format — data: {...} lines terminated with data: [DONE].

List Models

`GET /api/v1/models`

Lists all models available to your organization — both platform models and BYOK providers you've configured:

curl https://backbone.manfred-kunze.dev/api/v1/models \
  -H "Authorization: Bearer sk_your_api_key"

Framework Integrations

The gateway works with anything that speaks OpenAI — just point base_url at your Backbone instance.

LangChain

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    openai_api_key="sk_your_api_key",
    openai_api_base="https://backbone.manfred-kunze.dev/api/v1",
    model_name="gpt-4o"
)

response = llm.invoke("What is the capital of France?")
print(response.content)

Works with any OpenAI-compatible tool

LiteLLM, LlamaIndex, Haystack, Semantic Kernel — if it has a base_url setting, it works with Backbone.

Supported BYOK Providers

Connect your own accounts from any of these providers (Scale tier and above):

Provider	Prefix	Example Models
OpenAI	`openai`	gpt-4o, gpt-4o-mini, o1
Azure OpenAI	`azure-openai`	Your Azure deployment names
Anthropic	`anthropic`	claude-sonnet-4-5-20250929, claude-3-haiku
Google Vertex AI	`vertex-ai`	gemini-pro, gemini-ultra
Ollama	`ollama`	llama3, mistral, codellama

Configure providers in the AI Providers section of the sidebar.