LLM Gateway
One API, any provider. Use the OpenAI SDK you already know — Backbone routes to the right backend.
Overview
The AI Gateway exposes OpenAI-compatible endpoints (/v1/chat/completions, /v1/audio/transcriptions, /v1/models) and routes requests to whatever provider you've configured. Swap between OpenAI, Anthropic, Azure, Vertex AI, or Ollama without touching your code.
Why use it?
- No vendor lock-in — switch providers by changing the model string, not your codebase
- Standard API — any OpenAI-compatible SDK, tool, or framework just works
- Streaming — full SSE support for real-time responses
- Centralized credentials — manage API keys and routing per-organization
Models
Platform Models
Backbone comes with pre-configured models available on every tier. Use them by name — no provider prefix required.
Checking available models
Use the GET /api/v1/models endpoint to see all models currently available to your organization, including both platform and BYOK models.
Bring Your Own Key (BYOK)
On the Scale tier and above, you can connect your own provider accounts and use the provider/model format:
openai/gpt-4o
anthropic/claude-sonnet-4-5-20250929
azure-openai/gpt-4
vertex-ai/gemini-pro
ollama/llama3
The prefix tells the gateway where to route. The model name after the slash is passed directly to the provider. BYOK usage is billed through your own provider account and bypasses platform token limits.
BYOK availability
Bring Your Own Key requires the Scale tier or higher. Configure your providers in the AI Providers section of the sidebar.
Chat Completions
POST /api/v1/chat/completions
The main endpoint. Drop-in replacement for https://api.openai.com/v1/chat/completions.
Request
curl -X POST https://backbone.manfred-kunze.dev/api/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk_your_api_key" \
-d '{
"model": "gpt-4o",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is Spring Boot?"}
]
}'
Request Parameters
| Field | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Platform model name (e.g., gpt-4o) or provider/model for BYOK |
messages | array | Yes | Chat messages (role + content) |
stream | boolean | No | Enable SSE streaming (default: false) |
temperature | number | No | Sampling temperature (0-2) |
max_tokens | number | No | Max tokens in response |
top_p | number | No | Nucleus sampling (0-1) |
frequency_penalty | number | No | Frequency penalty (-2 to 2) |
presence_penalty | number | No | Presence penalty (-2 to 2) |
stop | array | No | Stop sequences |
tools | array | No | Tool definitions for function calling |
Response
{
"id": "chatcmpl-123",
"object": "chat.completion",
"model": "gpt-4o",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "Spring Boot is a Java-based framework..."
},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 25,
"completion_tokens": 150,
"total_tokens": 175
}
}
Streaming
Set "stream": true and you'll get Server-Sent Events:
Request
from openai import OpenAI
client = OpenAI(
api_key="sk_your_api_key",
base_url="https://backbone.manfred-kunze.dev/api/v1"
)
stream = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Write a haiku about code"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
SSE format
Streaming responses follow the standard OpenAI SSE format — data: {...} lines terminated with data: [DONE].
List Models
GET /api/v1/models
Lists all models available to your organization — both platform models and BYOK providers you've configured:
curl https://backbone.manfred-kunze.dev/api/v1/models \
-H "Authorization: Bearer sk_your_api_key"
Framework Integrations
The gateway works with anything that speaks OpenAI — just point base_url at your Backbone instance.
LangChain
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
openai_api_key="sk_your_api_key",
openai_api_base="https://backbone.manfred-kunze.dev/api/v1",
model_name="gpt-4o"
)
response = llm.invoke("What is the capital of France?")
print(response.content)
Works with any OpenAI-compatible tool
LiteLLM, LlamaIndex, Haystack, Semantic Kernel — if it has a base_url setting, it works with Backbone.
Supported BYOK Providers
Connect your own accounts from any of these providers (Scale tier and above):
| Provider | Prefix | Example Models |
|---|---|---|
| OpenAI | openai | gpt-4o, gpt-4o-mini, o1 |
| Azure OpenAI | azure-openai | Your Azure deployment names |
| Anthropic | anthropic | claude-sonnet-4-5-20250929, claude-3-haiku |
| Google Vertex AI | vertex-ai | gemini-pro, gemini-ultra |
| Ollama | ollama | llama3, mistral, codellama |
Configure providers in the AI Providers section of the sidebar.