Audio Transcription
Convert audio to text using OpenAI-compatible Whisper endpoints —through the same gateway.
Overview
The transcription API follows the OpenAI audio/transcriptions format exactly. If you're already using the OpenAI SDK for transcription, just change the base_url and you're done.
What you need
- An audio file (supported:
flac,mp3,mp4,mpeg,mpga,m4a,ogg,wav,webm) - A configured provider that supports Whisper (OpenAI or Azure OpenAI)
- Max file size: 25 MB
Transcribe Audio
POST /api/v1/audio/transcriptions
Send your audio file as multipart/form-data:
Request
curl -X POST https://backbone.manfred-kunze.dev/api/v1/audio/transcriptions \
-H "Authorization: Bearer sk_your_api_key" \
-F "[email protected]" \
-F "model=whisper-1" \
-F "language=en"
Request Parameters
| Field | Type | Required | Description |
|---|---|---|---|
file | file | Yes | Audio file to transcribe |
model | string | Yes | Platform model name (e.g., whisper-1) or provider/model for BYOK |
language | string | No | ISO-639-1 code (en, de, fr, etc.) |
prompt | string | No | Guide the transcription style |
response_format | string | No | json, text, srt, verbose_json, vtt (default: json) |
temperature | number | No | Sampling temperature (0-1) |
timestamp_granularities[] | array | No | word and/or segment |
Improve accuracy with language
Setting language explicitly improves accuracy and speed —Whisper doesn't have to auto-detect.
Response Formats
Choose the output format that fits your use case:
JSON (default)
{ "text": "Welcome to today's meeting. We'll be discussing the Q4 roadmap..." }
Verbose JSON
Get word-level timestamps for precise alignment:
curl ... -F "response_format=verbose_json" \
-F "timestamp_granularities[]=word" \
-F "timestamp_granularities[]=segment"
{
"task": "transcribe",
"language": "en",
"duration": 45.2,
"text": "Hello and welcome...",
"words": [
{"word": "Hello", "start": 0.0, "end": 0.4},
{"word": "and", "start": 0.5, "end": 0.6},
{"word": "welcome", "start": 0.7, "end": 1.1}
]
}
SRT / VTT Subtitles
For video captioning, use response_format=srt for SRT or response_format=vtt for WebVTT.
Generate Subtitle Files
Generate subtitles
from openai import OpenAI
client = OpenAI(
api_key="sk_your_api_key",
base_url="https://backbone.manfred-kunze.dev/api/v1"
)
srt = client.audio.transcriptions.create(
model="whisper-1",
file=open("video.mp3", "rb"),
response_format="srt"
)
with open("subtitles.srt", "w") as f:
f.write(srt)
Supported Models
The platform provides whisper-1 out of the box on all tiers.
With BYOK (Scale tier and above), you can route through your own provider:
| Provider | Model | Notes |
|---|---|---|
| OpenAI | openai/whisper-1 | Your own OpenAI Whisper API key |
| Azure OpenAI | azure-openai/{deployment} | Your Azure Whisper deployment |