Audio Transcription

Convert audio to text using OpenAI-compatible Whisper endpoints —through the same gateway.

Overview

The transcription API follows the OpenAI audio/transcriptions format exactly. If you're already using the OpenAI SDK for transcription, just change the base_url and you're done.

What you need

An audio file (supported: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm)
A configured provider that supports Whisper (OpenAI or Azure OpenAI)
Max file size: 25 MB

Transcribe Audio

`POST /api/v1/audio/transcriptions`

Send your audio file as multipart/form-data:

Request

curl -X POST https://backbone.manfred-kunze.dev/api/v1/audio/transcriptions \
  -H "Authorization: Bearer sk_your_api_key" \
  -F "[email protected]" \
  -F "model=whisper-1" \
  -F "language=en"

Request Parameters

Field	Type	Required	Description
`file`	file	Yes	Audio file to transcribe
`model`	string	Yes	Platform model name (e.g., `whisper-1`) or `provider/model` for BYOK
`language`	string	No	ISO-639-1 code (`en`, `de`, `fr`, etc.)
`prompt`	string	No	Guide the transcription style
`response_format`	string	No	`json`, `text`, `srt`, `verbose_json`, `vtt` (default: `json`)
`temperature`	number	No	Sampling temperature (0-1)
`timestamp_granularities[]`	array	No	`word` and/or `segment`

Improve accuracy with language

Setting language explicitly improves accuracy and speed —Whisper doesn't have to auto-detect.

Response Formats

Choose the output format that fits your use case:

JSON (default)

{ "text": "Welcome to today's meeting. We'll be discussing the Q4 roadmap..." }

Verbose JSON

Get word-level timestamps for precise alignment:

curl ... -F "response_format=verbose_json" \
         -F "timestamp_granularities[]=word" \
         -F "timestamp_granularities[]=segment"

{
  "task": "transcribe",
  "language": "en",
  "duration": 45.2,
  "text": "Hello and welcome...",
  "words": [
    {"word": "Hello", "start": 0.0, "end": 0.4},
    {"word": "and", "start": 0.5, "end": 0.6},
    {"word": "welcome", "start": 0.7, "end": 1.1}
  ]
}

SRT / VTT Subtitles

For video captioning, use response_format=srt for SRT or response_format=vtt for WebVTT.

Generate Subtitle Files

Generate subtitles

from openai import OpenAI

client = OpenAI(
    api_key="sk_your_api_key",
    base_url="https://backbone.manfred-kunze.dev/api/v1"
)

srt = client.audio.transcriptions.create(
    model="whisper-1",
    file=open("video.mp3", "rb"),
    response_format="srt"
)

with open("subtitles.srt", "w") as f:
    f.write(srt)

Supported Models

The platform provides whisper-1 out of the box on all tiers.

With BYOK (Scale tier and above), you can route through your own provider:

Provider	Model	Notes
OpenAI	`openai/whisper-1`	Your own OpenAI Whisper API key
Azure OpenAI	`azure-openai/{deployment}`	Your Azure Whisper deployment