Audio Transcription

Convert audio to text using OpenAI-compatible Whisper endpoints —through the same gateway.

Overview

The transcription API follows the OpenAI audio/transcriptions format exactly. If you're already using the OpenAI SDK for transcription, just change the base_url and you're done.

What you need

  • An audio file (supported: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm)
  • A configured provider that supports Whisper (OpenAI or Azure OpenAI)
  • Max file size: 25 MB

Transcribe Audio

POST /api/v1/audio/transcriptions

Send your audio file as multipart/form-data:

Request

curl -X POST https://backbone.manfred-kunze.dev/api/v1/audio/transcriptions \
  -H "Authorization: Bearer sk_your_api_key" \
  -F "[email protected]" \
  -F "model=whisper-1" \
  -F "language=en"

Request Parameters

FieldTypeRequiredDescription
filefileYesAudio file to transcribe
modelstringYesPlatform model name (e.g., whisper-1) or provider/model for BYOK
languagestringNoISO-639-1 code (en, de, fr, etc.)
promptstringNoGuide the transcription style
response_formatstringNojson, text, srt, verbose_json, vtt (default: json)
temperaturenumberNoSampling temperature (0-1)
timestamp_granularities[]arrayNoword and/or segment

Response Formats

Choose the output format that fits your use case:

JSON (default)

{ "text": "Welcome to today's meeting. We'll be discussing the Q4 roadmap..." }

Verbose JSON

Get word-level timestamps for precise alignment:

curl ... -F "response_format=verbose_json" \
         -F "timestamp_granularities[]=word" \
         -F "timestamp_granularities[]=segment"
{
  "task": "transcribe",
  "language": "en",
  "duration": 45.2,
  "text": "Hello and welcome...",
  "words": [
    {"word": "Hello", "start": 0.0, "end": 0.4},
    {"word": "and", "start": 0.5, "end": 0.6},
    {"word": "welcome", "start": 0.7, "end": 1.1}
  ]
}

SRT / VTT Subtitles

For video captioning, use response_format=srt for SRT or response_format=vtt for WebVTT.

Generate Subtitle Files

Generate subtitles

from openai import OpenAI

client = OpenAI(
    api_key="sk_your_api_key",
    base_url="https://backbone.manfred-kunze.dev/api/v1"
)

srt = client.audio.transcriptions.create(
    model="whisper-1",
    file=open("video.mp3", "rb"),
    response_format="srt"
)

with open("subtitles.srt", "w") as f:
    f.write(srt)

Supported Models

The platform provides whisper-1 out of the box on all tiers.

With BYOK (Scale tier and above), you can route through your own provider:

ProviderModelNotes
OpenAIopenai/whisper-1Your own OpenAI Whisper API key
Azure OpenAIazure-openai/{deployment}Your Azure Whisper deployment

Was this page helpful?