Skip to main content

Speech-to-Text (STT)

POST /v1/audio/transcriptions
Transcribe audio files to text, compatible with the OpenAI Whisper API format.

Request Parameters

ParameterTypeRequiredDescription
filefileYesAudio file (multipart/form-data)
modelstringYesModel name: whisper-1, gpt-4o-transcribe
languagestringNoAudio language (ISO-639-1 format), e.g. zh, en, ja
response_formatstringNoOutput format: json (default), text, srt, verbose_json, vtt
temperaturenumberNoSampling temperature, 0-1
promptstringNoPrompt to help the model understand context

Supported Audio Formats

mp3, mp4, mpeg, mpga, m4a, wav, webm

Request Examples

curl -X POST https://crazyrouter.com/v1/audio/transcriptions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F file=@audio.mp3 \
  -F model=whisper-1 \
  -F language=en \
  -F response_format=json

Response Examples

JSON Format

{
  "text": "Hello, welcome to the Crazyrouter API. Today we'll introduce the speech-to-text feature."
}

verbose_json Format

{
  "task": "transcribe",
  "language": "english",
  "duration": 5.2,
  "text": "Hello, welcome to the Crazyrouter API.",
  "segments": [
    {
      "id": 0,
      "start": 0.0,
      "end": 2.5,
      "text": "Hello, welcome to the Crazyrouter API."
    }
  ]
}

SRT Format

1
00:00:00,000 --> 00:00:02,500
Hello, welcome to the Crazyrouter API.

Audio Translation

POST /v1/audio/translations
Translate non-English audio to English text. Parameters are the same as the transcription endpoint.
Python
with open("chinese_audio.mp3", "rb") as audio_file:
    translation = client.audio.translations.create(
        model="whisper-1",
        file=audio_file
    )

print(translation.text)  # Outputs English translation
Specifying the language parameter can improve transcription accuracy. Audio file size limit is 25MB.