Documentation Index
Fetch the complete documentation index at: https://docs.crazyrouter.com/llms.txt
Use this file to discover all available pages before exploring further.
Gemini Audio Understanding
POST /v1beta/models/{model}:generateContent
As of 2026-04-08, successful Crazyrouter and local :4000 retests show:
gemini-2.5-pro can read audio/wav
- the currently verified primary path is
inlineData
- short audio classification, transcription, language hints, and summaries can be requested directly in text
Verified Minimal Request
curl "https://crazyrouter.com/v1beta/models/gemini-2.5-pro:generateContent?key=YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"contents": [
{
"role": "user",
"parts": [
{
"inlineData": {
"mimeType": "audio/wav",
"data": "BASE64_WAV_DATA"
}
},
{
"text": "Return JSON with transcript, language, and summary."
}
]
}
],
"generationConfig": {
"maxOutputTokens": 512
}
}'
Observed successful response shape:
{
"candidates": [
{
"content": {
"parts": [
{
"text": "```json\n{\n \"transcript\": \"ding\",\n \"language\": \"zh-CN\",\n \"summary\": \"The audio contains one short notification sound.\"\n}\n```"
}
]
}
}
]
}
Request Notes
- Prefer
inlineData for audio understanding
- Keep
mimeType aligned with the actual format, such as audio/wav or audio/mpeg
- Put raw Base64 into
data without a Data URL prefix
- If you need strict JSON instead of JSON-looking text, combine this route with Structured Outputs
This page only covers the short-audio understanding path that was actually rechecked successfully. For longer STT workflows, realtime audio, or TTS, see the existing STT, Realtime, and TTS pages.