Streaming text to speech
HTTP API
Streaming TTS
Start playback before generation finishes. SSE-style data records.
POST
Streaming text to speech
Documentation Index
Fetch the complete documentation index at: https://docs.kova.ai/llms.txt
Use this file to discover all available pages before exploring further.
POST /v1/tts/stream returns a text/plain body of SSE-style records — one JSON object per data: line, separated by blank lines. The request body is identical to Text to speech. Use streaming when you want to start playback before generation finishes.
The default response encoding is mp3, not raw PCM. Each
audio_chunk value is base64-encoded audio in your chosen response_format; concatenate the chunks to reconstruct the file.Event stream format
The response body is a sequence of records:- Starts with
data:(note the space). - Contains one JSON object.
- Ends with
\n\n(two newlines).
data: [DONE] — the stream ends when the HTTP body ends.
Event types
audio
timestamps (only when timestamps: true)
words[i] starts at start_seconds[i] and ends at end_seconds[i].
Examples
See also
- WebSocket API — for multi-utterance, interactive use cases.
- Text to speech — same request body, single audio response.