Documentation Index
Fetch the complete documentation index at: https://docs.kova.ai/llms.txt
Use this file to discover all available pages before exploring further.
WS /v1/tts/ws is the recommended transport for interactive applications — voice agents, dialog systems, anything where text arrives incrementally over the lifetime of a session.
A single WebSocket connection multiplexes multiple contexts. Each context represents one continuous utterance with a chosen voice and format. You can have many open contexts on one connection at once; the server tags every frame with its context_id so you can route audio back to the right player.
When to use WebSocket vs streaming HTTP
| Need | Use |
|---|---|
| One utterance, low-latency playback | Streaming HTTP — simpler, fewer moving parts |
| Many utterances over a session, want connection reuse | WebSocket |
| Multiple parallel utterances (e.g. background music + dialog) | WebSocket with multiple contexts |
| Browser client | Streaming HTTP — browsers can’t set custom WS headers reliably |
Authentication
Thex-api-key header is sent on the WebSocket handshake (not as the first frame after connect):
Connection lifecycle
Start a context
Send a
start_context frame with the voice, model, and (optionally) timestamps + response_format. The server replies with context_started.Send text
Send one or more
send_text frames. The server generates audio incrementally and emits audio_chunk frames (plus timestamps if enabled).Flush
Send a
flush frame to mark the end of the current utterance. The server finishes any in-progress generation and emits flush_completed.Close the context
Send
close_context to release server resources for that context. The server emits context_closed. The WebSocket itself stays open.Audio format
WebSocketaudio_chunk values are base64-encoded little-endian 16-bit PCM at 32 kHz mono by default. You can override per-context by passing response_format to start_context — same shape as the HTTP request.
Next
- Frame reference — every client + server frame with shapes.
- Example: streaming a long document — concrete end-to-end walkthrough.