Frames are JSON objects sent as WebSocket text messages. The presence of a discriminator field (e.g.Documentation Index
Fetch the complete documentation index at: https://docs.kova.ai/llms.txt
Use this file to discover all available pages before exploring further.
start_context, send_text, audio_chunk) identifies the frame type.
Client → Server frames
start_context
Opens a new context. Replies withcontext_started.
send_text
Add text to an open context. The server starts generating audio for it incrementally.flush
Mark the end of the current utterance. The server finishes any in-progress generation and emitsflush_completed.
close_context
Release server resources for a context. The WebSocket itself stays open.Server → Client frames
context_started
Echoed by the server after a successfulstart_context.
audio_chunk
Audio bytes for a context.response_format is set: base64-encoded little-endian int16 PCM at 32 kHz mono.
timestamps
Word timing for a context (only when the context was started withtimestamps: true).
flush_completed
Emitted after everyaudio_chunk for the flushed utterance has been sent.
context_closed
Confirmation that a context has been released.error
Server-side error for a specific context or flush. The WebSocket stays open; you can continue with other contexts.Frame ordering
Within a singlecontext_id:
context_startedarrives before anyaudio_chunkortimestamps.audio_chunkandtimestampsinterleave during generation.flush_completedarrives after the lastaudio_chunkfor the flushed utterance.context_closedarrives last.
context_id to demultiplex on the client.
See also
- Example: streaming a long document — end-to-end walkthrough.
- Overview — lifecycle, browser caveat, authentication.