This walkthrough shows feeding sentences from a long document into a single WebSocket context, receiving PCM audio chunks as they’re generated, and writing the result to a single WAV file at the end.Documentation Index
Fetch the complete documentation index at: https://docs.kova.ai/llms.txt
Use this file to discover all available pages before exploring further.
Setup
Install the SDK and haveKOVA_API_KEY set in your environment.
Example
What’s happening
- Open one context for the whole document. Reusing a context means the voice and format stay consistent across the entire output, with no warm-up overhead between sentences.
- Send each sentence as a separate
send_textframe with trailing whitespace so the model treats them as sequential prose, not concatenated tokens. - Audio streams back incrementally — by the time you’ve sent the third sentence, audio for the first is already arriving.
flushwith a sentinelflush_idlets you know when the server has finished generating the last sentence (you wait for the matchingflush_completed).- Assemble PCM and write a single WAV file at the end. Python uses stdlib
wave; JS uses the exportedpcm16ToWavBytesprimitive +node:fs. The SDKs don’t currently ship a higher-levelwritePcm16WavFilehelper — if one is added later, this page should be updated to prefer it.
Variations
- Multiple parallel contexts: open
ctx-narrationandctx-sfxconcurrently with different voices. Demultiplexaudio_chunkbycontext_idon the client. - Different output format: swap
encoding: "pcm"forencoding: "mp3"and skip the WAV-header step. - Without timestamps: omit
timestamps: truefromstart_contextto skip word-timing frames.