Text to speech
HTTP API
Text to speech
Synchronous TTS. Returns base64-encoded audio after generation completes.
POST
Text to speech
Documentation Index
Fetch the complete documentation index at: https://docs.kova.ai/llms.txt
Use this file to discover all available pages before exploring further.
POST /v1/tts is the simplest way to generate audio: send text, get a complete audio file back as base64. Use this when you have a single utterance and want to write the result to a file or play it back after generation completes.
If you need to start playback before generation finishes, use Streaming TTS instead.
Examples
response_format
response_format accepts an object or, for backward compatibility, a bare encoding string (e.g. "mp3").
| encoding | sample_rate | bitrate | Notes |
|---|---|---|---|
mp3 (default) | 16000–48000 Hz (default 32000) | 32–320 kbps (default 128k) | Compressed; best general-purpose default. |
wav | 8000–48000 Hz (default 32000) | n/a | Uncompressed; one header per file. |
pcm | 8000–48000 Hz (default 32000) | n/a | Raw signed 16-bit little-endian. Best for streaming if you assemble your own header. |
linear16 | 8000–48000 Hz (default 32000) | n/a | Equivalent to pcm but emits a WAV header per chunk during streaming. |
opus | one of 8000, 12000, 16000, 24000, 48000 (default 48000) | 32–192 kbps (default 64k) | Compressed, low-latency. |
mulaw | 8000 Hz only | n/a | Telephony (G.711). |
alaw | 8000 Hz only | n/a | Telephony (G.711). |
See also
- Streaming TTS — same request shape, lower time-to-first-audio.
- Voices — available speaker ids.
- Errors — what 4xx and 5xx bodies look like.
Authorizations
Body
application/json