Skip to main content
POST
/
v1
/
tts
Text to speech
curl --request POST \
  --url https://api.kova.ai/v1/tts \
  --header 'Content-Type: application/json' \
  --header 'x-api-key: <api-key>' \
  --data '
{
  "text": "<string>",
  "voice": "<string>",
  "normalize_text": false,
  "response_format": {
    "bitrate": "<string>",
    "encoding": "mp3",
    "sample_rate": 123
  },
  "temperature": 123,
  "timestamps": false
}
'
{
  "audio": "<string>",
  "timestamps": {
    "end_seconds": [
      123
    ],
    "start_seconds": [
      123
    ],
    "words": [
      "<string>"
    ]
  }
}

Documentation Index

Fetch the complete documentation index at: https://docs.kova.ai/llms.txt

Use this file to discover all available pages before exploring further.

POST /v1/tts is the simplest way to generate audio: send text, get a complete audio file back as base64. Use this when you have a single utterance and want to write the result to a file or play it back after generation completes. If you need to start playback before generation finishes, use Streaming TTS instead.

Examples

from kova_tts import KovaTTSClient, AudioResponseFormat

client = KovaTTSClient(api_key="kova_sk_...")
result = client.tts(
    text="Welcome to Kova.",
    voice="cal",
    response_format=AudioResponseFormat(encoding="mp3"),
    timestamps=True,
)
client.write_audio_file(result.audio, "welcome.mp3")
if result.timestamps:
    print(result.timestamps.words)

response_format

response_format accepts an object or, for backward compatibility, a bare encoding string (e.g. "mp3").
encodingsample_ratebitrateNotes
mp3 (default)16000–48000 Hz (default 32000)32–320 kbps (default 128k)Compressed; best general-purpose default.
wav8000–48000 Hz (default 32000)n/aUncompressed; one header per file.
pcm8000–48000 Hz (default 32000)n/aRaw signed 16-bit little-endian. Best for streaming if you assemble your own header.
linear168000–48000 Hz (default 32000)n/aEquivalent to pcm but emits a WAV header per chunk during streaming.
opusone of 8000, 12000, 16000, 24000, 48000 (default 48000)32–192 kbps (default 64k)Compressed, low-latency.
mulaw8000 Hz onlyn/aTelephony (G.711).
alaw8000 Hz onlyn/aTelephony (G.711).
Encoding aliases are accepted and normalized: linear_pcm / linear-pcm / pcm_s16le / raw / mu-law / μ-law / ulaw / u-law / a-law. Documented names are canonical.

See also

  • Streaming TTS — same request shape, lower time-to-first-audio.
  • Voices — available speaker ids.
  • Errors — what 4xx and 5xx bodies look like.

Authorizations

x-api-key
string
header
required

Body

application/json
text
string
required
voice
string
required
normalize_text
boolean
default:false
response_format
AudioResponseFormat · object
temperature
number | null
timestamps
boolean
default:false

Response

Successful Response

audio
string
required
timestamps
SyncTimestamps · object