Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.kova.ai/llms.txt

Use this file to discover all available pages before exploring further.

Kova uses concurrency limits, not per-minute or per-hour throttles.

Per-key limit

TierPer-key max in-flightPer-minute / per-hourMax chars per request
Standard (kova_sk_)10no throttleno limit
If you exceed 10 in-flight requests on a single key, the 11th immediately receives a 429 — it does not queue server-side.

Server-wide semaphore

The server has a single global semaphore that bounds total concurrent inference (default MAX_CONCURRENT_TTS=15). Under sustained heavy load — many keys saturating their per-key limit at once — requests may queue briefly even if your key is under its individual limit. In practice this is rare; the per-key limit is the binding constraint for almost all integrations.

Queue locally

Maintain a bounded semaphore in your application matching the per-key limit. Drop or backpressure once full.

Retry with backoff

On 429, retry with exponential backoff + jitter. Start at 200ms, cap at 5s.

Split across keys

For higher throughput, create multiple keys (one per service / environment) — each gets its own per-key budget.

Reuse WebSocket

For real-time apps, one WebSocket connection counts as one in-flight slot regardless of how many utterances you send through it.

Asking for higher limits

For sustained high-volume traffic, contact us to discuss provisioned capacity.