Rate limits

Kova uses concurrency limits, not per-minute or per-hour throttles.

Per-key limit

Tier	Per-key max in-flight	Per-minute / per-hour	Max chars per request
Standard (`kova_sk_`)	10	no throttle	no limit

If you exceed 10 in-flight requests on a single key, the 11th immediately receives a 429 — it does not queue server-side.

Server-wide semaphore

The server has a single global semaphore that bounds total concurrent inference (default MAX_CONCURRENT_TTS=15). Under sustained heavy load — many keys saturating their per-key limit at once — requests may queue briefly even if your key is under its individual limit. In practice this is rare; the per-key limit is the binding constraint for almost all integrations.

Recommended client-side patterns

Queue locally

Maintain a bounded semaphore in your application matching the per-key limit. Drop or backpressure once full.

Retry with backoff

On 429, retry with exponential backoff + jitter. Start at 200ms, cap at 5s.

Split across keys

For higher throughput, create multiple keys (one per service / environment) — each gets its own per-key budget.

Reuse WebSocket

For real-time apps, one WebSocket connection counts as one in-flight slot regardless of how many utterances you send through it.

Asking for higher limits

For sustained high-volume traffic, contact us to discuss provisioned capacity.

Errors

​Per-key limit

​Server-wide semaphore

​Recommended client-side patterns