Kova uses concurrency limits, not per-minute or per-hour throttles.Documentation Index
Fetch the complete documentation index at: https://docs.kova.ai/llms.txt
Use this file to discover all available pages before exploring further.
Per-key limit
| Tier | Per-key max in-flight | Per-minute / per-hour | Max chars per request |
|---|---|---|---|
Standard (kova_sk_) | 10 | no throttle | no limit |
Server-wide semaphore
The server has a single global semaphore that bounds total concurrent inference (defaultMAX_CONCURRENT_TTS=15). Under sustained heavy load — many keys saturating their per-key limit at once — requests may queue briefly even if your key is under its individual limit.
In practice this is rare; the per-key limit is the binding constraint for almost all integrations.
Recommended client-side patterns
Queue locally
Maintain a bounded semaphore in your application matching the per-key limit. Drop or backpressure once full.
Retry with backoff
On 429, retry with exponential backoff + jitter. Start at 200ms, cap at 5s.
Split across keys
For higher throughput, create multiple keys (one per service / environment) — each gets its own per-key budget.
Reuse WebSocket
For real-time apps, one WebSocket connection counts as one in-flight slot regardless of how many utterances you send through it.