API Rate Limits and 429 Handling

The Lilac API applies a default rate limit per organization to keep shared inference capacity fair across customers.

Default Limit

200 requests per minute per organization.

This limit applies to all Lilac API requests, including inference calls (chat completions, completions, and responses).

429 Too Many Requests

Requests above the limit may receive an HTTP 429 Too Many Requests response. When this happens:

Back off and retry later.
If a Retry-After header or retry_after field is present, respect it.
Use exponential backoff for automated clients.

Example

curl -i https://api.getlilac.com/v1/chat/completions \
  -H "Authorization: Bearer your-lilac-api-key" \
  -H "Content-Type: application/json" \
  -d '{ "model": "moonshotai/kimi-k2.6", "messages": [{"role": "user", "content": "hi"}] }'

# HTTP/1.1 429 Too Many Requests
# Retry-After: 12

When using the OpenAI SDK, 429 responses are surfaced as RateLimitError (Python) / RateLimitError (JS). The SDK will retry transient errors with exponential backoff by default — keep that behavior or implement your own.

Higher Limits

Need more than 200 requests per minute? Contact Lilac support or book a call with the founders to discuss higher limits.

API Status and Model Performance Inference Pricing

⌘I

​Default Limit

​429 Too Many Requests

​Example

​Higher Limits

Default Limit

429 Too Many Requests

Example

Higher Limits