Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.getlilac.com/llms.txt

Use this file to discover all available pages before exploring further.

The Lilac API applies a default rate limit per organization to keep shared inference capacity fair across customers.

Default Limit

  • 200 requests per minute per organization.
This limit applies to all Lilac API requests, including inference calls (chat completions, completions, and responses).

429 Too Many Requests

Requests above the limit may receive an HTTP 429 Too Many Requests response. When this happens:
  • Back off and retry later.
  • If a Retry-After header or retry_after field is present, respect it.
  • Use exponential backoff for automated clients.

Example

curl -i https://api.getlilac.com/v1/chat/completions \
  -H "Authorization: Bearer your-lilac-api-key" \
  -H "Content-Type: application/json" \
  -d '{ "model": "moonshotai/kimi-k2.6", "messages": [{"role": "user", "content": "hi"}] }'

# HTTP/1.1 429 Too Many Requests
# Retry-After: 12
When using the OpenAI SDK, 429 responses are surfaced as RateLimitError (Python) / RateLimitError (JS). The SDK will retry transient errors with exponential backoff by default — keep that behavior or implement your own.

Higher Limits

Need more than 200 requests per minute? Contact Lilac support or book a call with the founders to discuss higher limits.