Available Models
Lilac currently supports the following models. We’re actively adding more — reach out if there’s a model you’d like to see.| Model | Model ID | Context Length | Input Price | Output Price |
|---|---|---|---|---|
| Kimi K2.5 | moonshotai/kimi-k2.5 | 262,144 tokens | $0.40 / M tokens | $2.00 / M tokens |
| GLM 5.1 | zai-org/glm-5.1 | 202,800 tokens | $0.90 / M tokens | $3.00 / M tokens |
| Gemma 4 (coming soon) | google/gemma-4 | — | $0.13 / M tokens | $0.38 / M tokens |
More models are coming soon. Request a model by emailing contact@getlilac.com.
Kimi K2.5
Moonshot AI’s flagship multimodal reasoning model. 1T total parameters (32B activated) with a Mixture-of-Experts architecture.Kimi K2.5 on Hugging Face
Model card, benchmarks, and deployment guides.
Capabilities
| Capability | Status | Details |
|---|---|---|
| Text input | Supported | Chat, instructions, system prompts |
| Image input | Supported | Native multimodal — pass images via image_url in messages |
| Text output | Supported | Completions, structured JSON, tool calls |
| Reasoning (thinking) | On by default | Chain-of-thought returned in reasoning field. Disable with chat_template_kwargs: {"thinking": false} |
| Tool calling | Supported | Function definitions with automatic argument extraction |
| Structured output | Supported | response_format with json_object or json_schema |
Recommended Parameters
From the Kimi K2.5 model card:| Mode | Temperature | Top P |
|---|---|---|
| Thinking (default) | 1.0 | 0.95 |
| Instant (thinking off) | 0.6 | 0.95 |
Vision
Kimi K2.5 natively supports image inputs. Pass images as base64 data URIs or URLs in thecontent array:
- Python
- cURL
GLM 5.1
Z.ai’s next-generation flagship model for agentic engineering. 754B total parameters in a Mixture-of-Experts architecture, with state-of-the-art coding capabilities — it holds up over long-horizon tasks, handles ambiguous problems well, and sustains hundreds of tool calls per run. 202.8K context window, 131.1K max output. MIT licensed.GLM 5.1 on Hugging Face
Model card, benchmarks, and deployment guides.
Capabilities
| Capability | Status | Details |
|---|---|---|
| Text input | Supported | Chat, instructions, system prompts |
| Text output | Supported | Completions, structured JSON, tool calls |
| Image input | Not supported | GLM 5.1 is text-only |
| Reasoning (thinking) | On by default | Chain-of-thought returned in reasoning field. Disable with chat_template_kwargs: {"thinking": false} |
| Tool calling | Supported | Function definitions with automatic argument extraction — strong performance on agentic tasks |
| Structured output | Supported | response_format with json_object |
Recommended Parameters
From the Z.ai platform docs:| Mode | Temperature | Top P |
|---|---|---|
| Thinking (default) | 1.0 | 0.95 |
| Instant (thinking off) | 0.6 | 0.95 |
Example request
- Python
- JavaScript
- cURL
Listing Models via API
- Python
- JavaScript
- cURL

