Documentation Index
Fetch the complete documentation index at: https://docs.getlilac.com/llms.txt
Use this file to discover all available pages before exploring further.
Available Models
Lilac currently supports the following models. We’re actively adding more — reach out if there’s a model you’d like to see.| Model | Model ID | Context Length | Quantization | Input Price | Cache Read Price | Output Price |
|---|---|---|---|---|---|---|
| Kimi K2.6 | moonshotai/kimi-k2.6 | 262,144 tokens | INT4 | $0.70 / M tokens | $0.20 / M tokens | $3.50 / M tokens |
| GLM 5.1 | zai-org/glm-5.1 | 202,800 tokens | FP8 | $0.90 / M tokens | $0.27 / M tokens | $3.00 / M tokens |
| Gemma 4 | google/gemma-4-31b-it | 262,100 tokens | BF16 | $0.11 / M tokens | — | $0.35 / M tokens |
| MiniMax M2.7 | minimaxai/minimax-m2.7 | 204,800 tokens | FP8 | $0.30 / M tokens | $0.055 / M tokens | $1.20 / M tokens |
Cache read is the rate for repeated input tokens served from cache. It’s billed at a lower rate than standard input tokens on supported models. Models that don’t support cached input tokens are marked with
—.More models are coming soon. Request a model by emailing contact@getlilac.com.
Kimi K2.6
Moonshot AI’s flagship multimodal reasoning model. 1T total parameters (32B activated) with a Mixture-of-Experts architecture.Kimi K2.6 on Hugging Face
Model card, benchmarks, and deployment guides.
Capabilities
| Capability | Status | Details |
|---|---|---|
| Text input | Supported | Chat, instructions, system prompts |
| Image input | Supported | Native multimodal — pass images via image_url in messages |
| Text output | Supported | Completions, structured JSON, tool calls |
| Reasoning (thinking) | On by default | Chain-of-thought returned in reasoning field. Kimi K2.6’s Moonshot chat template honors chat_template_kwargs: {"thinking": false} (the enable_thinking key is ignored here). For forward compatibility across models, see the Reasoning section. |
| Tool calling | Supported | Function definitions with automatic argument extraction |
| Structured output | Supported | response_format with json_object or json_schema |
Recommended Parameters
From the Kimi K2.6 model card:| Mode | Temperature | Top P |
|---|---|---|
| Thinking (default) | 1.0 | 0.95 |
| Instant (thinking off) | 0.6 | 0.95 |
Vision
Kimi K2.6 natively supports image inputs. Pass images as base64 data URIs or URLs in thecontent array:
- Python
- cURL
GLM 5.1
Z.ai’s next-generation flagship model for agentic engineering. 754B total parameters in a Mixture-of-Experts architecture, with state-of-the-art coding capabilities — it holds up over long-horizon tasks, handles ambiguous problems well, and sustains hundreds of tool calls per run. 202.8K context window, 131.1K max output. MIT licensed.GLM 5.1 on Hugging Face
Model card, benchmarks, and deployment guides.
Capabilities
| Capability | Status | Details |
|---|---|---|
| Text input | Supported | Chat, instructions, system prompts |
| Text output | Supported | Completions, structured JSON, tool calls |
| Image input | Not supported | GLM 5.1 is text-only |
| Reasoning (thinking) | On by default | Chain-of-thought returned in reasoning field. GLM 5.1’s chat template honors chat_template_kwargs: {"enable_thinking": false} (the thinking key is ignored here, and leaving it set alone will cause the chain-of-thought to leak into content terminated by a </think> marker — see the Reasoning section for details and the forward-compatible form). |
| Tool calling | Supported | Function definitions with automatic argument extraction — strong performance on agentic tasks |
| Structured output | Supported | response_format with json_object |
Recommended Parameters
From the Z.ai platform docs:| Mode | Temperature | Top P |
|---|---|---|
| Thinking (default) | 1.0 | 0.95 |
| Instant (thinking off) | 0.6 | 0.95 |
Example request
- Python
- JavaScript
- cURL
Gemma 4
Google’s open-weight multimodal model. 31B parameters with native support for text, image, and video inputs. 262K context window with BF16 precision. Released under the Gemma license.Gemma 4 on Hugging Face
Model card, benchmarks, and deployment guides.
Capabilities
| Capability | Status | Details |
|---|---|---|
| Text input | Supported | Chat, instructions, system prompts |
| Image input | Supported | Native multimodal — pass images via image_url in messages |
| Video input | Supported | Pass video frames as a sequence of images |
| Text output | Supported | Completions, structured JSON |
| Reasoning (thinking) | Off by default | Chain-of-thought returned in reasoning field when enabled. Gemma 4’s chat template honors chat_template_kwargs: {"enable_thinking": true} (the thinking key is ignored here). Unlike Kimi K2.6 and GLM 5.1, thinking is off by default — you must opt in. See the Reasoning section for the forward-compatible form. |
| Tool calling | Supported | Function definitions with automatic argument extraction |
| Structured output | Supported | response_format with json_object or json_schema |
Structured output caveat. On current vLLM builds, combining
--reasoning-parser gemma4 with enable_thinking: false can silently disable xgrammar-backed structured output — see vllm-project/vllm#39130. If you rely on response_format: json_schema with Gemma 4, leave thinking enabled or validate output client-side.Enabling reasoning
Gemma 4 is the only model in the catalog where reasoning is off by default. To turn it on, use the forward-compatible form recommended in the Reasoning section:- Python
- cURL
Vision
Gemma 4 natively supports image inputs. Pass images as base64 data URIs or URLs in thecontent array:
- Python
- cURL
Video
Gemma 4 can process video by accepting a sequence of frames as images. Extract frames from your video and pass them as multipleimage_url entries:
- Python
- cURL
MiniMax M2.7
MiniMax’s text-only language model. 204.8K context window with reasoning, tool calling, and structured output support over an OpenAI-compatible API.MiniMax M2.7 on Hugging Face
Model card, benchmarks, and deployment guides.
Capabilities
| Capability | Status | Details |
|---|---|---|
| Text input | Supported | Chat, instructions, system prompts |
| Text output | Supported | Completions, structured JSON, tool calls |
| Image input | Not supported | MiniMax M2.7 is text-only |
| Reasoning (thinking) | Supported | Chain-of-thought returned in the reasoning field. See the Reasoning section for forward-compatible toggles. |
| Tool calling | Supported | Function definitions with automatic argument extraction |
| Structured output | Supported | response_format with json_object or json_schema |
Limits
| Limit | Value |
|---|---|
| Context length | 204,800 tokens |
| Max completion tokens | 204,800 tokens |
Supported parameters
MiniMax M2.7 accepts the following sampling and request parameters via the OpenAI-compatible API:temperature, top_p, top_k, max_tokens, stop, frequency_penalty, presence_penalty, repetition_penalty, seed, min_p, logit_bias, logprobs, top_logprobs, response_format, structured_outputs, tools, tool_choice.
Example request
- Python
- JavaScript
- cURL
Listing Models via API
- Python
- JavaScript
- cURL

