Documentation Index
Fetch the complete documentation index at: https://docs.getlilac.com/llms.txt
Use this file to discover all available pages before exploring further.
The responses endpoint is OpenAI’s newer API format with built-in support for structured output and tool calling.
Endpoint
POST https://api.getlilac.com/v1/responses
Example
from openai import OpenAI
client = OpenAI(
base_url="https://api.getlilac.com/v1",
api_key="your-lilac-api-key",
)
response = client.responses.create(
model="moonshotai/kimi-k2.6",
input="Explain GPU inference in two sentences.",
)
print(response.output_text)
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.getlilac.com/v1",
apiKey: "your-lilac-api-key",
});
const response = await client.responses.create({
model: "moonshotai/kimi-k2.6",
input: "Explain GPU inference in two sentences.",
});
console.log(response.output_text);
curl https://api.getlilac.com/v1/responses \
-H "Authorization: Bearer your-lilac-api-key" \
-H "Content-Type: application/json" \
-d '{
"model": "moonshotai/kimi-k2.6",
"input": "Explain GPU inference in two sentences."
}'
Request Parameters
Required
| Parameter | Type | Description |
|---|
model | string | Model ID (e.g., moonshotai/kimi-k2.6). |
input | string or array | User prompt as a string, or conversation history as an array of message objects. |
Sampling
| Parameter | Type | Default | Description |
|---|
instructions | string | null | System-level instructions for the model. |
temperature | float | 1.0 | Sampling temperature (0–2). |
top_p | float | 1.0 | Nucleus sampling threshold. |
max_output_tokens | integer | null | Maximum tokens to generate (including reasoning tokens). |
stream | boolean | false | Stream the response via SSE. |
Structured Output
| Parameter | Type | Default | Description |
|---|
text | object | null | Structured output format with JSON Schema. See example below. |
The responses endpoint uses a flat tool format — name, description, and parameters are top-level fields, not nested under function.
| Parameter | Type | Default | Description |
|---|
tools | array | null | List of tool definitions (see format below). |
The tool format differs from /v1/chat/completions. See the tool calling example below for the correct format.
Reasoning
Models with reasoning (like Kimi K2.6 and GLM 5.1) include chain-of-thought by default. The response includes a reasoning output item containing the model’s thinking. Reasoning tokens count toward your usage.
Disabling reasoning is not currently supported on the /v1/responses endpoint. To control reasoning, use Chat Completions with chat_template_kwargs: {"thinking": false} instead.
Structured Output
Force the model to return JSON matching a schema:
response = client.responses.create(
model="moonshotai/kimi-k2.6",
input="Give me a color with its name and hex code.",
text={
"format": {
"type": "json_schema",
"name": "color",
"schema": {
"type": "object",
"properties": {
"name": {"type": "string"},
"hex": {"type": "string"}
},
"required": ["name", "hex"]
}
}
},
)
print(response.output_text)
# {"name": "Teal", "hex": "#008080"}
const response = await client.responses.create({
model: "moonshotai/kimi-k2.6",
input: "Give me a color with its name and hex code.",
text: {
format: {
type: "json_schema",
name: "color",
schema: {
type: "object",
properties: {
name: { type: "string" },
hex: { type: "string" },
},
required: ["name", "hex"],
},
},
},
});
console.log(response.output_text);
curl https://api.getlilac.com/v1/responses \
-H "Authorization: Bearer your-lilac-api-key" \
-H "Content-Type: application/json" \
-d '{
"model": "moonshotai/kimi-k2.6",
"input": "Give me a color with its name and hex code.",
"text": {
"format": {
"type": "json_schema",
"name": "color",
"schema": {
"type": "object",
"properties": {
"name": {"type": "string"},
"hex": {"type": "string"}
},
"required": ["name", "hex"]
}
}
}
}'
The responses endpoint uses a flat tool format where name, description, and parameters are at the top level:
response = client.responses.create(
model="moonshotai/kimi-k2.6",
input="What's the weather in NYC?",
tools=[
{
"type": "function",
"name": "get_weather",
"description": "Get current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string"}
},
"required": ["location"]
}
}
],
)
for item in response.output:
if item.type == "function_call":
print(f"{item.name}({item.arguments})")
# get_weather({"location": "NYC"})
const response = await client.responses.create({
model: "moonshotai/kimi-k2.6",
input: "What's the weather in NYC?",
tools: [
{
type: "function",
name: "get_weather",
description: "Get current weather for a location",
parameters: {
type: "object",
properties: {
location: { type: "string" },
},
required: ["location"],
},
},
],
});
for (const item of response.output) {
if (item.type === "function_call") {
console.log(`${item.name}(${item.arguments})`);
}
}
curl https://api.getlilac.com/v1/responses \
-H "Authorization: Bearer your-lilac-api-key" \
-H "Content-Type: application/json" \
-d '{
"model": "moonshotai/kimi-k2.6",
"input": "What is the weather in NYC?",
"tools": [
{
"type": "function",
"name": "get_weather",
"description": "Get current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string"}
},
"required": ["location"]
}
}
]
}'
With Instructions
Use instructions to set system-level context:
response = client.responses.create(
model="moonshotai/kimi-k2.6",
input="Give me a color",
instructions="Always respond in JSON with 'name' and 'hex' fields.",
max_output_tokens=50,
)
curl https://api.getlilac.com/v1/responses \
-H "Authorization: Bearer your-lilac-api-key" \
-H "Content-Type: application/json" \
-d '{
"model": "moonshotai/kimi-k2.6",
"input": "Give me a color",
"instructions": "Always respond in JSON with name and hex fields.",
"max_output_tokens": 50
}'
Differences from Chat Completions
| Feature | Chat Completions | Responses |
|---|
| Input format | messages array | input string or array |
| Tool format | Nested under function | Flat (name/description/parameters at top level) |
| Max tokens param | max_tokens | max_output_tokens |
| Structured output | response_format | text.format |
| Disable reasoning | chat_template_kwargs: {"thinking": false} | Not supported |
| System prompt | system role message | instructions parameter |