Responses API

The responses endpoint is OpenAI’s newer API format with built-in support for structured output and tool calling.

Endpoint

POST https://api.getlilac.com/v1/responses

Example

Python
JavaScript
cURL

from openai import OpenAI

client = OpenAI(
    base_url="https://api.getlilac.com/v1",
    api_key="your-lilac-api-key",
)

response = client.responses.create(
    model="moonshotai/kimi-k2.6",
    input="Explain GPU inference in two sentences.",
)

print(response.output_text)

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.getlilac.com/v1",
  apiKey: "your-lilac-api-key",
});

const response = await client.responses.create({
  model: "moonshotai/kimi-k2.6",
  input: "Explain GPU inference in two sentences.",
});

console.log(response.output_text);

curl https://api.getlilac.com/v1/responses \
  -H "Authorization: Bearer your-lilac-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "moonshotai/kimi-k2.6",
    "input": "Explain GPU inference in two sentences."
  }'

Request Parameters

Required

Parameter	Type	Description
`model`	`string`	Model ID (e.g., `moonshotai/kimi-k2.6`).
`input`	`string` or `array`	User prompt as a string, or conversation history as an array of message objects.

Sampling

Parameter	Type	Default	Description
`instructions`	`string`	`null`	System-level instructions for the model.
`temperature`	`float`	`1.0`	Sampling temperature (0–2).
`top_p`	`float`	`1.0`	Nucleus sampling threshold.
`max_output_tokens`	`integer`	`null`	Maximum tokens to generate (including reasoning tokens).
`stream`	`boolean`	`false`	Stream the response via SSE.

Structured Output

Parameter	Type	Default	Description
`text`	`object`	`null`	Structured output format with JSON Schema. See example below.

Tools

The responses endpoint uses a flat tool format — name, description, and parameters are top-level fields, not nested under function.

Parameter	Type	Default	Description
`tools`	`array`	`null`	List of tool definitions (see format below).

The tool format differs from /v1/chat/completions. See the tool calling example below for the correct format.

Reasoning

Models with reasoning (like Kimi K2.6 and GLM 5.2) include chain-of-thought by default. The response includes a reasoning output item containing the model’s thinking. Reasoning tokens count toward your usage.

Disabling reasoning is not currently supported on the /v1/responses endpoint. To control reasoning, use Chat Completions with chat_template_kwargs: {"thinking": false} instead.

Structured Output

Force the model to return JSON matching a schema:

Python
JavaScript
cURL

response = client.responses.create(
    model="moonshotai/kimi-k2.6",
    input="Give me a color with its name and hex code.",
    text={
        "format": {
            "type": "json_schema",
            "name": "color",
            "schema": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "hex": {"type": "string"}
                },
                "required": ["name", "hex"]
            }
        }
    },
)

print(response.output_text)
# {"name": "Teal", "hex": "#008080"}

const response = await client.responses.create({
  model: "moonshotai/kimi-k2.6",
  input: "Give me a color with its name and hex code.",
  text: {
    format: {
      type: "json_schema",
      name: "color",
      schema: {
        type: "object",
        properties: {
          name: { type: "string" },
          hex: { type: "string" },
        },
        required: ["name", "hex"],
      },
    },
  },
});

console.log(response.output_text);

curl https://api.getlilac.com/v1/responses \
  -H "Authorization: Bearer your-lilac-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "moonshotai/kimi-k2.6",
    "input": "Give me a color with its name and hex code.",
    "text": {
      "format": {
        "type": "json_schema",
        "name": "color",
        "schema": {
          "type": "object",
          "properties": {
            "name": {"type": "string"},
            "hex": {"type": "string"}
          },
          "required": ["name", "hex"]
        }
      }
    }
  }'

Tool Calling

The responses endpoint uses a flat tool format where name, description, and parameters are at the top level:

Python
JavaScript
cURL

response = client.responses.create(
    model="moonshotai/kimi-k2.6",
    input="What's the weather in NYC?",
    tools=[
        {
            "type": "function",
            "name": "get_weather",
            "description": "Get current weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string"}
                },
                "required": ["location"]
            }
        }
    ],
)

for item in response.output:
    if item.type == "function_call":
        print(f"{item.name}({item.arguments})")
        # get_weather({"location": "NYC"})

const response = await client.responses.create({
  model: "moonshotai/kimi-k2.6",
  input: "What's the weather in NYC?",
  tools: [
    {
      type: "function",
      name: "get_weather",
      description: "Get current weather for a location",
      parameters: {
        type: "object",
        properties: {
          location: { type: "string" },
        },
        required: ["location"],
      },
    },
  ],
});

for (const item of response.output) {
  if (item.type === "function_call") {
    console.log(`${item.name}(${item.arguments})`);
  }
}

curl https://api.getlilac.com/v1/responses \
  -H "Authorization: Bearer your-lilac-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "moonshotai/kimi-k2.6",
    "input": "What is the weather in NYC?",
    "tools": [
      {
        "type": "function",
        "name": "get_weather",
        "description": "Get current weather for a location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {"type": "string"}
          },
          "required": ["location"]
        }
      }
    ]
  }'

With Instructions

Use instructions to set system-level context:

Python
cURL

response = client.responses.create(
    model="moonshotai/kimi-k2.6",
    input="Give me a color",
    instructions="Always respond in JSON with 'name' and 'hex' fields.",
    max_output_tokens=50,
)

curl https://api.getlilac.com/v1/responses \
  -H "Authorization: Bearer your-lilac-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "moonshotai/kimi-k2.6",
    "input": "Give me a color",
    "instructions": "Always respond in JSON with name and hex fields.",
    "max_output_tokens": 50
  }'

Differences from Chat Completions

Feature	Chat Completions	Responses
Input format	`messages` array	`input` string or array
Tool format	Nested under `function`	Flat (name/description/parameters at top level)
Max tokens param	`max_tokens`	`max_output_tokens`
Structured output	`response_format`	`text.format`
Disable reasoning	`chat_template_kwargs: {"thinking": false}`	Not supported
System prompt	`system` role message	`instructions` parameter

​Endpoint

​Example

​Request Parameters

​Required

​Sampling

​Structured Output

​Tools

​Reasoning

​Structured Output

​Tool Calling

​With Instructions

​Differences from Chat Completions

Endpoint

Example

Request Parameters

Required

Sampling

Structured Output

Tools

Reasoning

Structured Output

Tool Calling

With Instructions

Differences from Chat Completions