Skip to main content
The responses endpoint is OpenAI’s newer API format with built-in support for structured output and tool calling.

Endpoint

POST https://api.getlilac.com/v1/responses

Example

from openai import OpenAI

client = OpenAI(
    base_url="https://api.getlilac.com/v1",
    api_key="your-lilac-api-key",
)

response = client.responses.create(
    model="moonshotai/kimi-k2.5",
    input="Explain GPU inference in two sentences.",
)

print(response.output_text)

Request Parameters

Required

ParameterTypeDescription
modelstringModel ID (e.g., moonshotai/kimi-k2.5).
inputstring or arrayUser prompt as a string, or conversation history as an array of message objects.

Sampling

ParameterTypeDefaultDescription
instructionsstringnullSystem-level instructions for the model.
temperaturefloat1.0Sampling temperature (0–2).
top_pfloat1.0Nucleus sampling threshold.
max_output_tokensintegernullMaximum tokens to generate (including reasoning tokens).
streambooleanfalseStream the response via SSE.

Structured Output

ParameterTypeDefaultDescription
textobjectnullStructured output format with JSON Schema. See example below.

Tools

The responses endpoint uses a flat tool formatname, description, and parameters are top-level fields, not nested under function.
ParameterTypeDefaultDescription
toolsarraynullList of tool definitions (see format below).
The tool format differs from /v1/chat/completions. See the tool calling example below for the correct format.

Reasoning

Models with reasoning (like Kimi K2.5 and GLM 5.1) include chain-of-thought by default. The response includes a reasoning output item containing the model’s thinking. Reasoning tokens count toward your usage.
Disabling reasoning is not currently supported on the /v1/responses endpoint. To control reasoning, use Chat Completions with chat_template_kwargs: {"thinking": false} instead.

Structured Output

Force the model to return JSON matching a schema:
response = client.responses.create(
    model="moonshotai/kimi-k2.5",
    input="Give me a color with its name and hex code.",
    text={
        "format": {
            "type": "json_schema",
            "name": "color",
            "schema": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "hex": {"type": "string"}
                },
                "required": ["name", "hex"]
            }
        }
    },
)

print(response.output_text)
# {"name": "Teal", "hex": "#008080"}

Tool Calling

The responses endpoint uses a flat tool format where name, description, and parameters are at the top level:
response = client.responses.create(
    model="moonshotai/kimi-k2.5",
    input="What's the weather in NYC?",
    tools=[
        {
            "type": "function",
            "name": "get_weather",
            "description": "Get current weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string"}
                },
                "required": ["location"]
            }
        }
    ],
)

for item in response.output:
    if item.type == "function_call":
        print(f"{item.name}({item.arguments})")
        # get_weather({"location": "NYC"})

With Instructions

Use instructions to set system-level context:
response = client.responses.create(
    model="moonshotai/kimi-k2.5",
    input="Give me a color",
    instructions="Always respond in JSON with 'name' and 'hex' fields.",
    max_output_tokens=50,
)

Differences from Chat Completions

FeatureChat CompletionsResponses
Input formatmessages arrayinput string or array
Tool formatNested under functionFlat (name/description/parameters at top level)
Max tokens parammax_tokensmax_output_tokens
Structured outputresponse_formattext.format
Disable reasoningchat_template_kwargs: {"thinking": false}Not supported
System promptsystem role messageinstructions parameter