How the Operator Works

The Sync Loop

The operator runs a reconciliation loop every 30 seconds for each GPU pool:

Schedule check

Is the current time within the pool’s availability window? If not, the operator skips this pool.

GPU discovery

The operator scans nodes matching the pool’s nodeSelector and counts available GPUs, distinguishing between your pods and Lilac inference pods.

Capacity calculation

Applies your configured limits — maxGPUs and maxUtilizationPct — to determine how many GPUs Lilac can use.

Control plane sync

Sends a full state snapshot (node inventory, running workloads, draining workloads) to the Lilac control plane and receives back a desired state with workload assignments.

Reconcile

Creates new inference pods for assigned workloads, drains pods that are no longer needed, and cleans up any pods that have drifted from the desired spec.

Connection States

The operator maintains a connection state with the control plane:

State	Meaning
Connected	Syncing normally
Degraded	Sync failed, retrying on next cycle
Draining	Disconnected for over 10 minutes — gracefully shutting down inference pods

A single successful sync returns the operator from Degraded to Connected.

What Gets Deployed

When the control plane assigns a workload, the operator creates a pod running vLLM — a high-performance inference engine. Each pod:

Runs a single model

Uses one or more GPUs on a single node

Is labeled and managed by the operator

Is automatically cleaned up when no longer needed

Your existing pods, namespaces, and resources are never modified.

How the Operator Works

Architecture

The Sync Loop

Connection States

Preemption

What Gets Deployed

​Architecture

​The Sync Loop

​Connection States

​Preemption

​What Gets Deployed

Architecture

The Sync Loop

Connection States

Preemption

What Gets Deployed