Operator Installation

This guide walks through installing the Lilac GPU operator in your cluster. The operator is deployed via a Helm chart hosted on AWS ECR.

Prerequisites

Kubernetes 1.28+ — if you’re running a single node and aren’t already in a Kubernetes cluster, k3s is a lightweight option that’s easier to set up and connect
kubectl configured with cluster admin access
helm v3
NVIDIA GPU nodes with the NVIDIA GPU Operator installed
A Lilac supplier API key (generated during onboarding)

Install with Helm

Create the namespace

kubectl create namespace lilac-system

Install the operator

helm install lilac-gpu-operator \
  oci://public.ecr.aws/lilac/lilac-gpu-operator \
  --version 0.3.5 \
  --namespace lilac-system \
  --set apiKey="YOUR_SUPPLIER_API_KEY" \
  --set clusterName="my-gpu-cluster"

If you hit AWS Public ECR rate limits while installing or pulling the operator image, use our Docker Hub mirror instead. Keep the same command and values, but replace oci://public.ecr.aws/lilac/lilac-gpu-operator with oci://docker.io/getlilac/lilac-gpu-operator.

Replace my-gpu-cluster with a name that identifies this cluster in your dashboard.

The Helm chart automatically creates the API key secret, control plane config, CRDs, RBAC, and service accounts. No manual setup required.

Verify the installation

kubectl get pods -n lilac-system

You should see the operator pod running:

NAME                                              READY   STATUS    RESTARTS   AGE
lilac-gpu-operator-...                             1/1     Running   0          30s

Create a GPU pool

Apply a basic GPUPool to tell the operator which GPUs to manage. Save the following as gpu-pool.yaml and apply it:

apiVersion: gpu.getlilac.com/v1alpha1
kind: GPUPool
metadata:
  name: b200-gpu-pool
  namespace: lilac-system
spec:
  nodeSelector:
    nvidia.com/gpu.product: B200
  cache:
    enabled: true
    capacity: 1000Gi
  workloads:
    inference: true

kubectl apply -f gpu-pool.yaml

See GPU Pool Configuration for more advanced setups, including time-based schedules, preemption policies, per-node cache overrides, and Hugging Face token configuration.

Verify Control Plane Connection

Check the operator logs to confirm it connected to the Lilac control plane:

kubectl logs -n lilac-system deploy/lilac-gpu-operator

Look for a log line like:

INFO  control plane sync successful  cluster_id=abc123

Your cluster should also appear as Connected in the Lilac dashboard within 30 seconds.

Helm Values

Value	Required	Default	Description
`apiKey`	Yes	—	Supplier API key from the Lilac dashboard
`clusterName`	Yes	—	Human-readable name for your cluster
`controlPlaneUrl`	No	`https://api.getlilac.com`	Control plane URL
`disconnectTimeout`	No	`10m`	Time before cluster is marked disconnected
`image.tag`	No	Chart app version	Override the operator image tag
`resources.limits`	No	`cpu: 500m, memory: 128Mi`	Resource limits for the operator pod
`resources.requests`	No	`cpu: 10m, memory: 64Mi`	Resource requests for the operator pod

Upgrading

To upgrade your operator to the latest version, run:

helm upgrade --install lilac-gpu-operator \
  oci://public.ecr.aws/lilac/lilac-gpu-operator \
  --version 0.3.5 \
  --namespace lilac-system \
  --reuse-values

Uninstalling

helm uninstall lilac-gpu-operator --namespace lilac-system

Uninstalling the operator will drain all Lilac inference workloads from your cluster. Your own workloads are not affected.

Operator Installation

Prerequisites

Install with Helm

Verify Control Plane Connection

Helm Values

Upgrading

Uninstalling

Next Steps

Configure GPU Pools

How the Operator Works

​Prerequisites

​Install with Helm

​Verify Control Plane Connection

​Helm Values

​Upgrading

​Uninstalling

​Next Steps

Configure GPU Pools

How the Operator Works

Prerequisites

Install with Helm

Verify Control Plane Connection

Helm Values

Upgrading

Uninstalling

Next Steps