Skip to main content
A GPU pool is a custom resource that tells the operator which GPUs in your cluster are available for Lilac inference workloads. You control everything — which nodes, how many GPUs, what hours, and how preemption works.

Creating a GPU Pool

Apply a GPUPool resource to your cluster:
apiVersion: gpu.getlilac.com/v1alpha1
kind: GPUPool
metadata:
  name: production-gpus
  namespace: lilac-system
spec:
  nodeSelector:
    nvidia.com/gpu.product: B200
  capacity:
    maxGPUs: 8
    maxUtilizationPct: 90
  schedule:
    mode: scheduled
    timezone: America/New_York
    windows:
      - days: [mon, tue, wed, thu, fri]
        start: "18:00"
        end: "08:00"
      - days: [sat, sun]    # all day
  preemption:
    gracePeriod: 30s
    priority: tenant
  workloads:
    inference: true
kubectl apply -f gpu-pool.yaml

Configuration Reference

nodeSelector

Standard Kubernetes label selector. Only nodes matching these labels are included in the pool.
nodeSelector:
  nvidia.com/gpu.product: B200    # GPU model
  topology.kubernetes.io/zone: us-east-1a  # Optional: limit to a zone

capacity

Control how much of your GPU fleet Lilac can use.
FieldTypeDescription
maxGPUsintegerMaximum number of GPUs Lilac can use across all nodes
maxUtilizationPctinteger (0–100)Maximum percentage of matching GPUs Lilac can consume

schedule

Define when GPUs are available for Lilac workloads.
ModeBehavior
alwaysGPUs are always available (respecting capacity limits)
scheduledGPUs are only available during defined time windows
schedule:
  mode: scheduled
  timezone: America/New_York
  windows:
    - days: [mon, tue, wed, thu, fri]
      start: "18:00"
      end: "08:00"
    - days: [sat, sun]    # all day — omit start/end
Use mode: always if you have dedicated GPUs that aren’t used for other workloads. Use mode: scheduled to share GPUs between your workloads (daytime) and Lilac (evenings/weekends).

preemption

Controls what happens when your workloads need GPUs back.
FieldTypeDescription
gracePerioddurationTime given to inference pods to finish in-flight requests before termination
prioritystringtenant means your workloads always take priority

workloads

Toggle which workload types this pool accepts.
FieldTypeDescription
inferencebooleanAllow inference workloads on this pool

Multiple Pools

You can create multiple GPU pools for different hardware or schedules:
# Pool for A100 GPUs — always available
apiVersion: gpu.getlilac.com/v1alpha1
kind: GPUPool
metadata:
  name: dedicated-a100s
  namespace: lilac-system
spec:
  nodeSelector:
    nvidia.com/gpu.product: A100
  capacity:
    maxGPUs: 4
  schedule:
    mode: always
  preemption:
    gracePeriod: 30s
    priority: tenant
  workloads:
    inference: true

Checking Pool Status

kubectl get gpupool -n lilac-system
NAME              PHASE     GPUS   IDLE   WORKLOADS   AGE
production-gpus   Active    8      6      3           2d
dedicated-a100s   Active    4      4      2           1d
For detailed status:
kubectl describe gpupool production-gpus -n lilac-system