A GPU pool is a custom resource that tells the operator which GPUs in your cluster are available for Lilac inference workloads. You control everything — which nodes, how many GPUs, what hours, and how preemption works.
Creating a GPU Pool
Apply a GPUPool resource to your cluster:
apiVersion: gpu.getlilac.com/v1alpha1
kind: GPUPool
metadata:
name: production-gpus
namespace: lilac-system
spec:
nodeSelector:
nvidia.com/gpu.product: B200
capacity:
maxGPUs: 8
maxUtilizationPct: 90
schedule:
mode: scheduled
timezone: America/New_York
windows:
- days: [mon, tue, wed, thu, fri]
start: "18:00"
end: "08:00"
- days: [sat, sun] # all day
preemption:
gracePeriod: 30s
priority: tenant
workloads:
inference: true
kubectl apply -f gpu-pool.yaml
Configuration Reference
nodeSelector
Standard Kubernetes label selector. Only nodes matching these labels are included in the pool.
nodeSelector:
nvidia.com/gpu.product: B200 # GPU model
topology.kubernetes.io/zone: us-east-1a # Optional: limit to a zone
capacity
Control how much of your GPU fleet Lilac can use.
| Field | Type | Description |
|---|
maxGPUs | integer | Maximum number of GPUs Lilac can use across all nodes |
maxUtilizationPct | integer (0–100) | Maximum percentage of matching GPUs Lilac can consume |
schedule
Define when GPUs are available for Lilac workloads.
| Mode | Behavior |
|---|
always | GPUs are always available (respecting capacity limits) |
scheduled | GPUs are only available during defined time windows |
schedule:
mode: scheduled
timezone: America/New_York
windows:
- days: [mon, tue, wed, thu, fri]
start: "18:00"
end: "08:00"
- days: [sat, sun] # all day — omit start/end
Use mode: always if you have dedicated GPUs that aren’t used for other workloads. Use mode: scheduled to share GPUs between your workloads (daytime) and Lilac (evenings/weekends).
preemption
Controls what happens when your workloads need GPUs back.
| Field | Type | Description |
|---|
gracePeriod | duration | Time given to inference pods to finish in-flight requests before termination |
priority | string | tenant means your workloads always take priority |
workloads
Toggle which workload types this pool accepts.
| Field | Type | Description |
|---|
inference | boolean | Allow inference workloads on this pool |
Multiple Pools
You can create multiple GPU pools for different hardware or schedules:
# Pool for A100 GPUs — always available
apiVersion: gpu.getlilac.com/v1alpha1
kind: GPUPool
metadata:
name: dedicated-a100s
namespace: lilac-system
spec:
nodeSelector:
nvidia.com/gpu.product: A100
capacity:
maxGPUs: 4
schedule:
mode: always
preemption:
gracePeriod: 30s
priority: tenant
workloads:
inference: true
Checking Pool Status
kubectl get gpupool -n lilac-system
NAME PHASE GPUS IDLE WORKLOADS AGE
production-gpus Active 8 6 3 2d
dedicated-a100s Active 4 4 2 1d
For detailed status:
kubectl describe gpupool production-gpus -n lilac-system