Skip to main content
Monitor your cluster’s connection status and workloads using the Lilac dashboard and Kubernetes tools.

Dashboard

The Clusters section of the Lilac console shows:

Cluster Status

StatusMeaning
ConnectedOperator is syncing normally with the control plane
DegradedSync has failed recently — operator will retry on next cycle
DrainingDisconnected for 10+ minutes — inference pods are being gracefully removed

Workload Activity

  • Desired vs. reported workloads per pool
  • Workload details: GPU count, pod phase, ready status, restart counts
  • Draining workload counts
  • Model assignments (which models are running on which nodes)

GPU Allocation

  • Per-pool breakdown of tenant vs. Lilac GPU usage
  • Node details (GPU product, total GPUs)
  • Last sync timestamps and lease expiration
Detailed usage statistics (tokens processed, revenue earned) are sent in your monthly report. See Revenue & Payouts for details.

Kubernetes Monitoring

Check Pool Status

kubectl get gpupool -n lilac-system

View Operator Logs

kubectl logs -n lilac-system deploy/lilac-gpu-operator-lilac-gpu-operator -f
Key log events to watch:
EventMeaning
control plane sync successfulNormal sync completed
workload createdNew inference pod deployed
preemption triggeredGPUs being reclaimed for your workloads
workload drainedInference pod gracefully removed
sync failedControl plane unreachable — will retry

View Running Inference Pods

kubectl get pods -n lilac-system -l app.kubernetes.io/managed-by=lilac

Kubernetes Events

The operator emits Kubernetes events for key state transitions:
EventDescription
PoolCleanedUpAll managed workloads deleted from pool
ControlPlaneDegradedControl plane stopped responding
ControlPlaneDisconnectedDisconnect timeout elapsed, draining workloads
WorkloadPreemptedWorkload evicted after grace period
WorkloadDrainingDraining began (includes reason and grace period)
View events:
kubectl get events -n lilac-system --sort-by='.lastTimestamp'