Apply and manage Compute solutions with a CRD

Prev Next

HiberScale technology provides hibernated nodes that can be resumed to reduce costs and startup delays. For more information, see HiberScale technology.

You can enable Kompass HiberScale technology for workloads by defining a WorkloadDescriptor custom resource.

The  WorkloadDescriptor references a workload and defines protection options - Headroom reduction (spike protection) and Spot protection - that control when HiberScale resumes hibernated nodes.

When protection is activated for a workload, HiberScale creates hibernated node pools for that workload, based on the QScaler CRD.

You can override the local nodepool settings, such as instance types and number of nodes as well as the global HiberScale settings as described in Advanced configuration and observability.

WorkloadDescriptor overview

A WorkloadDescriptor is a namespaced resource.

The main fields include:

  • workloadReference: Identifies the workload by apiVersion, kind, and name. Supported workload types include Deployment and StatefulSet.

  • protection: Defines conditions that trigger HiberScale for spike and Spot protection.

  • resources: Specifies the vCPU and memory reserved for hibernated nodes that protect the workload.

Example:

apiVersion: kompass.zesty.co/v1alpha1
kind: WorkloadDescriptor
metadata:
  name: test-deployment
  namespace: test
spec:
  workloadReference:
    apiVersion: apps/v1
    kind: Deployment
    name: test-deployment
  protection:
    spike:
      active: true
      threshold: "10%"
      strategy: "default"
    spot:
      active: false
  resources:
    cpu: 8000m
    memory: 8Gi

For a full reference of the fields, see WorkloadDescriptor field reference.

Lifecycle

After a WorkloadDescriptor is created, Kompass performs the following operations before the resource is activated:

  1. Image size calculation: Kompass calculates the uncompressed size of container images using the QCacheRevisionCreation CRD. For the first WorkloadDescriptor, the calculation starts immediately. For later ones, the calculation respects the cooldown defined in the QubexConfig (revisionMinCreationInterval, default 30 minutes).

  2. Shard updates: Kompass creates or updates QCacheShard CRs with image size and distribution details.

  3. QNode updates: For the first WorkloadDescriptor, Kompass creates new QNodes. For subsequent ones, existing QNodes wake up and download new images.

  4. Active state: When QNodes reach CacheLag=0 (all images downloaded), Kompass sets the WorkloadDescriptor state to Active. Pre-pulled images reduce startup delays when HiberScale resumes hibernated nodes.

The protection section contains options that define when HiberScale resumes hibernated nodes:

  • Spike protection (Headroom reduction): reacts to unschedulable Pods

  • Spot protection: reacts to Spot interruption notices

Spike protection

The spike section specifies how Kompass manages unschedulable Pods. When active, it can adjust the HPA minReplicas according to a strategy that you configure.  For more information on how Kompass protects workloads, see Headroom reduction.

  • active: Determines whether the workload is protected for spikes and minimum replica management.

    • true: Kompass resumes hibernated nodes when the number of unschedulable Pods exceeds the threshold.

    • false: Kompass ignores unschedulable Pods and does not resume hibernated nodes.

  • threshold: The percentage of Pods in the workload that must be unschedulable before Kompass resumes hibernated nodes.

  • strategy: Defines how Kompass manages the HPA minReplicas value.

    • manual: Kompass does not control minReplicas; the user manages it directly.

    • default: Kompass sets minReplicasbased on past usage to reduce costs without affecting SLAs.

    • conservative: Similar to default, but uses more cautious calculations. This reduces savings but provides additional protection for sensitive workloads.

Argo CD

When using ArgoCD, configure it so that Kompass changes to the HPA are not treated as conflicts; otherwise, ArgoCD may revert Kompass adjustments.

Spot protection

The spot section specifies how Kompass responds to Spot interruptions.

  • active: Determines whether the workload is protected for Spot interruptions. For more information on how Kompass protects workloads, see Spot management.

    • true: Kompass resumes hibernated nodes in response to interruption notices.

    • false: Kompass ignores interruption notices.

Resources

The resources section defines CPU and memory values for hibernated nodes.

  • Values determine the resources allocated to the pool of hibernated nodes used to protect the workload.

    Recommended vCPU and RAM allocation

    It is recommended to allocate an amount of resources equal to the current Pod requests multiplied by the maximum number of Pods expected.

  • You can prevent workload protection in these ways:

    • Delete the WorkloadDescriptor.

      This is preferred when using IaC tools.

    • Set both spike.active=false and spot.active=false.
      This prevents HiberScale from creating hibernated nodes without deleting the WorkloadDescriptor. If there are previously created hibernated nodes, they will be removed.

Deactivation

You can deactivate HiberScale by disabling the active fields for both spike and spot.

For more information, see Resources.

Advanced configuration and observability

The following options provide additional configuration and integration. They are not required for basic operation but can be used to tune behavior or integrate with observability systems.

QScaler local setting overrides

The QScaler CRD defines hibernated node pools. For each autoscaler or Karpenter nodepool that protects a workload, Kompass creates a corresponding QScaler object.

You can override the following settings from the upstream nodepool:

  • instanceTypes: compatible instance types that support hibernation

  • maxHibernatedQNodes: maximum number of hibernated nodes

  • maxRunningQNodes: maximum number of Kompass-created nodes

Example:

spec:
 overrides:
   instanceTypes:
   - c5.xlarge
   - c5.2xlarge
   maxHibernatedQNodes:
     type: Absolute
     value: 10
   maxRunningQNodes:
     type: Absolute
     value: 10

Sizing types for maximum values:

  • Absolute: fixed number of nodes

  • PercentOfPool: percentage of the maximum size of the upstream nodepool

  • PercentOfRunning: percentage of the current number of nodes in the upstream nodepool

Recommendation: For predictability, use Absolute sizing.

QubexConfig global setting overrides

The QubexConfig CRD defines global HiberScale settings.

You can override the default settings.

Common setting:

  • cache.revisionMinCreationInterval: Time to wait between checking for new images in the cluster.

Metrics ingestion

Kompass control plane components expose Prometheus metrics on a /metrics endpoint. These endpoints can be scraped by Prometheus to ingest Kompass metrics into an observability system.

Where to go from here