Apply and manage Headroom reduction and Spot management with CRD

HiberScale technology provides hibernated nodes that can resume within seconds to reduce application start times by up to 60%. This acceleration allows Kompass to safely lower the minimum replica count for workloads, reducing the number of Pods and nodes required while maintaining application responsiveness and availability.

The result is lower infrastructure costs without compromising workload performance. For more information, see HiberScale technology.

You can enable Kompass HiberScale technology for workloads by defining a WorkloadDescriptor custom resource.

The WorkloadDescriptor references a workload and defines protection options - Headroom reduction (spike protection) and Spot protection - that control when HiberScale resumes hibernated nodes.

When protection is activated for a workload, HiberScale creates hibernated node pools for that workload, based on the QScaler CRD. Kompass manages these pools through QScaler custom resources that correspond to your cluster's autoscaler or Karpenter nodepools.

You can override the local nodepool settings, such as instance types and number of nodes as well as the global HiberScale settings as described in Advanced configuration and observability.

WorkloadDescriptor overview

A WorkloadDescriptor is a namespaced resource.

The main fields include:

workloadReference: Identifies the workload by apiVersion, kind, and name. Supported workload types include Deployment and StatefulSet.
protection: Defines conditions that trigger HiberScale for spike and Spot protection.
resources: Specifies the vCPU and memory reserved for hibernated nodes that protect the workload.

Example:

apiVersion: kompass.zesty.co/v1alpha1
kind: WorkloadDescriptor
metadata:
  name: test-deployment
  namespace: test
spec:
  workloadReference:
    apiVersion: apps/v1
    kind: Deployment
    name: test-deployment
  protection:
    spike:
      active: true
      threshold: "10%"
      strategy: "default"
    spot:
      active: false
  draining:
    method: regular # accelerated (default)|regular 
    threshold: 10% # percentage|integer value 
  resources:
    cpu: 8000m
    memory: 8Gi

For a full reference of the fields, see WorkloadDescriptor field reference.

The protection section contains options that define when HiberScale resumes hibernated nodes:

Spike protection (Headroom reduction): reacts to unschedulable Pods
Spot protection: reacts to Spot interruption notices

Spike protection

The spike section specifies how Kompass manages unschedulable Pods. When active, it can adjust the HPA minReplicas according to a strategy that you configure. For more information on how Kompass protects workloads, see Headroom reduction.

active: Determines whether the workload is protected for spikes and minimum replica management.
- true: Kompass resumes hibernated nodes when the number of unschedulable Pods exceeds the threshold.
- false: Kompass ignores unschedulable Pods and does not resume hibernated nodes.
threshold: The percentage of Pods in the workload that must be unschedulable before Kompass resumes hibernated nodes.
strategy: Defines how Kompass manages the HPA minReplicas value.
- manual: Kompass does not control minReplicas; the user manages it directly.
- default: Kompass sets minReplicasbased on past usage to reduce costs without affecting SLAs.
- conservative: Similar to default, but uses more cautious calculations. This reduces savings but provides additional protection for sensitive workloads.
draining: Controls how Kompass evicts Pods from its nodes during the drain process.
- method: Selects the drain strategy.
  - accelerated: Evicts Pods without threshold control.
  - regular: Uses threshold-controlled eviction.
- threshold: Maximum amount of simultaneous unscheduled Pods. Accepts percentage values (e.g., 10%) or integer values (e.g., 5). Applies only when the method is regular.

Argo CD
When using ArgoCD, configure it so that Kompass changes to the HPA are not treated as conflicts; otherwise, ArgoCD may revert Kompass adjustments.

Spot protection

The spot section specifies how Kompass responds to Spot interruptions.

active: Determines whether the workload is protected for Spot interruptions. For more information on how Kompass protects workloads, see Spot management.
- true: Kompass resumes hibernated nodes in response to interruption notices.
- false: Kompass ignores interruption notices.

Resources

The resources section defines CPU and memory values for hibernated nodes.

Values determine the resources allocated to the pool of hibernated nodes used to protect the workload.
Recommended vCPU and RAM allocation
It is recommended to allocate an amount of resources equal to the current Pod requests multiplied by the maximum number of Pods expected.
You can prevent workload protection in these ways:
- Delete the WorkloadDescriptor.
  This is preferred when using IaC tools.
- Set both spike.active=false and spot.active=false.
  This prevents HiberScale from creating hibernated nodes without deleting the WorkloadDescriptor. If there are previously created hibernated nodes, they will be removed.

Deactivation

You can deactivate HiberScale by disabling the active fields for both spike and spot.

For more information, see Resources.

Advanced configuration and observability

The following options provide additional configuration and integration. They are not required for basic operation but can be used to tune behavior or integrate with observability systems.

QScaler local setting overrides

The QScaler CRD defines hibernated node pools. For each autoscaler or Karpenter nodepool that protects a workload, Kompass creates a corresponding QScaler object.

You can override the following settings from the upstream nodepool:

instanceTypes: compatible instance types that support hibernation
maxHibernatedQNodes: maximum number of hibernated (Kompass) nodes
maxRunningQNodes: maximum number of Kompass-created nodes

Example:

spec:
 overrides:
   instanceTypes:
   - c5.xlarge
   - c5.2xlarge
   maxHibernatedQNodes:
     type: Absolute
     value: 10
   maxRunningQNodes:
     type: Absolute
     value: 10

Sizing types for maximum values:

Absolute: fixed number of nodes
PercentOfPool: percentage of the maximum size of the upstream nodepool
PercentOfRunning: percentage of the current number of nodes in the upstream nodepool

Recommendation: For predictability, use Absolute sizing.

QubexConfig global setting overrides

The QubexConfig CRD defines global HiberScale settings.

You can override the default settings after installation in the QubexConfig CR:

kubectl edit qubexconfig qubex-config -n <namespace>

Common settings:

cacheConfig.revisionMinCreationInterval: Time to wait between checking for new images in the cluster.
drainingConfig.scaleInProtectionDuration: Before this time elapses, machines will not be cordoned.
drainingConfig.drainGracePeriod: The grace period of self-managed drain before forced termination.

Metrics ingestion

Kompass control plane components expose Prometheus metrics on a /metrics endpoint. These endpoints can be scraped by Prometheus to ingest Kompass metrics into an observability system.