HiberScale technology provides hibernated nodes that can be resumed to reduce costs and startup delays. For more information, see HiberScale technology.
You can enable Kompass HiberScale technology for workloads by defining a WorkloadDescriptor
custom resource.
The WorkloadDescriptor references a workload and defines protection options - Headroom reduction (spike protection) and Spot protection - that control when HiberScale resumes hibernated nodes.
When protection is activated for a workload, HiberScale creates hibernated node pools for that workload, based on the QScaler CRD.
You can override the local nodepool settings, such as instance types and number of nodes as well as the global HiberScale settings as described in Advanced configuration and observability.
WorkloadDescriptor overview
A WorkloadDescriptor is a namespaced resource.
The main fields include:
workloadReference: Identifies the workload by apiVersion, kind, and name. Supported workload types include Deployment and StatefulSet.
protection: Defines conditions that trigger HiberScale for spike and Spot protection.
resources: Specifies the vCPU and memory reserved for hibernated nodes that protect the workload.
Example:
apiVersion: kompass.zesty.co/v1alpha1
kind: WorkloadDescriptor
metadata:
name: test-deployment
namespace: test
spec:
workloadReference:
apiVersion: apps/v1
kind: Deployment
name: test-deployment
protection:
spike:
active: true
threshold: "10%"
strategy: "default"
spot:
active: false
resources:
cpu: 8000m
memory: 8Gi
For a full reference of the fields, see WorkloadDescriptor field reference.
Lifecycle
After a WorkloadDescriptor is created, Kompass performs the following operations before the resource is activated:
Image size calculation: Kompass calculates the uncompressed size of container images using the
QCacheRevisionCreation
CRD. For the first WorkloadDescriptor, the calculation starts immediately. For later ones, the calculation respects the cooldown defined in the QubexConfig (revisionMinCreationInterval
, default 30 minutes).Shard updates: Kompass creates or updates
QCacheShard
CRs with image size and distribution details.QNode updates: For the first WorkloadDescriptor, Kompass creates new QNodes. For subsequent ones, existing QNodes wake up and download new images.
Active state: When QNodes reach
CacheLag=0
(all images downloaded), Kompass sets the WorkloadDescriptor state to Active. Pre-pulled images reduce startup delays when HiberScale resumes hibernated nodes.
The protection section contains options that define when HiberScale resumes hibernated nodes:
Spike protection (Headroom reduction): reacts to unschedulable Pods
Spot protection: reacts to Spot interruption notices
Spike protection
The spike
section specifies how Kompass manages unschedulable Pods. When active, it can adjust the HPA minReplicas
according to a strategy that you configure. For more information on how Kompass protects workloads, see Headroom reduction.
active: Determines whether the workload is protected for spikes and minimum replica management.
true: Kompass resumes hibernated nodes when the number of unschedulable Pods exceeds the threshold.
false: Kompass ignores unschedulable Pods and does not resume hibernated nodes.
threshold: The percentage of Pods in the workload that must be unschedulable before Kompass resumes hibernated nodes.
strategy: Defines how Kompass manages the HPA
minReplicas
value.manual: Kompass does not control
minReplicas
; the user manages it directly.default: Kompass sets
minReplicas
based on past usage to reduce costs without affecting SLAs.conservative: Similar to default, but uses more cautious calculations. This reduces savings but provides additional protection for sensitive workloads.
Argo CD
When using ArgoCD, configure it so that Kompass changes to the HPA are not treated as conflicts; otherwise, ArgoCD may revert Kompass adjustments.
Spot protection
The spot
section specifies how Kompass responds to Spot interruptions.
active: Determines whether the workload is protected for Spot interruptions. For more information on how Kompass protects workloads, see Spot management.
true: Kompass resumes hibernated nodes in response to interruption notices.
false: Kompass ignores interruption notices.
Resources
The resources
section defines CPU and memory values for hibernated nodes.
Values determine the resources allocated to the pool of hibernated nodes used to protect the workload.
Recommended vCPU and RAM allocation
It is recommended to allocate an amount of resources equal to the current Pod requests multiplied by the maximum number of Pods expected.
You can prevent workload protection in these ways:
Delete the WorkloadDescriptor.
This is preferred when using IaC tools.
Set both
spike.active=false
andspot.active=false
.
This prevents HiberScale from creating hibernated nodes without deleting the WorkloadDescriptor. If there are previously created hibernated nodes, they will be removed.
Deactivation
You can deactivate HiberScale by disabling the active
fields for both spike
and spot
.
For more information, see Resources.
Advanced configuration and observability
The following options provide additional configuration and integration. They are not required for basic operation but can be used to tune behavior or integrate with observability systems.
QScaler local setting overrides
The QScaler
CRD defines hibernated node pools. For each autoscaler or Karpenter nodepool that protects a workload, Kompass creates a corresponding QScaler object.
You can override the following settings from the upstream nodepool:
instanceTypes: compatible instance types that support hibernation
maxHibernatedQNodes: maximum number of hibernated nodes
maxRunningQNodes: maximum number of Kompass-created nodes
Example:
spec:
overrides:
instanceTypes:
- c5.xlarge
- c5.2xlarge
maxHibernatedQNodes:
type: Absolute
value: 10
maxRunningQNodes:
type: Absolute
value: 10
Sizing types for maximum values:
Absolute: fixed number of nodes
PercentOfPool: percentage of the maximum size of the upstream nodepool
PercentOfRunning: percentage of the current number of nodes in the upstream nodepool
Recommendation: For predictability, use Absolute sizing.
QubexConfig global setting overrides
The QubexConfig CRD defines global HiberScale settings.
You can override the default settings.
Common setting:
cache.revisionMinCreationInterval: Time to wait between checking for new images in the cluster.
Metrics ingestion
Kompass control plane components expose Prometheus metrics on a /metrics endpoint. These endpoints can be scraped by Prometheus to ingest Kompass metrics into an observability system.