How Spot management works

When you activate Spot management on a workload, the following automatic steps ensure protection:

Kompass ensures that the workload and Karpenter configurations enable using Spot nodes, modifying configurations if necessary.
Hibernated nodes start warming up.
For more information about hibernated nodes, see HiberScale technology.
Interruption protection begins immediately for all Pods from protected workloads hosted on Spot instance nodes.
When AWS notifies about a Spot interruption, Kompass reactivates pre-baked hibernated nodes to replace the interrupted nodes.
At the same time, Karpenter evicts the Pods running on the current nodes and launches new nodes.
(Pods are evicted according to the limits in the PDB.)
When the Kompass nodes are ready to host Pods, Pods are scheduled to those nodes.
5* minutes after being reactivated, Kompass hibernated nodes are cordoned.
Kompass nodes are gradually drained, until all Pods are hosted by autoscaler nodes.
To ensure smooth draining, the number of Pods may exceed the number that were running before the interruption.
The Kompass nodes are terminated when they are no longer hosting Pods, or 60* minutes after being cordoned, whichever is first.
* You can configure the time period in the QubexConfig CR. For more information, see Apply and manage Headroom reduction and Spot management with CRD.

The following diagram illustrates the process:

See also: