When you activate Spot management on a workload, the following automatic steps ensure protection:
Kompass ensures that the workload and Karpenter configurations enable using Spot nodes, modifying configurations if necessary.
Hibernated nodes start warming up.
For more information about hibernated nodes, see HiberScale technology.Interruption protection begins immediately for all Pods from protected workloads hosted on Spot instance nodes.
When AWS notifies about a Spot interruption, Kompass reactivates pre-baked hibernated nodes to replace the interrupted nodes.
At the same time, Karpenter evicts the Pods running on the current nodes and launches new nodes.
(Pods are evicted according to the limits in the PDB.)
When the Kompass nodes are ready to host Pods, Pods are scheduled to those nodes.
5 minutes after being reactivated, Kompass hibernated nodes are cordoned.
Kompass nodes are gradually drained, until all Pods are hosted by autoscaler nodes.
To ensure smooth draining, the number of Pods may exceed the number that were running before the interruption.
When Kompass nodes are empty (no more Pods being hosted), the nodes are terminated.
The following diagram illustrates the process:
Where to go from here