Install the Zesty K8s Scaler
  • 3 Minutes to read
  • PDF

Install the Zesty K8s Scaler

  • PDF

Article summary

This topic describes how to install the Zesty K8s scaler (“Scaler”) in your cluster. The Scaler enables the Spot management and Headroom reduction solutions.

The Scaler includes these parts:

  • QScaler: Transforms traditional node scaling by maintaining a strategic pool of hibernated VMs. Instead of the time-consuming process of creating new VMs during demand spikes, QScaler instantly resumes pre-configured VMs from hibernation.

  • QCache: Revolutionizes container image management by pre-baking frequently used images into VM volumes. When a node starts, it has immediate access to required container images locally, eliminating registry download times during Pod startup.

The Scaler powers the Kompass Spot management and Headroom reduction solutions.

Prerequisites

  • On the workstation

    • AWS CLI SDK v2.x or later
      To determine the version, run this command:
      $ aws --version
      To determine that it is connected to your AWS account, run this command:
      $ aws sts get-caller-identity

    • AWS permissions to create the AWS resources listed in Generated AWS resources

    • Zesty-provided files:

      • Configuration file for any needed customization suggested by Zesty.

      • JSON file with environment-specific secrets.
        Keep this file safe, and consider deleting it after installation.

  • K8s connectivity

    • Connected to the K8s cluster with admin privileges.
      To determine the current permissions, run this command:
      $ kubectl auth can-i '*' '*'

  • Cluster

    • EKS v1.28 or later

    • If you use Terraform, CloudFormation, Pulumi or other IaC tools to create or maintain your cluster, ensure that they do not overwrite or modify any of the K8s Scaler components in the qubex-system namespace. That includes any CRDs that the Scaler generates.

    • The K8s Agent must be installed before installing the Scaler.
      For more information, see Install the K8s Agent.

To install the Scaler:

  1. Download the latest qubexctl installer, by running the following:
    # OPTIONAL: set version to download

    $ export QUBEXCTL_VERSION=0.9.1

    # Run download script

    $ curl -sfL https://qubex-assets.s3.eu-west-3.amazonaws.com/qubexctl-install.sh | bash

  2. Install the Scaler, by running this command:
    $ ./qubexctl qscaler install aws --qubex-secret-path=secret.b64 -f qubex-config.yaml

Installation may take several minutes. In addition to installing, it also performs small-scale simulations to ensure the Scaler works in your environment.

Limitations

Kompass does not support burstable T instance type families. If your node pool is configured with T burstable instances, Zesty will protect their workloads using alternative nodes that provide the same or higher levels of resources.

Resources created during installation

During installation, the following resource types are created:

  • AWS: During installation of the Scaler, AWS resources are generated  to facilitate its high availability & communication across disparate components of the platform.

  • Kubernetes: During installation of the Scaler, K8s resources are generated to enable both ongoing operations as well as monitoring & support.

  • Accounts: DockerHub accounts are created to host container images.

  • Runtime resources: Other AWS and K8s resources created after the installation

These resources are described in the following sections.

Generated AWS resources

The following table describes these resources:

Resource type

Resource

Description

SQS queues

QScaler-spot-failures-<Cluster>-<Hash>

A queue used by the K8s Scaler to receive Spot Interruption messages from AWS

QScaler-resume-events-<Hash>

A queue used by the K8s Scaler for communications between Hibernated nodes and the control plane regarding node resumptions

EventBridge Rules

QScalerSpotFailuresEventQueueName-<Hash>

This rule makes AWS send messages to QScaler-spot-failures for each spot interruption in the region

IAM policies

QScalerManageResumeQueuesPolicy-<Region>-<Cluster>

QScalerReadFromResumeQueuePolicy-<Region>-<Cluster>

QScalerReadFromSpotFailureQueuePolicy-<Region>-<Cluster>

QScalerVmControlPolicy-<Region>-<Cluster>

QScalerReadFromECRPolicy-<Region>-<Cluster>

QScalerControlFisPolicy-<Region>-<Cluster>

IAM roles

QScalerControllerRole-<Hash>

Used by the QScaler controller

QScalerNodeRole-<Hash>

Used by the Hibernated node VMs, and also has the required permission to register as a node in the cluster

QScalerNode-<Region>-<Cluster>

An InstanceProfile for this role

Security Groups

qubex-vpc-endpoint-sg-<CLUSTER_NAME>

Security group that limits access to the QCache interface endpoint (the ECR pull through)

ECR

qubex-s3-vpc-endpoint

S3 VPC interface endpoint, optional component (if the VPC has an S3 gateway endpoint) used to reduce the costs of accessing the network

ECR pull through rules

These rules reduce repeated pulling of container images from the network (and the resultant high network costs):

  • qubex-public-ecr

  • qubex-public-k8s

  • qubex-public-quay

  • qubex-public-ghcr

  • qubex-public-dockerhub

Secrets

ecr-pullthroughcache/ghcr-<CLUSTER_NAME>-<RANDOM>

Used by ECR pull through rule to access public GHCR repositories

ecr-pullthroughcache/dockerhub-<CLUSTER_NAME>-<RANDOM>

Used by ECR pull through rule to access public Docker repositories

Generated K8s resources

The following table lists these resources:

Resource

Item

Telemetry

  • qubex-opentelemetry-collector

  • qubex-qscaler-kpi-collector

  • qubex-qscaler-metrics

  • qubex-vector-non-qubex

  • qubex-vector-qnodes

Control pane

  • qubex-qscaler-container-cache

  • qubex-qscaler-controller

RBAC

  • qubex-opentelemetry-collector

  • qubex-qscaler-container-cache

  • qubex-qscaler-controller

  • qubex-vector

  • Snapshoter

Additional created resources

The following table describes additional resources that are created:

Resource

Description

Accounts

Accounts are created in Docker Hub, used by the QCache pull through to cache images pulled from DockerHub

Runtime resources

  • EC2 instances reflecting the hibernated nodes created by the control plane

  • SQS queues used for communications between the hibernated nodes, the control plane, and AWS EC2 (for spot interruptions)

Collected logs

The following table shows logs that are collected:

Log type

Collected logs

From all nodes

  • Kernel logs

  • kubelet logs

  • containerd logs

  • docker logs

  • aws-node DaemonSet logs

  • kube-proxy DaemonSet logs

  • Scaler agent internal logs

From Hibernated nodes

  • nodeadm-config logs

  • nodeadm-run logs

  • systemd-hibernate logs

  • /var/log/cloud-init-output.log file

  • /var/log/node-agent.log file

  • /var/log/pm-suspend.log file

  • Filesystem metrics (to monitor QCache)


Was this article helpful?