Install the Kompass Compute agent

This topic describes how to install the Kompass Compute agent in your cluster. The Compute agent enables the Spot management and Headroom reduction solutions.

The Kompass Compute agent includes these parts:

QScaler: Transforms traditional node scaling by maintaining a strategic pool of hibernated VMs. Instead of the time-consuming process of creating new VMs during demand spikes, QScaler instantly resumes pre-configured VMs from hibernation.
QCache: Revolutionizes container image management by pre-baking frequently used images into VM volumes. When a node starts, it has immediate access to required container images locally, eliminating registry download times during Pod startup.

Prerequisites

On the workstation
- AWS CLI SDK v2.x or later
  To determine the version, run this command:
  $ aws --version
  To determine that it is connected to your AWS account, run this command:
  $ aws sts get-caller-identity
- AWS permissions to create the AWS resources listed in Generated AWS resources
- Zesty-provided files:
  - Configuration file for any needed customization suggested by Zesty.
  - JSON file with environment-specific secrets.
    Keep this file safe, and consider deleting it after installation.
K8s connectivity
- Connected to the K8s cluster with admin privileges.
  To determine the current permissions, run this command:
  $ kubectl auth can-i '*' '*'
Cluster
- EKS v1.28 or later
- If you use Terraform, CloudFormation, Pulumi or other IaC tools to create or maintain your cluster, ensure that they do not overwrite or modify any of the Compute agent components in the qubex-system namespace. That includes any CRDs that the Compute agent generates.
- The Kompass Insights agent must be installed before installing the Kompass Compute agent.
  For more information, see Install the Kompass Insights agent.

To install the Compute agent:

Download the latest qubexctl installer, by running the following:
# OPTIONAL: set version to download
$ export QUBEXCTL_VERSION=0.11.1
# Run download script
$ curl -sfL https://qubex-assets.s3.eu-west-3.amazonaws.com/qubexctl-install.sh | bash
Install the Compute agent, by running this command:
$ ./qubexctl qscaler install aws --qubex-secret-path=secret.b64 -f qubex-config.yaml

Installation may take several minutes. In addition to installing, it also performs small-scale simulations to ensure the Compute agent works in your environment.

Limitations

Kompass does not support burstable T instance type families. If your node pool is configured with T burstable instances, Zesty will protect their workloads using alternative nodes that provide the same or higher levels of resources.

Resources created during installation

During installation, the following resource types are created:

AWS: During installation of the Compute agent, AWS resources are generated to facilitate its high availability & communication across disparate components of the platform.
Kubernetes: During installation of the Compute agent, K8s resources are generated to enable both ongoing operations as well as monitoring & support.
Accounts: DockerHub accounts are created to host container images.
Runtime resources: Other AWS and K8s resources created after the installation

These resources are described in the following sections.

Generated AWS resources

The following table describes these resources:

Resource type	Resource	Description
SQS queues	QScaler-spot-failures-<Cluster>-<Hash>	A queue used by the Compute agent to receive Spot Interruption messages from AWS
SQS queues	QScaler-resume-events-<Hash>	A queue used by the Compute agent for communications between hibernated nodes and the control plane regarding node resumptions
EventBridge Rules	QScalerSpotFailuresEventQueueName-<Hash>	This rule makes AWS send messages to QScaler-spot-failures for each spot interruption in the region
IAM policies	QScalerManageResumeQueuesPolicy-<Region>-<Cluster>
	QScalerReadFromResumeQueuePolicy-<Region>-<Cluster>
	QScalerReadFromSpotFailureQueuePolicy-<Region>-<Cluster>
	QScalerVmControlPolicy-<Region>-<Cluster>
	QScalerReadFromECRPolicy-<Region>-<Cluster>
	QScalerControlFisPolicy-<Region>-<Cluster>
IAM roles	QScalerControllerRole-<Hash>	Used by the QScaler controller
	QScalerNodeRole-<Hash>	Used by the hibernated node VMs, and also has the required permission to register as a node in the cluster
	QScalerNode-<Region>-<Cluster>	An InstanceProfile for this role
Security Groups	qubex-vpc-endpoint-sg-<CLUSTER_NAME>	Security group that limits access to the QCache interface endpoint (the ECR pull through)
ECR	qubex-s3-vpc-endpoint	S3 VPC interface endpoint, optional component (if the VPC has an S3 gateway endpoint) used to reduce the costs of accessing the network
	ECR pull through rules	These rules reduce repeated pulling of container images from the network (and the resultant high network costs): qubex-public-ecr qubex-public-k8s qubex-public-quay qubex-public-ghcr qubex-public-dockerhub
Secrets	ecr-pullthroughcache/ghcr-<CLUSTER_NAME>-<RANDOM>	Used by ECR pull through rule to access public GHCR repositories
	ecr-pullthroughcache/dockerhub-<CLUSTER_NAME>-<RANDOM>	Used by ECR pull through rule to access public Docker repositories

Generated K8s resources

The following table lists these resources:

Resource	Item
Telemetry	qubex-opentelemetry-collector qubex-qscaler-kpi-collector qubex-qscaler-metrics qubex-vector-non-qubex qubex-vector-qnodes
Control pane	qubex-qscaler-container-cache qubex-qscaler-controller
RBAC	qubex-opentelemetry-collector qubex-qscaler-container-cache qubex-qscaler-controller qubex-vector Snapshoter

Additional created resources

The following table describes additional resources that are created:

Resource	Description
Accounts	Accounts are created in Docker Hub, used by the QCache pull through to cache images pulled from DockerHub
Runtime resources	EC2 instances reflecting the hibernated nodes created by the control plane SQS queues used for communications between the hibernated nodes, the control plane, and AWS EC2 (for spot interruptions)

Collected logs

The following table shows logs that are collected:

Log type	Collected logs
From all nodes	Kernel logs kubelet logs containerd logs docker logs aws-node DaemonSet logs kube-proxy DaemonSet logs Compute agent internal logs
From Hibernated nodes	nodeadm-config logs nodeadm-run logs systemd-hibernate logs /var/log/cloud-init-output.log file /var/log/node-agent.log file /var/log/pm-suspend.log file Filesystem metrics (to monitor QCache)