Launch the EMR cluster with Zesty Disk
  • 2 Minutes to read
  • PDF

Launch the EMR cluster with Zesty Disk

  • PDF

Article summary

This topic describes how to launch an Amazon EMR cluster using a bootstrap action script, with either the AWS CLI or through SDK. The bootstrap action (located on an Amazon S3 bucket) is required in order to install the Zesty Disk Agent on the EMR cluster nodes.

Bootstrap actions are scripts that run on EMR cluster nodes before any other application starts. They can be used to install additional software or configure settings.

Prerequisites

To launch the EMR cluster with Zesty Disk:

If running the command from CLI, edit or add the following attributes:

  • “instance-groups”

    Choose the updated file with the initial EBS size, described in Set up an EMR cluster.

  • “custom-ami-id”

    Write the ID of the compatible AMI that was created, described in Create a compatible AMI.

  • “tags”

    Add “ZestyDisk=True” so Zesty Disk can identify the EBSs to be managed.

  • “bootstrap-actions”

    Choose the file that includes the Zesty Disk Agent installation and enter your API key in the script.

The following is an example bash script that launches an EMR cluster based on the custom AMI:

AWS_PROFILE=<sso> aws emr create-cluster 
--name "<cluster-name>" \
--release-label emr-<version> \
--use-default-roles \
--enable-debugging \
--ec2-attributes KeyName=<key-pair> \
--applications Name=<app-name> \
--instance-groups file://<filename.json> \
--log-uri=s3:<bucket_path> \
--region us-east-1 \
--custom-ami-id ami-<ID> \
--tags ZestyDisk=True EMR=True \
--bootstrap-actions Path=s3://static.zesty.co/emr/demo/bootstrap-collector.sh,Name=InstallZestyCollector,Args=<API_key>

The following table describes the script commands and arguments:

Command/argument

Description

Example

AWS_PROFILE=<sso>

Sets the environment variable AWS_PROFILE to <sso>.

This tells the AWS CLI to use the specified AWS profile for authentication. This profile should be configured with appropriate credentials and permissions in your AWS configuration files.

AWS_PROFILE=sso_rnd

aws emr create-cluster

This is the AWS CLI command to create a new EMR cluster.

--name "<cluster-name>"

Specifies the name of the EMR cluster.

--name "zesty_cluster"

--release-label emr-<version>

Defines the EMR release version to use.

--release-label emr-6.1.0

--use-default-roles

Instructs AWS to use the default roles for EMR, which typically include roles for the EMR service and the EC2 instances to access other AWS services as required.

--enable-debugging

Enables the configuration of debugging options.

--ec2-attributes KeyName=<key-pair>

Specifies EC2 instance settings. In this example, it sets the KeyName to the value of your key pair used to SSH into the EC2 instances that are part of the EMR cluster.

--ec2-attributes KeyName=rnd

--applications Name=<app-name>

Defines the application to install on the cluster.

--applications Name=Hadoop

--instance-groups file://<filename.json>

Points to a local JSON file that specifies the configuration of EC2 instance groups in the cluster, such as the number and type of instances, whether they are on-demand or spot, their roles (master, core, and task nodes), etc.

--instance-groups file://instance_groups_config.json

--log-uri s3:<bucket_path>

Specifies the S3 bucket where the EMR cluster's logs will be stored.

--log-uri s3://company-devops-sso/emr-logs/

--region <launch_region>

Specifies the AWS region where the cluster should be launched.

--region us-east-1:

--custom-ami-id ami-<ID>

Specifies a custom AMI ID for the EC2 instances in the cluster. Use the ID of the AMI that you created for this purpose, as described in Set up an EMR cluster.

--custom-ami-id ami-288dnh210033

--tags ZestyDisk=True EMR=True

Applies the specified tags to the EMR cluster, which can be useful for tracking billing or managing resources by groups. The ZestyDisk tag is mandatory.

--bootstrap-actions Path=s3://static.zesty.co/emr/demo/bootstrap-collector.sh,Name=InstallZestyCollector,Args=<API_key>

Defines a bootstrap action script that will be executed on all cluster nodes before the specified application is started.

This script is located at s3://static.zesty.co/emr/demo/bootstrap-collector.sh

--bootstrap-actions Path=s3://static.zesty.co/emr/demo/bootstrap-collector.sh,Name=InstallZestyCollector,Args=12345678ysunnqwbiy392nlkni0Ujs00pw0okq31


Was this article helpful?