Create Cluster

Pick your instance types; depending on the account you are using you might face limits.

export PC_MASTER=c5.xlarge
export PC_COMPUTE=c5n.18xlarge
export PC_MIN=3
export PC_MAX=3
export PC_MAINTAIN=true
  • PC_MASTER defines the instance type used for the head node of the resulting SLURM cluster. As it is not used to run workloads, we pick a smaller instance then the compute nodes.
  • PC_MASTER states the instance type used for the compute nodes.
  • PC_MIN sets the minimal amount of instances of the cluster. To speed up the submission we keep instances running. In a production environment, this might be set to 0.
  • PC_MAX; maximal size the cluster can scale out to.
  • PC_MAINTAIN configures maintain_initial_size. Set to true it won’t attempt to scale down the cluster.

Write configuration…

mkdir -p ~/.parallelcluster
cat > ~/.parallelcluster/config << EOF
[aws]
aws_region_name = ${REGION}

[global]
sanity_check = true
cluster_template = default
update_check = true

[aliases]
ssh = ssh {CFN_USER}@{MASTER_IP} {ARGS}

[vpc public]
vpc_id = ${VPC_ID}
master_subnet_id = ${SUBNET_ID}

[cluster default]
vpc_settings = public
key_name = lab-3-your-key
compute_instance_type=${PC_COMPUTE}
master_instance_type=${PC_MASTER}
initial_queue_size = ${PC_MIN}
max_queue_size = ${PC_MAX}
maintain_initial_size = ${PC_MAINTAIN}
scheduler=slurm
cluster_type = ondemand
s3_read_write_resource=arn:aws:s3:::${S3_BUCKET_NAME}*
placement_group = DYNAMIC
placement = compute
base_os = alinux2
tags = {"Name" : "pcluster-node"}
disable_hyperthreading = true
fsx_settings = fsxshared
enable_efa = compute
dcv_settings = hpc-dcv

[dcv hpc-dcv]
enable = master

[fsx fsxshared]
shared_dir = /fsx
storage_capacity = 1200
EOF

The following configuration creates a three node cluster and maintains its size. Please make sure you evaluate the configurations before you use this outside of this time-boxed workshop!

Configuration to evaluate:

  • master_subnet_id/vpc_id: Depending on the setup of Cloud9 or the instances you chose, you might need to change your SubnetID in order to connect to the correct AZ.
  • compute_instance_type/master_instance_type: Make sure you pick the right instances for your workload
  • *maintain_initial_size: If you want your cluster to scale down to zero, please change this to false to allow the scheduler to terminate instances. To prevent waiting time while instance start up, we configure this with true.

Create Cluster

Now that we configured ParallelCluster we are going to create a cluster. The second argument pc-fsx represents the name of the cluster you are creating. As we do not specify a configuration file (-c config), we are picking up the default configuration we created above.

pcluster create pc-fsx

The creation step will take around 10 minutes to finish.