FAQ | Troubleshooting | Glossary
slurm-gcp
is an open-source software solution that enables setting up
Slurm clusters on
Google Cloud Platform with ease. With it, you can
create and manage Slurm cluster infrastructure in
GCP, deployed in different configurations.
Google's HPC Toolkit, on github, can be used to manage and deploy Slurm clusters and other supporting infrastrucutre via HPC Blueprints.
See supported Operating Systems.
SchedMD provides professional services and commercial support to help you get up and running and stay running.
Issues and/or enhancement requests can be submitted to SchedMD's Bugzilla.
Also, join comunity discussions on either the Slurm User mailing list or the Google Cloud & Slurm Community Discussion Group.
slurm-gcp
can be deployed and used in different configurations and methods to
meet your computing needs.
See HPC Blueprints for HPC Toolkit example cluster configurations that are production ready.
All Slurm cluster resources will exist in the cloud.
See the Cloud Cluster Guide for details.
Only Slurm compute nodes will exist in the cloud. The Slurm controller and other Slurm components will remain in the onprem environment.
See the Hybrid Cluster Guide for details.
Two or more clusters are connected, allowing for jobs to be submitted from and ran on different clusters. This can be a mix between onprem and cloud clusters.
See the Federated Cluster Guide for details.
See the Upgrade to v5 Guide for details.
slurm-gcp
.Please reach out to us here. We will be happy to support you!
I skimmed over the fixes (there were no ambiguous ones) -- seems all good but please double check ;)
bandwidth_tier
is missing on the partition
variable of the singularity
example v4
module. As a result, the following error comes up:
╷
│ Error: Invalid value for input variable
│
│ on main.tf line 35, in module "slurm_cluster_network":
│ 35: partitions = var.partitions
│
│ The given value is not suitable for
│ module.slurm_cluster_network.var.partitions declared at
│ ../../modules/network/io.tf:70,1-22: incorrect list element type: attribute
│ "bandwidth_tier" is required.
╵
╷
│ Error: Invalid value for input variable
│
│ on main.tf line 62, in module "slurm_cluster_controller":
│ 62: partitions = var.partitions
│
│ The given value is not suitable for
│ module.slurm_cluster_controller.var.partitions declared at
│ ../../modules/controller/io.tf:149,1-22: incorrect list element type:
│ attribute "bandwidth_tier" is required.
╵
╷
│ Error: Invalid value for input variable
│
│ on main.tf line 113, in module "slurm_cluster_compute":
│ 113: partitions = var.partitions
│
│ The given value is not suitable for
│ module.slurm_cluster_compute.var.partitions declared at
│ ../../modules/compute/io.tf:59,1-22: incorrect list element type: attribute
│ "bandwidth_tier" is required.
╵
ERRO[0037] 1 error occurred:
* exit status 1