


{"id":352244,"date":"2024-02-19T09:10:38","date_gmt":"2024-02-19T09:10:38","guid":{"rendered":"\/forum\/forums\/topic\/running-a-test-job-on-a-slurm-cluster\/"},"modified":"2024-07-03T07:57:01","modified_gmt":"2024-07-03T07:57:01","slug":"running-a-test-job-on-a-slurm-cluster","status":"closed","type":"topic","link":"https:\/\/innovationspace.ansys.com\/forum\/forums\/topic\/running-a-test-job-on-a-slurm-cluster\/","title":{"rendered":"Running a test job on a SLURM Cluster"},"content":{"rendered":"<p>This guide will help you validate that the cluster resources have been deployed successfully on Ansys Gateway.<br \/>Please follow the recommended HPC cluster configurations by application in the documentation page to setup your resources: <a href=\"https:\/\/ansyshelp.ansys.com\/account\/secured?returnurl=\/Views\/Secured\/CSP\/v000\/en\/gateway_ru\/ru\/recommended_configurations_by_application.html\">Recommended Configurations by Application (ansys.com)<\/a><br \/>Cluster Requirements:&nbsp;<\/p>\n<p>To configure an HPC workflow that uses a Slurm cluster the following resources are required in the following order :&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<table class=\"MsoTableGrid\" style=\"border-collapse: collapse;border: none;width: 94.7972%\" border=\"1\" cellspacing=\"0\" cellpadding=\"0\">\n<tbody>\n<tr>\n<td style=\"width: 58.572%;border: 1pt solid windowtext;padding: 0cm 5.4pt\" valign=\"top\" width=\"312\">\n<p class=\"MsoNormal\" style=\"line-height: normal\"><span style=\"font-size: 12.0pt;font-family: 'Times New Roman',serif\">&nbsp;<\/span><\/p>\n<ol start=\"1\" type=\"1\">\n<li class=\"MsoNormal\" style=\"line-height: normal\"><a title=\"NFS file storage server with a shared folder.\" href=\"https:\/\/ansyshelp.ansys.com\/Views\/Secured\/gateway\/v000\/en\/gateway_ru\/config_slurm.html#nfs_storage_slurm\"><strong><span style=\"font-size: 12.0pt;font-family: 'Times New Roman',serif;color: blue\">NFS file storage server with a shared folder.<\/span><\/strong><\/a><\/li>\n<li class=\"MsoNormal\" style=\"line-height: normal\"><a title=\"Linux Virtual Desktop with Slurm Controller and KDE\" href=\"https:\/\/ansyshelp.ansys.com\/Views\/Secured\/gateway\/v000\/en\/gateway_ru\/config_slurm.html#vdi_slurm_kde\"><strong><span style=\"font-size: 12.0pt;font-family: 'Times New Roman',serif;color: blue\">Linux Virtual Desktop with Slurm Controller and KDE<\/span><\/strong><\/a><\/li>\n<li class=\"MsoNormal\" style=\"line-height: normal\"><a title=\"HPC Cluster with Slurm Node with NFS and EFA\" href=\"https:\/\/ansyshelp.ansys.com\/Views\/Secured\/gateway\/v000\/en\/gateway_ru\/config_slurm.html#slurm_cluster_node\"><strong><span style=\"font-size: 12.0pt;font-family: 'Times New Roman',serif;color: blue\">HPC Cluster with Slurm Node with NFS and EFA<\/span><\/strong><\/a><\/li>\n<\/ol>\n<\/td>\n<td style=\"width: 41.4174%;border-top: 1pt solid windowtext;border-right: 1pt solid windowtext;border-bottom: 1pt solid windowtext;border-left: none;padding: 0cm 5.4pt\" valign=\"top\" width=\"312\">\n<p class=\"MsoNormal\" style=\"line-height: normal\"><span style=\"font-size: 12.0pt;font-family: 'Times New Roman',serif;color: #2b579a;background: #E6E6E6\"><a class=\"wp-colorbox-image cboxElement\" href=\"\/forum\/wp-content\/uploads\/sites\/2\/2024\/02\/19-02-2024-1708333331-mceclip0.png\"><img decoding=\"async\" src=\"\/forum\/wp-content\/uploads\/sites\/2\/2024\/02\/19-02-2024-1708333331-mceclip0.png\"><\/a><br \/><!--[endif]--><\/span><\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Please ensure that all resources are ready and <strong>RUNNING<\/strong> before proceeding.<\/p>\n<table class=\"MsoTableGrid\" style=\"border-collapse: collapse;border: none;width: 94.709%\" border=\"1\" cellspacing=\"0\" cellpadding=\"0\">\n<tbody>\n<tr>\n<td style=\"width: 50.7315%;border: 1pt solid windowtext;padding: 0cm 5.4pt\" valign=\"top\" width=\"312\">\n<p class=\"MsoNormal\" style=\"line-height: normal\"><span style=\"color: #2b579a;background: #E6E6E6\"><a class=\"wp-colorbox-image cboxElement\" href=\"\/forum\/wp-content\/uploads\/sites\/2\/2024\/02\/19-02-2024-1708333395-mceclip1.png\"><img decoding=\"async\" src=\"\/forum\/wp-content\/uploads\/sites\/2\/2024\/02\/19-02-2024-1708333395-mceclip1.png\"><\/a><br \/><!--[endif]--><\/span><\/p>\n<\/td>\n<td style=\"width: 49.2583%;border-top: 1pt solid windowtext;border-right: 1pt solid windowtext;border-bottom: 1pt solid windowtext;border-left: none;padding: 0cm 5.4pt\" valign=\"top\" width=\"312\">\n<p class=\"MsoNormal\" style=\"line-height: normal\"><span style=\"color: #2b579a;background: #E6E6E6\"><a class=\"wp-colorbox-image cboxElement\" href=\"\/forum\/wp-content\/uploads\/sites\/2\/2024\/02\/19-02-2024-1708333399-mceclip2.png\"><img decoding=\"async\" src=\"\/forum\/wp-content\/uploads\/sites\/2\/2024\/02\/19-02-2024-1708333399-mceclip2.png\"><\/a><br \/><!--[endif]--><\/span><\/p>\n<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 50.7315%;border-right: 1pt solid windowtext;border-bottom: 1pt solid windowtext;border-left: 1pt solid windowtext;border-top: none;padding: 0cm 5.4pt\" valign=\"top\" width=\"312\">\n<p class=\"MsoNormal\" style=\"line-height: normal\"><span style=\"color: #2b579a;background: #E6E6E6\"><a class=\"wp-colorbox-image cboxElement\" href=\"\/forum\/wp-content\/uploads\/sites\/2\/2024\/02\/19-02-2024-1708333405-mceclip3.png\"><img decoding=\"async\" src=\"\/forum\/wp-content\/uploads\/sites\/2\/2024\/02\/19-02-2024-1708333405-mceclip3.png\"><\/a><br \/><!--[endif]--><\/span><\/p>\n<\/td>\n<td style=\"width: 49.2583%;border-top: none;border-left: none;border-bottom: 1pt solid windowtext;border-right: 1pt solid windowtext;padding: 0cm 5.4pt\" valign=\"top\" width=\"312\">\n<p class=\"MsoNormal\" style=\"line-height: normal\"><span style=\"font-size: 12.0pt;font-family: 'Times New Roman',serif\">&nbsp;<\/span><\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Now we can proceed with the execution of the test script.<\/p>\n<ol>\n<li>Take a note of the shared folder name<br \/><a class=\"wp-colorbox-image cboxElement\" href=\"\/forum\/wp-content\/uploads\/sites\/2\/2024\/02\/19-02-2024-1708333453-mceclip4.png\"><img decoding=\"async\" src=\"\/forum\/wp-content\/uploads\/sites\/2\/2024\/02\/19-02-2024-1708333453-mceclip4.png\"><\/a><\/li>\n<li>Connect to the Slurm Controller VM. (use RDP, or SSH)<\/li>\n<li>Create a file &#8220;test.sh&#8221; and paste the contents below:\n<pre>#!\/bin\/bash<br>export NFSDIR=\/mnt\/$1<br>if [ ! -d \"$NFSDIR\" ]; then<br>&nbsp; echo \"${NFSDIR} does not exist.\"<br>&nbsp; exit 1<br>fi<br>mkdir -p ${NFSDIR}\/testjob<br>NODES=$(sinfo -p ACS_cluster -o \"%a %F\" | tail -n1 | grep \"up\" | awk -F'\/' '{print $4}')<br>if [ -z \"${NODES}\" ]; then<br>&nbsp; echo \"No available nodes on ACS_cluster partition\"<br>&nbsp; exit 1<br>fi<br>sbatch -N$NODES &lt;&lt; 'EOF'<br>#!\/bin\/bash<br>#SBATCH --job-name=\"TestJob\"<br>#SBATCH --exclusive<br>#SBATCH --output=\"%x-%j\".out<br>#SBATCH --error=\"%x-%j\".err<br>#SBATCH --partition=ACS_cluster<br>date<br>echo \"------------\"<br>echo \"SLURM_JOB_ID : \"$SLURM_JOB_ID<br>echo \"SLURM_JOB_NODELIST : \"$SLURM_JOB_NODELIST<br>echo \"SLURM_JOB_NUM_NODES : \"$SLURM_JOB_NUM_NODES<br>echo \"SLURM_NODELIST : \"$SLURM_NODELIST<br>echo \"SLURM_TASKS_PER_NODE : \"$SLURM_TASKS_PER_NODE<br>echo \"WORKING DIRECTORY : \"$SLURM_SUBMIT_DIR<br>echo \"NFS STORAGE DIRECTORY: \"$NFSDIR<br>echo \"------------\"<br>cd $NFSDIR\/testjob<br>touch TestJob-$SLURM_JOB_ID.txt<br>MACHINELIST=$(srun hostname)<br>for HOST in ${MACHINELIST}; do<br>&nbsp; &nbsp; &nbsp; &nbsp; srun --nodes=1 echo \"hello from node ${HOST}\" &gt;&gt; TestJob-$SLURM_JOB_ID.txt &amp;<br>done<br>#Wait for sub-processes to finish<br>wait<br>EOF<\/pre>\n<\/li>\n<li>Run &#8220;test.sh&#8221; script, you need to provide the shared folder name from step (1)\n<pre>bash test.sh sharedfolder<\/pre>\n<\/li>\n<li>Check the output, it should look like this\n<pre>cat \/mnt\/sharedfolder\/TestJob-1*<\/pre>\n<p><a class=\"wp-colorbox-image cboxElement\" href=\"\/forum\/wp-content\/uploads\/sites\/2\/2024\/02\/19-02-2024-1708333678-mceclip5.png\"><img decoding=\"async\" src=\"\/forum\/wp-content\/uploads\/sites\/2\/2024\/02\/19-02-2024-1708333678-mceclip5.png\"><\/a><\/li>\n<\/ol>\n<pre><strong><br>References<\/strong>: <a href=\"https:\/\/ansyshelp.ansys.com\/account\/secured?returnurl=\/Views\/Secured\/gateway\/v000\/en\/gateway_ru\/test_slurm_cluster.html\">Testing a Slurm Cluster (ansys.com)<\/a><\/pre>\n","protected":false},"template":"","class_list":["post-352244","topic","type-topic","status-closed","hentry"],"aioseo_notices":[],"acf":[],"custom_fields":[{"0":{"_bbp_author_ip":["23.206.193.146"]," _bbp_last_reply_id":["0"]," _bbp_likes_count":["0"],"_btv_view_count":["1952"],"_bbp_topic_status":["answered"],"_bbp_status":["publish"],"_edit_lock":["1719992930:20790"],"_bbp_likes_count":["1"],"_bbp_topic_id":["352244"],"_bbp_forum_id":["233598"],"_bbp_engagement":["20790"],"_bbp_voice_count":["1"],"_bbp_reply_count":["0"],"_bbp_last_reply_id":["0"],"_bbp_last_active_id":["352244"],"_bbp_last_active_time":["2024-02-19 09:10:38"]},"test":"nikos-nikoloutsakosansys-com"}],"_links":{"self":[{"href":"https:\/\/innovationspace.ansys.com\/forum\/wp-json\/wp\/v2\/topics\/352244","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/innovationspace.ansys.com\/forum\/wp-json\/wp\/v2\/topics"}],"about":[{"href":"https:\/\/innovationspace.ansys.com\/forum\/wp-json\/wp\/v2\/types\/topic"}],"version-history":[{"count":2,"href":"https:\/\/innovationspace.ansys.com\/forum\/wp-json\/wp\/v2\/topics\/352244\/revisions"}],"predecessor-version":[{"id":370268,"href":"https:\/\/innovationspace.ansys.com\/forum\/wp-json\/wp\/v2\/topics\/352244\/revisions\/370268"}],"wp:attachment":[{"href":"https:\/\/innovationspace.ansys.com\/forum\/wp-json\/wp\/v2\/media?parent=352244"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}