{"id":161800,"date":"2021-08-31T19:34:55","date_gmt":"2021-08-31T19:34:55","guid":{"rendered":"\/forum\/forums\/topic\/running-ansys-hfss-on-multiple-nodes-on-slurm-based-cluster\/"},"modified":"2021-11-23T22:23:47","modified_gmt":"2021-11-23T22:23:47","slug":"running-ansys-hfss-on-multiple-nodes-on-slurm-based-cluster","status":"closed","type":"topic","link":"https:\/\/innovationspace.ansys.com\/forum\/forums\/topic\/running-ansys-hfss-on-multiple-nodes-on-slurm-based-cluster\/","title":{"rendered":"Running ANSYS HFSS on multiple nodes on SLURM based cluster"},"content":{"rendered":"<div class=\"Item-Body\">\n<div class=\"Message userContent\">\n<p>Hi all,<\/p>\n<p>Could someone point me to some standardized instructions for running HFSS on multiple nodes on a SLURM based cluster? Currently we are limited by the RAM on a single node, but multi-core simulations(parametric sweeps) do work on a single node.<\/p>\n<p>I did find an interesting solution here(<a href=\"\/forum\/discussion\/18341\/running-hfss-on-a-slurm-based-machine-rsm-cannot-be-accessed\" rel=\"nofollow\">\/forum\/discussion\/18341\/running-hfss-on-a-slurm-based-machine-rsm-cannot-be-accessed<\/a>), but cannot yet get my cluster admins to install ansoftrsmservice executable. Is there a way around installing this executable? I can see that ARC is pre-installed on the cluster, but running ARC node executable on every node before submitting the job might not be possible. If anyone has a standardized procedure for doing multi-node jobs on SLURM based clusters using ARC or anything else, I will be happy to hear that.<\/p>\n<p>I tried to submit a job on ansysedt on a cluster compute node (<a href=\"https:\/\/vis.tacc.utexas.e\" rel=\"nofollow\">https:\/\/vis.tacc.utexas.edu\/#<\/a>) as follows:<\/p>\n<p>Tools-&gt;Job Management-&gt;Submit Job<\/p>\n<p>The hostnames of the assigned nodes is given by the NODELIST variable (squeue -u &lt;username&gt;)<\/p>\n<p>Following is the preview of the job submission<\/p>\n<p>\/home1\/apps\/ANSYS\/AnsysEM20.1\/Linux64\/desktopjob -cmd dso -jobid RSM_14259 -machinelist list=c506-011:1:21:90%:1,c506-012:1:21:90%:1,c506-013:1:21:90%:1,c506-014:1:21:90%:1 -monitor -waitforlicense -useelectronicsppe -ng -batchoptions &quot; -batchsolve 20210711_Single_Qubit1_a:Optimetrics:ParametricSetup1&#039;work2\/08252\/ameya\/stampede2\/tmp\/Cavity Qubit Start_parametric_2.aedt&#039;<\/p>\n<p>I found the default port used by slurmd to listen for incoming requests from slurmctld is 6818 (<a href=\"https:\/\/slurm.schedmd.com\/net\" rel=\"nofollow\">https:\/\/slurm.schedmd.com\/network.html<\/a>) but I could not configure the port when submitting the job<\/p>\n<p>When I submit the job, I get a message that the job submission is successful and get redirected to Monitor Job &#8211; RSM which gives the following error message:<\/p>\n<p>Connecting to running job.<\/p>\n<p>==================================<\/p>\n<p>Running LSDSO job &#039;RSM_5415&#039;<\/p>\n<p>Location: \/home1\/apps\/ANSYS\/AnsysEM20.1\/Linux64\/desktopjob<\/p>\n<p>Batch Solve\/Save: &#039;work2\/08252\/ameya\/stampede2\/tmp\/Cavity Qubit Start_parametric_2.aedt<\/p>\n<p>Starting Batch Run: 04:11:30PM Thursday, August 12, 2021<\/p>\n<p>Temp directory: \/tmp<\/p>\n<p>==================================<\/p>\n<p>Error: (T=08\/12\/21 16:11:31): Failed to launch engine on machine &#039;c506-012&#039; or obtain a compatible interface<\/p>\n<p>Error: (T=08\/12\/21 16:11:31): Failed to activate child job. Node &#039;c506-012&#039; is removed from this job&#039;s available resources.<\/p>\n<p>Error: (T=08\/12\/21 16:11:31): Failed to launch engine on machine &#039;c506-013&#039; or obtain a compatible interface<\/p>\n<p>Error: (T=08\/12\/21 16:11:31): Failed to activate child job. Node &#039;c506-013&#039; is removed from this job&#039;s available resources.<\/p>\n<p>Error: (T=08\/12\/21 16:11:31): Failed to launch&#8230;<\/p>\n","protected":false},"template":"","class_list":["post-161800","topic","type-topic","status-closed","hentry","topic-tag-hfss","topic-tag-hpc"],"aioseo_notices":[],"acf":[],"custom_fields":[{"0":{"_bbp_author_ip":[""],"_bbp_view_count":["236"],"_bbp_likes_count":["0","0"],"_bbp_forum_subforum_count":["0"],"_btv_view_count":["5558"],"_bbp_subscription":["153348"],"_bbpmt_movedon":["2022-07-01 16:54:19"],"_bbpmt_movedfrom":["151970"],"_bbp_topic_status":["unanswered"],"_bbp_status":["publish"],"_bbp_topic_id":["161800"],"_bbp_forum_id":["27793"],"_bbp_engagement":["197","223340","240923","241358"],"_bbp_voice_count":["4"],"_bbp_reply_count":["13"],"_bbp_last_reply_id":["190515"],"_bbp_last_active_id":["190515"],"_bbp_last_active_time":["2021-11-23 22:23:47"]},"test":"ameya-riswadkarutexas-edu"}],"_links":{"self":[{"href":"https:\/\/innovationspace.ansys.com\/forum\/wp-json\/wp\/v2\/topics\/161800","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/innovationspace.ansys.com\/forum\/wp-json\/wp\/v2\/topics"}],"about":[{"href":"https:\/\/innovationspace.ansys.com\/forum\/wp-json\/wp\/v2\/types\/topic"}],"version-history":[{"count":0,"href":"https:\/\/innovationspace.ansys.com\/forum\/wp-json\/wp\/v2\/topics\/161800\/revisions"}],"wp:attachment":[{"href":"https:\/\/innovationspace.ansys.com\/forum\/wp-json\/wp\/v2\/media?parent=161800"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}