{"id":347091,"date":"2024-01-29T17:26:39","date_gmt":"2024-01-29T17:26:39","guid":{"rendered":"\/forum\/forums\/topic\/ansysedt-slurm-tight-intelmpi-integration-issue\/"},"modified":"2024-02-12T22:42:02","modified_gmt":"2024-02-12T22:42:02","slug":"ansysedt-slurm-tight-intelmpi-integration-issue","status":"closed","type":"topic","link":"https:\/\/innovationspace.ansys.com\/forum\/forums\/topic\/ansysedt-slurm-tight-intelmpi-integration-issue\/","title":{"rendered":"AnsysEDT SLURM tight IntelMPI integration issue"},"content":{"rendered":"<p>Hello all,<\/p>\n<p>&nbsp;<\/p>\n<p>I have been trying to run HFSS simulations on the HPC available at my institution. AnsysEDT 2023R2 has recently been installed on Rocky Linux 8.8.<\/p>\n<p>&nbsp;<\/p>\n<p>Running jobs on an exclusive node obtained through salloc work fine. As in I can salloc &#8211;nodes=1 &#8211;exclusive, ssh in, launch AnsysEDT and run both manual configuration and auto configuration jobs with no problem after changing MPI Version to 2021. The job distributes and runs super well.<\/p>\n<p>Issues arrise when trying to utilize the SLURM integration available.<\/p>\n<p>When running a job with auto settings and auto setup, say across 2 nodes. The multiple hf3d processes launched on each node only get pinned to 1 CPU per node. This has been verified by enabling the debug options and inspecting the log_mpirun.fl_xxxxxxxxx.log file. This happens both with the version of IntelMPI bundles with AnsysEDT and the one on the cluster (changed with $INTELMPI_ROOT var). I have set the $AnsTempDir and the tempdirectory batch option to locations accessible by all nodes. I have tried various different $ANSOFT_MPI_INTERCONNECT options and $ANSOFT_MPI_INTERCONNECT_VARIANT too.<\/p>\n<p>When running a job with manual settings and the manu setup, again across 2 nodes. The pre-processing completes successfully, and in a distributed fashion, however, upon solving the first frequency for adaptive meshing the process exits with a message to contact customer support. I have tracked the issue down to be a SIGSEGV 11 from the MUMPS driver called by hf3d through the log files. Again happens with all possible variations as mentioned above.<\/p>\n<p>HPCLicenseType is pool<\/p>\n<p>tempdirectory is set to something reasonable<\/p>\n<p>HFSS\/MPIVendor intel<\/p>\n<p>HFSS\/MPIVersion 2021<\/p>\n<p>HFSS\/RemoteSpawnCommand scheduler<\/p>\n<p>&nbsp;<\/p>\n<p>Any help would be great! Thank you.<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"template":"","class_list":["post-347091","topic","type-topic","status-closed","hentry","topic-tag-intel-mpi-1","topic-tag-mpi-with-slurm","topic-tag-slurm"],"aioseo_notices":[],"acf":[],"custom_fields":[{"0":{"_bbp_subscription":["213550","2937"],"_bbp_author_ip":["23.206.193.59"]," _bbp_last_reply_id":["0"]," _bbp_likes_count":["0"],"_btv_view_count":["766"],"_bbp_topic_status":["answered"],"_edit_lock":["1706626134:109055"],"_bbp_topic_id":["347091"],"_bbp_forum_id":["27793"],"_bbp_engagement":["2937","213550"],"_bbp_voice_count":["2"],"_bbp_reply_count":["3"],"_bbp_last_reply_id":["349404"],"_bbp_last_active_id":["349404"],"_bbp_last_active_time":["2024-02-06 15:28:31"]},"test":"krrs87durham-ac-uk"}],"_links":{"self":[{"href":"https:\/\/innovationspace.ansys.com\/forum\/wp-json\/wp\/v2\/topics\/347091","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/innovationspace.ansys.com\/forum\/wp-json\/wp\/v2\/topics"}],"about":[{"href":"https:\/\/innovationspace.ansys.com\/forum\/wp-json\/wp\/v2\/types\/topic"}],"version-history":[{"count":1,"href":"https:\/\/innovationspace.ansys.com\/forum\/wp-json\/wp\/v2\/topics\/347091\/revisions"}],"predecessor-version":[{"id":350913,"href":"https:\/\/innovationspace.ansys.com\/forum\/wp-json\/wp\/v2\/topics\/347091\/revisions\/350913"}],"wp:attachment":[{"href":"https:\/\/innovationspace.ansys.com\/forum\/wp-json\/wp\/v2\/media?parent=347091"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}