


{"id":311898,"date":"2023-10-17T18:07:21","date_gmt":"2023-10-17T18:07:21","guid":{"rendered":"\/forum\/?post_type=topic&#038;p=311898"},"modified":"2023-10-17T18:07:21","modified_gmt":"2023-10-17T18:07:21","slug":"mpi-failure-at-frequency-sweep","status":"closed","type":"topic","link":"https:\/\/innovationspace.ansys.com\/forum\/forums\/topic\/mpi-failure-at-frequency-sweep\/","title":{"rendered":"MPI failure at frequency sweep"},"content":{"rendered":"<p>Hi&nbsp;<\/p>\n<p>I&#8217;m using ansysem\/22 on SLURM for a simulation with an adaptive solver and discrete frequency sweeping (8 data points). The sbat file is at the end of this post, but in summary, I&#8217;m using 1 node, 8 tasks per node, 1 core per task, and 15 GB of RAM per core; specifically, I assign MPIVendor as Intel in the batch options.<\/p>\n<p>All this, and I get the following error: &#8220;The attempted launch of solvers via MPI failed while connecting to communication pipes&#8221; (complete error screenshot is attached). I&#8217;ve looked online (including similar posts on the Intel forum or even here) and haven&#8217;t been able to solve the problem so far. This error is persistent in both interactive session and -ng through sbat job submission. I don&#8217;t think there is something wrong with my simulation file because the same error occurs when I use the ansys test file located at &#8220;sw\/pkgs\/***\/ansysem\/22.R2\/v222\/Linux64\/schedulers\/diagnostics\/Projects\/HFSS\/OptimTee-DiscreteSweep.aedt&#8221;.&nbsp;<\/p>\n<p>What happens is that HFSS solves the adaptive frequency and locks on distributing the frequencies with the following message {solved 1 out of 8 frequencies being solved in parallel} till MPI timeout reaches and the error appears. I&#8217;ve checked the HPC analysis and the number of cores, etc. seems okay.&nbsp;<\/p>\n<p>I would really appreciate any feedback on this issue.&nbsp;<\/p>\n<p>Bests<\/p>\n<p>Ehsan<\/p>\n<p><img decoding=\"async\" src=\"\/forum\/wp-content\/uploads\/sites\/2\/2023\/10\/17-10-2023-1697565952-MPI_Pipe.png\" alt=\"\"><img decoding=\"async\" src=\"\/forum\/wp-content\/uploads\/sites\/2\/2023\/10\/17-10-2023-1697565924-sbat.png\" alt=\"\"><\/p>\n","protected":false},"template":"","class_list":["post-311898","topic","type-topic","status-closed","hentry","topic-tag-frequency-sweep-1","topic-tag-mpi-with-slurm"],"aioseo_notices":[],"acf":[],"custom_fields":[{"0":{"_bbp_author_ip":["23.206.193.41"],"_bbp_subscription":["295894","2937"],"_btv_view_count":["2081"],"_bbp_likes_count":["1"],"_bbp_topic_status":["unanswered"],"_bbp_status":["publish"],"_bbp_topic_id":["311898"],"_bbp_forum_id":["27793"],"_bbp_engagement":["2937","295894"],"_bbp_voice_count":["2"],"_bbp_reply_count":["5"],"_bbp_last_reply_id":["313356"],"_bbp_last_active_id":["313356"],"_bbp_last_active_time":["2023-10-25 19:59:25"]},"test":"hafeziumich-edu"}],"_links":{"self":[{"href":"https:\/\/innovationspace.ansys.com\/forum\/wp-json\/wp\/v2\/topics\/311898","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/innovationspace.ansys.com\/forum\/wp-json\/wp\/v2\/topics"}],"about":[{"href":"https:\/\/innovationspace.ansys.com\/forum\/wp-json\/wp\/v2\/types\/topic"}],"version-history":[{"count":0,"href":"https:\/\/innovationspace.ansys.com\/forum\/wp-json\/wp\/v2\/topics\/311898\/revisions"}],"wp:attachment":[{"href":"https:\/\/innovationspace.ansys.com\/forum\/wp-json\/wp\/v2\/media?parent=311898"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}