


{"id":403669,"date":"2025-01-06T08:34:54","date_gmt":"2025-01-06T08:34:54","guid":{"rendered":"https:\/\/innovationspace.ansys.com\/forum\/forums\/topic\/trouble-running-mpi-with-ansys-fluent-on-hpc\/"},"modified":"2025-01-06T08:35:52","modified_gmt":"2025-01-06T08:35:52","slug":"trouble-running-mpi-with-ansys-fluent-on-hpc","status":"publish","type":"topic","link":"https:\/\/innovationspace.ansys.com\/forum\/forums\/topic\/trouble-running-mpi-with-ansys-fluent-on-hpc\/","title":{"rendered":"Trouble Running MPI with ANSYS Fluent on HPC"},"content":{"rendered":"<p>&lt;p&gt;&lt;div&gt;&lt;div&gt;Hi,&lt;\/div&gt;&lt;br&gt;&lt;div&gt;I&rsquo;m encountering issues with running ANSYS Fluent using MPI on HPC.&lt;\/div&gt;&lt;div&gt;Below, I&rsquo;ve included my SLURM job script and a snippet of the log output for reference.&lt;\/div&gt;&lt;br&gt;&lt;div&gt;I&rsquo;m unsure how to proceed at this point and would appreciate any guidance or suggestions.&lt;\/div&gt;&lt;br&gt;&lt;div&gt;<strong>Job Script:<\/strong>&lt;\/div&gt;&lt;br&gt;&lt;div&gt;#!\/bin\/bash&lt;\/div&gt;&lt;div&gt;#SBATCH -J Fluent&nbsp;&lt;\/div&gt;&lt;div&gt;#SBATCH -o run.out&nbsp;&lt;\/div&gt;&lt;div&gt;#SBATCH -N 2&nbsp;&lt;\/div&gt;&lt;div&gt;#SBATCH -n 256&nbsp;&lt;\/div&gt;&lt;div&gt;#SBATCH -p development&lt;\/div&gt;&lt;div&gt;#SBATCH -t 2:00:00&lt;\/div&gt;&lt;div&gt;#SBATCH -A MYPROJECT&lt;\/div&gt;&lt;br&gt;&lt;div&gt;set echo on&lt;\/div&gt;&lt;div&gt;total_tasks=256&lt;\/div&gt;&lt;div&gt;tasks_per_node=128&lt;\/div&gt;&lt;br&gt;&lt;div&gt;fluent232=\/scratch\/tacc\/apps\/ANSYS\/2023R2\/v232\/fluent\/bin\/fluent&lt;\/div&gt;&lt;br&gt;&lt;div&gt;module load ansys&lt;\/div&gt;&lt;br&gt;&lt;div&gt;echo &#8220;Generating PNODES, removing log files!&#8221;&lt;\/div&gt;&lt;div&gt;rm -f pnodes&lt;\/div&gt;&lt;div&gt;nlist=$(scontrol show hostname $SLURM_NODELIST | paste -d, -s)&lt;\/div&gt;&lt;div&gt;echo $nlist&lt;\/div&gt;&lt;div&gt;echo $SLURM_CPUS_ON_NODE&lt;\/div&gt;&lt;div&gt;for node in $(echo $nlist | tr &#8220;,&#8221; &#8221; &#8220;); do&lt;\/div&gt;&lt;div&gt;for i in $(seq 1 $tasks_per_node); do&lt;\/div&gt;&lt;div&gt;echo $node &gt;&gt; pnodes&lt;\/div&gt;&lt;div&gt;done&lt;\/div&gt;&lt;div&gt;done&lt;\/div&gt;&lt;br&gt;&lt;div&gt;$fluent232 3ddp -t$total_tasks -g -cnf=pnodes -mpi=intel -pib.infinipath -ssh -g &lt; run.inp &gt;&gt; run.log&lt;\/div&gt;&lt;br&gt;&lt;br&gt;&lt;div&gt;<strong>Log Output (Snippet):<\/strong>&lt;\/div&gt;&lt;br&gt;&lt;div&gt;Host spawning Node 0 on machine &#8220;c303-005.ls6.tacc.utexas.edu&#8221; (unix).&lt;\/div&gt;&lt;div&gt;\/scratch\/tacc\/apps\/ANSYS\/2023R2\/v232\/fluent\/fluent23.2.0\/bin\/fluent -r23.2.0 3ddp -flux -node -t256 -pinfiniband -mpi=intel -cnf=pnodes -ssh -mport 129.114.41.53:129.114.41.53:40663:0&lt;\/div&gt;&lt;div&gt;Starting \/scratch\/tacc\/apps\/ANSYS\/2023R2\/v232\/fluent\/fluent23.2.0\/multiport\/mpi\/lnamd64\/intel2021\/bin\/mpirun -f \/tmp\/fluent-appfile.MYID.919430 &#8211;rsh=ssh -genv FLUENT_ARCH lnamd64 -genv I_MPI_DEBUG 0 -genv I_MPI_ADJUST_GATHERV 3 -genv I_MPI_ADJUST_ALLREDUCE 2 -genv I_MPI_PLATFORM auto -genv PYTHONHOME \/scratch\/tacc\/apps\/ANSYS\/2023R2\/v232\/fluent\/fluent23.2.0\/..\/..\/commonfiles\/CPython\/3_10\/linx64\/Release\/python -genv FLUENT_PROD_DIR \/scratch\/tacc\/apps\/ANSYS\/2023R2\/v232\/fluent\/fluent23.2.0 -genv FLUENT_AFFINITY 0 -genv I_MPI_PIN enable -genv KMP_AFFINITY disabled -machinefile \/tmp\/fluent-appfile.MYID.919430 -np 256 \/scratch\/tacc\/apps\/ANSYS\/2023R2\/v232\/fluent\/fluent23.2.0\/lnamd64\/3ddp_node\/fluent_mpi.23.2.0 node -mpiw intel -pic infiniband -mport 129.114.41.53:129.114.41.53:40663:0&lt;\/div&gt;&lt;div&gt;[mpiexec@c303-005.ls6.tacc.utexas.edu] check_exit_codes (..\/..\/..\/..\/..\/src\/pm\/i_hydra\/libhydra\/demux\/hydra_demux_poll.c:117): unable to run bstrap_proxy on c303-006 (pid 925626, exit code 65280)&lt;\/div&gt;&lt;div&gt;[mpiexec@c303-005.ls6.tacc.utexas.edu] poll_for_event (..\/..\/..\/..\/..\/src\/pm\/i_hydra\/libhydra\/demux\/hydra_demux_poll.c:159): check exit codes error&lt;br&gt;[mpiexec@c303-005.ls6.tacc.utexas.edu] HYD_dmx_poll_wait_for_proxy_event (..\/..\/..\/..\/..\/src\/pm\/i_hydra\/libhydra\/demux\/hydra_demux_poll.c:212): poll for event error&lt;br&gt;[mpiexec@c303-005.ls6.tacc.utexas.edu] HYD_bstrap_setup (..\/..\/..\/..\/..\/src\/pm\/i_hydra\/libhydra\/bstrap\/src\/intel\/i_hydra_bstrap.c:1061): error waiting for event&lt;br&gt;[mpiexec@c303-005.ls6.tacc.utexas.edu] HYD_print_bstrap_setup_error_message (..\/..\/..\/..\/..\/src\/pm\/i_hydra\/mpiexec\/intel\/i_mpiexec.c:1027): error setting up the bootstrap proxies&lt;\/div&gt;&lt;br&gt;&lt;br&gt;&lt;div&gt;I suspect there might be an issue with how MPI is set up or how the nodes are being utilized, but I&rsquo;m not sure where to start troubleshooting.&lt;\/div&gt;&lt;br&gt;&lt;div&gt;Could someone help me:&lt;\/div&gt;&lt;\/p&gt;<\/p>\n<ol>\n<li>Identify possible issues in my SLURM job script.<\/li>\n<li>Understand if the MPI configuration might be causing this issue.<\/li>\n<li>Suggest any debug or diagnostic steps I can take.<\/li>\n<\/ol>\n<p>&lt;p&gt;Thank You!&lt;\/p&gt;&lt;p&gt;&lt;br&gt;&lt;br&gt;&lt;\/div&gt;&lt;\/p&gt;<\/p>\n","protected":false},"template":"","class_list":["post-403669","topic","type-topic","status-publish","hentry"],"aioseo_notices":[],"acf":[],"custom_fields":[{"0":{"_bbp_forum_id":["27792"],"_bbp_topic_id":["403669"],"_bbp_subscription":["29071","157473"],"_bbp_author_ip":["2600:1700:19e1:c51f:fd3e:16dd:45da:1f15"],"_bbp_last_reply_id":["403730"],"_bbp_last_active_id":["403730"],"_bbp_last_active_time":["2025-01-06 14:20:47"],"_bbp_reply_count":["1"],"_bbp_reply_count_hidden":["0"],"_bbp_voice_count":["2"],"_bbp_engagement":["29071","157473"],"_btv_view_count":["512"],"_bbp_topic_status":["unanswered"],"_edit_last":["29071"],"_bbp_revision_log":["a:1:{i:403670;a:2:{s:6:\"author\";i:29071;s:6:\"reason\";s:0:\"\";}}"]},"test":"th0masutexas-edu"}],"_links":{"self":[{"href":"https:\/\/innovationspace.ansys.com\/forum\/wp-json\/wp\/v2\/topics\/403669","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/innovationspace.ansys.com\/forum\/wp-json\/wp\/v2\/topics"}],"about":[{"href":"https:\/\/innovationspace.ansys.com\/forum\/wp-json\/wp\/v2\/types\/topic"}],"version-history":[{"count":1,"href":"https:\/\/innovationspace.ansys.com\/forum\/wp-json\/wp\/v2\/topics\/403669\/revisions"}],"predecessor-version":[{"id":403670,"href":"https:\/\/innovationspace.ansys.com\/forum\/wp-json\/wp\/v2\/topics\/403669\/revisions\/403670"}],"wp:attachment":[{"href":"https:\/\/innovationspace.ansys.com\/forum\/wp-json\/wp\/v2\/media?parent=403669"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}