

Topics related to Fluent, CFX, Turbogrid and more.

Trouble Running MPI with ANSYS Fluent on HPC

    • th0mas


      I’m encountering issues with running ANSYS Fluent using MPI on HPC.
      Below, I’ve included my SLURM job script and a snippet of the log output for reference.

      I’m unsure how to proceed at this point and would appreciate any guidance or suggestions.

      Job Script:

      #SBATCH -J Fluent 
      #SBATCH -o run.out 
      #SBATCH -N 2 
      #SBATCH -n 256 
      #SBATCH -p development
      #SBATCH -t 2:00:00

      set echo on


      module load ansys

      echo "Generating PNODES, removing log files!"
      rm -f pnodes
      nlist=$(scontrol show hostname $SLURM_NODELIST | paste -d, -s)
      echo $nlist
      echo $SLURM_CPUS_ON_NODE
      for node in $(echo $nlist | tr "," " "); do
      for i in $(seq 1 $tasks_per_node); do
      echo $node >> pnodes

      $fluent232 3ddp -t$total_tasks -g -cnf=pnodes -mpi=intel -pib.infinipath -ssh -g < run.inp >> run.log

      Log Output (Snippet):

      Host spawning Node 0 on machine "c303-005.ls6.tacc.utexas.edu" (unix).
      /scratch/tacc/apps/ANSYS/2023R2/v232/fluent/fluent23.2.0/bin/fluent -r23.2.0 3ddp -flux -node -t256 -pinfiniband -mpi=intel -cnf=pnodes -ssh -mport
      Starting /scratch/tacc/apps/ANSYS/2023R2/v232/fluent/fluent23.2.0/multiport/mpi/lnamd64/intel2021/bin/mpirun -f /tmp/fluent-appfile.MYID.919430 --rsh=ssh -genv FLUENT_ARCH lnamd64 -genv I_MPI_DEBUG 0 -genv I_MPI_ADJUST_GATHERV 3 -genv I_MPI_ADJUST_ALLREDUCE 2 -genv I_MPI_PLATFORM auto -genv PYTHONHOME /scratch/tacc/apps/ANSYS/2023R2/v232/fluent/fluent23.2.0/../../commonfiles/CPython/3_10/linx64/Release/python -genv FLUENT_PROD_DIR /scratch/tacc/apps/ANSYS/2023R2/v232/fluent/fluent23.2.0 -genv FLUENT_AFFINITY 0 -genv I_MPI_PIN enable -genv KMP_AFFINITY disabled -machinefile /tmp/fluent-appfile.MYID.919430 -np 256 /scratch/tacc/apps/ANSYS/2023R2/v232/fluent/fluent23.2.0/lnamd64/3ddp_node/fluent_mpi.23.2.0 node -mpiw intel -pic infiniband -mport
      [mpiexec@c303-005.ls6.tacc.utexas.edu] check_exit_codes (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:117): unable to run bstrap_proxy on c303-006 (pid 925626, exit code 65280)
      [mpiexec@c303-005.ls6.tacc.utexas.edu] poll_for_event (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:159): check exit codes error
      [mpiexec@c303-005.ls6.tacc.utexas.edu] HYD_dmx_poll_wait_for_proxy_event (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:212): poll for event error
      [mpiexec@c303-005.ls6.tacc.utexas.edu] HYD_bstrap_setup (../../../../../src/pm/i_hydra/libhydra/bstrap/src/intel/i_hydra_bstrap.c:1061): error waiting for event
      [mpiexec@c303-005.ls6.tacc.utexas.edu] HYD_print_bstrap_setup_error_message (../../../../../src/pm/i_hydra/mpiexec/intel/i_mpiexec.c:1027): error setting up the bootstrap proxies

      I suspect there might be an issue with how MPI is set up or how the nodes are being utilized, but I’m not sure where to start troubleshooting.

      Could someone help me:

      1. Identify possible issues in my SLURM job script.
      2. Understand if the MPI configuration might be causing this issue.
      3. Suggest any debug or diagnostic steps I can take.

      Thank You!

    • MangeshANSYS
      Ansys Employee


      Do you also run into the same issue with Ansys Fluent 2024 R2 ? if not then I will recommend runnign 24 R2

      see if the information on this page helps 



Viewing 1 reply thread
  • You must be logged in to reply to this topic.