Fluids

Fluids

Topics related to Fluent, CFX, Turbogrid and more.

Trouble Running MPI with ANSYS Fluent on HPC

    • th0mas
      Subscriber

      Hi,

      I’m encountering issues with running ANSYS Fluent using MPI on HPC.
      Below, I’ve included my SLURM job script and a snippet of the log output for reference.

      I’m unsure how to proceed at this point and would appreciate any guidance or suggestions.

      Job Script:

      #!/bin/bash
      #SBATCH -J Fluent 
      #SBATCH -o run.out 
      #SBATCH -N 2 
      #SBATCH -n 256 
      #SBATCH -p development
      #SBATCH -t 2:00:00
      #SBATCH -A MYPROJECT

      set echo on
      total_tasks=256
      tasks_per_node=128

      fluent232=/scratch/tacc/apps/ANSYS/2023R2/v232/fluent/bin/fluent

      module load ansys

      echo "Generating PNODES, removing log files!"
      rm -f pnodes
      nlist=$(scontrol show hostname $SLURM_NODELIST | paste -d, -s)
      echo $nlist
      echo $SLURM_CPUS_ON_NODE
      for node in $(echo $nlist | tr "," " "); do
      for i in $(seq 1 $tasks_per_node); do
      echo $node >> pnodes
      done
      done

      $fluent232 3ddp -t$total_tasks -g -cnf=pnodes -mpi=intel -pib.infinipath -ssh -g < run.inp >> run.log


      Log Output (Snippet):

      Host spawning Node 0 on machine "c303-005.ls6.tacc.utexas.edu" (unix).
      /scratch/tacc/apps/ANSYS/2023R2/v232/fluent/fluent23.2.0/bin/fluent -r23.2.0 3ddp -flux -node -t256 -pinfiniband -mpi=intel -cnf=pnodes -ssh -mport 129.114.41.53:129.114.41.53:40663:0
      Starting /scratch/tacc/apps/ANSYS/2023R2/v232/fluent/fluent23.2.0/multiport/mpi/lnamd64/intel2021/bin/mpirun -f /tmp/fluent-appfile.MYID.919430 --rsh=ssh -genv FLUENT_ARCH lnamd64 -genv I_MPI_DEBUG 0 -genv I_MPI_ADJUST_GATHERV 3 -genv I_MPI_ADJUST_ALLREDUCE 2 -genv I_MPI_PLATFORM auto -genv PYTHONHOME /scratch/tacc/apps/ANSYS/2023R2/v232/fluent/fluent23.2.0/../../commonfiles/CPython/3_10/linx64/Release/python -genv FLUENT_PROD_DIR /scratch/tacc/apps/ANSYS/2023R2/v232/fluent/fluent23.2.0 -genv FLUENT_AFFINITY 0 -genv I_MPI_PIN enable -genv KMP_AFFINITY disabled -machinefile /tmp/fluent-appfile.MYID.919430 -np 256 /scratch/tacc/apps/ANSYS/2023R2/v232/fluent/fluent23.2.0/lnamd64/3ddp_node/fluent_mpi.23.2.0 node -mpiw intel -pic infiniband -mport 129.114.41.53:129.114.41.53:40663:0
      [mpiexec@c303-005.ls6.tacc.utexas.edu] check_exit_codes (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:117): unable to run bstrap_proxy on c303-006 (pid 925626, exit code 65280)
      [mpiexec@c303-005.ls6.tacc.utexas.edu] poll_for_event (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:159): check exit codes error
      [mpiexec@c303-005.ls6.tacc.utexas.edu] HYD_dmx_poll_wait_for_proxy_event (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:212): poll for event error
      [mpiexec@c303-005.ls6.tacc.utexas.edu] HYD_bstrap_setup (../../../../../src/pm/i_hydra/libhydra/bstrap/src/intel/i_hydra_bstrap.c:1061): error waiting for event
      [mpiexec@c303-005.ls6.tacc.utexas.edu] HYD_print_bstrap_setup_error_message (../../../../../src/pm/i_hydra/mpiexec/intel/i_mpiexec.c:1027): error setting up the bootstrap proxies


      I suspect there might be an issue with how MPI is set up or how the nodes are being utilized, but I’m not sure where to start troubleshooting.

      Could someone help me:

      1. Identify possible issues in my SLURM job script.
      2. Understand if the MPI configuration might be causing this issue.
      3. Suggest any debug or diagnostic steps I can take.

      Thank You!



    • MangeshANSYS
      Ansys Employee

      Hello

      Do you also run into the same issue with Ansys Fluent 2024 R2 ? if not then I will recommend runnign 24 R2

      see if the information on this page helps 
      https://ansyshelp.ansys.com/public/account/secured?returnurl=/Views/Secured/corp/v242/en/fluent_beta_doc/flu_beta_par_winshare.html

       

       

Viewing 1 reply thread
  • You must be logged in to reply to this topic.