We have an exciting announcement about badges coming in May 2025. Until then, we will temporarily stop issuing new badges for course completions and certifications. However, all completions will be recorded and fulfilled after May 2025.

Ansys Learning Forum Forums Discuss Simulation Fluids Fluent fails with Intel MPI protocol on 2 nodes Reply To: Fluent fails with Intel MPI protocol on 2 nodes

dv.makarov
Subscriber

Hello,

The result is still the same, here we are:

[ssivaraman@node187 [kelvin2] ~]$  export I_MPI_BOOTSTRAP=ssh
[ssivaraman@node187 [kelvin2] ~]$ 
[ssivaraman@node187 [kelvin2] ~]$ echo $I_MPI_BOOTSTRAP
ssh
[ssivaraman@node187 [kelvin2] ~]$ 
[ssivaraman@node187 [kelvin2] ~]$ fluent 3ddp -t256 -mpi=intel -cnf=node[187-188] -ssh -g
/opt/apps/ansys/v241/fluent/fluent24.1.0/bin/fluent -r24.1.0 3ddp -t256 -mpi=intel -cnf=node[187-188] -ssh -g
Hostfile does not exist, will try to use it as hostname!
ssh: Could not resolve hostname node[187-188]: Name or service not known
ssh: Could not resolve hostname node[187-188]: Name or service not known
/opt/apps/ansys/v241/fluent/fluent24.1.0/cortex/lnamd64/cortex.24.1.0 -f fluent -g (fluent "3ddp  -host -r24.1.0 -t256 -cnf=node[187-188] -path/opt/apps/ansys/v241/fluent -ssh")

 

Opening input/output transcript to file "/users/ssivaraman/fluent-20241023-145644-1854988.trn".
Auto-Transcript Start Time:  14:56:44, 23 Oct 2024 
/opt/apps/ansys/v241/fluent/fluent24.1.0/bin/fluent -r24.1.0 3ddp -host -t256 -cnf=node[187-188] -path/opt/apps/ansys/v241/fluent -ssh -cx node187.pri.kelvin2.alces.network:44871:44849
Starting /opt/apps/ansys/v241/fluent/fluent24.1.0/lnamd64/3ddp_host/fluent.24.1.0 host -cx node187.pri.kelvin2.alces.network:44871:44849 "(list (rpsetvar (QUOTE parallel/function) "fluent 3ddp -flux -node -r24.1.0 -t256 -pdefault -mpi=intel -cnf=node[187-188] -ssh") (rpsetvar (QUOTE parallel/rhost) "") (rpsetvar (QUOTE parallel/ruser) "") (rpsetvar (QUOTE parallel/nprocs_string) "256") (rpsetvar (QUOTE parallel/auto-spawn?) #t) (rpsetvar (QUOTE parallel/trace-level) 0) (rpsetvar (QUOTE parallel/remote-shell) 1) (rpsetvar (QUOTE parallel/path) "/opt/apps/ansys/v241/fluent") (rpsetvar (QUOTE parallel/hostsfile) "node[187-188]") (rpsetvar (QUOTE gpuapp/devices) ""))"

 

              Welcome to ANSYS Fluent 2024 R1

 

              Copyright 1987-2024 ANSYS, Inc. All Rights Reserved.
              Unauthorized use, distribution or duplication is prohibited.
              This product is subject to U.S. laws governing export and re-export.
              For full Legal Notice, see documentation.

 

Build Time: Nov 22 2023 10:07:25 EST  Build Id: 10184  

Connected License Server List:  1055@193.61.145.219

 

     --------------------------------------------------------------
     This is an academic version of ANSYS FLUENT. Usage of this product
     license is limited to the terms and conditions specified in your ANSYS
     license form, additional terms section.
     --------------------------------------------------------------
Host spawning Node 0 on machine "node187.pri.kelvin2.alces.network" (unix).
/opt/apps/ansys/v241/fluent/fluent24.1.0/bin/fluent -r24.1.0 3ddp -flux -node -t256 -pdefault -mpi=intel -cnf=node[187-188] -ssh -mport 10.10.15.27:10.10.15.27:38735:0
Starting /opt/apps/ansys/v241/fluent/fluent24.1.0/multiport/mpi/lnamd64/intel2021/bin/mpirun -f /tmp/fluent-appfile.ssivaraman.1855431 --rsh=ssh -genv FI_PROVIDER tcp -genv FLUENT_ARCH lnamd64 -genv I_MPI_DEBUG 0 -genv I_MPI_ADJUST_GATHERV 3 -genv I_MPI_ADJUST_ALLREDUCE 2 -genv I_MPI_PLATFORM auto -genv PYTHONHOME /opt/apps/ansys/v241/fluent/fluent24.1.0/../../commonfiles/CPython/3_10/linx64/Release/python -genv FLUENT_PROD_DIR /opt/apps/ansys/v241/fluent/fluent24.1.0 -genv FLUENT_AFFINITY 0 -genv I_MPI_PIN enable -genv KMP_AFFINITY disabled -machinefile /tmp/fluent-appfile.ssivaraman.1855431 -np 256 /opt/apps/ansys/v241/fluent/fluent24.1.0/lnamd64/3ddp_node/fluent_mpi.24.1.0 node -mpiw intel -pic default -mport 10.10.15.27:10.10.15.27:38735:0
[mpiexec@node187.pri.kelvin2.alces.network] check_exit_codes (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:117): unable to run bstrap_proxy on node[187-188] (pid 1856274, exit code 65280)
[mpiexec@node187.pri.kelvin2.alces.network] poll_for_event (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:159): check exit codes error
[mpiexec@node187.pri.kelvin2.alces.network] HYD_dmx_poll_wait_for_proxy_event (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:212): poll for event error
[mpiexec@node187.pri.kelvin2.alces.network] HYD_bstrap_setup (../../../../../src/pm/i_hydra/libhydra/bstrap/src/intel/i_hydra_bstrap.c:1061): error waiting for event
[mpiexec@node187.pri.kelvin2.alces.network] HYD_print_bstrap_setup_error_message (../../../../../src/pm/i_hydra/mpiexec/intel/i_mpiexec.c:1027): error setting up the bootstrap proxies
[mpiexec@node187.pri.kelvin2.alces.network] Possible reasons:
[mpiexec@node187.pri.kelvin2.alces.network] 1. Host is unavailable. Please check that all hosts are available.
[mpiexec@node187.pri.kelvin2.alces.network] 2. Cannot launch hydra_bstrap_proxy or it crashed on one of the hosts. Make sure hydra_bstrap_proxy is available on all hosts and it has right permissions.
[mpiexec@node187.pri.kelvin2.alces.network] 3. Firewall refused connection. Check that enough ports are allowed in the firewall and specify them with the I_MPI_PORT_RANGE variable.
[mpiexec@node187.pri.kelvin2.alces.network] 4. Ssh bootstrap cannot launch processes on remote host. Make sure that passwordless ssh connection is established across compute hosts.
[mpiexec@node187.pri.kelvin2.alces.network]    You may try using -bootstrap option to select alternative launcher.

 

Thank you!

Yours,
Dmitriy