Ansys Learning Forum Forums Discuss Simulation Fluids Fluent fails with Intel MPI protocol on 2 nodes Reply To: Fluent fails with Intel MPI protocol on 2 nodes

dv.makarov
Subscriber

Hello,

The result is still the same, here we are:

[ssivaraman@node187 [kelvin2] ~]$  export I_MPI_BOOTSTRAP=ssh
[ssivaraman@node187 [kelvin2] ~]$ 
[ssivaraman@node187 [kelvin2] ~]$ echo $I_MPI_BOOTSTRAP
ssh
[ssivaraman@node187 [kelvin2] ~]$ 
[ssivaraman@node187 [kelvin2] ~]$ fluent 3ddp -t256 -mpi=intel -cnf=node[187-188] -ssh -g
/opt/apps/ansys/v241/fluent/fluent24.1.0/bin/fluent -r24.1.0 3ddp -t256 -mpi=intel -cnf=node[187-188] -ssh -g
Hostfile does not exist, will try to use it as hostname!
ssh: Could not resolve hostname node[187-188]: Name or service not known
ssh: Could not resolve hostname node[187-188]: Name or service not known
/opt/apps/ansys/v241/fluent/fluent24.1.0/cortex/lnamd64/cortex.24.1.0 -f fluent -g (fluent "3ddp  -host -r24.1.0 -t256 -cnf=node[187-188] -path/opt/apps/ansys/v241/fluent -ssh")

 

Opening input/output transcript to file "/users/ssivaraman/fluent-20241023-145644-1854988.trn".
Auto-Transcript Start Time:  14:56:44, 23 Oct 2024 
/opt/apps/ansys/v241/fluent/fluent24.1.0/bin/fluent -r24.1.0 3ddp -host -t256 -cnf=node[187-188] -path/opt/apps/ansys/v241/fluent -ssh -cx node187.pri.kelvin2.alces.network:44871:44849
Starting /opt/apps/ansys/v241/fluent/fluent24.1.0/lnamd64/3ddp_host/fluent.24.1.0 host -cx node187.pri.kelvin2.alces.network:44871:44849 "(list (rpsetvar (QUOTE parallel/function) "fluent 3ddp -flux -node -r24.1.0 -t256 -pdefault -mpi=intel -cnf=node[187-188] -ssh") (rpsetvar (QUOTE parallel/rhost) "") (rpsetvar (QUOTE parallel/ruser) "") (rpsetvar (QUOTE parallel/nprocs_string) "256") (rpsetvar (QUOTE parallel/auto-spawn?) #t) (rpsetvar (QUOTE parallel/trace-level) 0) (rpsetvar (QUOTE parallel/remote-shell) 1) (rpsetvar (QUOTE parallel/path) "/opt/apps/ansys/v241/fluent") (rpsetvar (QUOTE parallel/hostsfile) "node[187-188]") (rpsetvar (QUOTE gpuapp/devices) ""))"

 

              Welcome to ANSYS Fluent 2024 R1

 

              Copyright 1987-2024 ANSYS, Inc. All Rights Reserved.
              Unauthorized use, distribution or duplication is prohibited.
              This product is subject to U.S. laws governing export and re-export.
              For full Legal Notice, see documentation.

 

Build Time: Nov 22 2023 10:07:25 EST  Build Id: 10184  

Connected License Server List:  1055@193.61.145.219

 

     --------------------------------------------------------------
     This is an academic version of ANSYS FLUENT. Usage of this product
     license is limited to the terms and conditions specified in your ANSYS
     license form, additional terms section.
     --------------------------------------------------------------
Host spawning Node 0 on machine "node187.pri.kelvin2.alces.network" (unix).
/opt/apps/ansys/v241/fluent/fluent24.1.0/bin/fluent -r24.1.0 3ddp -flux -node -t256 -pdefault -mpi=intel -cnf=node[187-188] -ssh -mport 10.10.15.27:10.10.15.27:38735:0
Starting /opt/apps/ansys/v241/fluent/fluent24.1.0/multiport/mpi/lnamd64/intel2021/bin/mpirun -f /tmp/fluent-appfile.ssivaraman.1855431 --rsh=ssh -genv FI_PROVIDER tcp -genv FLUENT_ARCH lnamd64 -genv I_MPI_DEBUG 0 -genv I_MPI_ADJUST_GATHERV 3 -genv I_MPI_ADJUST_ALLREDUCE 2 -genv I_MPI_PLATFORM auto -genv PYTHONHOME /opt/apps/ansys/v241/fluent/fluent24.1.0/../../commonfiles/CPython/3_10/linx64/Release/python -genv FLUENT_PROD_DIR /opt/apps/ansys/v241/fluent/fluent24.1.0 -genv FLUENT_AFFINITY 0 -genv I_MPI_PIN enable -genv KMP_AFFINITY disabled -machinefile /tmp/fluent-appfile.ssivaraman.1855431 -np 256 /opt/apps/ansys/v241/fluent/fluent24.1.0/lnamd64/3ddp_node/fluent_mpi.24.1.0 node -mpiw intel -pic default -mport 10.10.15.27:10.10.15.27:38735:0
[mpiexec@node187.pri.kelvin2.alces.network] check_exit_codes (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:117): unable to run bstrap_proxy on node[187-188] (pid 1856274, exit code 65280)
[mpiexec@node187.pri.kelvin2.alces.network] poll_for_event (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:159): check exit codes error
[mpiexec@node187.pri.kelvin2.alces.network] HYD_dmx_poll_wait_for_proxy_event (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:212): poll for event error
[mpiexec@node187.pri.kelvin2.alces.network] HYD_bstrap_setup (../../../../../src/pm/i_hydra/libhydra/bstrap/src/intel/i_hydra_bstrap.c:1061): error waiting for event
[mpiexec@node187.pri.kelvin2.alces.network] HYD_print_bstrap_setup_error_message (../../../../../src/pm/i_hydra/mpiexec/intel/i_mpiexec.c:1027): error setting up the bootstrap proxies
[mpiexec@node187.pri.kelvin2.alces.network] Possible reasons:
[mpiexec@node187.pri.kelvin2.alces.network] 1. Host is unavailable. Please check that all hosts are available.
[mpiexec@node187.pri.kelvin2.alces.network] 2. Cannot launch hydra_bstrap_proxy or it crashed on one of the hosts. Make sure hydra_bstrap_proxy is available on all hosts and it has right permissions.
[mpiexec@node187.pri.kelvin2.alces.network] 3. Firewall refused connection. Check that enough ports are allowed in the firewall and specify them with the I_MPI_PORT_RANGE variable.
[mpiexec@node187.pri.kelvin2.alces.network] 4. Ssh bootstrap cannot launch processes on remote host. Make sure that passwordless ssh connection is established across compute hosts.
[mpiexec@node187.pri.kelvin2.alces.network]    You may try using -bootstrap option to select alternative launcher.

 

Thank you!

Yours,
Dmitriy