Ansys Learning Forum › Forums › Discuss Simulation › Fluids › Fluent fails with Intel MPI protocol on 2 nodes › Reply To: Fluent fails with Intel MPI protocol on 2 nodes
Â
Morning,
Still crashiing on attempt to start on 2 nodes:
==============================================================================Â
[ssivaraman@node187 [kelvin2] ~]$ module load ansys/v241/ulster
ansys/v241/ulsterutility.c(2245):ERROR:50: Cannot open file '/opt/apps/etc/modules/ansys/v241/qub' for 'reading'
Â
|
OK
[ssivaraman@node187 [kelvin2] ~]$ export I_MPI_HYDRA_BOOTSTRAP=ssh
[ssivaraman@node187 [kelvin2] ~]$ echo $I_MPI_HYDRA_BOOTSTRAP
ssh
[ssivaraman@node187 [kelvin2] ~]$ fluent 3ddp -t256 -mpi=intel -cnf=node187,node188 -ssh -g
/opt/apps/ansys/v241/fluent/fluent24.1.0/bin/fluent -r24.1.0 3ddp -t256 -mpi=intel -cnf=node187,node188 -ssh -g
/opt/apps/ansys/v241/fluent/fluent24.1.0/cortex/lnamd64/cortex.24.1.0 -f fluent -g (fluent "3ddp -pmpi-auto-selected -host -r24.1.0 -t256 -mpi=intel -cnf=node187,node188 -path/opt/apps/ansys/v241/fluent -ssh")
Â
Opening input/output transcript to file "/users/ssivaraman/fluent-20241024-094521-1942286.trn".
Auto-Transcript Start Time:Â 09:45:21, 24 Oct 2024Â
/opt/apps/ansys/v241/fluent/fluent24.1.0/bin/fluent -r24.1.0 3ddp -pmpi-auto-selected -host -t256 -mpi=intel -cnf=node187,node188 -path/opt/apps/ansys/v241/fluent -ssh -cx node187.pri.kelvin2.alces.network:44677:44067
Starting /opt/apps/ansys/v241/fluent/fluent24.1.0/lnamd64/3ddp_host/fluent.24.1.0 host -cx node187.pri.kelvin2.alces.network:44677:44067 "(list (rpsetvar (QUOTE parallel/function) "fluent 3ddp -flux -node -r24.1.0 -t256 -pmpi-auto-selected -mpi=intel -cnf=node187,node188 -ssh") (rpsetvar (QUOTE parallel/rhost) "") (rpsetvar (QUOTE parallel/ruser) "") (rpsetvar (QUOTE parallel/nprocs_string) "256") (rpsetvar (QUOTE parallel/auto-spawn?) #t) (rpsetvar (QUOTE parallel/trace-level) 0) (rpsetvar (QUOTE parallel/remote-shell) 1) (rpsetvar (QUOTE parallel/path) "/opt/apps/ansys/v241/fluent") (rpsetvar (QUOTE parallel/hostsfile) "node187,node188") (rpsetvar (QUOTE gpuapp/devices) ""))"
Â
             Welcome to ANSYS Fluent 2024 R1
Â
             Copyright 1987-2024 ANSYS, Inc. All Rights Reserved.
             Unauthorized use, distribution or duplication is prohibited.
             This product is subject to U.S. laws governing export and re-export.
             For full Legal Notice, see documentation.
Â
Build Time: Nov 22 2023 10:07:25 ESTÂ Build Id: 10184Â Â
Connected License Server List:  1055@193.61.145.219
Â
    --------------------------------------------------------------
    This is an academic version of ANSYS FLUENT. Usage of this product
    license is limited to the terms and conditions specified in your ANSYS
    license form, additional terms section.
    --------------------------------------------------------------
Host spawning Node 0 on machine "node187.pri.kelvin2.alces.network" (unix).
/opt/apps/ansys/v241/fluent/fluent24.1.0/bin/fluent -r24.1.0 3ddp -flux -node -t256 -pmpi-auto-selected -mpi=intel -cnf=node187,node188 -ssh -mport 10.10.15.27:10.10.15.27:46493:0
Starting /opt/apps/ansys/v241/fluent/fluent24.1.0/multiport/mpi/lnamd64/intel2021/bin/mpirun -f /tmp/fluent-appfile.ssivaraman.1942769 --rsh=ssh -genv FLUENT_ARCH lnamd64 -genv I_MPI_DEBUG 0 -genv I_MPI_ADJUST_GATHERV 3 -genv I_MPI_ADJUST_ALLREDUCE 2 -genv I_MPI_PLATFORM auto -genv PYTHONHOME /opt/apps/ansys/v241/fluent/fluent24.1.0/../../commonfiles/CPython/3_10/linx64/Release/python -genv FLUENT_PROD_DIR /opt/apps/ansys/v241/fluent/fluent24.1.0 -genv FLUENT_AFFINITY 0 -genv I_MPI_PIN enable -genv KMP_AFFINITY disabled -machinefile /tmp/fluent-appfile.ssivaraman.1942769 -np 256 /opt/apps/ansys/v241/fluent/fluent24.1.0/lnamd64/3ddp_node/fluent_mpi.24.1.0 node -mpiw intel -pic mpi-auto-selected -mport 10.10.15.27:10.10.15.27:46493:0
[mpiexec@node187.pri.kelvin2.alces.network] check_exit_codes (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:117): unable to run bstrap_proxy on node188 (pid 1943623, exit code 65280)
[mpiexec@node187.pri.kelvin2.alces.network] poll_for_event (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:159): check exit codes error
[mpiexec@node187.pri.kelvin2.alces.network] HYD_dmx_poll_wait_for_proxy_event (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:212): poll for event error
[mpiexec@node187.pri.kelvin2.alces.network] HYD_bstrap_setup (../../../../../src/pm/i_hydra/libhydra/bstrap/src/intel/i_hydra_bstrap.c:1061): error waiting for event
[mpiexec@node187.pri.kelvin2.alces.network] HYD_print_bstrap_setup_error_message (../../../../../src/pm/i_hydra/mpiexec/intel/i_mpiexec.c:1027): error setting up the bootstrap proxies
[mpiexec@node187.pri.kelvin2.alces.network] Possible reasons:
[mpiexec@node187.pri.kelvin2.alces.network] 1. Host is unavailable. Please check that all hosts are available.
[mpiexec@node187.pri.kelvin2.alces.network] 2. Cannot launch hydra_bstrap_proxy or it crashed on one of the hosts. Make sure hydra_bstrap_proxy is available on all hosts and it has right permissions.
[mpiexec@node187.pri.kelvin2.alces.network] 3. Firewall refused connection. Check that enough ports are allowed in the firewall and specify them with the I_MPI_PORT_RANGE variable.
[mpiexec@node187.pri.kelvin2.alces.network] 4. Ssh bootstrap cannot launch processes on remote host. Make sure that passwordless ssh connection is established across compute hosts.
[mpiexec@node187.pri.kelvin2.alces.network]Â Â Â You may try using -bootstrap option to select alternative launcher.