TAGGED: ansys-fluent, fluent
-
-
February 22, 2022 at 1:54 pm
skylerp
SubscriberOS: Centos 7
We use Slurm as a scheduler and have never had issues with MPI. By default our nodes use openmpi3/3.1.4. When Requesting nodes I use the command:Â
salloc -N 3 -n 20 --mem=80G -C ib
This gives me 3 nodes, 20 cores each, 80G of memory and uses infiniband. I then SSH to the host node (with x11 forwarding) and then run ansys (runwb2 command) and load my Fluent workbench file. When clicking "Setup" I change the processing options to "Parallel Per Machine File" with 3 processes. Then click "Show More Options" -> "Parallel Settings". Interconnects = infiniband, MPI Types = openmpi, I then supply the file containing the machine names I was allocated with the salloc command.
This is the console log I recieve:Â
/opt/ohpc/pub/apps/ansys/2019R3_Fluent/v195/fluent/fluent19.5.0/bin/fluent -r19.5.0 3d -pinfiniband -host -alnamd64 -t3 -mpi=openmpi -cnf=/mnt/beegfs/home/testUser/testNode.txt -path/opt/ohpc/pub/apps/ansys/2019R3_Fluent/v195/fluent -ssh -cx node007.hpc.fau.edu:39136:43808
Starting /opt/ohpc/pub/apps/ansys/2019R3_Fluent/v195/fluent/fluent19.5.0/lnamd64/3d_host/fluent.19.5.0 host -cx node007.hpc.fau.edu:39136:43808 "(list (rpsetvar (QUOTE parallel/function) "fluent 3d -flux -node -alnamd64 -r19.5.0 -t3 -pinfiniband -mpi=openmpi -cnf=/mnt/beegfs/home/testUser/testNode.txt -ssh") (rpsetvar (QUOTE parallel/rhost) "") (rpsetvar (QUOTE parallel/ruser) "") (rpsetvar (QUOTE parallel/nprocs_string) "3") (rpsetvar (QUOTE parallel/auto-spawn?) #t) (rpsetvar (QUOTE parallel/trace-level) 0) (rpsetvar (QUOTE parallel/remote-shell) 1) (rpsetvar (QUOTE parallel/path) "/opt/ohpc/pub/apps/ansys/2019R3_Fluent/v195/fluent") (rpsetvar (QUOTE parallel/hostsfile) "/mnt/beegfs/home/testUser/testNode.txt") )"
Welcome to ANSYS Fluent 2019 R3
Copyright 1987-2019 ANSYS, Inc. All Rights Reserved.
Unauthorized use, distribution or duplication is prohibited.
This product is subject to U.S. laws governing export and re-export.
For full Legal Notice, see documentation.
Build Time: Aug 05 2019 15:40:42 EDT Build Id: 10249
*********************************************
Info: Your license enables 4-way parallel execution.
For faster simulations, please start the application with the appropriate parallel options.
*********************************************
--------------------------------------------------------------
This is an academic version of ANSYS FLUENT. Usage of this product
license is limited to the terms and conditions specified in your ANSYS
license form, additional terms section.
--------------------------------------------------------------
Host spawning Node 0 on machine "node007.hpc.fau.edu" (unix).
/opt/ohpc/pub/apps/ansys/2019R3_Fluent/v195/fluent/fluent19.5.0/bin/fluent -r19.5.0 3d -flux -node -alnamd64 -t3 -pinfiniband -mpi=openmpi -cnf=/mnt/beegfs/home/testUser/testNode.txt -ssh -mport 10.116.1.7:10.116.1.7:43013:0
Starting fixfiledes /opt/ohpc/pub/apps/ansys/2019R3_Fluent/v195/fluent/fluent19.5.0/multiport/mpi/lnamd64/openmpi/bin/mpirun --mca btl self,vader,mvapi --prefix /opt/ohpc/pub/apps/ansys/2019R3_Fluent/v195/fluent/fluent19.5.0/multiport/mpi/lnamd64/openmpi -x LD_LIBRARY_PATH -x KMP_AFFINITY=disabled -x FLUENT_ARCH=lnamd64 -x FLUENT_PROD_DIR=/opt/ohpc/pub/apps/ansys/2019R3_Fluent/v195/fluent/fluent19.5.0 -x PYTHONHOME=/opt/ohpc/pub/apps/ansys/2019R3_Fluent/v195/fluent/fluent19.5.0/../../commonfiles/CPython/2_7_15/linx64/Release/python -np 3 --hostfile /tmp/fluent-appfile.testUser.32486 /opt/ohpc/pub/apps/ansys/2019R3_Fluent/v195/fluent/fluent19.5.0/lnamd64/3d_node/fluent_mpi.19.5.0 node -mpiw openmpi -pic infiniband -mport 10.116.1.7:10.116.1.7:43013:0
--------------------------------------------------------------------------
Failed to create a completion queue (CQ):
Hostname: node007
Requested CQE: 16384
Error: Cannot allocate memory
Check the CQE attribute.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
Open MPI has detected that there are UD-capable Verbs devices on your
system, but none of them were able to be setup properly. This may
indicate a problem on this system.
You job will continue, but Open MPI will ignore the "ud" oob component
in this run.
Hostname: node007
--------------------------------------------------------------------------
--------------------------------------------------------------------------
Failed to create a completion queue (CQ):
Hostname: node082
Requested CQE: 16384
Error: Cannot allocate memory
Check the CQE attribute.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
Open MPI has detected that there are UD-capable Verbs devices on your
system, but none of them were able to be setup properly. This may
indicate a problem on this system.
You job will continue, but Open MPI will ignore the "ud" oob component
in this run.
Hostname: node082
--------------------------------------------------------------------------
--------------------------------------------------------------------------
Failed to create a completion queue (CQ):
Hostname: node081
Requested CQE: 16384
Error: Cannot allocate memory
Check the CQE attribute.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
A requested component was not found, or was unable to be opened. This
means that this component is either not installed or is unable to be
used on your system (e.g., sometimes this means that shared libraries
that the component requires are unable to be found/loaded). Note that
Open MPI stopped checking at the first component that it did not find.
Host: node007.hpc.fau.edu
Framework: btl
Component: mvapi
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):
mca_bml_base_open() failed
--> Returned "Not found" (-13) instead of "Success" (0)
--------------------------------------------------------------------------
[node007:32653] *** An error occurred in MPI_Init
[node007:32653] *** reported by process [4164222977,2]
[node007:32653] *** on a NULL communicator
[node007:32653] *** Unknown error
[node007:32653] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[node007:32653] *** and potentially your MPI job)
[node007.hpc.fau.edu:32614] 3 more processes have sent help message help-oob-ud.txt / create-cq-failed
[node007.hpc.fau.edu:32614] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[node007.hpc.fau.edu:32614] 3 more processes have sent help message help-oob-ud.txt / no-ports-usable
[node007.hpc.fau.edu:32614] 2 more processes have sent help message help-mca-base.txt / find-available:not-valid
[node007.hpc.fau.edu:32614] 2 more processes have sent help message help-mpi-runtime.txt / mpi_init:startup:internal-failure
[node007.hpc.fau.edu:32614] 1 more process has sent help message help-mpi-errors.txt / mpi_errors_are_fatal unknown handle
Any help would be greatly appreciated. Thanks
February 25, 2022 at 6:28 pmHunter Wang
Ansys EmployeeTry default IBM MPI in v195 for Fluent. Also try Intel MPI.
Change Interconnect from Inbiniband to Ethernet to see how Open MPI, IBM MPI or Intel MPI work.
March 1, 2022 at 3:50 pmskylerp
SubscriberTried multiple versions of MPI, intel MPI. Both IB and ethernet, all produce the same error above.
April 6, 2022 at 2:02 pmskylerp
SubscriberAny other ideas?
Viewing 3 reply threads- The topic ‘Ansys Fluent Running in Parallel issues’ is closed to new replies.
Ansys Innovation SpaceTrending discussionsTop Contributors-
3757
-
1333
-
1168
-
1090
-
1014
Top Rated Tags© 2025 Copyright ANSYS, Inc. All rights reserved.
Ansys does not support the usage of unauthorized Ansys software. Please visit www.ansys.com to obtain an official distribution.
-