Ansys Products

Ansys Products

Discuss installation & licensing of our Ansys Teaching and Research products.

Ansys Fluent Running in Parallel issues

    • skylerp
      Subscriber

      OS: Centos 7

      We use Slurm as a scheduler and have never had issues with MPI. By default our nodes use openmpi3/3.1.4. When Requesting nodes I use the command: 

      salloc -N 3 -n 20 --mem=80G -C ib

      This gives me 3 nodes, 20 cores each, 80G of memory and uses infiniband. I then SSH to the host node (with x11 forwarding) and then run ansys (runwb2 command) and load my Fluent workbench file. When clicking "Setup" I change the processing options to "Parallel Per Machine File" with 3 processes. Then click "Show More Options" -> "Parallel Settings". Interconnects = infiniband, MPI Types = openmpi, I then supply the file containing the machine names I was allocated with the salloc command.


      This is the console log I recieve: 

      /opt/ohpc/pub/apps/ansys/2019R3_Fluent/v195/fluent/fluent19.5.0/bin/fluent -r19.5.0 3d -pinfiniband -host -alnamd64 -t3 -mpi=openmpi -cnf=/mnt/beegfs/home/testUser/testNode.txt -path/opt/ohpc/pub/apps/ansys/2019R3_Fluent/v195/fluent -ssh -cx node007.hpc.fau.edu:39136:43808

      Starting /opt/ohpc/pub/apps/ansys/2019R3_Fluent/v195/fluent/fluent19.5.0/lnamd64/3d_host/fluent.19.5.0 host -cx node007.hpc.fau.edu:39136:43808 "(list (rpsetvar (QUOTE parallel/function) "fluent 3d -flux -node -alnamd64 -r19.5.0 -t3 -pinfiniband -mpi=openmpi -cnf=/mnt/beegfs/home/testUser/testNode.txt -ssh") (rpsetvar (QUOTE parallel/rhost) "") (rpsetvar (QUOTE parallel/ruser) "") (rpsetvar (QUOTE parallel/nprocs_string) "3") (rpsetvar (QUOTE parallel/auto-spawn?) #t) (rpsetvar (QUOTE parallel/trace-level) 0) (rpsetvar (QUOTE parallel/remote-shell) 1) (rpsetvar (QUOTE parallel/path) "/opt/ohpc/pub/apps/ansys/2019R3_Fluent/v195/fluent") (rpsetvar (QUOTE parallel/hostsfile) "/mnt/beegfs/home/testUser/testNode.txt") )"


      Welcome to ANSYS Fluent 2019 R3


      Copyright 1987-2019 ANSYS, Inc. All Rights Reserved.

      Unauthorized use, distribution or duplication is prohibited.

      This product is subject to U.S. laws governing export and re-export.

      For full Legal Notice, see documentation.


      Build Time: Aug 05 2019 15:40:42 EDT Build Id: 10249


      *********************************************

      Info: Your license enables 4-way parallel execution.

      For faster simulations, please start the application with the appropriate parallel options.

      *********************************************


      --------------------------------------------------------------

      This is an academic version of ANSYS FLUENT. Usage of this product

      license is limited to the terms and conditions specified in your ANSYS

      license form, additional terms section.

      --------------------------------------------------------------

      Host spawning Node 0 on machine "node007.hpc.fau.edu" (unix).

      /opt/ohpc/pub/apps/ansys/2019R3_Fluent/v195/fluent/fluent19.5.0/bin/fluent -r19.5.0 3d -flux -node -alnamd64 -t3 -pinfiniband -mpi=openmpi -cnf=/mnt/beegfs/home/testUser/testNode.txt -ssh -mport 10.116.1.7:10.116.1.7:43013:0

      Starting fixfiledes /opt/ohpc/pub/apps/ansys/2019R3_Fluent/v195/fluent/fluent19.5.0/multiport/mpi/lnamd64/openmpi/bin/mpirun --mca btl self,vader,mvapi --prefix /opt/ohpc/pub/apps/ansys/2019R3_Fluent/v195/fluent/fluent19.5.0/multiport/mpi/lnamd64/openmpi -x LD_LIBRARY_PATH -x KMP_AFFINITY=disabled -x FLUENT_ARCH=lnamd64 -x FLUENT_PROD_DIR=/opt/ohpc/pub/apps/ansys/2019R3_Fluent/v195/fluent/fluent19.5.0 -x PYTHONHOME=/opt/ohpc/pub/apps/ansys/2019R3_Fluent/v195/fluent/fluent19.5.0/../../commonfiles/CPython/2_7_15/linx64/Release/python -np 3 --hostfile /tmp/fluent-appfile.testUser.32486 /opt/ohpc/pub/apps/ansys/2019R3_Fluent/v195/fluent/fluent19.5.0/lnamd64/3d_node/fluent_mpi.19.5.0 node -mpiw openmpi -pic infiniband -mport 10.116.1.7:10.116.1.7:43013:0

      --------------------------------------------------------------------------

      Failed to create a completion queue (CQ):


      Hostname: node007

      Requested CQE: 16384

      Error: Cannot allocate memory


      Check the CQE attribute.

      --------------------------------------------------------------------------

      --------------------------------------------------------------------------

      Open MPI has detected that there are UD-capable Verbs devices on your

      system, but none of them were able to be setup properly. This may

      indicate a problem on this system.


      You job will continue, but Open MPI will ignore the "ud" oob component

      in this run.


      Hostname: node007

      --------------------------------------------------------------------------

      --------------------------------------------------------------------------

      Failed to create a completion queue (CQ):


      Hostname: node082

      Requested CQE: 16384

      Error: Cannot allocate memory


      Check the CQE attribute.

      --------------------------------------------------------------------------

      --------------------------------------------------------------------------

      Open MPI has detected that there are UD-capable Verbs devices on your

      system, but none of them were able to be setup properly. This may

      indicate a problem on this system.


      You job will continue, but Open MPI will ignore the "ud" oob component

      in this run.


      Hostname: node082

      --------------------------------------------------------------------------

      --------------------------------------------------------------------------

      Failed to create a completion queue (CQ):


      Hostname: node081

      Requested CQE: 16384

      Error: Cannot allocate memory


      Check the CQE attribute.

      --------------------------------------------------------------------------

      --------------------------------------------------------------------------

      A requested component was not found, or was unable to be opened. This

      means that this component is either not installed or is unable to be

      used on your system (e.g., sometimes this means that shared libraries

      that the component requires are unable to be found/loaded). Note that

      Open MPI stopped checking at the first component that it did not find.


      Host: node007.hpc.fau.edu

      Framework: btl

      Component: mvapi

      --------------------------------------------------------------------------

      --------------------------------------------------------------------------

      It looks like MPI_INIT failed for some reason; your parallel process is

      likely to abort. There are many reasons that a parallel process can

      fail during MPI_INIT; some of which are due to configuration or environment

      problems. This failure appears to be an internal failure; here's some

      additional information (which may only be relevant to an Open MPI

      developer):


      mca_bml_base_open() failed

      --> Returned "Not found" (-13) instead of "Success" (0)

      --------------------------------------------------------------------------

      [node007:32653] *** An error occurred in MPI_Init

      [node007:32653] *** reported by process [4164222977,2]

      [node007:32653] *** on a NULL communicator

      [node007:32653] *** Unknown error

      [node007:32653] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,

      [node007:32653] *** and potentially your MPI job)

      [node007.hpc.fau.edu:32614] 3 more processes have sent help message help-oob-ud.txt / create-cq-failed

      [node007.hpc.fau.edu:32614] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

      [node007.hpc.fau.edu:32614] 3 more processes have sent help message help-oob-ud.txt / no-ports-usable

      [node007.hpc.fau.edu:32614] 2 more processes have sent help message help-mca-base.txt / find-available:not-valid

      [node007.hpc.fau.edu:32614] 2 more processes have sent help message help-mpi-runtime.txt / mpi_init:startup:internal-failure

      [node007.hpc.fau.edu:32614] 1 more process has sent help message help-mpi-errors.txt / mpi_errors_are_fatal unknown handle


      Any help would be greatly appreciated. Thanks

    • Hunter Wang
      Ansys Employee
      Try default IBM MPI in v195 for Fluent. Also try Intel MPI.
      Change Interconnect from Inbiniband to Ethernet to see how Open MPI, IBM MPI or Intel MPI work.
    • skylerp
      Subscriber
      Tried multiple versions of MPI, intel MPI. Both IB and ethernet, all produce the same error above.
    • skylerp
      Subscriber
      Any other ideas?
Viewing 3 reply threads
  • The topic ‘Ansys Fluent Running in Parallel issues’ is closed to new replies.