Platform

Platform

Topics related to optiSLang, HPC, DesignXplorer, Cloud and more.

ANSYS Workbench HPC MPI Command Line Settings

    • KartiSinghFreeman
      Subscriber

      Hey,

      I'm currently submitting jobs with PBS Pro on the university HPC but I am struggling to get multi-node simulations to work with utilisation of more than one core. I've tried submissions through workbench, and the mechanical solver directly but neither appears to use more than one core.

      Currently my submission script looks like this

      #!/bin/bash -l

      #PBS -N Test

      #PBS -l select=4:ncpus=3:mpiprocs=3:mem=10gb

      #PBS -l walltime=2:00:00

      cd $PBS_O_WORKDIR

      module load intel

      module load ansys/21.1

      /pkg/suse12/software/ANSYS/21.1/v211/ansys/bin/ansysdis211 -i ds.dat -o solve.out -b -dis -mpi intelmpi -np 10

      I've also provided a screencap of the command line options within Workbench's settings for Mechanical APDL.

    • mrife
      Ansys Employee
      Hi @KartiSinghFreemanthat MAPDL Options shown is for the MAPDL Component System in Workbench and not WB Mechanical. What does the solve process look like for the PBS queue in WB Mechanical?
      Mike
    • KartiSinghFreeman
      Subscriber
      thanks that makes sense now.
      I managed to get workbench working with the script below but now my issue is that I cannot solve with more than two processors. If I go into the GUI and individually load each component locally I can solve with more processors, but on the HPC with submission it defaults to two. I have no idea how to increase the processor count.
      #!/bin/bash -l
      #PBS -N Test
      #PBS -l select=10:ncpus=1:mpiprocs=1:mem=10gb
      #PBS -l walltime=00:30:00
      #PBS -p 1023
      cd $PBS_O_WORKDIR
      module load ansys/20.1
      export I_MPI_HYDRA_BOOTSTRAP=ssh; export KMP_AFFINITY=balanced
      runwb2 -B -E "Update();Save(Overwrite=True)" -F Test.wbpj


    • mrife
      Ansys Employee
      is there a cluster admin you can ask? I think the ncpus and mpiprocs being 1 is the answer. Normally we connect to PBS via RSM and RSM knows how to submit the correct PBS command to launch the job. I think it first requests of PBS the compute node list and number of cpu cores on each to use, given the total number of cpu cores you requested to solve on. I think these are stored as variables and used in the select/ncpus/mpiprocs line.
      Mike
    • KartiSinghFreeman
      Subscriber
      It's definitely something related to the cluster as these are the two errors I get before it defaults into 2 cores
      --------------------- Error 1
      #!/bin/sh
      echo job started on $(hostname)
      # check shared cluster directory
      ClusterSharedDirectory="/home/Test/_ProjectScratch/Scr7200/"
      [ -d "$ClusterSharedDirectory" ] || { echo "Shared cluster directory does not exist on execution host, make sure it is mounted, shared out and can be accessed from all nodes."; echo 1008 > "/home/Test/_ProjectScratch/Scr7200/exitcode_6aa0b86a-5272-463b-8291-ce057e10320e.rsmout"; exit 1008; }
      # check AWP_ROOT
      echo AWP_ROOT201=$AWP_ROOT201
      [ -z "$AWP_ROOT201" ] && echo "AWP_ROOT201 is not set on execution host" && echo 1000 > "/home/Test/_ProjectScratch/Scr7200/exitcode_6aa0b86a-5272-463b-8291-ce057e10320e.rsmout" && exit 1000
      [ -d "$AWP_ROOT201" ] || { echo "AWP_ROOT201 directory does not exist on execution host"; echo 1009 > "/home/Test/_ProjectScratch/Scr7200/exitcode_6aa0b86a-5272-463b-8291-ce057e10320e.rsmout"; exit 1009; }
      # check command can be found
      command="$AWP_ROOT201/commonfiles/CPython/3_7/linx64/Release/python/runpython"
      [ -f "$command" ] || { echo "$command not found"; echo 1007 > "/home/Test/_ProjectScratch/Scr7200/exitcode_6aa0b86a-5272-463b-8291-ce057e10320e.rsmout"; exit 1007; }
      # running the cluster commmand
      echo command: "$AWP_ROOT201/commonfiles/CPython/3_7/linx64/Release/python/runpython" -B -E "$AWP_ROOT201/RSM/Config/scripts/ClusterJobs.py" "/home/Test/_ProjectScratch/Scr7200/control_6aa0b86a-5272-463b-8291-ce057e10320e.rsm"
      "$AWP_ROOT201/commonfiles/CPython/3_7/linx64/Release/python/runpython" -B -E "$AWP_ROOT201/RSM/Config/scripts/ClusterJobs.py" "/home/Test/_ProjectScratch/Scr7200/control_6aa0b86a-5272-463b-8291-ce057e10320e.rsm"
      --------------------- Error 2
      ARC
      cl4n007
      $AWP_ROOT201/SEC/SolverExecutionController/runsec.sh

      null
      done
      NOSCRATCH
      NOUNC
      /home/Test/_ProjectScratch/Scr7200/
      true
      SSH
      NOLIVELOGFILE
      stdout_6aa0b86a-5272-463b-8291-ce057e10320e.live
      stderr_6aa0b86a-5272-463b-8291-ce057e10320e.live
      *.dat
      file*.*
      *.mac
      thermal.build
      commands.xml
      SecInput.txt
      done
      file.abt
      sec.interrupt
      done
      *.xml
      *.NR*
      *.swf
      CAERepOutput.xml
      Load_*.inp
      Mode_mapping_*.txt
      NotSupportedElems.dat
      ObjectiveHistory.out
      PostImage*.png
      cyclic_map.json
      exit.topo
      file*.dsub
      file*.ldhi
      file*.mntr
      file*.png
      file*.r0*
      file*.r1*
      file*.r2*
      file*.r3*
      file*.r4*
      file*.r5*
      file*.r6*
      file*.r7*
      file*.r8*
      file*.r9*
      file*.rd*
      file*.rst
      file.BCS
      file.DSP
      file.PCS
      file.ce
      file.cm
      file.cnd
      file.cnm
      file.err
      file.gst
      file.json
      file.nd*
      file.nlh
      file.nr*
      file.rdb
      file.rfl
      file.spm
      file0.BCS
      file0.PCS
      file0.ce
      file0.cnd
      file0.err
      file0.gst
      file0.nd*
      file0.nlh
      file0.nr*
      frequencies_*.out
      input.x17
      intermediate*.topo
      morphed*.stl
      post.out
      record.txt
      solve*.out
      topo.err
      topo.out
      vars.topo
      SecDebugLog.txt
      secStart.log
      sec.validation.executed
      sec.envvarvalidation.executed
      sec.failure
      *.xml
      *.NR*
      *.swf
      CAERepOutput.xml
      Load_*.inp
      Mode_mapping_*.txt
      NotSupportedElems.dat
      ObjectiveHistory.out
      PostImage*.png
      cyclic_map.json
      exit.topo
      file*.dsub
      file*.ldhi
      file*.mntr
      file*.png
      file*.r0*
      file*.r1*
      file*.r2*
      file*.r3*
      file*.r4*
      file*.r5*
      file*.r6*
      file*.r7*
      file*.r8*
      file*.r9*
      file*.rd*
      file*.rst
      file.BCS
      file.DSP
      file.PCS
      file.ce
      file.cm
      file.cnd
      file.cnm
      file.err
      file.gst
      file.json
      file.nd*
      file.nlh
      file.nr*
      file.rdb
      file.rfl
      file.spm
      file0.BCS
      file0.PCS
      file0.ce
      file0.cnd
      file0.err
      file0.gst
      file0.nd*
      file0.nlh
      file0.nr*
      frequencies_*.out
      input.x17
      intermediate*.topo
      morphed*.stl
      post.out
      record.txt
      solve*.out
      topo.err
      topo.out
      vars.topo
      SecDebugLog.txt
      sec.solverexitcode
      secStart.log
      sec.failure
      sec.envvarvalidation.executed
      done
      stdout_6aa0b86a-5272-463b-8291-ce057e10320e.rsmout
      stderr_6aa0b86a-5272-463b-8291-ce057e10320e.rsmout
      control_6aa0b86a-5272-463b-8291-ce057e10320e.rsm
      hosts.dat
      exitcode_6aa0b86a-5272-463b-8291-ce057e10320e.rsmout
      exitcodeCommands_6aa0b86a-5272-463b-8291-ce057e10320e.rsmout
      stdout_6aa0b86a-5272-463b-8291-ce057e10320e.live
      stderr_6aa0b86a-5272-463b-8291-ce057e10320e.live
      ClusterJobCustomization.xml
      ClusterJobs.py
      clusterjob_6aa0b86a-5272-463b-8291-ce057e10320e.sh
      clusterjob_6aa0b86a-5272-463b-8291-ce057e10320e.bat
      inquire.request
      inquire.confirm
      request.upload.rsm
      request.download.rsm
      wait.download.rsm
      scratch.job.rsm
      volatile.job.rsm
      restart.xml
      cancel_6aa0b86a-5272-463b-8291-ce057e10320e.rsmout
      liveLogLastPositions_6aa0b86a-5272-463b-8291-ce057e10320e.rsm
      stdout_6aa0b86a-5272-463b-8291-ce057e10320e_kill.rsmout
      stderr_6aa0b86a-5272-463b-8291-ce057e10320e_kill.rsmout
      sec.interrupt
      stdout_6aa0b86a-5272-463b-8291-ce057e10320e_*.rsmout
      stderr_6aa0b86a-5272-463b-8291-ce057e10320e_*.rsmout
      stdout_task_*.live
      stderr_task_*.live
      control_task_*.rsm
      stdout_task_*.rsmout
      stderr_task_*.rsmout
      exitcode_task_*.rsmout
      exitcodeCommands_task_*.rsmout
      file.abt
      done
      RSM_IRON_PYTHON_HOME
      /pkg/suse12/software/ANSYS/20.1/v201/aisol/../commonfiles/IronPython
      RSM_TASK_WORKING_DIRECTORY
      /home/Test/_ProjectScratch/Scr7200
      RSM_USE_SSH_LINUX
      True
      RSM_QUEUE_NAME
      local
      RSM_CONFIGUREDQUEUE_NAME
      Local
      RSM_COMPUTE_SERVER_MACHINE_NAME
      cl4n007
      RSM_HPC_JOBNAME
      Mechanical
      RSM_HPC_DISPLAYNAME
      Wishbone_Test-DP0-Model (C2)-Static Structural (C3)-Solution (C4)
      RSM_HPC_CORES
      2
      RSM_HPC_DISTRIBUTED
      TRUE
      RSM_HPC_NODE_EXCLUSIVE
      FALSE
      RSM_HPC_QUEUE
      local
      RSM_HPC_USER
      nxxxxxxxx
      RSM_HPC_WORKDIR
      /home/Test/_ProjectScratch/Scr7200
      RSM_HPC_JOBTYPE
      Mechanical_ANSYS
      RSM_HPC_ANSYS_LOCAL_INSTALL_DIRECTORY
      /pkg/suse12/software/ANSYS/20.1/v201/aisol/..
      RSM_HPC_VERSION
      201
      RSM_HPC_STAGING
      /home/Test/_ProjectScratch/Scr7200/
      RSM_HPC_LOCAL_PLATFORM
      Linux
      RSM_HPC_CLUSTER_TARGET_PLATFORM
      Linux
      RSM_HPC_STDOUTFILE
      stdout_6aa0b86a-5272-463b-8291-ce057e10320e.rsmout
      RSM_HPC_STDERRFILE
      stderr_6aa0b86a-5272-463b-8291-ce057e10320e.rsmout
      RSM_HPC_STDOUTLIVE
      stdout_6aa0b86a-5272-463b-8291-ce057e10320e.live
      RSM_HPC_STDERRLIVE
      stderr_6aa0b86a-5272-463b-8291-ce057e10320e.live
      RSM_HPC_SCRIPTS_DIRECTORY_LOCAL
      /pkg/suse12/software/ANSYS/20.1/v201/aisol/../RSM/Config/scripts
      RSM_HPC_SCRIPTS_DIRECTORY
      $AWP_ROOT201/RSM/Config/scripts
      RSM_HPC_SUBMITHOST
      localhost
      RSM_HPC_STORAGEID
      e0ab39a5-9380-430c-9b38-4303e97fa96dMechanical=LocalNoCopy#localhost$/home/Test/_ProjectScratch/Scr7200/Tuesday, November 23, 2021 02:01:40.634 PMTrue
      RSM_HPC_PLATFORMSTORAGEID
      /home/Test/_ProjectScratch/Scr7200/
      RSM_HPC_NATIVEOPTIONS

      ARC_ROOT
      /pkg/suse12/software/ANSYS/20.1/v201/aisol/../RSM/Config/scripts/../../ARC
      RSM_HPC_KEYWORD
      ARC
      RSM_PYTHON_LOCALE
      en-us
      done
      AWP_ROOT201
      2.0
      6aa0b86a-5272-463b-8291-ce057e10320e

    • KartiSinghFreeman
      Subscriber
      I've spent hours on this and I cannot get it to work on more than two cores distributed on the HPC. Like on my local computer I can run more cores using the university licenses. This **** so much.
      I don't seem to have this issue with Fluent, but Mechanical it will not work.
Viewing 5 reply threads
  • The topic ‘ANSYS Workbench HPC MPI Command Line Settings’ is closed to new replies.