Electronics

Electronics

Topics relate to HFSS, Maxwell, SIwave, Icepak, Electronics Enterprise and more

SLURM integration

    • JOHN LAGRONE
      Subscriber

      We have not been able to get slurm integration working with electronics desktop components, e.g. hfss on multiple node jobs (preferred) or with RSM (previously worked for us) with 24.1

      We have modified the ansoftrsmservice.cfg to contain

              $begin 'Scheduler'
                    'SchedulerName'='generic'
                    'ConfigString'='{"Proxy":"slurm"}'
              $end 'Scheduler'

      and set multiple relevant environment variables, e.g. 

      ANSYS_HOME=/hpc/m3/apps/ansys/24R1/AnsysEM/v241/Linux64
      export PATH=${ANSYS_HOME}:$PATH
      export ANSOFT_DEBUG_LOG=$HOME/ansys_debug.log
      export ANSYSEM_ROOT241=${ANSYS_HOME}
      export ANSYSEM_COMMON_PREFIX=${ANSYS_HOME}/common
      export ANSYSEM_TASKS_PER_NODE=$( echo ${SLURM_TASKS_PER_NODE} | cut -f 1 -d \()
      export ANSYS_EM_EXEC_DIR=${ANSYS_HOME}
      export ANSYSEM_GENERIC_MPI_WRAPPER=${ANSYS_HOME}/schedulers/scripts/utils/slurm_srun_wrapper.sh
      export ANS_NODEPCHECK=1
      export ANSYSEM_GENERIC_EXEC_PATH=${ANSYS_HOME}/common/mono/Linux64/bin:${ANSYS_HOME}/common/IronPython
      export ANSYS_EM_GENERIC_COMMON_TEMP=/tmp

      Our batch options look like:

      $begin 'Config'
      'HFSS/NumCoresPerDistributedTask'=4
      'HFSS 3D Layout Design/NumCoresPerDistributedTask'=4
      'HFSS/HPCLicenseType'='Pool'
      'HPCLicenseType'='Pool'
      'HFSS/SolveAdaptiveOnly'=0
      'HFSS/MPIVendor'='Intel'
      'HFSS 3D Layout Design/MPIVendor'='Intel'
      'Maxwell 2D/MPIVendor'='Intel'
      'Maxwell 3D/MPIVendor'='Intel'
      'Q3D Extractor/MPIVendor'='Intel'
      'Icepak/MPIVendor'='Intel'
      'HFSS/RemoteSpawnCommand'='scheduler'
      'HFSS 3D Layout Design/RemoteSpawnCommand'='scheduler'
      'Maxwell 3D/RemoteSpawnCommand'='scheduler'
      'Maxwell 2D/RemoteSpawnCommand'='scheduler'
      'Q3D Extractor/RemoteSpawnCommand'='scheduler'
      'Icepak/RemoteSpawnCommand'='scheduler'
      'Desktop/Settings/ProjectOptions/ProductImprovementOptStatus'='0'
      'Desktop/Settings/ProjectOptions/AnsysEMPreferredSubnetAddress'='10.215.24.0/21'
      $end 'Config'

      It appears that the slurm_srun_wrapper is not setting the host(s) correctly, for example, this is from a log generated:

      (02:14:48 PM May 24, 2024) Command = /hpc/m3/apps/ansys/24R1/AnsysEM/v241/Linux64/schedulers/scripts/utils/slurm_srun_wrapper.sh
      (02:14:48 PM May 24, 2024) All args = --nodelist c001.infiniband.cluster,c002.infiniband.cluster -N 2 -n 2 --input none --external-launcher /hpc/m3/apps/ansys/24R1/AnsysEM/v241/Linux64/common/fluent_mpi/multiport/mpi/lnamd64/intel/bin/pmi_proxy --control-port c001.cm.cluster:36271 --pmi-connect alltoall --pmi-aggregate -s 0 --rmk user --launcher slurm --launcher-exec /hpc/m3/apps/ansys/24R1/AnsysEM/v241/Linux64/schedulers/scripts/utils/slurm_srun_wrapper.sh --launcher-exec-args --external-launcher --demux poll --pgid 0 --enable-stdin 1 --retries 10 --control-code 1142573241 --usize -2 --proxy-id -1

      and it sets host=$3, which is "-N". We assume that something is not being passed correctly, but the end result is that this generates a malformed srun command that kills our slurm daemons. Changing it to host=$2 in this case seems to help as it actually gets a hostname, but it doesn't appear to communicate between nodes.

      We've also been unable to get a multinode job to work with RSM like we've been able previously (We did something very similar to https://www.chpc.utah.edu/documentation/software/scripts/run_ansysedt.slr). Note, we also see "errors" in the RSM logs saying we have multiple network interfaces (which is true) and to see the docs on how to set the appropriate network interface. We have been unable to find that documentation for the RSM that is bundled with the Electronics Desktop -- we can only find the documation for the full RSM which says to set it with rsmutil, but that does not exist in the Electronics Desktop installation (e.g. https://ansyshelp.ansys.com/account/secured?returnurl=/Views/Secured/corp/v241/en/wb_rsm/wb_rsm_mult_nic.html)  

    • randyk
      Ansys Employee

      Discussing directly

    • randyk
      Ansys Employee

      Case resolution:
      Issue related to SLURM 23.11
      SLURM 23.11 changed the behaviour of mpirun when using I_MPI_HYDRA_BOOTSTRAP=slurm (the default), by injecting two environment variables and passing --external-launcher option to the launcher command.
      Patches provided to resolve the issue.

    • Benjamin Choi
      Subscriber

      Hello, could you provide the exact solution for this? That would be greatly appreciated.

    • randyk
      Ansys Employee

      Hi Benjamin,

      The solution is Ansys knowledgebase 000064207
      This is available on the Ansys Partner Community. Your Ansys ASC would have access - if you do not.

      That solution provides two patches that can be applied to AnsysEM 2022R2 through 2024R2.
      AnsysEM 2025R1 will natively work with SLURM 23.11

      thanks
      Randy

Viewing 4 reply threads
  • You must be logged in to reply to this topic.