Electronics

Electronics

Topics related to HFSS, Maxwell, SIwave, Icepak, Electronics Enterprise and more.

ansysedt in batch using MPI

    • jsarlo
      Subscriber

      I am working with AnsysEM 2022r2.  I am a sys admin working with a researcher.  We are trying to run a batch job with Slurm and wanting to use multiple compute nodes in the cluster.  We have the aedt input file and are trying the following as the execution line of the job script.

      ansysedt -ng -batchsolve -dis -mpi -machinelist list=$hl num=$SLURM_NTASKS ${InputFile}

      When I watch the compute nodes that get assigned, I only see the first one being used.  Nothing ever starts on the second compute node.  The $hl list gets built to something like  list=compute-4-53-ib0:48:48:98%,compute-7-19-ib0:48:48:98%  I have also tried building the list to being individual 1:1 48 times for each compute node (compute-4-53-ib0:1:1:98%,compute-4-53-ib0:1:1:98%...compute-7-19-ib0:1:1:98%, ...)

      Is there something else that needs to be on the command line to use both compute nodes or is there something else that needs to be done?

      Jeff

       

    • randyk
      Forum Moderator

      Hi Jeff, 

      Please consider creating the following script. (job.sh for this example)
      dos2unix   ./job.sh
      chmod +x ./job.sh
      sbatch ./job.sh

      Modify lines 2-3, 12-13, and 39 as needed.
      note1: the value of line 39 "numcores=xx" must match the allocated resource core count.

      job.sh
      #!/bin/bash
      #SBATCH -N 2               # allocate 2 nodes
      #SBATCH -n 32             # 32 tasks total
      #SBATCH -J AnsysEMTest     # sensible name for the job
      #SBATCH -p default           # partition name
      ##SBATCH --mem 0            #allocates all the memory on the node to the job
      ##SBATCH --time 0
      ##SBATCH --mail-user="user@company.com"
      ##SBATCH --mail-type=ALL
       
      # Project Name and setup
      JobName=OptimTee.aedt
      AnalysisSetup=""
       
      # Project location
      JobFolder=$(pwd)
       
      #### Do not modify any items below this line unless requested ####
      InstFolder=/opt/AnsysEM/v222/Linux64
       
      #SLURM
       export ANSYSEM_GENERIC_MPI_WRAPPER=${InstFolder}/schedulers/scripts/utils/slurm_srun_wrapper.sh
       export ANSYSEM_COMMON_PREFIX=${InstFolder}/common
       srun_cmd="srun --overcommit --export=ALL  -n 1 -N 1 --cpu-bind=none --mem-per-cpu=0 --overlap "
       # note: srun '--overlap' option was introduced in SLURM VERSION 20.11. If running older SLURM version, remove the '--overlap' argument.
       export ANSYSEM_TASKS_PER_NODE="${SLURM_TASKS_PER_NODE}"
       
      # Setup Batchoptions
      echo "\$begin 'Config'" > ${JobFolder}/${JobName}.options
      echo "'Desktop/Settings/ProjectOptions/HPCLicenseType'='Pack'" >> ${JobFolder}/${JobName}.options
      echo "'HFSS/RAMLimitPercent'=90" >> ${JobFolder}/${JobName}.options
      echo "'HFSS 3D Layout Design/RAMLimitPercent'=90" >> ${JobFolder}/${JobName}.options
      echo "'HFSS/RemoteSpawnCommand'='scheduler'" >> ${JobFolder}/${JobName}.options
      echo "'HFSS 3D Layout Design/RemoteSpawnCommand'='scheduler'" >> ${JobFolder}/${JobName}.options
      # If multiple networks on execution host, specify network CIDR 
      # echo "'Desktop/Settings/ProjectOptions/AnsysEMPreferredSubnetAddress'='192.168.1.0/24'" >> ${JobFolder}/${JobName}.options
      echo "\$end 'Config'" >> ${JobFolder}/${JobName}.options
       
      # Submit AEDT Job (SLURM requires 'srun' and tight integration change to the slurm_srun_wrapper.sh 
      ${srun_cmd} ${InstFolder}/ansysedt -ng -monitor -waitforlicense -useelectronicsppe=1 -distributed -machinelist numcores=32 -auto -batchoptions ${JobFolder}/${JobName}.options -batchsolve ${AnalysisSetup} ${JobFolder}/${Project} > ${JobFolder}/${JobName}.progress



    • jsarlo
      Subscriber

      Thanks for the suggestion.  I did try this with a change to the licensing and I did

      #SBATCH -N 2
      #SBATCH --ntasks-per-node 4
       
      just to make sure that it should try to use the 2nd compute node.  I still got the same results with no processes showing up on the 2nd node.  I have tried this with both 2022r2 and 2023r2.  Not sure if there is something missing in the input file or if we just don't have something else with the install configured properly.  We are now getting to the point that the users have larger jobs that can fit on just one node and really need to be able to use multiple nodes.
       
      Is there something else I need to be checking?
       
      Jeff
    • randyk
      Forum Moderator

      Hi Jeff,

      Can you please paste the contents of the OptimTee.aedt.batchinfo/OptimTee-xxx.log
      -- if you need to, do a replace all for each of the hostnames and project path listed in that file

      thanks
      Randy

    • jsarlo
      Subscriber

      This is what that file has

      Ansys Electronics Desktop Version 2023.2.0, Build: 2023-05-16 22:11:08
      Location: /share/apps/AnsysEM-2023r2/v232/Linux64/ansysedt.exe
      Batch Solve/Save: /project/hpcc/jeff/ansys/maxwell/2node_pipe_2023/Manual_half_pipe.aedt
      Starting Batch Run: 1:34:05 PM  Oct 05, 2023
      Temp directory: /tmp
      Project directory: /home/jsarlo/Ansoft
      [info] Running SLURM job with ID 1878304.  Command line: "/share/apps/AnsysEM-2023r2/v232/Linux64/ansysedt.exe -ng -monitor -waitforlicense -useelectronicsppe=1 -distributed -machinelist numcores=8 -auto -batchoptions /project/hpcc/jeff/ansys/maxwell/2node_pipe_2023/Manual_half_pipe.aedt.options -batchsolve /project/hpcc/jeff/ansys/maxwell/2node_pipe_2023/Manual_half_pipe.aedt".
      Simulation settings:
      [info] Simulation settings:

      Design type: Maxwell 3D
      [info]
      Design type: Maxwell 3D
      Allow off core: False
      [info] Allow off core: False
      Using automatic settings
      [info] Using automatic settings
      Optimetrics variations will be solved sequentially.

      [info] Optimetrics variations will be solved sequentially.

      Machines:
      [info] Machines:
      compute-2-20 [773417 MB]: RAM: 90%, 4 cores, 0 GPUs
      compute-2-24 [773417 MB]: RAM: 90%, 4 cores, 0 GPUs

      [info] compute-2-20 [773417 MB]: RAM: 90%, 4 cores, 0 GPUs
      compute-2-24 [773417 MB]: RAM: 90%, 4 cores, 0 GPUs

      [info] Project:Manual_half_pipe, Setup1 : [PROFILE] Solution Process : Start Time: 10/05/2023 13:34:19, Host: compute-2-20.local, Processor: 48, OS: Linux 4.18.0-477.15.1.el8_8.x86_64, Product: Maxwell 3D 2023.2.0 (01:34:19 PM  Oct 05, 2023)
      [info] Project:Manual_half_pipe, Setup1 : [PROFILE]        Executing From: /share/apps/AnsysEM-2023r2/v232/Linux64/MAXWELLCOMENGINE.exe (01:34:19 PM  Oct 05, 2023)
      [info] Project:Manual_half_pipe, Setup1 : [PROFILE] HPC : Type: Auto, MPI Vendor: Intel, MPI Version: 2018 (01:34:19 PM  Oct 05, 2023)
      [info] Project:Manual_half_pipe, Setup1 : [PROFILE] Machine 1 : Name: compute-2-20.local, RAM Limit: 90.000000%, Cores: 4 (01:34:19 PM  Oct 05, 2023)
      [info] Project:Manual_half_pipe, Setup1 : [PROFILE] Machine 2 : Name: compute-2-24.local, RAM Limit: 90.000000%, Cores: 4 (01:34:19 PM  Oct 05, 2023)
      [info] Project:Manual_half_pipe, Setup1 : [PROFILE] Stop :   (01:34:19 PM  Oct 05, 2023)
      [info] Project:Manual_half_pipe, Setup1 : [PROFILE] Design Validation : Level: Perform full validations, Elapsed Time: 00:00:00, Memory: 75.6 M (01:34:19 PM  Oct 05, 2023)
      [info] Project:Manual_half_pipe, Setup1 : [PROFILE] Adaptive Meshing : Time: 10/05/2023 13:34:20 (01:34:20 PM  Oct 05, 2023)
      [info] Project:Manual_half_pipe, Setup1 : [PROFILE] Pass 2 (01:34:20 PM  Oct 05, 2023)
      [info] Project:Manual_half_pipe, Setup1 : [PROFILE] Adaptive Refine : Real Time 00:08:52 : CPU Time 00:09:01 : Memory 3.39 G : Tetrahedra: 4213922, Cores: 1 (01:43:12 PM  Oct 05, 2023)
      [error] Project:Manual_half_pipe, Design:Maxwell3DDesign1 (EddyCurrent), Unable to create child process: 3dedy. Please contact Ansys technical support. -- Simulating on machine: compute-2-20 (01:45:15 PM  Oct 05, 2023)
      [info] Project:Manual_half_pipe, Setup1 : [PROFILE] Stop :   (01:45:15 PM  Oct 05, 2023)
      [info] Project:Manual_half_pipe, Setup1 : [PROFILE] Stop : Elapsed Time: 00:10:55 (01:45:15 PM  Oct 05, 2023)
      [info] Project:Manual_half_pipe, Setup1 : [PROFILE]        Unable to create child process: 3dedy. Please contact Ansys technical support. (01:45:15 PM  Oct 05, 2023)
      [info] Project:Manual_half_pipe, Setup1 : [PROFILE]        Stop Time: 10/05/2023 13:45:15, Status: Engine Detected Error (01:45:15 PM  Oct 05, 2023)
      [info] Project:Manual_half_pipe, Setup1 : [PROFILE] Stop : Elapsed Time: 00:10:56, ComEngine Memory: 90.6 M (01:45:15 PM  Oct 05, 2023)
      [error] Project:Manual_half_pipe, Design:Maxwell3DDesign1 (EddyCurrent), Simulation completed with execution error on server: compute-2-20. (01:45:16 PM  Oct 05, 2023)
      Stopping Batch Run: 1:45:23 PM  Oct 05, 2023

    • jsarlo
      Subscriber

      This was from the 2023 version test.

Viewing 5 reply threads
  • The topic ‘ansysedt in batch using MPI’ is closed to new replies.