speed up the job in hpc (mpp)

- December 14, 2024 at 1:32 pm
  
  rajesh.pamarthi2711
  Subscriber
  
  HI,
  
  I am trying to perform simulation of my .k file in HPC cluster. My .k file has a total of 240952 elements. And i have following softwares available in my HPC.
  
  ansyscl
  easybuild
  libmppdyna_d__avx2_ifort190_intelmpi.so
  libmppdyna_s__avx2_ifort190_intelmpi.so
  ls-dyna_mpp_d_R14_1_0_x64_centos79_ifort190_avx2_intelmpi-2018.l2a
  ls-dyna_mpp_d_R14_1_0_x64_centos79_ifort190_avx2_intelmpi-2018_sharelib
  ls-dyna_mpp_d_R14_1_0_x64_centos79_ifort190_avx2_intelmpi-2018_sharelib.tgz
  ls-dyna_mpp_s_R14_1_0_x64_centos79_ifort190_avx2_intelmpi-2018.l2a
  ls-dyna_mpp_s_R14_1_0_x64_centos79_ifort190_avx2_intelmpi-2018_sharelib
  ls-dyna_mpp_s_R14_1_0_x64_centos79_ifort190_avx2_intelmpi-2018_sharelib.tgz
  ls-dyna_smp_d_R14_1_0_x64_centos79_ifort190_sse2
  ls-dyna_smp_d_R14_1_0_x64_centos79_ifort190_sse2.l2a
  ls-dyna_smp_d_R14_1_0_x64_centos79_ifort190_sse2.tgz
  ls-dyna_smp_s_R14_1_0_x64_centos79_ifort190_sse2
  ls-dyna_smp_s_R14_1_0_x64_centos79_ifort190_sse2.l2a
  mpp-dyna
  mpp-dyna-d
  mpp-dyna-s
  smp-dyna
  smp-dyna-d
  smp-dyna-s
  
  I am trying to submit a sbatch job like in the manner in the command line,
  
  #!/bin/bash
  #SBATCH --time=01:00:00 # walltime
  #SBATCH --nodes=1 # use 1 node
  #SBATCH --ntasks=32 # number of processor cores (i.e. tasks)
  #SBATCH --mem-per-cpu=4000M # memory per CPU core
  #SBATCH --output=output.log
  #SBATCH --error=error.log
  
  module load GCC/13.2.0 OpenMPI/4.1.6 intel-compilers/2023.2.1 impi/2021.10.0 Python/3.11.5 SciPy-bundle/2023.11 matplotlib/3.8.2 LS-DYNA/14.1.0
  
  srun mpp-dyna i=/home/pro/main.k memory=120000000
  
  but with the above settings with ntasks =32, i have got a status.out file stating it will take 285hrs to complete. Its too long. Can someone help me how to speed up the completion.
  
  Thank you
- December 18, 2024 at 12:38 am
  
  Reno Genest
  Ansys Employee
  
  Hello Rajesh,
  How many cores did you actually use for your LS-DYNA run? Have a look at the d3hsp or the mes0000 file. If you ran on 32 cores, you should have mes0000 - mes00031 in the working directory. Is it the case?
  Your command line to run LS-DYNA is:
  srun mpp-dyna i=/home/pro/main.k memory=120000000
  
  The first part should be pointing to the MPI (Intel, Platform, Open, etc.) mpirun executable and then you should have the command "-np" to specify the number of processors to use in your LS-DYNA run. I don't see it in your case. Here is an example of a command line to run MPP LS-DYNA with Platform MPI on Linux:
  /data2/rgenest/bin/ibm/platform_mpi/bin/mpirun -np 4 /data2/rgenest/lsdyna/ls-dyna_mpp_d_R13_0_0_x64_centos610_ifort190_sse2_platformmpi/ls-dyna_mpp_d_R13_0_0_x64_centos610_ifort190_sse2_platformmpi i=/data2/rgenest/runs/Test/input.k memory=20m
  
  Have you tried with a command line like above? You will find more LS-DYNA command line examples in the following forum post:
  Run LS-DYNA in HPC
  
  Let me know how it goes.
  
  Reno.
- December 18, 2024 at 12:40 am
  Reno Genest
  Ansys Employee
  Hello,
  After you make sure you run on 32 cores, you can try to bind processes to cores and see if you can speedup the calculation further. I found the following in our knowledge database:
  "
```
By default, an MPI process migrates between cores as the OS manages resources and attempts to
get the best load balance on the system.
But because LS-DYNA is a memory intensive application, such migration can significantly degrade performance
since memory access can take longer if the process is moved to a core farther from the memory it is using.
To avoid this performance degradation, it is important to bind each MPI process to a core.  

Each MPI has its own way of binding the processes to cores, and furthermore, 
threaded MPP (HYBRID) employs a different strategy from pure MPP.


I.  Pure MPP
============

To bind processes to cores, include the following MPI execution line directives according to the type of MPI used.

HP-MPI, Platform MPI, and IBM Platform MPI:

	-cpu_bind or -cpu_bind=rank
        -cpu_bind=MAP_CPU:0,1,2,...   <<<< not recommended unless user really needs to bind MPI processes to specific cores

IBM Platform MPI 9.1.4 and later:

	-affcycle=numa

Intel MPI:

	-genv I_MPI_PIN_DOMAIN=core

Open MPI:

	--bind-to numa
"
```
  Let me know how it goes.
  
  Reno.
- December 19, 2024 at 10:46 am
  
  rajesh.pamarthi2711
  Subscriber
  
  Hi Reno,
  
  When I executed the job only using -
  
  srun mpp-dyna i=/home/pro/main.k memory=120000000, I can see the mes0000 - mes00031 files in my working directory. I think the SLURM manager automatically allots the ntasks to the lsdyna and its working.
  
  And due to the new comaptitibility issue of modules, I have removed my OPEN MPI module. so currently i am only using -
  module load GCC/13.2.0 intel-compilers/2023.2.1 impi/2021.10.0 Python/3.11.5 SciPy-bundle/2023.12 matplotlib/3.8.2 LS-DYNA/14.1.0
  
  with the above modules and with the lsdyna execution command (please look at the mpirun path and lsrun path)
  
  /software/rapids/r24.04/impi/2021.10.0-intel-compilers-2023.2.1/mpi/2021.10.0/bin/mpirun -genv I_MPI_PIN_DOMAIN=core -np 24 ls-dyna_mpp_d_R14_1_0_x64_centos79_ifort190_avx2_intelmpi-2018_sharelib i={new_file_path} memory=200M'
  
  I have run some jobs with options.
  1) np32, memory=20m, binding cores disabled - 185 hrs (status.out file)
  2) np32, memory=20m, binding cores enabled - 168 hrs (status.out file)
  3) np32, memory=200m, binding cores enabled - 168 hrs (status.out file)
  4) np24, memory=200m, binding cores enabled - 190 hrs (status.out file)
  
  I am using IMPI here as you can see in the loaded modules. and i have given the path to the lsrun executable, can you once look at the path. I have tried many different settings (changing nodes, cores, memory in sbatch, memory option in lsdyna command), but i am not able to optimize. The simulation kinda consists of running the electromagnetic solver execution
  
  can you let me know where I am going wrong
- December 19, 2024 at 10:48 am
  
  rajesh.pamarthi2711
  Subscriber
  
  Hi,
  
  And I would like to ask, is setting the memory and memory2 option in the LSDYNA command mandatory?
- December 19, 2024 at 6:16 pm
  
  Reno Genest
  Ansys Employee
  
  Hello Rajesh,
  Is this a pure electromagnetic run? Or do you have structural components as well? What is driving your timestep? Also, what part of the model is taking the most time (clock %) to solve? Have a look at the end of the d3hsp file for the timing information. Here is an example:
  
  Reno.
- December 19, 2024 at 6:19 pm
  
  Reno Genest
  Ansys Employee
  
  Hello Rajesh,
  The memory= and memory2= are not mandatory, but it can lead to issues if your model is large and not enough memory is requested. Note that memory2= is only used in MPP. You will find more information about memory in the LS-DYNA User Manual Vol I Appendix O: LS-DYNA Keyword Manual
  
  Let me know how it goes.
  
  Reno.
- December 26, 2024 at 3:28 pm
  
  rajesh.pamarthi2711
  Subscriber
  
  Hi,
  
  I have tried to run the simulation with 3 nodes and 147 tasks. the termination time is about 6.000e-04 (control_termination)
  #SBATCH –nodes=3 # use 1 node
  #SBATCH –ntasks=147 # explicitly bind tasks to cores
  #SBATCH –mem-per-cpu=4000M # memory per CPU core
  #SBATCH –error=error.log
  
  module load GCC/13.2.0 intel-compilers/2023.2.1 impi/2021.10.0 Python/3.11.5 SciPy-bundle/2023.12 matplotlib/3.8.2 LS-DYNA/14.1.0
  
  python ls.py
  
  with one of the following line in Python code,
  lsdyna_command = f’/software/rapids/r24.04/impi/2021.10.0-intel-compilers-2023.2.1/mpi/2021.10.0/bin/mpirun -genv I_MPI_PIN_DOMAIN=core -np 147 /software/rapids/r24.04/LS-DYNA/14.1.0-intel-2023b/ls-dyna_mpp_d_R14_1_0_x64_centos79_ifort190_avx2_intelmpi-2018_sharelib i={new_file_path} memory=128M'
  
  # Execute the LS-Dyna command
  print(“Running LS-Dyna…”)
  run_command(lsdyna_command)
  
  For the above, the simulation started execution but appears to be terminated before total completion time. In my d3hsp, it gave out
  
  709 t 3.5450E-05 dt 5.00E-08 electromagnetism step
  
  License routines forcing premature code termination.
  Contact with the license server has been lost.
  The server may have died or a network connectivity problem
  may have occurred.
  
  710 t 3.5500E-05 dt 1.00E+06 write d3dump01 file 12/24/24 21:37:58
  710 t 3.5500E-05 dt 1.00E+06 flush i/o buffers 12/24/24 21:37:58
  710 t 3.5500E-05 dt 1.00E+06 write d3plot file 12/24/24 21:37:59
  
  N o r m a l t e r m i n a t i o n 12/24/24 21:37:59
  
  S t o r a g e a l l o c a t i o n
  
  Memory required to complete solution (memory= 5235K memory2= 2508K)
  Minimum 2408K on processor 5
  Maximum 2508K on processor 36
  Average 2424K
  
  Matrix Assembly dynamically allocated memory
  Maximum 56M
  
  Additional dynamically allocated memory
  Minimum 162M on processor 121
  Maximum 217M on processor 146
  Average 170M
  
  Total allocated memory
  Minimum 220M on processor 121
  Maximum 275M on processor 146
  Average 228M
  
  T i m i n g i n f o r m a t i o n
  CPU(seconds) %CPU Clock(seconds) %Clock
  —————————————————————-
  Keyword Processing … 2.8257E+00 0.01 2.8348E+00 0.01
  MPP Decomposition …. 1.6675E+01 0.06 1.7915E+01 0.06
  Init Proc ………. 1.0730E+01 0.04 1.0759E+01 0.04
  Decomposition …… 5.4681E-01 0.00 5.5070E-01 0.00
  Translation …….. 5.3983E+00 0.02 6.6042E+00 0.02
  Initialization ……. 1.3516E+01 0.05 1.3698E+01 0.05
  Init Proc Phase 1 .. 2.6186E+00 0.01 2.6754E+00 0.01
  Init Proc Phase 2 .. 1.4131E+00 0.01 1.4373E+00 0.01
  Element processing … 9.0849E-01 0.00 6.1107E+01 0.22
  Solids …………. 5.8148E-01 0.00 3.8135E+01 0.13
  E Other ………… 8.5733E-02 0.00 6.8885E+00 0.02
  Binary databases ….. 2.9037E+01 0.10 4.4529E+01 0.16
  ASCII database ……. 1.1411E-01 0.00 7.0965E+00 0.03
  Contact algorithm …. 8.2748E+01 0.30 1.1298E+02 0.40
  Interf. ID 1 1.9341E+01 0.07 1.9670E+01 0.07
  Interf. ID 2 5.4628E+01 0.20 7.0596E+01 0.25
  Rigid Bodies ……… 4.2108E+01 0.15 5.3479E+01 0.19
  EM solver ………… 2.7242E+04 98.45 2.7758E+04 97.85
  Misc …………… 6.1301E+02 2.22 6.6360E+02 2.34
  System Solve ……. 2.5164E+04 90.94 2.5492E+04 89.86
  FEM matrices setup . 1.1896E+02 0.43 1.3114E+02 0.46
  BEM matrices setup . 3.5966E+02 1.30 3.8563E+02 1.36
  FEMSTER to DYNA …. 4.3925E+02 1.59 5.2618E+02 1.85
  Compute fields ….. 5.4711E+02 1.98 5.5925E+02 1.97
  Time step size ……. 1.4727E+02 0.53 1.5686E+02 0.55
  Group force file ….. 1.0339E-02 0.00 8.1230E-01 0.00
  Others …………… 1.1413E+01 0.04 1.6252E+01 0.06
  Force Sharing …… 1.1304E+01 0.04 1.3013E+01 0.05
  Misc. 1 ………….. 4.3898E+00 0.02 2.2644E+01 0.08
  Scale Masses ……. 7.7710E-03 0.00 5.5401E-01 0.00
  Force Constraints .. 1.3901E-02 0.00 6.8301E-01 0.00
  Force to Accel ….. 1.0230E-02 0.00 7.7030E-01 0.00
  Constraint Sharing . 3.3435E-02 0.00 1.2809E+00 0.00
  Update RB nodes …. 3.8281E-02 0.00 2.4141E+00 0.01
  Misc. 2 ………….. 1.1875E-01 0.00 9.0182E+00 0.03
  Misc. 3 ………….. 7.7356E+01 0.28 8.4258E+01 0.30
  Misc. 4 ………….. 1.1225E-01 0.00 6.7043E+00 0.02
  Timestep Init …… 1.6812E-02 0.00 4.6854E-01 0.00
  Apply Loads …….. 8.1626E-02 0.00 5.6807E+00 0.02
  —————————————————————-
  T o t a l s 2.7670E+04 100.00 2.8368E+04 100.00
  
  Problem time = 3.5500E-05
  Problem cycle = 710
  Total CPU time = 27670 seconds ( 7 hours 41 minutes 10 seconds)
  CPU time per zone cycle = 161549.932 nanoseconds
  Clock time per zone cycle= 165622.228 nanoseconds
  
  Parallel execution with 147 MPP proc
  NLQ used/max 64/ 64
  
  I think the totoal termination has not executed, it prematurely stopped execution
- December 26, 2024 at 6:57 pm
  
  Reno Genest
  Ansys Employee
  
  Hello Rajesh,
  Could you create a new post for this problem? You get the following message:
  "
  License routines forcing premature code termination.
  Contact with the license server has been lost.
  The server may have died or a network connectivity problem
  may have occurred."
  
  It looks like you lost communication with the license manager. This is a different problem.
  
  In the new post, state the error message and also specify if you are using the LSTC License manager or the Ansys license manager to run LS-DYNA.
  
  Reno.
- December 26, 2024 at 7:48 pm
  
  rajesh.pamarthi2711
  Subscriber
  
  Hi Reno,
  
  Apart from the license issue, do you
  think any issue with the way lscommand specified in the python? is everything else all right? Its taking many hours to run the model which has 241000 elements.
  
  And in HPC, how to know the which type of license manager is running?
- December 26, 2024 at 8:06 pm
  
  Reno Genest
  Ansys Employee
  
  Hello Rajesh,
  The command line seems fine; you ran for more than 7 hours on 147 MPP processors.
  The EM solver part takes most of the calculation clock % time. Your EM timestep before the license problem was 5e-8 second. This is quite small. Did you specify this timestep using *EM_CONTROL_TIMESTEP? Could you increase the EM timestep to reach your termination time faster? Note that if you have structural parts, the solid mechanics timestep should be smaller or equal to the EM timestep.
  
  Reno.
- December 26, 2024 at 8:11 pm
  
  Reno Genest
  Ansys Employee
  
  Hello Rajesh,
  Also, note the following information from our training on the Ansys Learning Hub (ALH):
  So, the memory= command has no effect for the EM part of the model.
  EM: Eddy Current Applications (Materials ONLY) – EM: Eddy Current Applications
  
  Have you tried with the latest R15.0.2 MPP solver?
  LS-DYNA (user=user) Download Page
  Username: user
  Password: computer
  
  Reno.
- December 26, 2024 at 8:25 pm
  
  rajesh.pamarthi2711
  Subscriber
  
  Hi Reno,
  
  Yes, its specified through EM_CONTROL_TIMESTEP, I cannot increase the EM time step, i have to make it run with that value.
  
  Do you think any other issue, like with the setting env variables or license.
  
  I have parts like inductor, workpiece, field shaper and others , 5 in total . All acounting to section_solid of element 2 form.
  
  control termination - end_time= 6.000e-04
  control timestep - TSSFAC =0.667
  EM_CONTROL_TIMESTEP = 5.000e-08
- December 26, 2024 at 8:30 pm
  
  Reno Genest
  Ansys Employee
  
  Hello Rajesh,
  Please create a new post for the license question. We will help you in the you thread.
  Thanks,
  
  Reno.

Viewing 13 reply threads

You must be logged in to reply to this topic.

Ansys Products

speed up the job in hpc (mpp)

Edit Discussion