Ansys Products

Ansys Products

Discuss installation & licensing of our Ansys Teaching and Research products.

speed up the job in hpc (mpp)

    • rajesh.pamarthi2711
      Subscriber
      HI,
       
      I am trying to perform simulation of my .k file in HPC cluster. My .k file has a total of 240952 elements. And i have following softwares available in my HPC.
       
      ansyscl
      easybuild
      libmppdyna_d__avx2_ifort190_intelmpi.so
      libmppdyna_s__avx2_ifort190_intelmpi.so
      ls-dyna_mpp_d_R14_1_0_x64_centos79_ifort190_avx2_intelmpi-2018.l2a
      ls-dyna_mpp_d_R14_1_0_x64_centos79_ifort190_avx2_intelmpi-2018_sharelib
      ls-dyna_mpp_d_R14_1_0_x64_centos79_ifort190_avx2_intelmpi-2018_sharelib.tgz
      ls-dyna_mpp_s_R14_1_0_x64_centos79_ifort190_avx2_intelmpi-2018.l2a
      ls-dyna_mpp_s_R14_1_0_x64_centos79_ifort190_avx2_intelmpi-2018_sharelib
      ls-dyna_mpp_s_R14_1_0_x64_centos79_ifort190_avx2_intelmpi-2018_sharelib.tgz
      ls-dyna_smp_d_R14_1_0_x64_centos79_ifort190_sse2
      ls-dyna_smp_d_R14_1_0_x64_centos79_ifort190_sse2.l2a
      ls-dyna_smp_d_R14_1_0_x64_centos79_ifort190_sse2.tgz
      ls-dyna_smp_s_R14_1_0_x64_centos79_ifort190_sse2
      ls-dyna_smp_s_R14_1_0_x64_centos79_ifort190_sse2.l2a
      mpp-dyna
      mpp-dyna-d
      mpp-dyna-s
      smp-dyna
      smp-dyna-d
      smp-dyna-s
       
      I am trying to submit a sbatch job like in the manner in the command line,
       
      #!/bin/bash
      #SBATCH --time=01:00:00       # walltime
      #SBATCH --nodes=1             # use 1 node
      #SBATCH --ntasks=32          # number of processor cores (i.e. tasks)
      #SBATCH --mem-per-cpu=4000M   # memory per CPU core
      #SBATCH --output=output.log
      #SBATCH --error=error.log
       
      module load GCC/13.2.0 OpenMPI/4.1.6 intel-compilers/2023.2.1 impi/2021.10.0 Python/3.11.5 SciPy-bundle/2023.11 matplotlib/3.8.2 LS-DYNA/14.1.0
       
      srun mpp-dyna i=/home/pro/main.k memory=120000000
       
       
      but with the above settings with ntasks =32, i have got a status.out file stating it will take 285hrs to complete. Its too long. Can someone help me how to speed up the completion.

      Thank you
       
    • Reno Genest
      Ansys Employee

      Hello Rajesh,

      How many cores did you actually use for your LS-DYNA run? Have a look at the d3hsp or the mes0000 file. If you ran on 32 cores, you should have mes0000 - mes00031 in the working directory. Is it the case?

      Your command line to run LS-DYNA is:

      srun mpp-dyna i=/home/pro/main.k memory=120000000

       

      The first part should be pointing to the MPI (Intel, Platform, Open, etc.) mpirun executable and then you should have the command "-np" to specify the number of processors to use in your LS-DYNA run. I don't see it in your case. Here is an example of a command line to run MPP LS-DYNA with Platform MPI on Linux:

      /data2/rgenest/bin/ibm/platform_mpi/bin/mpirun -np 4 /data2/rgenest/lsdyna/ls-dyna_mpp_d_R13_0_0_x64_centos610_ifort190_sse2_platformmpi/ls-dyna_mpp_d_R13_0_0_x64_centos610_ifort190_sse2_platformmpi i=/data2/rgenest/runs/Test/input.k memory=20m

       

      Have you tried with a command line like above? You will find more LS-DYNA command line examples in the following forum post:

      Run LS-DYNA in HPC

       

      Let me know how it goes.

       

      Reno.

    • Reno Genest
      Ansys Employee

      Hello,

      After you make sure you run on 32 cores, you can try to bind processes to cores and see if you can speedup the calculation further. I found the following in our knowledge database:

      "

      By default, an MPI process migrates between cores as the OS manages resources and attempts to
      get the best load balance on the system.
      But because LS-DYNA is a memory intensive application, such migration can significantly degrade performance
      since memory access can take longer if the process is moved to a core farther from the memory it is using.
      To avoid this performance degradation, it is important to bind each MPI process to a core.  
      
      Each MPI has its own way of binding the processes to cores, and furthermore, 
      threaded MPP (HYBRID) employs a different strategy from pure MPP.
      
      
      I.  Pure MPP
      ============
      
      To bind processes to cores, include the following MPI execution line directives according to the type of MPI used.
      
      HP-MPI, Platform MPI, and IBM Platform MPI:
      
      	-cpu_bind or -cpu_bind=rank
              -cpu_bind=MAP_CPU:0,1,2,...   <<<< not recommended unless user really needs to bind MPI processes to specific cores
      
      IBM Platform MPI 9.1.4 and later:
      
      	-affcycle=numa
      
      Intel MPI:
      
      	-genv I_MPI_PIN_DOMAIN=core
      
      Open MPI:
      
      	--bind-to numa
      "


      Let me know how it goes.

       

      Reno.

    • rajesh.pamarthi2711
      Subscriber

      Hi Reno,

      When I executed the job only using - 

      srun mpp-dyna i=/home/pro/main.k memory=120000000, I can see the mes0000 - mes00031 files in my working directory. I think the SLURM manager automatically allots the ntasks to the lsdyna and its working.

      And due to the new comaptitibility issue of modules, I have removed my OPEN MPI module. so currently i am only using -
      module load GCC/13.2.0 intel-compilers/2023.2.1 impi/2021.10.0 Python/3.11.5 SciPy-bundle/2023.12 matplotlib/3.8.2 LS-DYNA/14.1.0

      with the above modules and with the lsdyna execution command (please look at the mpirun path and lsrun path)


      /software/rapids/r24.04/impi/2021.10.0-intel-compilers-2023.2.1/mpi/2021.10.0/bin/mpirun -genv I_MPI_PIN_DOMAIN=core -np 24 ls-dyna_mpp_d_R14_1_0_x64_centos79_ifort190_avx2_intelmpi-2018_sharelib i={new_file_path} memory=200M'

      I have run some jobs with options. 
      1) np32, memory=20m, binding cores disabled - 185 hrs (status.out file)

      2) np32, memory=20m, binding cores enabled - 168 hrs (status.out file)

      3) np32, memory=200m, binding cores enabled - 168 hrs (status.out file)

      4) np24, memory=200m, binding cores enabled - 190 hrs (status.out file)

      I am using IMPI here as you can see in the loaded modules. and i have given the path to the lsrun executable, can you once look at the path. I have tried many different settings (changing nodes, cores, memory in sbatch, memory option in lsdyna command), but i am not able to optimize. The simulation kinda consists of running the electromagnetic solver execution


      can you let me know where I am going wrong

    • rajesh.pamarthi2711
      Subscriber

      Hi,

      And I would like to ask, is setting the memory and memory2 option in the LSDYNA command mandatory?

    • Reno Genest
      Ansys Employee

      Hello Rajesh,

      Is this a pure electromagnetic run? Or do you have structural components as well? What is driving your timestep? Also, what part of the model is taking the most time (clock %) to solve? Have a look at the end of the d3hsp file for the timing information. Here is an example:

       

      Reno.

    • Reno Genest
      Ansys Employee

      Hello Rajesh,

      The memory=  and memory2= are not mandatory, but it can lead to issues if your model is large and not enough memory is requested. Note that memory2= is only used in MPP. You will find more information about memory in the LS-DYNA User Manual Vol I Appendix O: LS-DYNA Keyword Manual

       

      Let me know how it goes.

       

      Reno. 

    • rajesh.pamarthi2711
      Subscriber

       

      Hi,

      I have tried to run the simulation with 3 nodes and 147 tasks. the termination time is about 6.000e-04 (control_termination)

      #SBATCH –nodes=3             # use 1 node
      #SBATCH –ntasks=147           # explicitly bind tasks to cores
      #SBATCH –mem-per-cpu=4000M   # memory per CPU core
      #SBATCH –error=error.log
       
      module load GCC/13.2.0 intel-compilers/2023.2.1 impi/2021.10.0 Python/3.11.5 SciPy-bundle/2023.12 matplotlib/3.8.2 LS-DYNA/14.1.0
       
      python ls.py

      with one of the following line in Python code, 
       lsdyna_command = f’/software/rapids/r24.04/impi/2021.10.0-intel-compilers-2023.2.1/mpi/2021.10.0/bin/mpirun -genv I_MPI_PIN_DOMAIN=core -np 147 /software/rapids/r24.04/LS-DYNA/14.1.0-intel-2023b/ls-dyna_mpp_d_R14_1_0_x64_centos79_ifort190_avx2_intelmpi-2018_sharelib i={new_file_path} memory=128M'
       
          # Execute the LS-Dyna command
          print(“Running LS-Dyna…”)
          run_command(lsdyna_command)


      For the above, the simulation started execution but appears to be terminated before total completion time. In my d3hsp, it gave out

       709 t 3.5450E-05 dt 5.00E-08 electromagnetism step
       
      License routines forcing premature code termination.
      Contact with the license server has been lost.
      The server may have died or a network connectivity problem
      may have occurred.
       
           710 t 3.5500E-05 dt 1.00E+06 write d3dump01 file          12/24/24 21:37:58
           710 t 3.5500E-05 dt 1.00E+06 flush i/o buffers            12/24/24 21:37:58
           710 t 3.5500E-05 dt 1.00E+06 write d3plot file            12/24/24 21:37:59
       
       N o r m a l    t e r m i n a t i o n                          12/24/24 21:37:59
       
       S t o r a g e   a l l o c a t i o n   
       
       Memory required to complete solution (memory=   5235K memory2=   2508K)
                Minimum   2408K on processor     5
                Maximum   2508K on processor    36
                Average   2424K
       
       Matrix Assembly dynamically allocated memory
                Maximum     56M
       
       Additional dynamically allocated memory
                Minimum    162M on processor   121
                Maximum    217M on processor   146
                Average    170M
       
       Total allocated memory
                Minimum    220M on processor   121
                Maximum    275M on processor   146
                Average    228M
       
       T i m i n g   i n f o r m a t i o n
                              CPU(seconds)   %CPU  Clock(seconds) %Clock
        —————————————————————-
        Keyword Processing … 2.8257E+00    0.01     2.8348E+00    0.01
        MPP Decomposition …. 1.6675E+01    0.06     1.7915E+01    0.06
          Init Proc ………. 1.0730E+01    0.04     1.0759E+01    0.04
          Decomposition …… 5.4681E-01    0.00     5.5070E-01    0.00
          Translation …….. 5.3983E+00    0.02     6.6042E+00    0.02
        Initialization ……. 1.3516E+01    0.05     1.3698E+01    0.05
          Init Proc Phase 1 .. 2.6186E+00    0.01     2.6754E+00    0.01
          Init Proc Phase 2 .. 1.4131E+00    0.01     1.4373E+00    0.01
        Element processing … 9.0849E-01    0.00     6.1107E+01    0.22
          Solids …………. 5.8148E-01    0.00     3.8135E+01    0.13
          E Other ………… 8.5733E-02    0.00     6.8885E+00    0.02
        Binary databases ….. 2.9037E+01    0.10     4.4529E+01    0.16
        ASCII database ……. 1.1411E-01    0.00     7.0965E+00    0.03
        Contact algorithm …. 8.2748E+01    0.30     1.1298E+02    0.40
          Interf. ID         1 1.9341E+01    0.07     1.9670E+01    0.07
          Interf. ID         2 5.4628E+01    0.20     7.0596E+01    0.25
        Rigid Bodies ……… 4.2108E+01    0.15     5.3479E+01    0.19
        EM solver ………… 2.7242E+04   98.45     2.7758E+04   97.85
          Misc …………… 6.1301E+02    2.22     6.6360E+02    2.34
          System Solve ……. 2.5164E+04   90.94     2.5492E+04   89.86
          FEM matrices setup . 1.1896E+02    0.43     1.3114E+02    0.46
          BEM matrices setup . 3.5966E+02    1.30     3.8563E+02    1.36
          FEMSTER to DYNA …. 4.3925E+02    1.59     5.2618E+02    1.85
          Compute fields ….. 5.4711E+02    1.98     5.5925E+02    1.97
        Time step size ……. 1.4727E+02    0.53     1.5686E+02    0.55
        Group force file ….. 1.0339E-02    0.00     8.1230E-01    0.00
        Others …………… 1.1413E+01    0.04     1.6252E+01    0.06
          Force Sharing …… 1.1304E+01    0.04     1.3013E+01    0.05
        Misc. 1 ………….. 4.3898E+00    0.02     2.2644E+01    0.08
          Scale Masses ……. 7.7710E-03    0.00     5.5401E-01    0.00
          Force Constraints .. 1.3901E-02    0.00     6.8301E-01    0.00
          Force to Accel ….. 1.0230E-02    0.00     7.7030E-01    0.00
          Constraint Sharing . 3.3435E-02    0.00     1.2809E+00    0.00
          Update RB nodes …. 3.8281E-02    0.00     2.4141E+00    0.01
        Misc. 2 ………….. 1.1875E-01    0.00     9.0182E+00    0.03
        Misc. 3 ………….. 7.7356E+01    0.28     8.4258E+01    0.30
        Misc. 4 ………….. 1.1225E-01    0.00     6.7043E+00    0.02
          Timestep Init …… 1.6812E-02    0.00     4.6854E-01    0.00
          Apply Loads …….. 8.1626E-02    0.00     5.6807E+00    0.02
        —————————————————————-
        T o t a l s            2.7670E+04  100.00     2.8368E+04  100.00
       
       Problem time       =    3.5500E-05
       Problem cycle      =       710
       Total CPU time     =     27670 seconds (   7 hours 41 minutes 10 seconds)
       CPU time per zone cycle  =     161549.932 nanoseconds
       Clock time per zone cycle=     165622.228 nanoseconds
       
       Parallel execution with    147 MPP proc
       NLQ used/max                64/    64


      I think the totoal termination has not executed, it prematurely stopped execution



       

    • Reno Genest
      Ansys Employee

      Hello Rajesh,

      Could you create a new post for this problem? You get the following message:

      "

      License routines forcing premature code termination.
      Contact with the license server has been lost.
      The server may have died or a network connectivity problem
      may have occurred."
       
      It looks like you lost communication with the license manager. This is a different problem. 
       
      In the new post, state the error message and also specify if you are using the LSTC License manager or the Ansys license manager to run LS-DYNA.
       
      Reno.
    • rajesh.pamarthi2711
      Subscriber

      Hi Reno,

      Apart from the license issue, do you

      think any issue with the way lscommand specified in the python? is everything else all right? Its taking many hours to run the model which has 241000 elements. 

      And in HPC, how to know the which type of license manager is running?

    • Reno Genest
      Ansys Employee

      Hello Rajesh,

      The command line seems fine; you ran for more than 7 hours on 147 MPP processors.

      The EM solver part takes most of the calculation clock % time. Your EM timestep before the license problem was 5e-8 second. This is quite small. Did you specify this timestep using *EM_CONTROL_TIMESTEP? Could you increase the EM timestep to reach your termination time faster? Note that if you have structural parts, the solid mechanics timestep should be smaller or equal to the EM timestep.

       

      Reno.

    • Reno Genest
      Ansys Employee

       

      Hello Rajesh,

      Also, note the following information from our training on the Ansys Learning Hub (ALH):

      So, the memory= command has no effect for the EM part of the model.

      EM: Eddy Current Applications (Materials ONLY) – EM: Eddy Current Applications

       

      Have you tried with the latest R15.0.2 MPP solver?

      LS-DYNA (user=user) Download Page

      Username: user

      Password: computer

       

      Reno.

       

       

    • rajesh.pamarthi2711
      Subscriber

      Hi Reno,

      Yes, its specified through EM_CONTROL_TIMESTEP, I cannot increase the EM time step, i have to make it run with that value. 

      Do you think any other issue, like with the setting env variables or license.

      I have parts like inductor, workpiece, field shaper and others , 5 in total . All acounting to section_solid of element 2 form.

      control termination - end_time= 6.000e-04
      control timestep - TSSFAC =0.667
      EM_CONTROL_TIMESTEP = 5.000e-08 

    • Reno Genest
      Ansys Employee

      Hello Rajesh,

      Please create a new post for the license question. We will help you in the you thread.

      Thanks,

       

      Reno.

Viewing 13 reply threads
  • You must be logged in to reply to this topic.