Ansys Products

Ansys Products

Discuss installation & licensing of our Ansys Teaching and Research products.

Ansys Mechanical APDL GPU acceleration on A100 help

    • Matt Belding
      Subscriber

      I'm trying to accelerate my finite element model that generates time-series responses using an HPC cluster of A100 gpus (4x per node). I've thrown the -acc nvidia and -na 4 flags in my executable for my slurm script and see the PCS output file register that GPU acceleration has been enabled but I've noticed a decrease in performance from our baseline workstation that contains only a few intel i7 cores (12 hours vs 14 hours). The solver being used is PCG while the gpus each have 40g of video mem. Besides the known flags to enable gpu acceleration, are there any additional steps that need to be taken to properly take advantage of acceleration in Ansys Mechanical?     

    • mrife
      Ansys Employee

      Hi Matt

      Answers to questions on solution performance always require knowledge of the FEM starting with the size; how many total degrees of freedom (number of equations) are being solved as shown in the pcs file?  Also which version of MAPDL is being used?   

    • Matt Belding
      Subscriber

      It has 341,307 DOF per test case. It contains 113,769 nodes and 96,088 elements (of which 94,720 are 8-node solid elements with 3 DOF at each node, and the rest are 2-node spring-damper elements). The model is also analyzed for 3050 sub-steps in time domain. We're using 2023R1.

    • mrife
      Ansys Employee

      Hi Matt

      Ok that model is very, very small with respect to GPU acceleration (of a solve).  Without knowing anything about the cluster, other than the compute nodes have some model of Intel i7 cpus, I'd suggest to run a test to compare using one GPU.  First change the model so that it is not solving for all 3050 sub-steps.  We only need to solve for a few in order to compare compute performance.  So change the loading set up to solve for maybe 10 sub-steps.  Or maybe just the first 2-3 load steps.  Next I usually start with 50,000 degrees of freedom per CPU core as a baseline test.  If the CPU was a leading edge model then I'd take that down to around 30,000.  But with 50k dof per core I'd try solving on 8 CPU cores to start (I also prefer even numbers!).  When done make a copy of the output and pcs files, then solve again on 8 CPU cores plus 1 of the GPUs.  Make a copy of the resulting output and pcs files.  Lastly try using 4 CPU cores and 1 GPU.  Save the files then report back the total CPU time and the total elapsed time for each solution.  

      Mike

    • Matt Belding
      Subscriber

       

      Hi Mike, 

      We reduced the model to the following: 

      28707 nodes and 24,022 elements (of which 23680 are 8-node solid elements with 3 DOF at each node, and the rest are 2-node spring-damper elements). There are also almost 86,000 DOFs and the model is analyzed for 15 sub-steps in time domain. Doing a breakdown of 16/8/4 Core with either No GPU or 1 GPU (A100). The results were the following:

      NUM_CPUNUM_GPUElaspsed Time (s)CPU Time (s)
      4070236.255
      4166222.73
      8094609.604
      8156368.356
      16083761.411
      16163631.41

       

      Let me know if I’m missing anything else. 

      Matt

       

    • mrife
      Ansys Employee

      Hi Matt

      Ok there seems to be a misunderstanding - the original model was too small for the chosen hardware.  This new model is even worse.  The solve process is spending too much time communicating between the domains (groups of elements) compared to time it takes to perform the computations.  Did you try solving the original model with the same set of hardware choices?  Plus maybe a bigger model or the same model with a more refined mesh?

       

      Mike 

Viewing 5 reply threads
  • The topic ‘Ansys Mechanical APDL GPU acceleration on A100 help’ is closed to new replies.