-
-
September 27, 2023 at 9:32 pmMatt BeldingSubscriber
I'm trying to accelerate my finite element model that generates time-series responses using an HPC cluster of A100 gpus (4x per node). I've thrown the -acc nvidia and -na 4 flags in my executable for my slurm script and see the PCS output file register that GPU acceleration has been enabled but I've noticed a decrease in performance from our baseline workstation that contains only a few intel i7 cores (12 hours vs 14 hours). The solver being used is PCG while the gpus each have 40g of video mem. Besides the known flags to enable gpu acceleration, are there any additional steps that need to be taken to properly take advantage of acceleration in Ansys Mechanical?
-
September 29, 2023 at 2:08 pmmrifeAnsys Employee
Hi Matt
Answers to questions on solution performance always require knowledge of the FEM starting with the size; how many total degrees of freedom (number of equations) are being solved as shown in the pcs file? Also which version of MAPDL is being used?
-
September 29, 2023 at 2:24 pmMatt BeldingSubscriber
It has 341,307 DOF per test case. It contains 113,769 nodes and 96,088 elements (of which 94,720 are 8-node solid elements with 3 DOF at each node, and the rest are 2-node spring-damper elements). The model is also analyzed for 3050 sub-steps in time domain. We're using 2023R1.
-
September 29, 2023 at 2:48 pmmrifeAnsys Employee
Hi Matt
Ok that model is very, very small with respect to GPU acceleration (of a solve). Without knowing anything about the cluster, other than the compute nodes have some model of Intel i7 cpus, I'd suggest to run a test to compare using one GPU. First change the model so that it is not solving for all 3050 sub-steps. We only need to solve for a few in order to compare compute performance. So change the loading set up to solve for maybe 10 sub-steps. Or maybe just the first 2-3 load steps. Next I usually start with 50,000 degrees of freedom per CPU core as a baseline test. If the CPU was a leading edge model then I'd take that down to around 30,000. But with 50k dof per core I'd try solving on 8 CPU cores to start (I also prefer even numbers!). When done make a copy of the output and pcs files, then solve again on 8 CPU cores plus 1 of the GPUs. Make a copy of the resulting output and pcs files. Lastly try using 4 CPU cores and 1 GPU. Save the files then report back the total CPU time and the total elapsed time for each solution.
Mike
-
October 9, 2023 at 4:07 pmMatt BeldingSubscriber
Hi Mike,
We reduced the model to the following:
28707 nodes and 24,022 elements (of which 23680 are 8-node solid elements with 3 DOF at each node, and the rest are 2-node spring-damper elements). There are also almost 86,000 DOFs and the model is analyzed for 15 sub-steps in time domain. Doing a breakdown of 16/8/4 Core with either No GPU or 1 GPU (A100). The results were the following:
NUM_CPU NUM_GPU Elaspsed Time (s) CPU Time (s) 4 0 70 236.255 4 1 66 222.73 8 0 94 609.604 8 1 56 368.356 16 0 83 761.411 16 1 63 631.41 Let me know if I’m missing anything else.
Matt
-
October 9, 2023 at 5:57 pmmrifeAnsys Employee
Hi Matt
Ok there seems to be a misunderstanding - the original model was too small for the chosen hardware. This new model is even worse. The solve process is spending too much time communicating between the domains (groups of elements) compared to time it takes to perform the computations. Did you try solving the original model with the same set of hardware choices? Plus maybe a bigger model or the same model with a more refined mesh?
Mike
-
- The topic ‘Ansys Mechanical APDL GPU acceleration on A100 help’ is closed to new replies.
- Workbench license error
- Unexpected error on Workbench: Root element not found.
- access to path files denied error
- Unexpected issues with SCCM deployment of Ansys Fluids and Structures 2024 R1
- AQWA: Hydrodynamic response error
- Tutorial or Help for 2 way FSI
- Questions and recommendations: Septum Horn Antenna
- Unable to connect to the license
- Moment Reaction probe with Large deformation
- Ansys with Vmware and CPU configuration : I’m lost, good practice?
-
1882
-
802
-
599
-
591
-
366
© 2025 Copyright ANSYS, Inc. All rights reserved.