TAGGED: computation-time
-
-
August 26, 2024 at 4:52 pmzzhang868Bbp_participant
Hello,
I’m looking for advice on how to effectively accelerate computation in ANSYS Mechanical.
I've tried using a GPU (RTX 6000), and while "Solution Statistics" indicates that the GPU is enabled, there's no noticeable speed improvement. Additionally, increasing the number of CPU cores or nodes hasn't helped. For example:
- Case 1: I currently use a server with 3 nodes, each with 24 cores. Increasing to 25 nodes doesn’t improve computation time, and reducing the number of cores slows it down further.
- Case 2: Even with GPU enabled, the computation remains slow.
Could you advise on the optimal settings for better performance? What might be causing the inefficiencies in these cases?
Thank you for your kind help.
-
August 26, 2024 at 8:22 pm
-
August 27, 2024 at 12:53 pmAshish KhemkaForum Moderator
Hi Zihan,
Â
For the second query can you please create a separate forum thread?
Â
Regards,
Ashish Khemka
-
August 27, 2024 at 3:11 pmzzhang868Bbp_participant
Has moved! Could you please have a look at these questions? Any suggestions would be appreciated!
-
-
September 11, 2024 at 5:08 pmmrifeAnsys Employee
Hi Zihan
For the first question let me give a little backgorund first. In distributed parallel solutions, the FEM domain is split into N domains at the element level if using N cpu cores. Then N instances of the solver process (here MAPDL) are used to solve and each solve process takes one of the smaller domains to solve. Each process has to communicate with the other processes that share nodes (FEM nodes, not compute cluster nodes). If the amount of communication gets to be more than the computation that the cpu core is doing, then too many CPU cores are being used. So for any model there will be a point where using more CPU cores will just slow down the solve.
In the mathematics of solving FEA equations we use degrees of freedom [DOF] instead of number of nodes/elements. Let's say this is a FEM with only solid structural elements with translational DOFs in X, Y and Z. Then the total number of DOFs is 3 * number of nodes.
In order to know how many cpu cores to use we need to know the range of DOFs per cpu core where the cpu core is most efficient. When working with a new CPU that I've not tested before I assume something like 30,000 DOFs per cpu core and figure out how many cpus and compute nodes are needed from there. Ignoring any question on network communication speed between the compute nodes (for now).
So for your model I'd try 170 cpu cores or 7 compute nodes as a test. Then maybe try 6 and 8 compute nodes (fully used) to see how hardware responds to a little more and less dof's per cpu core. Depending on what happens you may then want to run other tests.
For now I'd not use the GPU as a solver accelerator. Since the GPU is helping all of the processes one GPU for 24 cpu cores is a bit much...I'd prefer to see 2 gpus for those 24 cores. Trying to keep to about 1 gpu per 12 or so cpu cores (and hence number of solver processes).
I'm also assuming the use of the sparse (direct) solver. If using the iterative solver then the dof's per cpu core will probably be different for that cluster.
Â
-
September 12, 2024 at 10:27 pmzzhang868Bbp_participant
Thank you so much for your feedback!
-
- You must be logged in to reply to this topic.
-
421
-
192
-
178
-
162
-
141
© 2024 Copyright ANSYS, Inc. All rights reserved.