Fluent GPU Solver Hardware Buying Guide

May 7, 2025 at 4:27 pm

FAQ

Participant

INTRODUCTION

GPU Hardware is very different than CPU hardware. Understanding these differences will help make correct buying decisions.

Preface

FAQs

What are the requirements to run Fluent on GPUs?
How do I choose GPU cards that work for me?
Which GPU cards are recommended for use with the Fluent GPU solver?
Won’t the (non-recommended) card I already have work just as well as a recommended one?
Assuming I use a recommended GPU card, how much faster can I expect my simulations to run?
I have only a mid-range budget. Can you recommend a card for me?
If you had to recommend one, all-around best card for most situations, which would it be?
What if I want to use Ansys cloud solutions instead of buying my own GPU hardware?
Can you recommend a card for specific models?
Benchmark before buying
Are there any other resources I can learn from?

Preface

What Are CPUs?

Central Processing Units (CPU) have relatively few flexible cores that can handle complex instruction sets. Each CPU can handle serial computations, file input/output, networking, and communicating with peripherals like USB ports, keyboards and mice.

and can often calculate single precision or double precision variables for memory-bound applications like CFD with similar speed. (E.g., for compute-bound applications such as Molecular Dynamics applications, the CPU will be half as fast using double precision!)

What Are GPUs?

Graphics Processing Unit (GPU) hardware, on the other hand, is tuned to specific applications. GPUs use different groupings of specialized cores and are dramatically parallel in nature. The groupings of cores are called Streaming Multiprocessors (SM) by NVIDIA or Compute Units (CU) by AMD. GPU cores are only able to process simple instructions and are clocked slower (so are more energy efficient) compared to CPU cores. Dramatic GPU simulation speedup occurs because there cores in a GPU compared to a CPU.

GPU cores fall into several specialized categories:

FP32 – single precision cores
FP64 – double precision cores
INT32 – long integer cores
Tensor – accelerate matrix operations
RT – “Ray Tracing” calculate rays from an object to a camera

GPUs do not necessarily contain all core types! Different GPU models will have varying amounts of these cores, or none at all. This is why it is so important to understand your application and how it fits with the capabilities of a particular GPU.

Using GPUs with the Fluent GPU Solver

The Fluent GPU solver run in single precision (3d) will take advantage of all the FP32 cores but does not use any FP64 cores if available. Tensor and integer cores are used as needed and are grouped nearby in the SM or CU.
Fluent does not currently use the RT (ray tracing cores) for a CFD solution. In the future, it is possible for certain radiation models to benefit. Currently, Ansys Optics codes that do raytracing optical analysis do use the RT cores and are faster by orders of magnitude.

The Good News and Bad News

Most of the less expensive GPU cards have no FP64 cores at all. This includes everything up to and including the Nvidia RTX6000ada which is the workstation packaged version of the Nvidia L40 server GPU. The good news is that Fluent can be run in double precision (fluent 3ddp) on these cards even though there are no FP64 cores. CUDA libraries emulate FP64 by using two FP32 CUDA cores. The bad news is that on such hardware, double precision means half the solve speed.

Perhaps it is time to think carefully about double precision? Traditionally the double precision solver was reserved for only cases that truly benefit. These were simulations driven by weak gradients such as natural convection or complex physics like multiphase flows. In recent years, cheap, plentiful CPU RAM with a small speed penalty has made double precision solvers an easy default approach.

Today’s progress requires a closer look at the choice of “3d” or “3ddp”. The Fluent GPU solver tends to be more robust and converges better in single precision compared to the Fluent CPU solver. Coming soon, the latest builds of Fluent have hybrid precision capability which gives double precision like convergence and accuracy using more FP32 calculations. The tradeoff of speed vs cost is much larger than it has been for some time.

Inexpensive GPU cards offer large cost efficiency if single precision solutions or raytracing are of primary interest.

Figure 1. Nvidia H100 GPU internals

Figure 2. Nvidia H100 SM layout
All the small green squares are SMs. Slightly different number of SMs depending on the variant. H100 PCIe (114 SMs), H100 SXM5, (132 SMs), GH100 (144 SMs). The Grace Hopper architecture co-locates ARM CPUs with the GPUs on the card.

Figure 3. Nvidia H100 SM internals showing different types of cores. Note the FP64 cores for native double precision calculations

When Double Precision is Required

Higher end GPUs like the Nvidia H100 offer dedicated FP64 cores so double precision solves are fast. Lower end GPU cards provide double precision by using two FP32 (single precision) cores to create virtualize a FP64. If native FP64 cores exist, no virtualization happens so the H100 cards are faster even for single precision calculations due to faster FP32 cores and more memory bandwidth.

[Reference comparison: https://vast.ai/article/nvidia-h100-vs-l40s-power-meets-versatility]

Most interesting, the H100 GPUs have no RT cores. Ansys optics simulations, which benefit from RT cores are still fast on H100 GPUs but cost efficiency is much better on L40 hardware which has 142 RT cores.

Now that we have a better understanding of the unique aspects of the GPU cards, let’s move on to discuss more fundamental aspects including particular models, RAM requirements and licensing.

FAQs

1. What are the requirements to run Fluent on GPUs?

General requirements:

Fluent benefits from GPUs because of their dedicated architecture for matrix operations.
You can use more than one GPU on the same or on multiple computers if your model does not fit in the memory of a single GPU.
The sum of the memory of all GPU cards must be able to hold the model and the computation overhead.
Certain input/output operations still require the CPU and the main memory. Each system should have at least the same amount of system memory as the sum of the GPU memory in this system. For example, a system with two GPU cards with 40 GB each should have at least 80 GB of system memory. It is possible that more system memory than GPU memory is needed, especially for polyhedral meshes. This applies to all computers involved in the calculation.
You need an Ansys CFD Enterprise license with enough HPC Packs, HPC tasks or an Ansys CFD HPC Ultimate license. See also Fluent GPU Solver FAQ.

Requirements specific to Nvidia cards:

The graphics card and its driver must be compatible with CUDA 11.8 or newer for Fluent 2025 R1 and CUDA 12.8 for Fluent 2025 R2. Maxwell, Pascal, Volta, Turing, Ampere, Ada Lovelace, Hopper, and Blackwell architectures should be compatible with CUDA 12.8. Kepler GPUs (introduced in 2013) are only supported up to Fluent 2025 R1.
CUDA 12.8 must be installed together with the driver for 2025 R2.

Requirements specific to AMD cards:

The graphics card and its driver must be compatible with ROCm-6.0 or newer. This version was released in 2019. All RDNA and CDNA architectures are compatible.
AMD cards can only be used under Linux with Fluent 2025 R1 or later.

2. How do I choose GPU cards that work for me?

Consider budget and needed memory, first. The best approach is to benchmark yourself using cloud services. Ansys published benchmark results (LINK if possible), but we cannot consider every possible use case and every possible combination of models.

The exact memory needs depend on the type of cells, the number of cells and boundary facets, the number of equations, the type of the solver, and the specifics of the used models. Still, it is possible to provide a rough rule of thumb as a lower limit:

The Fluent Cortex process needs about 55 MB of GPU memory and 725 MB of system memory. This is only needed once, independent of the number of GPUs used. Post-processing is done with this process. It can take a lot more system and GPU memory depending on what is shown.
Each Fluent compute process requires about 95 MB of GPU memory and 210 MB of system memory, regardless of single or double precision mode. Data passes through the system memory of these processes during I/O operations (file access). The system memory needed for this process during these operations can be larger than the GPU memory needed for computation.
Approximate GPU memory for the calculation of 1 million fluid cells with a two-equation turbulence model and active energy equation:

Mesh type	Single precision, segregated	Single precision, coupled	Double precision, segregated	Double precision, coupled
Tetrahedral	1.0 GB	1.8 GB	1.6 GB	3.0 GB
Hexahedral	1.2 GB	2.2 GB	1.9 GB	3.6 GB
Polyhedral	1.8 GB	3.4 GB	2.8 GB	5.6 GB

Now that you have a rough understanding of the minimum memory requirements, you can select candidates. For your convenience you can find a few characteristics for common cards below. The theoretical computation speed in single and double precision refers to the specialized cores. Even though most cards do not have specialized double precision cores, they can still calculate in double precision at roughly a quarter to half the speed of single precision. The memory bandwidth is important to transport the data to the SMs/CUs.

	Card name	Memory size (GB)	Memory bandwidth (GB/s)	SMs / CUs	Single precision (TFLOPS)	Double precision (TFLOPS)	Released
Workstation cards	RTX 6000 Ada	48	960	142	91.1	–	2023
	RTX 5000 Ada	32	576	100	65.3	–	2023
	RTX 4500 Ada	24	432	60	39.6	–	2023
	RTX 4000 Ada	20	360	48	26.7	–	2023
	RTX 2000 Ada	16	224	22	12	–	2024
	RTX A6000	48	768	84	38.7	–	2022
	RTX A5500	24	768	80	34.1	–	2022
	RTX A5000	24	768	64	27.8	–	2022
	RTX A4500	20	640	56	23.7	–	2023
	RTX A4000	16	448	48	19.2	–	2022
	RTX A2000	6 or 12	288	26	8	–	2022
	RTX 5000 A Mobile	16	576	76	42.6	–	2023
	RTX 4000 A Mobile	12	432	58	42.6	–	2023
	RTX 2000 A Mobile	8	256	24	14.5	–	2023
	Radeon Pro W7900	48	864	96	61.3	–	2023
	Radeon Pro W7800 48GB	48	864	70	45.2	–	2023
	Radeon Pro W7800	32	576	70	45.2	–	2023
	Radeon Pro W7700	16	576	48	28.3	–	2023
	Radeon Pro W7600	8	288	32	21.4	–	2023
	Radeon Pro W6800	32	512	60	17.83	1.11	2021
	Radeon Pro W6600	8	224	28	10.4	–	2021

Server cards	H200 SXM	141	4800		67	37	2024
	H200 NVL (PCIe)	141	4800		60	30	2024
	H100 SXM	80	3350	132	67	34	2024
	H100 NVL (PCIe)	94	3900	114	60	30	2024
	A100 SXM	80	2039	108	19.5	9.7	2022
	A100 PCIe	80	1935	108	19.5	9.7	2022
	L40	48	864	142	90.5	–	2022
	L40S	48	864	142	91.6	–	2024
	A30	24	933	72	10.3	5.2	2022
	Instinct MI325X	256	6000	304	163.4	163.4	2024
	Instinct MI300X	192	5300	304	163.4	163.4	2023
	Instinct MI250X	128	3200	220	95.7	95.7	2021
	Instinct MI250	128	3200	208	45.3	45.3	2021
	Instinct MI210	64	1600	104	22.6	22.6	2022

3. Which GPU cards are recommended for use with the Fluent GPU solver?

A list of tested hardware for the different Ansys products is available here: Native GPU Accelerator Capabilities

Fluent is tested and verified with all the following Nvidia and AMD GPU cards:

Workstation: RTX A4000, RTX A5000, RTX A6000, RTX A6000 Ada, Quadro RTX 6000
PROS: Typically, these cards are affordable and also available to buy. They can be used for many other applications, including high-end visualization.
CONS: Compared with high-end server cards they are slow in double precision. They do not offer lots of memory.

Server: A100, H100, Instinct MI210
PROS: Offer the maximum calculation speed and memory for a single card that is currently available.
CONS: The cards are expensive and can be difficult to get.

Server: A40, L40
PROS: Performance and price is slightly above the high-end workstation cards but much lower than the high-end server cards. For single-precision calculations they are an excellent choice.
CONS: Compared with the high-end server cards they are slower in double precision.

*These cards have all been internally tested by the Ansys team. However, the Fluent GPU Solver supports many more GPU cards than those mentioned above. We recommend benchmarking your GPU cards to find the best one for your application.

4. Won’t the (non-recommended) card I already have work just as well as a recommended one?

If your existing hardware is compatible with CUDA 12.8 or ROCm-6.0, Fluent should run even if the card is not recommended.

If non-recommended means gaming card, you should be aware that Ansys does not test them. Fluent will most likely also run on it while it fulfills the minimum requirements. Like the Nvidia L40, L40S, and workstation cards, gaming which reduces the speed for such calculations compared to server cards even when the GPU generation and number of streaming multiprocessors (SMs) is identical.

5. Assuming I use a recommended GPU card, how much faster can I expect my simulations to run?

You can get an impression of the possible speed-up from the difference in theoretical single-precision or double-precision computation speed and the difference in memory bandwidth. It depends on the specific calculation which of the categories has more impact.

Any graphics card that fulfills the minimum requirements is faster than the fastest workstation CPU of the same generation. Obviously, you still need a CPU, but it can be a cheaper one when combined with a powerful graphics card.

Consider the size of your models and select a card or multiple cards that have enough memory to run your simulations. In most cases you benefit from a higher memory bandwidth.

The Nvidia L40 and L40S come with a slightly higher price tag than the high-end workstation graphics cards. Speed and memory are also slightly higher. Both high-end workstation cards and the visualization server cards are good choices for a limited budget.

If the computer is used for computation and visualization, a card like the L40 or L40S is an interesting choice because compared to A100, H100, and H200 it is affordable and has hardware for visualization. The A100, H100, and H200 are significantly faster for computations in double precision but lack visualization capabilities.

If the computer is only used for computation, A100, H100, and H200 are good choices from Nvidia. These are different generations of the same class of cards. Support for Nvidia hardware is spread widely across many different software packages.

The AMD Instinct cards have a very compelling offer in terms of computation speed and memory. If you also plan to use other GPU-based software products besides Fluent, check if they support AMD hardware, first.

8. What if I want to use Ansys cloud solutions instead of buying my own GPU hardware?

“Ansys Access on Microsoft Azure” and “Ansys Gateway powered by AWS” offer single instances of one or multiple GPUs to run Fluent jobs on different configurations. Contact us to discuss if one of these offers is suitable for your needs.

There are many possible model combinations that we cannot recommend a specific card without detailed context.
The most important question to consider is the requirement of double precision. If this is needed, the high-end server products are more appealing despite their price tag. When judging the need for double precision, consider benchmarking your model with the GPU solver in single and double precision for accuracy and speed. Due to the different architecture of the solver, you might be able to run in single precision even when the CPU solver requires double precision.

The second question is about the required memory. Every additional model adds to the memory consumption. Again, benchmarking can help you find the optimal amount of memory that is needed for your applications. Remember that it is not necessary that the model fits into a single card. You can distribute it across multiple GPU cards in one or multiple computers.

10. Benchmark before buying

Avoid surprises. GPU cards are bespoke hardware that require a much closer look at benefits and tradeoffs. Hopefully this document has given you a better understanding of what to look for in a GPU. However, nothing beats running your case on a GPU to gain experience. If you have absolutely no GPUs available at least run the Fluent GPU code on your CPU based system (use –gpu=-1). This will allow verification that the models you need are available on the GPU and give an accurate RAM estimate.

11. Are there any other resources I can learn from?

Yes, please reference the below resources to learn more:

General

Fluent GPU Solver Hardware Buying Guide

INTRODUCTION

Table of Contents

Preface

What Are CPUs?

What Are GPUs?

Using GPUs with the Fluent GPU Solver

The Good News and Bad News

When Double Precision is Required

FAQs

1. What are the requirements to run Fluent on GPUs?

2. How do I choose GPU cards that work for me?

3. Which GPU cards are recommended for use with the Fluent GPU solver?

4. Won’t the (non-recommended) card I already have work just as well as a recommended one?

5. Assuming I use a recommended GPU card, how much faster can I expect my simulations to run?

8. What if I want to use Ansys cloud solutions instead of buying my own GPU hardware?

10. Benchmark before buying

11. Are there any other resources I can learn from?

Introducing Ansys Electronics Desktop on Ansys Cloud

How to Create a Reflector for a Center High-Mounted Stop Lamp (CHMSL)

Introducing the GEKO Turbulence Model in Ansys Fluent

Postprocessing on Ansys EnSight

General

Fluent GPU Solver Hardware Buying Guide

INTRODUCTION

Table of Contents

Preface

What Are CPUs?

What Are GPUs?

Using GPUs with the Fluent GPU Solver

The Good News and Bad News

When Double Precision is Required

FAQs

1. What are the requirements to run Fluent on GPUs?

2. How do I choose GPU cards that work for me?

3. Which GPU cards are recommended for use with the Fluent GPU solver?

4. Won’t the (non-recommended) card I already have work just as well as a recommended one?

5. Assuming I use a recommended GPU card, how much faster can I expect my simulations to run?

6. I have only a mid-range budget. Can you recommend a card for me?

7. If you had to recommend one, all-around best card for most situations, which would it be?

8. What if I want to use Ansys cloud solutions instead of buying my own GPU hardware?

9. Can you recommend a card for specific models?

10. Benchmark before buying

11. Are there any other resources I can learn from?

Introducing Ansys Electronics Desktop on Ansys Cloud

How to Create a Reflector for a Center High-Mounted Stop Lamp (CHMSL)

Introducing the GEKO Turbulence Model in Ansys Fluent

Postprocessing on Ansys EnSight

New Post - - General