How is parallel processing done in Ansys Fluent?
Parallel processing in Ansys Fluent involves an interaction between Ansys Fluent, a host process, and a set of compute-node processes. Ansys Fluent interacts with the host process and the collection of compute nodes using a utility called cortex that manages Ansys Fluent’s user interface and basic graphical functions.
Parallel Ansys Fluent splits up the mesh and data into multiple partitions, then assigns each mesh partition to a different compute process (or node). The number of partitions is equal to or less than the number of processors (or cores) available on your compute cluster. The compute-node processes can be executed on a massively-parallel computer, a multiple-CPU workstation, or a network cluster of computers.
Generally, as the number of compute nodes increases, turnaround time for solutions will decrease. This is referred to as solver “scalability.” However, beyond a certain point, the ratio of network communication to computation increases, leading to reduced parallel efficiency, so optimal system sizing is important for simulations.
Ansys Fluent uses a host process that does not store any mesh or solution data. Instead, the host process only interprets commands from Ansys Fluent’s graphics-related interface, cortex.
The host distributes those commands to the other compute nodes via a socket interconnect to a single designated compute node called compute-node-0. This specialized compute node distributes the host commands to the other compute nodes. Each compute node simultaneously executes the same program on its own data set. Communication from the compute nodes to the host is possible only through compute-node-0 and only when all compute nodes have synchronized with each other.
Each compute node is virtually connected to every other compute node, and relies on inter-process communication to perform such functions as sending and receiving arrays, synchronizing, and performing global operations (such as summations over all cells). Inter-process communication is managed by a message-passing library. For example, the message-passing library could be a vendor implementation of the Message Passing Interface (MPI) standard.
All of the parallel Ansys Fluent processes (as well as the serial process) are identified by a unique integer ID. The host collects messages from compute-node-0 and performs operations (such as printing, displaying messages, and writing to a file) on all of the data, in the same way as the serial solver. You have the option of bypassing the host when inputting or outputting parallel data files, so that the data is passed directly between the compute nodes and the disk in a parallel fashion. This can reduce the time for data file I/O operations (for details, see Reading and Writing Parallel Data Files).