General

General

What can I do, if I observe parallel performance issues with ANSYS CFX on new hardware?

    • FAQFAQ
      Participant

      We observed this behavior from time to time on newer machines. This was related to the CPU binding/affinity of the solver processes. Some systems allow a process to switch between different processor during runtime. This can slow down the performance. To ensure that the processes are bind to a specific CPU, you have to add the following option on the command line (from ANSYS CFX 18.0 on): cfx5solve -def name.def -part 24 -affinity explicit ….. For the CFX solver, if process affinity is not set explicitly, then the affinity settings applied by MPI are used. In many cases these are sensible, although in some cases the affinity settings applied by MPI are not optimal (for example, processes may not be bound to cores). In a few cases MPI’s affinity settings can be incorrect, causing slow operation. To display the affinity settings for diagnostic purposes, the expert parameter setting ‘affinity diagnostics option = 3’ should be used. If the solver is being run from the command line using the cfx5solve command, an expert parameter can be set by specifying a CCL file on the command line. For example contain a file named expert.ccl containing: FLOW: EXPERT PARAMETERS: affinity diagnostics option = 3 END END and specify on the cfx5solve command along with the usual options: cfx5solve -ccl expert.ccl ……. Alternatively, undocumented expert parameters such as this can be set in CFX-Pre, by creating the file mentioned above and selecting ‘File -> Import -> CCL’ to import. Running the solver with the expert parameter described above will display output similar to the following in the CFX solver output file: Part Processing Elements – Initial (ordered CPU/core/HT) 1 11111111111111111111 2 11111111111111111111 3 11111111111111111111 4 11111111111111111111 5 11111111111111111111 6 11111111111111111111 7 11111111111111111111 8 11111111111111111111 9 11111111111111111111 10 11111111111111111111 11 11111111111111111111 12 11111111111111111111 13 11111111111111111111 14 11111111111111111111 15 11111111111111111111 16 11111111111111111111 17 11111111111111111111 18 11111111111111111111 19 11111111111111111111 20 11111111111111111111 In the example above, any partition can run on any core. The solver will still run normally, but this affinity setting may not give optimal performance. Displaying the affinity can also highlight problems e.g.: Part Processing Elements – Initial (ordered CPU/core/HT) 1 11000000000000000000 2 11000000000000000000 3 11000000000000000000 4 11000000000000000000 5 11000000000000000000 6 11000000000000000000 7 11000000000000000000 8 11000000000000000000 9 11000000000000000000 10 11000000000000000000 11 11000000000000000000 12 11000000000000000000 13 11000000000000000000 14 11000000000000000000 15 11000000000000000000 16 11000000000000000000 17 11000000000000000000 18 11000000000000000000 19 11000000000000000000 20 11000000000000000000 The example above shows affinity set with all 20 partitions restricted to just 2 of the cores. This is a rare situation and indicates a CPU binding problem. The solver will run extremely slowly in this case. Process affinity can set explicitly by using the cfx5solve command line option e.g. cfx5solve -def name.def -part 20 -affinity explicit ….. Typical output from this would be: Part Processing Elements – Revised (ordered CPU/core/HT) 1 10000000000000000000 2 00000000001000000000 3 01000000000000000000 4 00000000000100000000 5 00100000000000000000 6 00000000000010000000 7 00010000000000000000 8 00000000000001000000 9 00001000000000000000 10 00000000000000100000 11 00000100000000000000 12 00000000000000010000 13 00000010000000000000 14 00000000000000001000 15 00000001000000000000 16 00000000000000000100 17 00000000100000000000 18 00000000000000000010 19 00000000010000000000 20 00000000000000000001 The example above shows one partition assigned to each core as a result of the -affinity explicit option. In most cases, this will give optimal performance. A warning will be shown in the solver output file process affinity was requested, but could not be set. Similarly, a warning is also shown if hyperthreading is enabled. It is recommended that hyperthreading is disabled.