Fluids

Fluids

Topics related to Fluent, CFX, Turbogrid and more.

ways to accelerate a cfd-dem simulation

    • parsaary
      Subscriber

      Hi,

      I need some help for speeding up a 2-way coupled fluent-rocky simulation being run on an HPC. Currently using a single node since I think rocky doesn't work on multinode. The current allocation is 4 gpu's for rocky, 20 cpu's for fluent, and the remaining 4 cpu's in the node are left free to be used for data transfer between rocky and fluent. I don't know what the bottle neck is, but I have pasted below an exerpt from the fluent and rocky logs at the same flow time. Thanks for your help!

      Fluent log:

      ...Signal received from Rocky.

      ...Reading Rocky data.

      ...Rocky data read message sent!

      Flow time = 0.0005170320000002374s, time step = 25060

      ********************************************************************

      Elapsed time = 4099.916666666666 s

      ********************************************************************

       

      /solve/dual-time-iterate 1 50

      Updating solution at time level N...

      done.

       

      iter  continuity     u-water  u-particle     v-water  v-particle     w-water  w-particle     k-water  k-particle   eps-water  eps-partic  vf-particl     time/iter

      30380  2.4872e-06  3.0265e-07  0.0000e+00  2.1331e-07  0.0000e+00  3.5284e-08  0.0000e+00  2.4211e-05  0.0000e+00  1.6416e-09  0.0000e+00  1.1737e-05  0:10:07   50

       

      ...ReceiveFluentDataReadMessage Received

      ...Exporting Fluent flow data to Rocky!

      ...Exporting Reference Density = 1.23e+00

      ...Fluent flow data written to Rocky!

      ...Fluent data sent to Rocky! 30381  2.5017e-06  3.0370e-07  0.0000e+00  2.1372e-07  0.0000e+00  3.5308e-08  0.0000e+00  2.4211e-05  0.0000e+00  1.6415e-09  0.0000e+00  1.1737e-05  0:09:55   49

      !30381 solution is converged

       

      ...Signal received from Rocky.

      ...Reading Rocky data.

      ...Rocky data read message sent!

      Flow time = 0.0005170492000002374s, time step = 25061

      ********************************************************************

      Elapsed time = 4100.716666666667 s

       

      Rocky Log:

      message
      message date="2025-10-23 17:09:04" up_time="11:19:36.149"

      CFDCoupling: Send CFD data

      Rocky current time = 5.17032e-05 iteration: 24048 last output time: 1.72172e-05

      CFD flow time: 1.72e-08 iteration: 0

      message
      message date="2025-10-23 17:09:04" up_time="11:19:36.150"

      Sending fluent message

      message
      message date="2025-10-23 17:09:08" up_time="11:19:40.675"

      Send Rocky Data Write Message.

      message
      message date="2025-10-23 17:09:08" up_time="11:19:40.729"

      Waiting for fluent message

      message
      message date="2025-10-23 17:09:16" up_time="11:19:48.482"

      Fluent write data message received.

      message
      message date="2025-10-23 17:09:17" up_time="11:19:49.283"

      Sent Fluent Data Read Message.

      message
      message date="2025-10-23 17:09:17" up_time="11:19:49.283"

      CFDCoupling: Received CFD data. Target DT = 1.72e-08

      message
      message date="2025-10-23 17:09:17" up_time="11:19:49.535"

      CFDCoupling: Send CFD data

      Rocky current time = 5.17204e-05 iteration: 24056 last output time: 1.72172e-05

      CFD flow time: 1.72e-08 iteration: 0

       

       

    • Jackson Gomes
      Ansys Employee

       

      Dear Parsary,

      If the bottleneck is on the Fluent side, you can allocate more CPU cores and even run it in distributed mode. If the bottleneck is on the Rocky side, you can allocate more GPUs to increase performance. For additional details on Rocky performance and hardware recommendations, please refer to the Rocky GPU Buying Guide | Ansys Knowledge available in the Ansys Knowledge resources.

      Hope this helps! 

      If you’d like to explore more learning materials about Rocky software, you can find them here: https://innovationspace.ansys.com/ais-rocky/

       

      Warm Regards,

      Jackson

       

       

      • parsaary
        Subscriber

        Hi Jackson,

         

        Thank you very much for your help, I think the bottleneck is down to the fluent portion of the sim. Usually I run a sim on 1 hpc node (each node has 4 gpu's, which i allocate to rocky, and 24 cpu's, 20 of which i allocate to fluent with local parallel, and 4 cpu's I leave unrequested so they can handle data transfer etc)

         

        However, am i correct in understanding that rocky cannot do multinode? in that case, would i only be using the gpu's from node 1 for rocky? e.g. when using 2 of my usual hpc nodes, 40 cpu's (20/node) would go to fluent, 8 cpu's (4/node) would be left unrequested (data transfer etc), but still only 4 gpu's (from one node) could be requested for rocky, leaving the 4 gpu's from node 2 unused...

         

        if my above understanding is correct and only gpu's from 1 node can be used for rocky, would it be a possibility for using only 1 such hpc node (with both gpu's and cpu's), and request the second node to be a simple cpu node? I ask this since my university has many more cpu nodes than cpu+gpu nodes, so the queue times would be much shorter if i could employ this type of parallel distribution. Thank you very much for your help.

         

        Best regards,

         

        Parsa

Viewing 1 reply thread
  • You must be logged in to reply to this topic.