ways to accelerate a cfd-dem simulation

- November 10, 2025 at 9:28 pm
  
  parsaary
  Subscriber
  
  Hi,
  I need some help for speeding up a 2-way coupled fluent-rocky simulation being run on an HPC. Currently using a single node since I think rocky doesn't work on multinode. The current allocation is 4 gpu's for rocky, 20 cpu's for fluent, and the remaining 4 cpu's in the node are left free to be used for data transfer between rocky and fluent. I don't know what the bottle neck is, but I have pasted below an exerpt from the fluent and rocky logs at the same flow time. Thanks for your help!
  Fluent log:
  ...Signal received from Rocky.
  ...Reading Rocky data.
  ...Rocky data read message sent!
  Flow time = 0.0005170320000002374s, time step = 25060
  ********************************************************************
  Elapsed time = 4099.916666666666 s
  ********************************************************************
  
  /solve/dual-time-iterate 1 50
  Updating solution at time level N...
  done.
  
  iter continuity u-water u-particle v-water v-particle w-water w-particle k-water k-particle eps-water eps-partic vf-particl time/iter
  30380 2.4872e-06 3.0265e-07 0.0000e+00 2.1331e-07 0.0000e+00 3.5284e-08 0.0000e+00 2.4211e-05 0.0000e+00 1.6416e-09 0.0000e+00 1.1737e-05 0:10:07 50
  
  ...ReceiveFluentDataReadMessage Received
  ...Exporting Fluent flow data to Rocky!
  ...Exporting Reference Density = 1.23e+00
  ...Fluent flow data written to Rocky!
  ...Fluent data sent to Rocky! 30381 2.5017e-06 3.0370e-07 0.0000e+00 2.1372e-07 0.0000e+00 3.5308e-08 0.0000e+00 2.4211e-05 0.0000e+00 1.6415e-09 0.0000e+00 1.1737e-05 0:09:55 49
  !30381 solution is converged
  
  ...Signal received from Rocky.
  ...Reading Rocky data.
  ...Rocky data read message sent!
  Flow time = 0.0005170492000002374s, time step = 25061
  ********************************************************************
  Elapsed time = 4100.716666666667 s
  
  Rocky Log:
  message
  message date="2025-10-23 17:09:04" up_time="11:19:36.149"
  CFDCoupling: Send CFD data
  Rocky current time = 5.17032e-05 iteration: 24048 last output time: 1.72172e-05
  CFD flow time: 1.72e-08 iteration: 0
  message
  message date="2025-10-23 17:09:04" up_time="11:19:36.150"
  Sending fluent message
  message
  message date="2025-10-23 17:09:08" up_time="11:19:40.675"
  Send Rocky Data Write Message.
  message
  message date="2025-10-23 17:09:08" up_time="11:19:40.729"
  Waiting for fluent message
  message
  message date="2025-10-23 17:09:16" up_time="11:19:48.482"
  Fluent write data message received.
  message
  message date="2025-10-23 17:09:17" up_time="11:19:49.283"
  Sent Fluent Data Read Message.
  message
  message date="2025-10-23 17:09:17" up_time="11:19:49.283"
  CFDCoupling: Received CFD data. Target DT = 1.72e-08
  message
  message date="2025-10-23 17:09:17" up_time="11:19:49.535"
  CFDCoupling: Send CFD data
  Rocky current time = 5.17204e-05 iteration: 24056 last output time: 1.72172e-05
  CFD flow time: 1.72e-08 iteration: 0
- November 17, 2025 at 11:24 am
  
  Jackson Gomes
  Ansys Employee
  
  Dear Parsary,
  If the bottleneck is on the Fluent side, you can allocate more CPU cores and even run it in distributed mode. If the bottleneck is on the Rocky side, you can allocate more GPUs to increase performance. For additional details on Rocky performance and hardware recommendations, please refer to the Rocky GPU Buying Guide | Ansys Knowledge available in the Ansys Knowledge resources.
  Hope this helps!
  If you’d like to explore more learning materials about Rocky software, you can find them here: https://innovationspace.ansys.com/ais-rocky/
  
  Warm Regards,
  Jackson
  - November 18, 2025 at 4:25 am
    
    parsaary
    Subscriber
    
    Hi Jackson,
    
    Thank you very much for your help, I think the bottleneck is down to the fluent portion of the sim. Usually I run a sim on 1 hpc node (each node has 4 gpu's, which i allocate to rocky, and 24 cpu's, 20 of which i allocate to fluent with local parallel, and 4 cpu's I leave unrequested so they can handle data transfer etc)
    
    However, am i correct in understanding that rocky cannot do multinode? in that case, would i only be using the gpu's from node 1 for rocky? e.g. when using 2 of my usual hpc nodes, 40 cpu's (20/node) would go to fluent, 8 cpu's (4/node) would be left unrequested (data transfer etc), but still only 4 gpu's (from one node) could be requested for rocky, leaving the 4 gpu's from node 2 unused...
    
    if my above understanding is correct and only gpu's from 1 node can be used for rocky, would it be a possibility for using only 1 such hpc node (with both gpu's and cpu's), and request the second node to be a simple cpu node? I ask this since my university has many more cpu nodes than cpu+gpu nodes, so the queue times would be much shorter if i could employ this type of parallel distribution. Thank you very much for your help.
    
    Best regards,
    
    Parsa

Viewing 1 reply thread

You must be logged in to reply to this topic.

Fluids

ways to accelerate a cfd-dem simulation

Ansys Assistant

Fluids

ways to accelerate a cfd-dem simulation

Edit Discussion

Ansys Assistant

Welcome to Ansys Assistant!