Fluids

Fluids

Topics related to Fluent, CFX, Turbogrid and more.

CFX solver crashing: multi-step linear communication method

    • grayg34
      Subscriber
      When running jobs with a large number of cores I have encountered issues with them crashing due to fatal overflow in the linear solver. if i run the same job (same .def file and initial values) on fewer cores it runs fine. The only difference when using more cores is the linear solution communication method : n +--------------------------------------------------------------------+n | Linear Solution Communication Method |n +--------------------------------------------------------------------+n For improved performance, a multi-step communication method has been automatically enabled for the linear solution. The threshold at whichn this method is enabled is currently 64 partitions.n Number of collection masters = 8n Minimum number of collected partitions = 12n Maximum number of collected partitions = 12nnI am asking if anyone has encountered similar problems? or if there is a solution besides using fewer cores?n
    • DrAmine
      Ansys Employee
      You can increase that number to something like 128 by adjusting the expert parameter:nnmg minpart parallel collection = 128nnnTrying this should reveal whether or not the linear solution communication method is the causen
    • grayg34
      Subscriber
      I turned off the multi-step communication method as you suggested. I rerun the case on the larger number of cores (2 compute nodes, 48 cores each). The run did not crash. I am very sure the linear solution communication method is the cause of the problem. n
    • DrAmine
      Ansys Employee
      Nice to read that.n
    • grayg34
      Subscriber
      I may have been mistaken about the root cause of the problem. The same 2 node run with the multi step communication disabled ran past the point where it crashed before, but it has now crashed with the same fatal overflow in the linear solver at a later time. n
    • DrAmine
      Ansys Employee
      Does it run with less core count?n
    • grayg34
      Subscriber
      It runs on a single node which is less cores in total. I have not tried running on 2 nodes with less cores. nThe cluster I am using has 48 core and 32 core nodes. I have been using 48 core nodes exclusively. Would it be a useful troubleshooting step to try a run with 2 node 32 core per node? nI have access to a different cluster that has 32 core nodes and my 2 node jobs there haven't encountered the same problems. The cases I am running on both clusters are very similar. n
    • grayg34
      Subscriber
      One other thing that just came to mind but I am not sure if it is relevant. I have increased the ILURES memory factor. I have the same value set regardless of the number of nodes or cores. n
Viewing 7 reply threads
  • The topic ‘CFX solver crashing: multi-step linear communication method’ is closed to new replies.