-
-
August 17, 2020 at 2:44 pm
grayg34
SubscriberWhen running jobs with a large number of cores I have encountered issues with them crashing due to fatal overflow in the linear solver. if i run the same job (same .def file and initial values) on fewer cores it runs fine. The only difference when using more cores is the linear solution communication method : n +--------------------------------------------------------------------+n | Linear Solution Communication Method |n +--------------------------------------------------------------------+n For improved performance, a multi-step communication method has been automatically enabled for the linear solution. The threshold at whichn this method is enabled is currently 64 partitions.n Number of collection masters = 8n Minimum number of collected partitions = 12n Maximum number of collected partitions = 12nnI am asking if anyone has encountered similar problems? or if there is a solution besides using fewer cores?n -
August 17, 2020 at 2:53 pm
DrAmine
Ansys EmployeeYou can increase that number to something like 128 by adjusting the expert parameter:nnmg minpart parallel collection = 128nnnTrying this should reveal whether or not the linear solution communication method is the causen -
August 18, 2020 at 3:16 pm
grayg34
SubscriberI turned off the multi-step communication method as you suggested. I rerun the case on the larger number of cores (2 compute nodes, 48 cores each). The run did not crash. I am very sure the linear solution communication method is the cause of the problem. n -
August 19, 2020 at 6:02 am
DrAmine
Ansys EmployeeNice to read that.n -
August 21, 2020 at 4:03 am
grayg34
SubscriberI may have been mistaken about the root cause of the problem. The same 2 node run with the multi step communication disabled ran past the point where it crashed before, but it has now crashed with the same fatal overflow in the linear solver at a later time. n -
August 21, 2020 at 5:58 am
DrAmine
Ansys EmployeeDoes it run with less core count?n -
August 21, 2020 at 2:26 pm
grayg34
SubscriberIt runs on a single node which is less cores in total. I have not tried running on 2 nodes with less cores. nThe cluster I am using has 48 core and 32 core nodes. I have been using 48 core nodes exclusively. Would it be a useful troubleshooting step to try a run with 2 node 32 core per node? nI have access to a different cluster that has 32 core nodes and my 2 node jobs there haven't encountered the same problems. The cases I am running on both clusters are very similar. n -
August 21, 2020 at 2:26 pm
grayg34
SubscriberOne other thing that just came to mind but I am not sure if it is relevant. I have increased the ILURES memory factor. I have the same value set regardless of the number of nodes or cores. n
-
Viewing 7 reply threads
- The topic ‘CFX solver crashing: multi-step linear communication method’ is closed to new replies.
Ansys Innovation Space
Trending discussions
Top Contributors
-
3597
-
1283
-
1107
-
1068
-
983
Top Rated Tags
© 2025 Copyright ANSYS, Inc. All rights reserved.
Ansys does not support the usage of unauthorized Ansys software. Please visit www.ansys.com to obtain an official distribution.