-
-
August 25, 2021 at 2:55 pm
sleong
SubscriberOur HPC Cluster is using IBM LSF scheduler. A job is run in a docker container with Intel MPI across multiple nodes. The problem is our HPC cluster nodes do not allow SSH access. Is there a way to disable SSH for Ansys Fluent?
August 27, 2021 at 11:05 amANSYS_MMadore
Ansys EmployeePlease set the below two system environment variables
In Bash Shell:
export FLUENT_SSH=blaunch
export SCHEDULER_RSH=1
In C Shell:
setenv FLUENT_SSH blaunch
setenv SCHEDULER_RSH 1
Also, you have to use -scheduler_tight_coupling in your command line.
You could also try this if you aren't using blaunch: SSH_SPAWN=0 -pcheck=0 to the fluent command. You still need to add: -scheduler_tight_coupling in the command line.
September 13, 2021 at 2:09 pmsleong
SubscriberHi mmadore Thank you very much mmadore for the solution, I can submit to our cluster now. But, I got the error "Received signal SIGSEGV.". I can run the test job to finish occasionally with 2 nodes but most of the time I still get the error "Received signal SIGSEGV.". For more than 2 nodes jobs, It always fail with that error. How can I fix the problem?
Building...
mesh
auto partitioning mesh by Metis (fast) distributing mesh
parts.................................................. ==============================================================================
Node 47: Process 24: Received signal SIGSEGV.
==============================================================================
==============================================================================
Node 45: Process 22: Received signal SIGSEGV.
==============================================================================
==============================================================================
Node 44: Process 21: Received signal SIGSEGV.
==============================================================================
==============================================================================
Node 42: Process 19: Received signal SIGSEGV.
==============================================================================
*** Error in `/export/ansys21/v211/fluent/fluent21.1.0/lnamd64/3ddp_node/fluent_mpi.21.1.0': double free or corruption (fasttop): 0x0000000007c26520 ***
*** Error in `/export/ansys21/v211/fluent/fluent21.1.0/lnamd64/3ddp_node/fluent_mpi.21.1.0': double free or corruption (fasttop): 0x000000000692a870 ***
*** Error in `/export/ansys21/v211/fluent/fluent21.1.0/lnamd64/3ddp_node/fluent_mpi.21.1.0': double free or corruption (fasttop): 0x000000000566beb0 ***
*** Error in `/export/ansys21/v211/fluent/fluent21.1.0/lnamd64/3ddp_node/fluent_mpi.21.1.0': double free or corruption (fasttop): 0x00000000057102b0 ***
*** Error in `/export/ansys21/v211/fluent/fluent21.1.0/lnamd64/3ddp_node/fluent_mpi.21.1.0': double free or corruption (fasttop): 0x00000000058c4740 ***
======= Backtrace: =========
======= Backtrace: =========
======= Backtrace: =========
/lib64/libc.so.6(+0x81299)[0x7fc52cba4299]
======= Backtrace: =========
/lib64/libc.so.6(+0x81299)[0x7f95ced34299]
/lib64/libc.so.6(+0x81299)[0x7f2134f55299]
/export/ansys21/v211/fluent/fluent21.1.0/lnamd64/syslib/libstdc++.so.6(_ZNSsD1Ev+0x3e)[0x7f2139c6bede]
/lib64/libc.so.6(+0x81299)[0x7f3d58ee1299]
/export/ansys21/v211/fluent/fluent21.1.0/lnamd64/syslib/libstdc++.so.6(_ZNSsD1Ev+0x3e)[0x7f3d5dbf7ede]
======= Backtrace: =========
/lib64/libc.so.6(+0x81299)[0x7f1a1e11e299]
/export/ansys21/v211/fluent/lib/lnamd64/libansysfluidssettingsparsers.so(_ZN5ansys21GenericSettingsParserD2Ev+0x46)[0x7f1a2f8ebcd6]
/lib64/libc.so.6(__cxa_finalize+0x9a)[0x7f1a1e0d705a]
/lib64/libc.so.6(__cxa_finalize+0x9a)[0x7f2134f0e05a]
/export/ansys21/v211/fluent/lib/lnamd64/libansysfluidsproject.so(+0x37e43)[0x7f2146207e43]
======= Memory map: ========
/lib64/libc.so.6(__cxa_finalize+0x9a)[0x7f3d58e9a05a]
/export/ansys21/v211/fluent/lib/lnamd64/libansysfluidsproject.so(+0x37e43)[0x7f3d6a193e43]
======= Memory map: ========
/export/ansys21/v211/fluent/fluent21.1.0/lnamd64/syslib/libstdc++.so.6(_ZNSsD1Ev+0x3e)[0x7fc5318baede]
/lib64/libc.so.6(+0x39ce9)[0x7fc52cb5cce9]
/lib64/libc.so.6(+0x39d37)[0x7fc52cb5cd37]
/export/ansys21/v211/fluent/fluent21.1.0/multiport/lnamd64/mpi/shared/libmport.so(+0x87012)[0x7fc53c5b2012]
/export/ansys21/v211/fluent/fluent21.1.0/lnamd64/syslib/libstdc++.so.6(_ZNSsD1Ev+0x3e)[0x7f95d3a4aede]
/lib64/libc.so.6(__cxa_finalize+0x9a)[0x7f95ceced05a]
/export/ansys21/v211/fluent/fluent21.1.0/cortex/lnamd64/libExpr.so(+0xac0e3)[0x7f95df0d60e3]
======= Memory map: ========
/export/ansys21/v211/fluent/lib/lnamd64/libansysfluidsfactory.so(+0x5c63)[0x7f1a2fd94c63]
======= Memory map: ========
/export/ansys21/v211/fluent/fluent21.1.0/multiport/lnamd64/mpi/shared/libmport.so(+0x8712d)[0x7fc53c5b212d]
/export/ansys21/v211/fluent/fluent21.1.0/multiport/lnamd64/mpi/shared/libmport.so(+0x8f1a3)[0x7fc53c5ba1a3]
/lib64/libpthread.so.0(+0x7ea5)[0x7fc539664ea5]
/lib64/libc.so.6(clone+0x6d)[0x7fc52cc2196d]
======= Memory map: ========
*** Error in `/export/ansys21/v211/fluent/fluent21.1.0/lnamd64/3ddp_node/fluent_mpi.21.1.0': double free or corruption (fasttop): 0x0000000007d47910 ***
===============Message from the Cortex Process================================
Fatal error in one of the compute processes.
==============================================================================
======= Backtrace: =========
/lib64/libc.so.6(+0x81299)[0x7f24a88bf299]
/export/ansys21/v211/fluent/fluent21.1.0/lnamd64/syslib/libstdc++.so.6(_ZNSsD1Ev+0x3e)[0x7f24ad5d5ede]
/lib64/libc.so.6(__cxa_finalize+0x9a)[0x7f24a887805a]
/export/ansys21/v211/fluent/lib/lnamd64/libansysfluidsproject.so(+0x37e43)[0x7f24b9b71e43]
======= Memory map: ========
September 13, 2021 at 2:44 pmANSYS_MMadore
Ansys EmployeeCan you share the full text of the .trn file for review?
September 13, 2021 at 4:37 pmsleong
SubscriberAttached please find the trn output.
September 13, 2021 at 5:12 pmANSYS_MMadore
Ansys Employeen32-63 compute1-exec-6.ris. 32/72 Linux-64 9-40 Intel(R) Xeon(R) Gold 6154
n0-31 compute1-exec-98.ris 32/32 Linux-64 9-40 Intel(R) Xeon(R) Gold 6242
host compute1-exec-98.ris Linux-64 539 Intel(R) Xeon(R) Gold 6242
exec-6 is a different architecture than the host and n0 with hyperthreading enabled on exec-6. Perhaps, have host on exec-98 and the compute processes only solving on exec-6?
Looks like you have HT enabled, perhaps try disabling.
Can you try:
-mpi=intel2019
September 13, 2021 at 5:41 pmsleong
SubscriberHi Mmadore Thank you very much! I tried 2, 4, 8 nodes with 64, 16, 144 processes with "-mpi=intel2019", all of them successfully completed without any problem. Thank you very much for your help!
Viewing 6 reply threads- The topic ‘Running Ansys Fluent in a HPC Cluster with LSF scheduler, Intel MPI, and Docker without SSH’ is closed to new replies.
Ansys Innovation SpaceTrending discussionsTop Contributors-
3492
-
1057
-
1051
-
965
-
942
Top Rated Tags© 2025 Copyright ANSYS, Inc. All rights reserved.
Ansys does not support the usage of unauthorized Ansys software. Please visit www.ansys.com to obtain an official distribution.
-

Ansys Assistant

Welcome to Ansys Assistant!
An AI-based virtual assistant for active Ansys Academic Customers. Please login using your university issued email address.

Hey there, you are quite inquisitive! You have hit your hourly question limit. Please retry after '10' minutes. For questions, please reach out to ansyslearn@ansys.com.
RETRY