


{"id":166634,"date":"2022-02-22T13:54:31","date_gmt":"2022-02-22T13:54:31","guid":{"rendered":"\/forum\/forums\/topic\/ansys-fluent-running-in-parallel-issues\/"},"modified":"2022-04-06T14:02:20","modified_gmt":"2022-04-06T14:02:20","slug":"ansys-fluent-running-in-parallel-issues","status":"closed","type":"topic","link":"https:\/\/innovationspace.ansys.com\/forum\/forums\/topic\/ansys-fluent-running-in-parallel-issues\/","title":{"rendered":"Ansys Fluent Running in Parallel issues"},"content":{"rendered":"<div class=\"Item-Body\">\n<div class=\"Message userContent\">\n<p>OS: Centos 7 <\/p>\n<p>We use Slurm as a scheduler and have never had issues with MPI. By default our nodes use openmpi3\/3.1.4. When Requesting nodes I use the command:&nbsp;<\/p>\n<p>salloc -N 3 -n 20 &#8211;mem=80G -C ib<\/p>\n<p>This gives me 3 nodes, 20 cores each, 80G of memory and uses infiniband. I then SSH to the host node (with x11 forwarding) and then run ansys (runwb2 command) and load my Fluent workbench file. When clicking &quot;Setup&quot; I change the processing options to &quot;Parallel Per Machine File&quot; with 3 processes. Then click &quot;Show More Options&quot; -&gt; &quot;Parallel Settings&quot;. Interconnects = infiniband, MPI Types = openmpi, I then supply the file containing the machine names I was allocated with the salloc command.<\/p>\n<p><\/p>\n<\/p>\n<p>This is the console log I recieve:&nbsp;<\/p>\n<p>\/opt\/ohpc\/pub\/apps\/ansys\/2019R3_Fluent\/v195\/fluent\/fluent19.5.0\/bin\/fluent -r19.5.0 3d -pinfiniband -host -alnamd64 -t3 -mpi=openmpi -cnf=\/mnt\/beegfs\/home\/testUser\/testNode.txt -path\/opt\/ohpc\/pub\/apps\/ansys\/2019R3_Fluent\/v195\/fluent -ssh -cx node007.hpc.fau.edu:39136:43808<\/p>\n<p>Starting \/opt\/ohpc\/pub\/apps\/ansys\/2019R3_Fluent\/v195\/fluent\/fluent19.5.0\/lnamd64\/3d_host\/fluent.19.5.0 host -cx node007.hpc.fau.edu:39136:43808 &quot;(list (rpsetvar (QUOTE parallel\/function) &quot;fluent 3d -flux -node -alnamd64 -r19.5.0 -t3 -pinfiniband -mpi=openmpi -cnf=\/mnt\/beegfs\/home\/testUser\/testNode.txt -ssh&quot;) (rpsetvar (QUOTE parallel\/rhost) &quot;&quot;) (rpsetvar (QUOTE parallel\/ruser) &quot;&quot;) (rpsetvar (QUOTE parallel\/nprocs_string) &quot;3&quot;) (rpsetvar (QUOTE parallel\/auto-spawn?) #t) (rpsetvar (QUOTE parallel\/trace-level) 0) (rpsetvar (QUOTE parallel\/remote-shell) 1) (rpsetvar (QUOTE parallel\/path) &quot;\/opt\/ohpc\/pub\/apps\/ansys\/2019R3_Fluent\/v195\/fluent&quot;) (rpsetvar (QUOTE parallel\/hostsfile) &quot;\/mnt\/beegfs\/home\/testUser\/testNode.txt&quot;) )&quot;<\/p>\n<p><\/p>\n<\/p>\n<p>Welcome to ANSYS Fluent 2019 R3<\/p>\n<p><\/p>\n<\/p>\n<p>Copyright 1987-2019 ANSYS, Inc. All Rights Reserved.<\/p>\n<p>Unauthorized use, distribution or duplication is prohibited.<\/p>\n<p>This product is subject to U.S. laws governing export and re-export.<\/p>\n<p>For full Legal Notice, see documentation.<\/p>\n<p><\/p>\n<\/p>\n<p>Build Time: Aug 05 2019 15:40:42 EDT Build Id: 10249<\/p>\n<p><\/p>\n<\/p>\n<p>*********************************************<\/p>\n<p>Info: Your license enables 4-way parallel execution.<\/p>\n<p>For faster simulations, please start the application with the appropriate parallel options.<\/p>\n<p>*********************************************<\/p>\n<p><\/p>\n<\/p>\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;<\/p>\n<p>This is an academic version of ANSYS FLUENT. Usage of this product<\/p>\n<p>license is limited to the terms and conditions specified in your ANSYS<\/p>\n<p>license form, additional terms section.<\/p>\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;<\/p>\n<p>Host spawning Node 0 on machine &quot;node007.hpc.fau.edu&quot; (unix).<\/p>\n<p>\/opt\/ohpc\/pub\/apps\/ansys\/2019R3_Fluent\/v195\/fluent\/fluent19.5.0\/bin\/fluent -r19.5.0 3d -flux -node -alnamd64 -t3 -pinfiniband -mpi=openmpi -cnf=\/mnt\/beegfs\/home\/testUser\/testNode.txt -ssh -mport 10.116.1.7:10.116.1.7:43013:0<\/p>\n<p>Starting fixfiledes \/opt\/ohpc\/pub\/apps\/ansys\/2019R3_Fluent\/v195\/fluent\/fluent19.5.0\/multiport\/mpi\/lnamd64\/openmpi\/bin\/mpirun &#8211;mca btl self,vader,mvapi &#8211;prefix \/opt\/ohpc\/pub\/apps\/ansys\/2019R3_Fluent\/v195\/fluent\/fluent19.5.0\/multiport\/mpi\/lnamd64\/openmpi -x LD_LIBRARY_PATH -x KMP_AFFINITY=disabled -x FLUENT_ARCH=lnamd64 -x FLUENT_PROD_DIR=\/opt\/ohpc\/pub\/apps\/ansys\/2019R3_Fluent\/v195\/fluent\/fluent19.5.0 -x PYTHONHOME=\/opt\/ohpc\/pub\/apps\/ansys\/2019R3_Fluent\/v195\/fluent\/fluent19.5.0\/..\/..\/commonfiles\/CPython\/2_7_15\/linx64\/Release\/python -np 3 &#8211;hostfile \/tmp\/fluent-appfile.testUser.32486 \/opt\/ohpc\/pub\/apps\/ansys\/2019R3_Fluent\/v195\/fluent\/fluent19.5.0\/lnamd64\/3d_node\/fluent_mpi.19.5.0 node -mpiw openmpi -pic infiniband -mport 10.116.1.7:10.116.1.7:43013:0<\/p>\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;<\/p>\n<p>Failed to create a completion queue (CQ):<\/p>\n<p><\/p>\n<\/p>\n<p>Hostname: node007<\/p>\n<p>Requested CQE: 16384<\/p>\n<p>Error: Cannot allocate memory<\/p>\n<p><\/p>\n<\/p>\n<p>Check the CQE attribute.<\/p>\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;<\/p>\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;<\/p>\n<p>Open MPI has detected that there are UD-capable Verbs devices on your<\/p>\n<p>system, but none of them were able to be setup properly. This may<\/p>\n<p>indicate a problem on this system.<\/p>\n<p><\/p>\n<\/p>\n<p>You job will continue, but Open MPI will ignore the &quot;ud&quot; oob component<\/p>\n<p>in this run.<\/p>\n<p><\/p>\n<\/p>\n<p>Hostname: node007<\/p>\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;<\/p>\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;<\/p>\n<p>Failed to create a completion queue (CQ):<\/p>\n<p><\/p>\n<\/p>\n<p>Hostname: node082<\/p>\n<p>Requested CQE: 16384<\/p>\n<p>Error: Cannot allocate memory<\/p>\n<p><\/p>\n<\/p>\n<p>Check the CQE attribute.<\/p>\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;<\/p>\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;<\/p>\n<p>Open MPI has detected that there are UD-capable Verbs devices on your<\/p>\n<p>system, but none of them were able to be setup properly. This may<\/p>\n<p>indicate a problem on this system.<\/p>\n<p><\/p>\n<\/p>\n<p>You job will continue, but Open MPI will ignore the &quot;ud&quot; oob component<\/p>\n<p>in this run.<\/p>\n<p><\/p>\n<\/p>\n<p>Hostname: node082<\/p>\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;<\/p>\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;<\/p>\n<p>Failed to create a completion queue (CQ):<\/p>\n<p><\/p>\n<\/p>\n<p>Hostname: node081<\/p>\n<p>Requested CQE: 16384<\/p>\n<p>Error: Cannot allocate memory<\/p>\n<p><\/p>\n<\/p>\n<p>Check the CQE attribute.<\/p>\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;<\/p>\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;<\/p>\n<p>A requested component was not found, or was unable to be opened. This<\/p>\n<p>means that this component is either not installed or is unable to be<\/p>\n<p>used on your system (e.g., sometimes this means that shared libraries<\/p>\n<p>that the component requires are unable to be found\/loaded). Note that<\/p>\n<p>Open MPI stopped checking at the first component that it did not find.<\/p>\n<p><\/p>\n<\/p>\n<p>Host: node007.hpc.fau.edu<\/p>\n<p>Framework: btl<\/p>\n<p>Component: mvapi<\/p>\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;<\/p>\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;<\/p>\n<p>It looks like MPI_INIT failed for some reason; your parallel process is<\/p>\n<p>likely to abort. There are many reasons that a parallel process can<\/p>\n<p>fail during MPI_INIT; some of which are due to configuration or environment<\/p>\n<p>problems. This failure appears to be an internal failure; here&#039;s some<\/p>\n<p>additional information (which may only be relevant to an Open MPI<\/p>\n<p>developer):<\/p>\n<p><\/p>\n<\/p>\n<p>mca_bml_base_open() failed<\/p>\n<p>&#8211;&gt; Returned &quot;Not found&quot; (-13) instead of &quot;Success&quot; (0)<\/p>\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;<\/p>\n<p>[node007:32653] *** An error occurred in MPI_Init<\/p>\n<p>[node007:32653] *** reported by process [4164222977,2]<\/p>\n<p>[node007:32653] *** on a NULL communicator<\/p>\n<p>[node007:32653] *** Unknown error<\/p>\n<p>[node007:32653] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,<\/p>\n<p>[node007:32653] *** and potentially your MPI job)<\/p>\n<p>[node007.hpc.fau.edu:32614] 3 more processes have sent help message help-oob-ud.txt \/ create-cq-failed<\/p>\n<p>[node007.hpc.fau.edu:32614] Set MCA parameter &quot;orte_base_help_aggregate&quot; to 0 to see all help \/ error messages<\/p>\n<p>[node007.hpc.fau.edu:32614] 3 more processes have sent help message help-oob-ud.txt \/ no-ports-usable<\/p>\n<p>[node007.hpc.fau.edu:32614] 2 more processes have sent help message help-mca-base.txt \/ find-available:not-valid<\/p>\n<p>[node007.hpc.fau.edu:32614] 2 more processes have sent help message help-mpi-runtime.txt \/ mpi_init:startup:internal-failure<\/p>\n<p>[node007.hpc.fau.edu:32614] 1 more process has sent help message help-mpi-errors.txt \/ mpi_errors_are_fatal unknown handle<\/p>\n<p><\/p>\n<\/p>\n<p>Any help would be greatly appreciated. Thanks<\/p>\n","protected":false},"template":"","class_list":["post-166634","topic","type-topic","status-closed","hentry","topic-tag-ansys-fluent","topic-tag-fluent"],"aioseo_notices":[],"acf":[],"custom_fields":[{"0":{"_btv_view_count":["2684"],"_bbp_likes_count":["0"],"_bbp_subscription":["257224"],"_bbp_topic_status":["unanswered"],"_bbp_status":["publish"],"_bbp_topic_id":["166634"],"_bbp_forum_id":["27796"],"_bbp_engagement":["157474","183088"],"_bbp_voice_count":["2"],"_bbp_reply_count":["3"],"_bbp_last_reply_id":["206270"],"_bbp_last_active_id":["206270"],"_bbp_last_active_time":["2022-04-06 14:02:20"]},"test":"skylerp"}],"_links":{"self":[{"href":"https:\/\/innovationspace.ansys.com\/forum\/wp-json\/wp\/v2\/topics\/166634","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/innovationspace.ansys.com\/forum\/wp-json\/wp\/v2\/topics"}],"about":[{"href":"https:\/\/innovationspace.ansys.com\/forum\/wp-json\/wp\/v2\/types\/topic"}],"version-history":[{"count":0,"href":"https:\/\/innovationspace.ansys.com\/forum\/wp-json\/wp\/v2\/topics\/166634\/revisions"}],"wp:attachment":[{"href":"https:\/\/innovationspace.ansys.com\/forum\/wp-json\/wp\/v2\/media?parent=166634"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}