


{"id":395091,"date":"2024-11-24T19:21:20","date_gmt":"2024-11-24T19:21:20","guid":{"rendered":"https:\/\/innovationspace.ansys.com\/forum\/forums\/topic\/running-ansys-fluent-on-a-hpc-cluster\/"},"modified":"2024-11-24T19:21:20","modified_gmt":"2024-11-24T19:21:20","slug":"running-ansys-fluent-on-a-hpc-cluster","status":"publish","type":"topic","link":"https:\/\/innovationspace.ansys.com\/forum\/forums\/topic\/running-ansys-fluent-on-a-hpc-cluster\/","title":{"rendered":"Running ANSYS Fluent on a HPC Cluster"},"content":{"rendered":"<p>&lt;p&gt;I am trying to run a fluent simulation accross multiple nodes of a HPC using pyfluent to open the case file, initialize it and run it. All the settings for the simulation are already done and saved ahead of time. The pyfluent script works when I submit the job on one node but when I submit it on multiple fluent times out when launching. I thought this was a MPI issue orginially but I am not sure. I&#8217;ve tried so many different things I&#8217;ve lost track and nothing has worked yet. I&#8217;ve attached the batch script, the pyfluent script and the error when running on multiple nodes below. If you need anymore info to help me I&#8217;ll be happy to get it to you. Thanks.&lt;br&gt;&lt;br&gt;batch script:&lt;br&gt;&lt;br&gt;&lt;\/p&gt;&lt;p&gt;#!\/bin\/bash&lt;br&gt;#SBATCH -J ansysjob &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;# job name&lt;br&gt;#SBATCH -e ansysjob.%j.err &nbsp; &nbsp; &nbsp; # error file name&lt;br&gt;#SBATCH -o ansysjob.%j.out &nbsp; &nbsp; &nbsp; # output file name&lt;br&gt;#SBATCH -N 1 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # request 2 nodes&lt;br&gt;#SBATCH -n 128 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # request 128 cores&lt;br&gt;#SBATCH -t 0:20:00 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # designate max run time&lt;br&gt;#SBATCH -A DDM23001 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;# charge job to myproject&lt;br&gt;#SBATCH -p development &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # designate queue&lt;\/p&gt;&lt;p&gt;# Load necessary modules&lt;br&gt;module load python3\/3.9.7&lt;br&gt;module load ansys&lt;\/p&gt;&lt;p&gt;# Define Fluent environment&lt;br&gt;export AWP_ROOT232=&#8217;\/scratch\/tacc\/apps\/ANSYS\/2023R2\/v232&#8242;&lt;\/p&gt;&lt;p&gt;# set library path for Fluent shared libraries&lt;br&gt;export LD_LIBRARY_PATH=\/scratch\/tacc\/apps\/ANSYS\/2023R2\/v232\/fluent\/lib\/lnamd64:$LD_LIBRARY_PATH&lt;\/p&gt;&lt;p&gt;# pre-create blank output file&lt;br&gt;touch \/scratch\/10223\/mjs7392\/jcat\/fluent_output.log&lt;\/p&gt;&lt;p&gt;# give permissions to the pyfluent script&lt;br&gt;chmod 700 \/scratch\/10223\/mjs7392\/jcat\/intake_script.py&lt;\/p&gt;&lt;p&gt;# Run the Python script with MPI configuration&lt;br&gt;python \/scratch\/10223\/mjs7392\/jcat\/intake_script.py &gt; \/scratch\/10223\/mjs7392\/jcat\/fluent_output.log 2&gt;&amp;1&lt;\/p&gt;&lt;p&gt;&lt;br&gt;pyfluent script:&lt;br&gt;&lt;br&gt;&lt;\/p&gt;&lt;p&gt;# created by Michael Salas&lt;\/p&gt;&lt;p&gt;import ansys.fluent.core as pyfluent&lt;br&gt;import os&lt;br&gt;import time &nbsp;# Import the time module for tracking execution time&lt;\/p&gt;&lt;p&gt;# set environment for Fluent location&lt;br&gt;os.environ[&#8216;AWP_ROOT232&#8217;] = &#8216;\/scratch\/tacc\/apps\/ANSYS\/2023R2\/v232&#8217; &nbsp;# Path to Fluent installation&lt;\/p&gt;&lt;p&gt;# path to the Fluent case file (HDF5 case file)&lt;br&gt;case_file = r&#8217;\/scratch\/10223\/mjs7392\/jcat\/jcat_files\/dp0\/FLTG\/Fluent\/FLTG-Setup-Output.cas.h5&#8242;&lt;\/p&gt;&lt;p&gt;# Start tracking time&lt;br&gt;start_time = time.time()&lt;\/p&gt;&lt;p&gt;# initialize a Fluent session&lt;br&gt;solver = pyfluent.launch_fluent(&lt;br&gt;&nbsp; &nbsp; mode=&#8221;solver&#8221;,&lt;br&gt;&nbsp; &nbsp; precision=pyfluent.Precision.DOUBLE,&lt;br&gt;&nbsp; &nbsp; dimension=pyfluent.Dimension.THREE&lt;br&gt;)&lt;\/p&gt;&lt;p&gt;# read the HDF5 case file (&#8216;.cas.h5&#8217;)&lt;br&gt;solver.file.read_case(file_type=&#8221;case&#8221;, file_name=case_file)&lt;\/p&gt;&lt;p&gt;solution = solver.settings.solution&lt;\/p&gt;&lt;p&gt;# initialize the solution&lt;br&gt;solution.initialization.standard_initialize()&lt;br&gt;print(&#8220;INITIALIZED&#8221;)&lt;\/p&gt;&lt;p&gt;# run the calculation&lt;br&gt;solution.run_calculation.dual_time_iterate()&lt;br&gt;print(&#8220;RAN CALC&#8221;)&nbsp;&lt;\/p&gt;&lt;p&gt;# End tracking time&lt;br&gt;end_time = time.time()&lt;\/p&gt;&lt;p&gt;# Calculate elapsed time in seconds&lt;br&gt;elapsed_time = end_time &#8211; start_time&lt;\/p&gt;&lt;p&gt;# Format the elapsed time in a readable way (e.g., HH:MM:SS)&lt;br&gt;hours, rem = divmod(elapsed_time, 3600)&lt;br&gt;minutes, seconds = divmod(rem, 60)&lt;br&gt;formatted_time = f&#8221;{int(hours):02}:{int(minutes):02}:{seconds:05.2f}&#8221;&lt;\/p&gt;&lt;p&gt;# Define the output log file path&lt;br&gt;log_file_path = &#8220;\/scratch\/10223\/mjs7392\/simulation_time_log.txt&#8221;&lt;\/p&gt;&lt;p&gt;# Write the elapsed time to the log file&lt;br&gt;with open(log_file_path, &#8220;w&#8221;) as log_file:&lt;br&gt;&nbsp; &nbsp; log_file.write(f&#8221;Simulation completed successfully.\\n&#8221;)&lt;br&gt;&nbsp; &nbsp; log_file.write(f&#8221;Total job run time: {formatted_time} (HH:MM:SS)\\n&#8221;)&lt;br&gt;&lt;br&gt;&lt;br&gt;The error:&lt;br&gt;&lt;br&gt;&lt;\/p&gt;&lt;p&gt;Host spawning Node 0 on machine &#8220;c304-005.ls6.tacc.utexas.edu&#8221; (unix).&lt;br&gt;\/scratch\/tacc\/apps\/ANSYS\/2023R2\/v232\/fluent\/fluent23.2.0\/bin\/fluent -r23.2.0 3ddp -flux -node -t128 -pmpi-auto-selected -mpi=intel -cnf=c304-005:64,c304-006:64 -ssh -mport 129.114.41.77:129.114.41.77:33065:0&lt;br&gt;Starting \/scratch\/tacc\/apps\/ANSYS\/2023R2\/v232\/fluent\/fluent23.2.0\/multiport\/mpi\/lnamd64\/intel2021\/bin\/mpirun -f \/tmp\/fluent-appfile.mjs7392.2114516 &#8211;rsh=ssh -genv FLUENT_ARCH lnamd64 -genv I_MPI_DEBUG 0 -genv I_MPI_ADJUST_GATHERV 3 -genv I_MPI_ADJUST_ALLREDUCE 2 -genv I_MPI_PLATFORM auto -genv PYTHONHOME \/scratch\/tacc\/apps\/ANSYS\/2023R2\/v232\/fluent\/fluent23.2.0\/..\/..\/commonfiles\/CPython\/3_10\/linx64\/Release\/python -genv FLUENT_PROD_DIR \/scratch\/tacc\/apps\/ANSYS\/2023R2\/v232\/fluent\/fluent23.2.0 -genv FLUENT_AFFINITY 0 -genv I_MPI_PIN enable -genv KMP_AFFINITY disabled -machinefile \/tmp\/fluent-appfile.mjs7392.2114516 -np 128 \/scratch\/tacc\/apps\/ANSYS\/2023R2\/v232\/fluent\/fluent23.2.0\/lnamd64\/3ddp_node\/fluent_mpi.23.2.0 node -mpiw intel -pic mpi-auto-selected -mport 129.114.41.77:129.114.41.77:33065:0&lt;br&gt;pyfluent.launcher ERROR: Exception caught &#8211; TimeoutError: The launch process has timed out.&lt;br&gt;Traceback (most recent call last):&lt;br&gt;&nbsp; File &#8220;\/home1\/10223\/mjs7392\/.local\/lib\/python3.9\/site-packages\/ansys\/fluent\/core\/launcher\/standalone_launcher.py&#8221;, line 253, in __call__&lt;br&gt;&nbsp; &nbsp; raise ex&lt;br&gt;&nbsp; File &#8220;\/home1\/10223\/mjs7392\/.local\/lib\/python3.9\/site-packages\/ansys\/fluent\/core\/launcher\/standalone_launcher.py&#8221;, line 233, in __call__&lt;br&gt;&nbsp; &nbsp; _await_fluent_launch(&lt;br&gt;&nbsp; File &#8220;\/home1\/10223\/mjs7392\/.local\/lib\/python3.9\/site-packages\/ansys\/fluent\/core\/launcher\/launcher_utils.py&#8221;, line 59, in _await_fluent_launch&lt;br&gt;&nbsp; &nbsp; raise TimeoutError(&#8220;The launch process has timed out.&#8221;)&lt;br&gt;TimeoutError: The launch process has timed out.&lt;\/p&gt;&lt;p&gt;The above exception was the direct cause of the following exception:&lt;\/p&gt;&lt;p&gt;Traceback (most recent call last):&lt;br&gt;&nbsp; File &#8220;\/scratch\/10223\/mjs7392\/jcat\/intake_script.py&#8221;, line 17, in &lt;module&gt;&lt;br&gt;&nbsp; &nbsp; solver = pyfluent.launch_fluent(&lt;br&gt;&nbsp; File &#8220;\/home1\/10223\/mjs7392\/.local\/lib\/python3.9\/site-packages\/ansys\/fluent\/core\/utils\/deprecate.py&#8221;, line 49, in wrapper&lt;br&gt;&nbsp; &nbsp; return func(*args, **kwargs)&lt;br&gt;&nbsp; File &#8220;\/home1\/10223\/mjs7392\/.local\/lib\/python3.9\/site-packages\/ansys\/fluent\/core\/utils\/deprecate.py&#8221;, line 49, in wrapper&lt;br&gt;&nbsp; &nbsp; return func(*args, **kwargs)&lt;br&gt;&nbsp; File &#8220;\/home1\/10223\/mjs7392\/.local\/lib\/python3.9\/site-packages\/ansys\/fluent\/core\/launcher\/launcher.py&#8221;, line 285, in launch_fluent&lt;br&gt;&nbsp; &nbsp; return launcher()&lt;br&gt;&nbsp; File &#8220;\/home1\/10223\/mjs7392\/.local\/lib\/python3.9\/site-packages\/ansys\/fluent\/core\/launcher\/standalone_launcher.py&#8221;, line 296, in __call__&lt;br&gt;&nbsp; &nbsp; raise LaunchFluentError(self._launch_cmd) from ex&lt;br&gt;ansys.fluent.core.launcher.error_handler.LaunchFluentError:&nbsp;&lt;br&gt;Fluent Launch string: nohup \/scratch\/tacc\/apps\/ANSYS\/2023R2\/v232\/fluent\/bin\/fluent 3ddp -t128 -cnf=c304-005:64,c304-006:64 -gu -sifile=\/tmp\/serverinfo-xjyx628q.txt -nm &amp;&lt;\/p&gt;<\/p>\n","protected":false},"template":"","class_list":["post-395091","topic","type-topic","status-publish","hentry","topic-tag-ansys-hpc-2","topic-tag-batch-hpc","topic-tag-hpc-cluster-1"],"aioseo_notices":[],"acf":[],"custom_fields":[{"0":{"_bbp_subscription":["302181","13659"],"_bbp_author_ip":["198.54.134.150"],"_btv_view_count":["2437"],"_bbp_topic_status":["unanswered"],"_bbp_topic_id":["395091"],"_bbp_forum_id":["27792"],"_bbp_engagement":["13659","302181"],"_bbp_voice_count":["2"],"_bbp_reply_count":["4"],"_bbp_last_reply_id":["395260"],"_bbp_last_active_id":["395260"],"_bbp_last_active_time":["2024-11-26 00:46:20"]},"test":"michaelsalasutexas-edu"}],"_links":{"self":[{"href":"https:\/\/innovationspace.ansys.com\/forum\/wp-json\/wp\/v2\/topics\/395091","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/innovationspace.ansys.com\/forum\/wp-json\/wp\/v2\/topics"}],"about":[{"href":"https:\/\/innovationspace.ansys.com\/forum\/wp-json\/wp\/v2\/types\/topic"}],"version-history":[{"count":0,"href":"https:\/\/innovationspace.ansys.com\/forum\/wp-json\/wp\/v2\/topics\/395091\/revisions"}],"wp:attachment":[{"href":"https:\/\/innovationspace.ansys.com\/forum\/wp-json\/wp\/v2\/media?parent=395091"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}