TAGGED: ansys-hpc, batch-hpc, hpc-cluster
-
-
November 24, 2024 at 7:21 pmmichaelsalasSubscriber
I am trying to run a fluent simulation accross multiple nodes of a HPC using pyfluent to open the case file, initialize it and run it. All the settings for the simulation are already done and saved ahead of time. The pyfluent script works when I submit the job on one node but when I submit it on multiple fluent times out when launching. I thought this was a MPI issue orginially but I am not sure. I've tried so many different things I've lost track and nothing has worked yet. I've attached the batch script, the pyfluent script and the error when running on multiple nodes below. If you need anymore info to help me I'll be happy to get it to you. Thanks.
batch script:#!/bin/bash
#SBATCH -J ansysjob        # job name
#SBATCH -e ansysjob.%j.err    # error file name
#SBATCH -o ansysjob.%j.out    # output file name
#SBATCH -N 1 Â Â Â Â Â Â Â Â Â Â # request 2 nodes
#SBATCH -n 128 Â Â Â Â Â Â Â Â Â # request 128 cores
#SBATCH -t 0:20:00 Â Â Â Â Â Â Â # designate max run time
#SBATCH -A DDM23001 Â Â Â Â Â Â Â # charge job to myproject
#SBATCH -p development      # designate queue# Load necessary modules
module load python3/3.9.7
module load ansys# Define Fluent environment
export AWP_ROOT232='/scratch/tacc/apps/ANSYS/2023R2/v232'# set library path for Fluent shared libraries
export LD_LIBRARY_PATH=/scratch/tacc/apps/ANSYS/2023R2/v232/fluent/lib/lnamd64:$LD_LIBRARY_PATH# pre-create blank output file
touch /scratch/10223/mjs7392/jcat/fluent_output.log# give permissions to the pyfluent script
chmod 700 /scratch/10223/mjs7392/jcat/intake_script.py# Run the Python script with MPI configuration
python /scratch/10223/mjs7392/jcat/intake_script.py > /scratch/10223/mjs7392/jcat/fluent_output.log 2>&1
pyfluent script:# created by Michael Salas
import ansys.fluent.core as pyfluent
import os
import time  # Import the time module for tracking execution time# set environment for Fluent location
os.environ['AWP_ROOT232'] = '/scratch/tacc/apps/ANSYS/2023R2/v232' Â # Path to Fluent installation# path to the Fluent case file (HDF5 case file)
case_file = r'/scratch/10223/mjs7392/jcat/jcat_files/dp0/FLTG/Fluent/FLTG-Setup-Output.cas.h5'# Start tracking time
start_time = time.time()# initialize a Fluent session
solver = pyfluent.launch_fluent(
  mode="solver",
  precision=pyfluent.Precision.DOUBLE,
  dimension=pyfluent.Dimension.THREE
)# read the HDF5 case file ('.cas.h5')
solver.file.read_case(file_type="case", file_name=case_file)solution = solver.settings.solution
# initialize the solution
solution.initialization.standard_initialize()
print("INITIALIZED")# run the calculation
solution.run_calculation.dual_time_iterate()
print("RAN CALC")Â# End tracking time
end_time = time.time()# Calculate elapsed time in seconds
elapsed_time = end_time - start_time# Format the elapsed time in a readable way (e.g., HH:MM:SS)
hours, rem = divmod(elapsed_time, 3600)
minutes, seconds = divmod(rem, 60)
formatted_time = f"{int(hours):02}:{int(minutes):02}:{seconds:05.2f}"# Define the output log file path
log_file_path = "/scratch/10223/mjs7392/simulation_time_log.txt"# Write the elapsed time to the log file
with open(log_file_path, "w") as log_file:
  log_file.write(f"Simulation completed successfully.\n")
  log_file.write(f"Total job run time: {formatted_time} (HH:MM:SS)\n")
The error:Host spawning Node 0 on machine "c304-005.ls6.tacc.utexas.edu" (unix).
/scratch/tacc/apps/ANSYS/2023R2/v232/fluent/fluent23.2.0/bin/fluent -r23.2.0 3ddp -flux -node -t128 -pmpi-auto-selected -mpi=intel -cnf=c304-005:64,c304-006:64 -ssh -mport 129.114.41.77:129.114.41.77:33065:0
Starting /scratch/tacc/apps/ANSYS/2023R2/v232/fluent/fluent23.2.0/multiport/mpi/lnamd64/intel2021/bin/mpirun -f /tmp/fluent-appfile.mjs7392.2114516 --rsh=ssh -genv FLUENT_ARCH lnamd64 -genv I_MPI_DEBUG 0 -genv I_MPI_ADJUST_GATHERV 3 -genv I_MPI_ADJUST_ALLREDUCE 2 -genv I_MPI_PLATFORM auto -genv PYTHONHOME /scratch/tacc/apps/ANSYS/2023R2/v232/fluent/fluent23.2.0/../../commonfiles/CPython/3_10/linx64/Release/python -genv FLUENT_PROD_DIR /scratch/tacc/apps/ANSYS/2023R2/v232/fluent/fluent23.2.0 -genv FLUENT_AFFINITY 0 -genv I_MPI_PIN enable -genv KMP_AFFINITY disabled -machinefile /tmp/fluent-appfile.mjs7392.2114516 -np 128 /scratch/tacc/apps/ANSYS/2023R2/v232/fluent/fluent23.2.0/lnamd64/3ddp_node/fluent_mpi.23.2.0 node -mpiw intel -pic mpi-auto-selected -mport 129.114.41.77:129.114.41.77:33065:0
pyfluent.launcher ERROR: Exception caught - TimeoutError: The launch process has timed out.
Traceback (most recent call last):
 File "/home1/10223/mjs7392/.local/lib/python3.9/site-packages/ansys/fluent/core/launcher/standalone_launcher.py", line 253, in __call__
  raise ex
 File "/home1/10223/mjs7392/.local/lib/python3.9/site-packages/ansys/fluent/core/launcher/standalone_launcher.py", line 233, in __call__
  _await_fluent_launch(
 File "/home1/10223/mjs7392/.local/lib/python3.9/site-packages/ansys/fluent/core/launcher/launcher_utils.py", line 59, in _await_fluent_launch
  raise TimeoutError("The launch process has timed out.")
TimeoutError: The launch process has timed out.The above exception was the direct cause of the following exception:
Traceback (most recent call last):
 File "/scratch/10223/mjs7392/jcat/intake_script.py", line 17, in
  solver = pyfluent.launch_fluent(
 File "/home1/10223/mjs7392/.local/lib/python3.9/site-packages/ansys/fluent/core/utils/deprecate.py", line 49, in wrapper
  return func(*args, **kwargs)
 File "/home1/10223/mjs7392/.local/lib/python3.9/site-packages/ansys/fluent/core/utils/deprecate.py", line 49, in wrapper
  return func(*args, **kwargs)
 File "/home1/10223/mjs7392/.local/lib/python3.9/site-packages/ansys/fluent/core/launcher/launcher.py", line 285, in launch_fluent
  return launcher()
 File "/home1/10223/mjs7392/.local/lib/python3.9/site-packages/ansys/fluent/core/launcher/standalone_launcher.py", line 296, in __call__
  raise LaunchFluentError(self._launch_cmd) from ex
ansys.fluent.core.launcher.error_handler.LaunchFluentError:Â
Fluent Launch string: nohup /scratch/tacc/apps/ANSYS/2023R2/v232/fluent/bin/fluent 3ddp -t128 -cnf=c304-005:64,c304-006:64 -gu -sifile=/tmp/serverinfo-xjyx628q.txt -nm & -
November 25, 2024 at 2:11 pmFedericoAnsys Employee
Hello,Â
Does your batch script work for other cases that don't involve PyFluent?
for PyFluent related questions, I would recommend you post this on the Ansys Developper Forum Home - Community Forum
-
November 25, 2024 at 8:01 pmmichaelsalasSubscriber
if I launch fluent through the batch script instead of through pyfluent I can run the initilize and run commands using a journal file. I started going down this path a little bit but was having trouble launching fluent from the batch script too.
-
November 25, 2024 at 8:36 pmFedericoAnsys Employee
What kind of trouble?
-
November 26, 2024 at 12:46 ammichaelsalasSubscriber
launch process still times outÂ
here is the batch script:#!/bin/bash
#SBATCH -J ansysjob        # job name
#SBATCH -e ansysjob.%j.err    # error file name
#SBATCH -o ansysjob.%j.out    # output file name
#SBATCH -N 2 Â Â Â Â Â Â Â Â Â Â # request 2 nodes
#SBATCH -n 256 Â Â Â Â Â Â Â Â Â # request 128 cores
#SBATCH -t 0:20:00 Â Â Â Â Â Â Â # designate max run time
#SBATCH -A DDM23001 Â Â Â Â Â Â Â # charge job to myproject
#SBATCH -p development      # designate queue# Load necessary modules
module load python3/3.9.7
module load ansyssrun hostname -s | uniq -c | sort -k2 -V | awk '{printf("%s:%d\n",$2,$1)}' > hosts.$SLURM_JOB_ID
# Wait for finishing
wait
# set library path for Fluent shared libraries
export LD_LIBRARY_PATH=/scratch/tacc/apps/ANSYS/2023R2/v232/fluent/lib/lnamd64:$LD_LIBRARY_PATH# give permissions to the pyfluent script
chmod 700 /scratch/10223/mjs7392/jcat/intake_script.py# pre-create blank output file
touch /scratch/10223/mjs7392/jcat/fluent_output.log# change to the directory where the Slurm job was submitted
cd $SLURM_SUBMIT_DIR# run Fluent
/scratch/tacc/apps/ANSYS/2023R2/v232/fluent/bin/fluent 3ddp -g -mpi=openmpi -pib -cnf=hosts.$SLURM_JOB_ID -t $SLURM_NTASKS -driver > /scratch/10223/mjs7392/jcat/fluent_output.log 2>&1 &wait
# run the script
python3 /scratch/10223/mjs7392/jcat/intake_script.py
here is the error:Warning: Â DISPLAY environment variable is not set.
 Graphics and GUI will not operate correctly
 without this being set properly.
Warning: Â DISPLAY environment variable is not set.
 Graphics and GUI will not operate correctly
 without this being set properly.
pyfluent.launcher ERROR: Exception caught - TimeoutError: The launch process has timed out.
Traceback (most recent call last):
 File "/home1/10223/mjs7392/.local/lib/python3.9/site-packages/ansys/fluent/core/launcher/standalone_launcher.py", line 253, in __call__
  raise ex
 File "/home1/10223/mjs7392/.local/lib/python3.9/site-packages/ansys/fluent/core/launcher/standalone_launcher.py", line 233, in __call__
  _await_fluent_launch(
 File "/home1/10223/mjs7392/.local/lib/python3.9/site-packages/ansys/fluent/core/launcher/launcher_utils.py", line 59, in _await_fluent_launch
  raise TimeoutError("The launch process has timed out.")
TimeoutError: The launch process has timed out.The above exception was the direct cause of the following exception:
Traceback (most recent call last):
 File "/scratch/10223/mjs7392/jcat/intake_script.py", line 17, in
  solver = pyfluent.launch_fluent(
 File "/home1/10223/mjs7392/.local/lib/python3.9/site-packages/ansys/fluent/core/utils/deprecate.py", line 49, in wrapper
  return func(*args, **kwargs)
 File "/home1/10223/mjs7392/.local/lib/python3.9/site-packages/ansys/fluent/core/utils/deprecate.py", line 49, in wrapper
  return func(*args, **kwargs)
 File "/home1/10223/mjs7392/.local/lib/python3.9/site-packages/ansys/fluent/core/launcher/launcher.py", line 285, in launch_fluent
  return launcher()
 File "/home1/10223/mjs7392/.local/lib/python3.9/site-packages/ansys/fluent/core/launcher/standalone_launcher.py", line 296, in __call__
  raise LaunchFluentError(self._launch_cmd) from ex
ansys.fluent.core.launcher.error_handler.LaunchFluentError:Â
Fluent Launch string: nohup /scratch/tacc/apps/ANSYS/2023R2/v232/fluent/bin/fluent 3ddp -t256 -cnf=c306-005:128,c306-006:128 -gu -sifile=/tmp/serverinfo-0g7hxku6.txt -nm &
the pyfluent script is the same for both batch filesÂ
-
-
-
- You must be logged in to reply to this topic.
- Non-Intersected faces found for matching interface periodic-walls
- Help: About the expression of turbulent viscosity in Realizable k-e model
- Unburnt Hydrocarbons contour in ANSYS FORTE for sector mesh
- error udf
- Cyclone (Stairmand) simulation using RSM
- Diesel with Ammonia/Hydrogen blend combustion
- Mass Conservation Issue in Methane Pyrolysis Shock Tube Simulation
- Fluent fails with Intel MPI protocol on 2 nodes
- Encountering Error in Heterogeneous Surface Reaction
- Script Error
-
1166
-
488
-
487
-
225
-
201
© 2024 Copyright ANSYS, Inc. All rights reserved.