TAGGED: ansys-fluent, dcs, Design Points, distributed-computing, dps, hpc, hpc-cluster, workbench
March 1, 2024 at 5:29 amdbhandsSubscriber
Hello, I currently have a HPC slurm cluster that users are able to submit Fluent simulations to using RSM that works well. When submitting parameter sets, the meshing and solving tasks are submitted together with the same amount of cores specified, and I would like the ability to specify a different machine for meshing vs solving.
To address this I have set up a DCS server on the head node of the cluster, as well as a DC evaluator on the head node of the slurm cluster and a DC Evaluator on a windows server for geometry updates (spaceclaim). The goal is to have the DCE on the head node submit the run to the slurm cluster partition. When submitting, the project updates geometry, then starts updating the solution for about 1.5 minutes before stating the project failed in workbench and to check DPS for more info. The error log from DPS is posted below. When setting the DCE to solve directly, the solution calculates as expected, however I need to be able to run across the cluster.
What can I do to fix this error? In addition, it seems that meshing and solution are bundled into the same task in DPS. Is it possible to break these up into two tasks so they can be solved on different machines or different core counts?
output from DPS:
Job is running on hostname node00.*****. (removed from this post)
Job user from this host: ********* (removed from this post)
Starting directory: /HARDDRIVE/ANSYS/Staging/qfb2scrd.gmu
Reading control file /HARDDRIVE/ANSYS/Staging/qfb2scrd.gmu/control_ffcba1bc-b678-4069-80e1-67b0aa785470.rsm ....Â
Correct Cluster verified
Cluster Type: SLURM
Underlying Cluster: SLURM
Compute Server is running on NODE00.REDACTED.CR
Reading commands and arguments...
  Command 1: C:\Program Files\ANSYS Inc\v222\Framework\bin\Win64\runwb2.bat, arguments: -B -R "test2d_Workbench_Solution.wbjn" -Z Dpdb.EvaluatorRun,Dpdb.EvaluatingProjectUpdate --output "test2d_Workbench_Solution_log.txt", redirectFile: None
Running from shared staging directory ...
Cluster Shared Directory: /HARDDRIVE/ANSYS/Staging/qfb2scrd.gmu
Job file clean up: True
Use SSH on Linux cluster nodes: True
StdoutLiveLogFile: stdout_ffcba1bc-b678-4069-80e1-67b0aa785470.live
StderrLiveLogFile: stderr_ffcba1bc-b678-4069-80e1-67b0aa785470.liveÂ
Reading input files...
Reading cancel files...
Reading output files...
Reading exclude files...
Reading environment variables...
  RSM_IRON_PYTHON_HOME = /ansys_inc/v222/commonfiles/IronPython
  RSM_COMPUTE_SERVER_MACHINE_NAME = node00.redacted.cr
  RSM_HPC_JOBNAME = RemoteJobName
  RSM_HPC_USER = redacted
  RSM_HPC_WORKDIR = /HARDDRIVE/ANSYS/Staging/qfb2scrd.gmu
  RSM_HPC_STAGING = /HARDDRIVE/ANSYS/Staging/qfb2scrd.gmu
  RSM_HPC_STDOUTFILE = stdout_ffcba1bc-b678-4069-80e1-67b0aa785470.rsmout
  RSM_HPC_STDERRFILE = stderr_ffcba1bc-b678-4069-80e1-67b0aa785470.rsmout
  RSM_HPC_STDOUTLIVE = stdout_ffcba1bc-b678-4069-80e1-67b0aa785470.live
  RSM_HPC_STDERRLIVE = stderr_ffcba1bc-b678-4069-80e1-67b0aa785470.live
  RSM_HPC_SCRIPTS_DIRECTORY_LOCAL = /ansys_inc/v222/RSM/Config/scripts
  RSM_HPC_SCRIPTS_DIRECTORY = /ansys_inc/v222/RSM/Config/scripts
  RSM_HPC_STORAGEID =   ec2368dc-fca3-4690-80ef-8d47d4885614   RsmJobRunnerStorage=LocalOS#CRAC$\\\ansys\qfb2scrd.gmu   Friday, March 01, 2024 09:43:55.922 AM   True
  RSM_HPC_PLATFORMSTORAGEID = \\\ansys\qfb2scrd.gmu
  ARC_ROOT = /ansys_inc/v222/RSM/Config/scripts/../../ARC
Reading AWP_ROOT environment variable name ...
  AWP_ROOT environment variable name is: AWP_ROOT222
Reading Low Disk Space Warning Limit ...
  Low disk space warning threshold set at: 2.0GiB
Reading File identifier ...
  File identifier found as: ffcba1bc-b678-4069-80e1-67b0aa785470
Done reading control file.
AWP_ROOT222 install directory: /ansys_inc/v222
SLURM_JOB_NODELIST = node00.redacted.cr,node[01-03]<
RSM_MACHINES = node00.redacted.cr:22:node01:24:node02:24:node03:24
ALTERNATE_MACHINES = node00.redacted.cr:22:node01:24:node02:24:node03:24
Number of nodes assigned for current job = 4Â
Machine list: ['node00.redacted.cr', 'node01', 'node02', 'node03']Â
Start running job commands ...
Running on machine : node00.redacted.cr
Current Directory: /HARDDRIVE/ANSYS/Staging/qfb2scrd.gmu
Running command: C:\Program Files\ANSYS Inc\v222\Framework\bin\Win64\runwb2.bat -B -R "test2d_Workbench_Solution.wbjn" -Z Dpdb.EvaluatorRun,Dpdb.EvaluatingProjectUpdate --output "test2d_Workbench_Solution_log.txt"
Redirecting output to  None
Final command arg list : ['C:\\Program Files\\ANSYS Inc\\v222\\Framework\\bin\\Win64\\runwb2.bat', '-B', '-R', 'test2d_Workbench_Solution.wbjn', '-Z', 'Dpdb.EvaluatorRun,Dpdb.EvaluatingProjectUpdate', '--output', 'test2d_Workbench_Solution_log.txt']
Running Process
** Traceback (most recent call last):
** Â File "/ansys_inc/v222/RSM/Config/scripts/ClusterJobs.py", line 292, in main
  exitCodeList, exitCode = runCommandList(_commandList, _commandArgList, _commandRedirectList, _commandProgressMonitoringFlags, _targetCluster, _usingLocalScratch)
** Â File "/ansys_inc/v222/RSM/Config/scripts/ClusterJobs.py", line 1189, in runCommandList
  cFiles, cmdProgMonFlagList[cmdIndex], enablePrints)
** Â File "/ansys_inc/v222/RSM/Config/scripts/ClusterJobs.py", line 1214, in runCommand
** Â File "/ansys_inc/v222/RSM/Config/scripts/ClusterJobs.py", line 1802, in begin
  self.process = subprocess.Popen(self.argList, bufsize=-1, stdin=subprocess.PIPE, stdout=stdoutStream, stderr=stderrStream, cwd=os.getcwd(), universal_newlines=True)
** Â File "/ansys_inc/v222/commonfiles/CPython/3_7/linx64/Release/python/lib/python3.7/subprocess.py", line 800, in __init__
  restore_signals, start_new_session)
** Â File "/ansys_inc/v222/commonfiles/CPython/3_7/linx64/Release/python/lib/python3.7/subprocess.py", line 1551, in _execute_child
  raise child_exception_type(errno_num, err_msg, err_filename)
** FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Program Files\\ANSYS Inc\\v222\\Framework\\bin\\Win64\\runwb2.bat': 'C:\\Program Files\\ANSYS Inc\\v222\\Framework\\bin\\Win64\\runwb2.bat'
Saving exit code file: /HARDDRIVE/ANSYS/Staging/qfb2scrd.gmu/exitcode_ffcba1bc-b678-4069-80e1-67b0aa785470.rsmout
  Exit code file: /HARDDRIVE/ANSYS/Staging/qfb2scrd.gmu/exitcode_ffcba1bc-b678-4069-80e1-67b0aa785470.rsmout has been created.
Saving exit code file: /HARDDRIVE/ANSYS/Staging/qfb2scrd.gmu/exitcodeCommands_ffcba1bc-b678-4069-80e1-67b0aa785470.rsmout
  Exit code file: /HARDDRIVE/ANSYS/Staging/qfb2scrd.gmu/exitcodeCommands_ffcba1bc-b678-4069-80e1-67b0aa785470.rsmout has been created.
ClusterJobs Exiting with code: 9999
Individual Command Exit Codes are: [None]
Fatal error when running job command(s).
[Errno 2] No such file or directory: 'C:\\Program Files\\ANSYS Inc\\v222\\Framework\\bin\\Win64\\runwb2.bat': 'C:\\Program Files\\ANSYS Inc\\v222\\Framework\\bin\\Win64\\runwb2.bat'
A Job command did not return exit code. The job will fail with exit code 9999
Traceback (most recent call last):
 File "/ansys_inc/v222/RSM/Config/scripts/ClusterJobs.py", line 292, in main
  exitCodeList, exitCode = runCommandList(_commandList, _commandArgList, _commandRedirectList, _commandProgressMonitoringFlags, _targetCluster, _usingLocalScratch)
 File "/ansys_inc/v222/RSM/Config/scripts/ClusterJobs.py", line 1189, in runCommandList
  cFiles, cmdProgMonFlagList[cmdIndex], enablePrints)
 File "/ansys_inc/v222/RSM/Config/scripts/ClusterJobs.py", line 1214, in runCommand
 File "/ansys_inc/v222/RSM/Config/scripts/ClusterJobs.py", line 1802, in begin
  self.process = subprocess.Popen(self.argList, bufsize=-1, stdin=subprocess.PIPE, stdout=stdoutStream, stderr=stderrStream, cwd=os.getcwd(), universal_newlines=True)
 File "/ansys_inc/v222/commonfiles/CPython/3_7/linx64/Release/python/lib/python3.7/subprocess.py", line 800, in __init__
  restore_signals, start_new_session)
 File "/ansys_inc/v222/commonfiles/CPython/3_7/linx64/Release/python/lib/python3.7/subprocess.py", line 1551, in _execute_child
  raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Program Files\\ANSYS Inc\\v222\\Framework\\bin\\Win64\\runwb2.bat': 'C:\\Program Files\\ANSYS Inc\\v222\\Framework\\bin\\Win64\\runwb2.bat'Â
March 11, 2024 at 4:03 pmMangeshANSYSAnsys Employee
please check target platform mix
the log showsÂ
but then further down i see
** Â File "/ansys_inc/v222/commonfiles/CPython/3_7/linx64/Release/python/lib/python3.7/subprocess.py", line 1551, in _execute_child
  raise child_exception_type(errno_num, err_msg, err_filename)
** FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Program Files\\ANSYS Inc\\v222\\Framework\\bin\\Win64\\runwb2.bat': 'C:\\Program Files\\ANSYS Inc\\v222\\Framework\\bin\\Win64\\runwb2.bat'Â
and furtherÂ
[Errno 2] No such file or directory: 'C:\\Program Files\\ANSYS Inc\\v222\\Framework\\bin\\Win64\\runwb2.bat': 'C:\\Program Files\\ANSYS Inc\\v222\\Framework\\bin\\Win64\\runwb2.bat'
A Job command did not return exit code. The job will fail with exit code 9999Â
March 11, 2024 at 8:20 pmdbhandsSubscriber
Thanks for the reply. I noticed this as well. Where would I check this setting? I am intending to run on linux, so the windows filepaths are incorrect. In the DC evaluator settings on the head node of the slurm cluster, the machine is set to linux and solves when set to direct. When setting to submit to RSM, however, the submission fails with the error above. When setting to RSM the only other option I see is "queue" where I have typed in the name of one of the RSM queues. I do not see anywhere to specify the platform type, however I would asume this should be taken care of by the RSM head node on the slurm cluster like it is during normal submissions to the RSM head node. -
March 11, 2024 at 8:47 pmMangeshANSYSAnsys Employee
Can you please explain the setups and which machine is set to what ?
1. Which computer is the project being opened and submitted from? what is th eOS? what is the DCS setting ?
2. Which is the computer where meshing needs to happen? what is the OS ?3. how and where is the submission to slurm configured?Â
please add screenshots obscuring any information that shuold not be on a public forum
- The topic ‘Setting up DCS services for a cluster’ is closed to new replies.
- Workbench license error
- Unexpected error on Workbench: Root element not found.
- Unexpected issues with SCCM deployment of Ansys Fluids and Structures 2024 R1
- Questions and recommendations: Septum Horn Antenna
- AQWA: Hydrodynamic response error
- Tutorial or Help for 2 way FSI
- Moment Reaction probe with Large deformation
- 2 way coupled FSI for ball bearing
- Ansys with Vmware and CPU configuration : I’m lost, good practice?
- Ball Bearing Transient Structural
© 2025 Copyright ANSYS, Inc. All rights reserved.