TAGGED: ansys-fluent, dcs, Design Points, distributed-computing, dps, hpc, hpc-cluster, workbench
-
-
March 1, 2024 at 5:29 amdbhandsSubscriber
Hello, I currently have a HPC slurm cluster that users are able to submit Fluent simulations to using RSM that works well. When submitting parameter sets, the meshing and solving tasks are submitted together with the same amount of cores specified, and I would like the ability to specify a different machine for meshing vs solving.
To address this I have set up a DCS server on the head node of the cluster, as well as a DC evaluator on the head node of the slurm cluster and a DC Evaluator on a windows server for geometry updates (spaceclaim). The goal is to have the DCE on the head node submit the run to the slurm cluster partition. When submitting, the project updates geometry, then starts updating the solution for about 1.5 minutes before stating the project failed in workbench and to check DPS for more info. The error log from DPS is posted below. When setting the DCE to solve directly, the solution calculates as expected, however I need to be able to run across the cluster.
What can I do to fix this error? In addition, it seems that meshing and solution are bundled into the same task in DPS. Is it possible to break these up into two tasks so they can be solved on different machines or different core counts?
Â
output from DPS:
Job is running on hostname node00.*****. (removed from this post)
Job user from this host: ********* (removed from this post)
Starting directory: /HARDDRIVE/ANSYS/Staging/qfb2scrd.gmu
Reading control file /HARDDRIVE/ANSYS/Staging/qfb2scrd.gmu/control_ffcba1bc-b678-4069-80e1-67b0aa785470.rsm ....Â
Correct Cluster verified
Cluster Type: SLURM
Underlying Cluster: SLURM
  RSM_CLUSTER_TYPE = SLURM
Compute Server is running on NODE00.REDACTED.CR
Reading commands and arguments...
  Command 1: C:\Program Files\ANSYS Inc\v222\Framework\bin\Win64\runwb2.bat, arguments: -B -R "test2d_Workbench_Solution.wbjn" -Z Dpdb.EvaluatorRun,Dpdb.EvaluatingProjectUpdate --output "test2d_Workbench_Solution_log.txt", redirectFile: None
Running from shared staging directory ...
  RSM_USE_LOCAL_SCRATCH = False
  RSM_LOCAL_SCRATCH_DIRECTORY =Â
  RSM_LOCAL_SCRATCH_PARTIAL_UNC_PATH =Â
Cluster Shared Directory: /HARDDRIVE/ANSYS/Staging/qfb2scrd.gmu
  RSM_SHARE_STAGING_DIRECTORY = /HARDDRIVE/ANSYS/Staging/qfb2scrd.gmu
Job file clean up: True
Use SSH on Linux cluster nodes: True
  RSM_USE_SSH_LINUX = True
LivelogFile: NOLIVELOGFILE
StdoutLiveLogFile: stdout_ffcba1bc-b678-4069-80e1-67b0aa785470.live
StderrLiveLogFile: stderr_ffcba1bc-b678-4069-80e1-67b0aa785470.liveÂ
Reading input files...
  test2d.wbpz
  test2d_Workbench_Solution.wbjn
  test2d_Workbench_Geometry.wppz
Reading cancel files...
  *.abt
Reading output files...
  test2d_output.wbpz
  test2d_Workbench_Solution_log.txt
  console_output.txt
  *.out
  *.trn
  *.log
  *.txt
Reading exclude files...
  persistedStorage/*
  stdout_ffcba1bc-b678-4069-80e1-67b0aa785470.rsmout
  stderr_ffcba1bc-b678-4069-80e1-67b0aa785470.rsmout
  control_ffcba1bc-b678-4069-80e1-67b0aa785470.rsm
  hosts.dat
  exitcode_ffcba1bc-b678-4069-80e1-67b0aa785470.rsmout
  exitcodeCommands_ffcba1bc-b678-4069-80e1-67b0aa785470.rsmout
  stdout_ffcba1bc-b678-4069-80e1-67b0aa785470.live
  stderr_ffcba1bc-b678-4069-80e1-67b0aa785470.live
  ClusterJobCustomization.xml
  ClusterJobs.py
  clusterjob_ffcba1bc-b678-4069-80e1-67b0aa785470.sh
  clusterjob_ffcba1bc-b678-4069-80e1-67b0aa785470.bat
  inquire.request
  inquire.confirm
  request.upload.rsm
  request.download.rsm
  wait.download.rsm
  scratch.job.rsm
  volatile.job.rsm
  restart.xml
  cancel_ffcba1bc-b678-4069-80e1-67b0aa785470.rsmout
  liveLogLastPositions_ffcba1bc-b678-4069-80e1-67b0aa785470.rsm
  stdout_ffcba1bc-b678-4069-80e1-67b0aa785470_kill.rsmout
  stderr_ffcba1bc-b678-4069-80e1-67b0aa785470_kill.rsmout
  sec.interrupt
  stdout_ffcba1bc-b678-4069-80e1-67b0aa785470_*.rsmout
  stderr_ffcba1bc-b678-4069-80e1-67b0aa785470_*.rsmout
  stdout_task_*.live
  stderr_task_*.live
  control_task_*.rsm
  stdout_task_*.rsmout
  stderr_task_*.rsmout
  exitcode_task_*.rsmout
  exitcodeCommands_task_*.rsmout
  persistedStorage/*
  *.abt
Reading environment variables...
  ANSYS_FRAMEWORK_UNDER_RSM = True
  ANSYS_FRAMEWORK_DEVELOPMENT = 1
  ANSYS_TEST_ME = 2
  ANSYS_FRAMEWORK_UNDER_RSM = True
  ANSYS_FRAMEWORK_DEVELOPMENT = 1
  ANSYS_TEST_ME = 2
  RSM_IRON_PYTHON_HOME = /ansys_inc/v222/commonfiles/IronPython
  RSM_TASK_WORKING_DIRECTORY = /HARDDRIVE/ANSYS/Staging/qfb2scrd.gmu
  RSM_USE_SSH_LINUX = True
  RSM_QUEUE_NAME = Aero
  RSM_CONFIGUREDQUEUE_NAME = Aero
  RSM_COMPUTE_SERVER_MACHINE_NAME = node00.redacted.cr
  RSM_HPC_JOBNAME = RemoteJobName
  RSM_HPC_DISPLAYNAME = task_2
  RSM_HPC_CORES = 94
  RSM_HPC_DISTRIBUTED = TRUE
  RSM_HPC_NODE_EXCLUSIVE = FALSE
  RSM_HPC_QUEUE = Aero
  RSM_HPC_USER = redacted
  RSM_HPC_WORKDIR = /HARDDRIVE/ANSYS/Staging/qfb2scrd.gmu
  RSM_HPC_JOBTYPE = NotUsed
  RSM_HPC_ANSYS_LOCAL_INSTALL_DIRECTORY = /ansys_inc/v222
  RSM_HPC_VERSION = 222
  RSM_HPC_STAGING = /HARDDRIVE/ANSYS/Staging/qfb2scrd.gmu
  RSM_HPC_LOCAL_PLATFORM = Linux
  RSM_HPC_CLUSTER_TARGET_PLATFORM = Linux
  RSM_HPC_STDOUTFILE = stdout_ffcba1bc-b678-4069-80e1-67b0aa785470.rsmout
  RSM_HPC_STDERRFILE = stderr_ffcba1bc-b678-4069-80e1-67b0aa785470.rsmout
  RSM_HPC_STDOUTLIVE = stdout_ffcba1bc-b678-4069-80e1-67b0aa785470.live
  RSM_HPC_STDERRLIVE = stderr_ffcba1bc-b678-4069-80e1-67b0aa785470.live
  RSM_HPC_SCRIPTS_DIRECTORY_LOCAL = /ansys_inc/v222/RSM/Config/scripts
  RSM_HPC_SCRIPTS_DIRECTORY = /ansys_inc/v222/RSM/Config/scripts
  RSM_HPC_SUBMITHOST = 10.115.50.220
  RSM_HPC_STORAGEID =   ec2368dc-fca3-4690-80ef-8d47d4885614   RsmJobRunnerStorage=LocalOS#CRAC$\\192.0.0.100\ansys\qfb2scrd.gmu   Friday, March 01, 2024 09:43:55.922 AM   True
  RSM_HPC_PLATFORMSTORAGEID = \\192.0.0.100\ansys\qfb2scrd.gmu
  RSM_HPC_NATIVEOPTIONS =Â
  ARC_ROOT = /ansys_inc/v222/RSM/Config/scripts/../../ARC
  RSM_HPC_KEYWORD = SLURM
  RSM_PYTHON_LOCALE = en-us
Reading AWP_ROOT environment variable name ...
  AWP_ROOT environment variable name is: AWP_ROOT222
Reading Low Disk Space Warning Limit ...
  Low disk space warning threshold set at: 2.0GiB
Reading File identifier ...
  File identifier found as: ffcba1bc-b678-4069-80e1-67b0aa785470
Done reading control file.
RSM_AWP_ROOT_NAME = AWP_ROOT222
AWP_ROOT222 install directory: /ansys_inc/v222
SLURM_JOB_NODELIST = node00.redacted.cr,node[01-03]<
SLURM_TASKS_PER_NODE = 22,24(x3)<
RSM_MACHINES = node00.redacted.cr:22:node01:24:node02:24:node03:24
ALTERNATE_MACHINES = node00.redacted.cr:22:node01:24:node02:24:node03:24
Number of nodes assigned for current job = 4Â
Machine list: ['node00.redacted.cr', 'node01', 'node02', 'node03']Â
Start running job commands ...
Running on machine : node00.redacted.cr
Current Directory: /HARDDRIVE/ANSYS/Staging/qfb2scrd.gmu
Running command: C:\Program Files\ANSYS Inc\v222\Framework\bin\Win64\runwb2.bat -B -R "test2d_Workbench_Solution.wbjn" -Z Dpdb.EvaluatorRun,Dpdb.EvaluatingProjectUpdate --output "test2d_Workbench_Solution_log.txt"
Redirecting output to  None
Final command arg list : ['C:\\Program Files\\ANSYS Inc\\v222\\Framework\\bin\\Win64\\runwb2.bat', '-B', '-R', 'test2d_Workbench_Solution.wbjn', '-Z', 'Dpdb.EvaluatorRun,Dpdb.EvaluatingProjectUpdate', '--output', 'test2d_Workbench_Solution_log.txt']
Running Process
** Traceback (most recent call last):
** Â File "/ansys_inc/v222/RSM/Config/scripts/ClusterJobs.py", line 292, in main
  exitCodeList, exitCode = runCommandList(_commandList, _commandArgList, _commandRedirectList, _commandProgressMonitoringFlags, _targetCluster, _usingLocalScratch)
** Â File "/ansys_inc/v222/RSM/Config/scripts/ClusterJobs.py", line 1189, in runCommandList
  cFiles, cmdProgMonFlagList[cmdIndex], enablePrints)
** Â File "/ansys_inc/v222/RSM/Config/scripts/ClusterJobs.py", line 1214, in runCommand
  clusterCmd.begin()
** Â File "/ansys_inc/v222/RSM/Config/scripts/ClusterJobs.py", line 1802, in begin
  self.process = subprocess.Popen(self.argList, bufsize=-1, stdin=subprocess.PIPE, stdout=stdoutStream, stderr=stderrStream, cwd=os.getcwd(), universal_newlines=True)
** Â File "/ansys_inc/v222/commonfiles/CPython/3_7/linx64/Release/python/lib/python3.7/subprocess.py", line 800, in __init__
  restore_signals, start_new_session)
** Â File "/ansys_inc/v222/commonfiles/CPython/3_7/linx64/Release/python/lib/python3.7/subprocess.py", line 1551, in _execute_child
  raise child_exception_type(errno_num, err_msg, err_filename)
** FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Program Files\\ANSYS Inc\\v222\\Framework\\bin\\Win64\\runwb2.bat': 'C:\\Program Files\\ANSYS Inc\\v222\\Framework\\bin\\Win64\\runwb2.bat'
Saving exit code file: /HARDDRIVE/ANSYS/Staging/qfb2scrd.gmu/exitcode_ffcba1bc-b678-4069-80e1-67b0aa785470.rsmout
  Exit code file: /HARDDRIVE/ANSYS/Staging/qfb2scrd.gmu/exitcode_ffcba1bc-b678-4069-80e1-67b0aa785470.rsmout has been created.
Saving exit code file: /HARDDRIVE/ANSYS/Staging/qfb2scrd.gmu/exitcodeCommands_ffcba1bc-b678-4069-80e1-67b0aa785470.rsmout
  Exit code file: /HARDDRIVE/ANSYS/Staging/qfb2scrd.gmu/exitcodeCommands_ffcba1bc-b678-4069-80e1-67b0aa785470.rsmout has been created.
ClusterJobs Exiting with code: 9999
Individual Command Exit Codes are: [None]
Fatal error when running job command(s).
[Errno 2] No such file or directory: 'C:\\Program Files\\ANSYS Inc\\v222\\Framework\\bin\\Win64\\runwb2.bat': 'C:\\Program Files\\ANSYS Inc\\v222\\Framework\\bin\\Win64\\runwb2.bat'
A Job command did not return exit code. The job will fail with exit code 9999
Traceback (most recent call last):
 File "/ansys_inc/v222/RSM/Config/scripts/ClusterJobs.py", line 292, in main
  exitCodeList, exitCode = runCommandList(_commandList, _commandArgList, _commandRedirectList, _commandProgressMonitoringFlags, _targetCluster, _usingLocalScratch)
 File "/ansys_inc/v222/RSM/Config/scripts/ClusterJobs.py", line 1189, in runCommandList
  cFiles, cmdProgMonFlagList[cmdIndex], enablePrints)
 File "/ansys_inc/v222/RSM/Config/scripts/ClusterJobs.py", line 1214, in runCommand
  clusterCmd.begin()
 File "/ansys_inc/v222/RSM/Config/scripts/ClusterJobs.py", line 1802, in begin
  self.process = subprocess.Popen(self.argList, bufsize=-1, stdin=subprocess.PIPE, stdout=stdoutStream, stderr=stderrStream, cwd=os.getcwd(), universal_newlines=True)
 File "/ansys_inc/v222/commonfiles/CPython/3_7/linx64/Release/python/lib/python3.7/subprocess.py", line 800, in __init__
  restore_signals, start_new_session)
 File "/ansys_inc/v222/commonfiles/CPython/3_7/linx64/Release/python/lib/python3.7/subprocess.py", line 1551, in _execute_child
  raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Program Files\\ANSYS Inc\\v222\\Framework\\bin\\Win64\\runwb2.bat': 'C:\\Program Files\\ANSYS Inc\\v222\\Framework\\bin\\Win64\\runwb2.bat'Â
-
March 11, 2024 at 4:03 pmMangeshANSYSAnsys Employee
Hello,
please check target platform mix
the log showsÂ
 RSM_HPC_CLUSTER_TARGET_PLATFORM = LinuxÂ
Â
but then further down i see
** Â File "/ansys_inc/v222/commonfiles/CPython/3_7/linx64/Release/python/lib/python3.7/subprocess.py", line 1551, in _execute_child
  raise child_exception_type(errno_num, err_msg, err_filename)
** FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Program Files\\ANSYS Inc\\v222\\Framework\\bin\\Win64\\runwb2.bat': 'C:\\Program Files\\ANSYS Inc\\v222\\Framework\\bin\\Win64\\runwb2.bat'Â
Â
Â
and furtherÂ
[Errno 2] No such file or directory: 'C:\\Program Files\\ANSYS Inc\\v222\\Framework\\bin\\Win64\\runwb2.bat': 'C:\\Program Files\\ANSYS Inc\\v222\\Framework\\bin\\Win64\\runwb2.bat'
A Job command did not return exit code. The job will fail with exit code 9999Â
Â
-
March 11, 2024 at 8:20 pmdbhandsSubscriber
Hello,
Thanks for the reply. I noticed this as well. Where would I check this setting? I am intending to run on linux, so the windows filepaths are incorrect. In the DC evaluator settings on the head node of the slurm cluster, the machine is set to linux and solves when set to direct. When setting to submit to RSM, however, the submission fails with the error above. When setting to RSM the only other option I see is "queue" where I have typed in the name of one of the RSM queues. I do not see anywhere to specify the platform type, however I would asume this should be taken care of by the RSM head node on the slurm cluster like it is during normal submissions to the RSM head node. -
March 11, 2024 at 8:47 pmMangeshANSYSAnsys Employee
Can you please explain the setups and which machine is set to what ?
1. Which computer is the project being opened and submitted from? what is th eOS? what is the DCS setting ?
2. Which is the computer where meshing needs to happen? what is the OS ?3. how and where is the submission to slurm configured?Â
please add screenshots obscuring any information that shuold not be on a public forum
Â
-
- The topic ‘Setting up DCS services for a cluster’ is closed to new replies.
- Workbench license error
- Unexpected error on Workbench: Root element not found.
- Unable to recover corrupted project in Workbench
- Unexpected issues with SCCM deployment of Ansys Fluids and Structures 2024 R1
- Questions and recommendations: Septum Horn Antenna
- AQWA: Hydrodynamic response error
- Tutorial or Help for 2 way FSI
- Moment Reaction probe with Large deformation
- 2 way coupled FSI for ball bearing
- Issue with force and current density calculations in Fluent MHD
-
1301
-
591
-
544
-
524
-
366
© 2025 Copyright ANSYS, Inc. All rights reserved.