-
-
October 5, 2023 at 8:08 pm0okamiSubscriber
We are having some issues in sending a test job to a SLURM HPC in the RSM Configuration Utility on a Windows Laptop. The windows laptop is currently using 2032 R2 but we also tried R1 of 2023 and still having the same issues. From reading the log file it seems to fail at the end with a "Failed to find cluster queue" error message. Any ideas or troubleshooting steps you all recommend?
M Empty Stdout Variable From Primary Command: 'RSM_HPC_PRIMARY_STDOUT' 591 10/4/2023 10:41:54 AM External operation: 'checkQueueExists' has failed. This may or may not become a fatal error 592 10/4/2023 10:41:54 AM HpcCommand Failed: queryQueues 593 10/4/2023 10:41:54 AM Submit Failed 594 10/4/2023 10:41:54 AM Failed to find cluster queue common 595 10/4/2023 10:41:54 AM Ansys.Rsm.ClusterOperations.GenericClusterException: Failed to find cluster queue common 596 10/4/2023 10:41:54 AM at Ansys.Rsm.ClusterOperations.Operations.SubmitOperation.PreSubmit(IClusterOperationsLogger logger, IBatchJobCommandDefinition commandDefinition, ClusterJobInfo jobInfo, IJobHandle handle) 597 10/4/2023 10:41:54 AM at Ansys.Rsm.ClusterOperations.Operations.SubmitOperation.DoExecute(IClusterOperationsLogger logger, ClusterJobInfo jobInfo, IJobHandle handle) 598 10/4/2023 10:41:54 AM at Ansys.Rsm.ClusterOperations.Operations.AbstractJobBasedClusterOperation.Execute(IClusterOperationsLogger logger, ClusterJobInfo jobInfo, IJobHandle handle) 599 10/4/2023 10:41:54 AM at Ansys.Rsm.ClusterOperations.JobBasedClusterOperations.Submit(ClusterJobInfo jobInfo, IClusterOperationsLogger logger, IJobHandle handle, IExternalStorageHandler storageHandler, IBatchJobCommandDefinition commandDefinition, String& jobId) 600 10/4/2023 10:41:54 AM at Ansys.Rsm.JobManagement.Core.ClusterHandler.Submit(Id internalId, JobDefinition jobDefinition, WaitHandle cancelHandle) 601 10/4/2023 10:41:54 AM --- Ansys.Rsm.JobManagement.Core.ClusterInteractionException: Submit Failed 602 10/4/2023 10:41:54 AM at Ansys.Rsm.JobManagement.Core.AbstractClusterInteracter`2.WaitUntilDone(TimeSpan timeout, TResult& result) 603 10/4/2023 10:41:54 AM at Ansys.Rsm.JobManagement.Core.SubmitAsyncResult.PerformClusterSubmit() 604 10/4/2023 10:41:54 AM at Ansys.Rsm.JobManagement.Core.SubmitAsyncResult.DoTaskOperation() 605 10/4/2023 10:41:54 AM Job submission failed. 606 10/4/2023 10:41:54 AM Finalize not required since submission failed or aborted. 607 10/4/2023 10:41:54 AM Release request received. -
October 10, 2023 at 3:22 pmMangeshANSYSAnsys Employee
Hello,
Please ensure that squeues or similar slurm commands can bbeund when job is being submitted
Please refer
2.2.3.1. Adding Common Job Environment Variables for Jobs
at
https://ansyshelp.ansys.com/account/secured?returnurl=/Views/Secured/corp/v232/en/wb_rsm/wb_rsm_setup_linux.html%23rsm_native_environvar -
November 28, 2023 at 3:41 pmMangeshANSYSAnsys Employee
If this the first time RSM is being configured in your enrironment?
Or are there other users or prior versions which are working correctly ?
some things to double check:
On linux say in .bashrc or similar is the variable AWP_ROOT232 set ?
From Windows if you run the below - when is the output of command below ? do you see the same output as when running sinfo on linux say in an interactive putty session? (please use your actual username and linux machine hostname when running this ocmmand)
plink.exe -batch -i "%KEYPATH%" your_actual_username@your_actual_linux_machine_name sinfo
-
- The topic ‘RSM Error when sending test job to HPC SLURM – Failed to find Cluster Queue’ is closed to new replies.
-
1116
-
468
-
440
-
225
-
201
© 2024 Copyright ANSYS, Inc. All rights reserved.