Ansys Products

Ansys Products

Discuss installation & licensing of our Ansys Teaching and Research products.

RSM Error when sending test job to HPC SLURM – Failed to find Cluster Queue

    • David
      Subscriber

      We are having some issues in sending a test job to a SLURM HPC in the RSM Configuration Utility on a Windows Laptop. The windows laptop is currently using 2032 R2 but we also tried R1  of 2023 and still having the same issues. From reading the log file it seems to fail at the end with a "Failed to find cluster queue" error message.  Any ideas or troubleshooting steps you all recommend?

      MEmpty Stdout Variable From Primary Command: 'RSM_HPC_PRIMARY_STDOUT'
      59110/4/2023 10:41:54 AMExternal operation: 'checkQueueExists' has failed.  This may or may not become a fatal error
      59210/4/2023 10:41:54 AMHpcCommand Failed: queryQueues
      59310/4/2023 10:41:54 AMSubmit Failed
      59410/4/2023 10:41:54 AMFailed to find cluster queue common
      59510/4/2023 10:41:54 AMAnsys.Rsm.ClusterOperations.GenericClusterException: Failed to find cluster queue common
      59610/4/2023 10:41:54 AM   at Ansys.Rsm.ClusterOperations.Operations.SubmitOperation.PreSubmit(IClusterOperationsLogger logger, IBatchJobCommandDefinition commandDefinition, ClusterJobInfo jobInfo, IJobHandle handle)
      59710/4/2023 10:41:54 AM   at Ansys.Rsm.ClusterOperations.Operations.SubmitOperation.DoExecute(IClusterOperationsLogger logger, ClusterJobInfo jobInfo, IJobHandle handle)
      59810/4/2023 10:41:54 AM   at Ansys.Rsm.ClusterOperations.Operations.AbstractJobBasedClusterOperation.Execute(IClusterOperationsLogger logger, ClusterJobInfo jobInfo, IJobHandle handle)
      59910/4/2023 10:41:54 AM   at Ansys.Rsm.ClusterOperations.JobBasedClusterOperations.Submit(ClusterJobInfo jobInfo, IClusterOperationsLogger logger, IJobHandle handle, IExternalStorageHandler storageHandler, IBatchJobCommandDefinition commandDefinition, String& jobId)
      60010/4/2023 10:41:54 AM   at Ansys.Rsm.JobManagement.Core.ClusterHandler.Submit(Id internalId, JobDefinition jobDefinition, WaitHandle cancelHandle)
      60110/4/2023 10:41:54 AM--- Ansys.Rsm.JobManagement.Core.ClusterInteractionException: Submit Failed
      60210/4/2023 10:41:54 AM   at Ansys.Rsm.JobManagement.Core.AbstractClusterInteracter`2.WaitUntilDone(TimeSpan timeout, TResult& result)
      60310/4/2023 10:41:54 AM   at Ansys.Rsm.JobManagement.Core.SubmitAsyncResult.PerformClusterSubmit()
      60410/4/2023 10:41:54 AM   at Ansys.Rsm.JobManagement.Core.SubmitAsyncResult.DoTaskOperation()
      60510/4/2023 10:41:54 AMJob submission failed.
      60610/4/2023 10:41:54 AMFinalize not required since submission failed or aborted.
      60710/4/2023 10:41:54 AMRelease request received.
    • MangeshANSYS
      Ansys Employee

      Hello,

      Please ensure that squeues or similar slurm commands can bbeund when job is being submitted

      Please refer 

      2.2.3.1. Adding Common Job Environment Variables for Jobs

      at
      https://ansyshelp.ansys.com/account/secured?returnurl=/Views/Secured/corp/v232/en/wb_rsm/wb_rsm_setup_linux.html%23rsm_native_environvar

    • MangeshANSYS
      Ansys Employee

      If this the first time RSM is being configured in your enrironment?

      Or are there other users or prior versions which are working correctly ? 

       

      some things to double check:

      On linux say in .bashrc or similar is the variable AWP_ROOT232 set ?

      From Windows if you run the below - when is the output of command below ? do you see the same output as when running sinfo on linux say in an interactive putty session? (please use your actual username and linux machine hostname when running this ocmmand)

      plink.exe -batch -i "%KEYPATH%" your_actual_username@your_actual_linux_machine_name sinfo

Viewing 2 reply threads
  • The topic ‘RSM Error when sending test job to HPC SLURM – Failed to find Cluster Queue’ is closed to new replies.