We’re putting the final touches on our new badges platform. Badge issuance remains temporarily paused, but all completions are being recorded and will be fulfilled once the platform is live. Thank you for your patience.

Ansys Learning Forum Forums Discuss Simulation Photonics Lumerical cluster job crashes arbitrarily Reply To: Lumerical cluster job crashes arbitrarily

Vighnesh Natarajan
Subscriber

Hello,

  • The operating system is Ubuntu 20.04.6 LTS (Focal Fossa). I login via ssh and using x11 I can open a GUI window.
  • This issue comes up with FDTD
  • The issue occurs only when I use SLURM as a job scheduler. In the resource manager in the GUI, I use SLURM as the job launcher and have setup a batch script that will run the job. An example of the same is
    • the sbatch command is "sbatch --mem=84G --time=48:00:00 -N 12 -n 96 --ntasks-per-node=8"
    • This invokes - "mpirun -np 96 --use-hwthread-cpus /share/apps/lumerical/v222/bin/fdtd-engine-ompi-lcl -logall -remote {PROJECT_FILE_PATH}"
  • SLURM is the only way I can submit large batch jobs to the cluster across multpile nodes. If I set the resource manager to "Local computer", it uses the mpich2nem solver and that never crashes, however that uses only one core - the core that I used to launch the GUI from the remote ssh session.

Hope this information is useful in helping diagnose the problem.