April 17, 2024 at 11:26 pm
Vighnesh Natarajan
Subscriber
Hello,
- The operating system is Ubuntu 20.04.6 LTS (Focal Fossa). I login via ssh and using x11 I can open a GUI window.
- This issue comes up with FDTD
- The issue occurs only when I use SLURM as a job scheduler. In the resource manager in the GUI, I use SLURM as the job launcher and have setup a batch script that will run the job. An example of the same is
- the sbatch command is "sbatch --mem=84G --time=48:00:00 -N 12 -n 96 --ntasks-per-node=8"
- This invokes - "mpirun -np 96 --use-hwthread-cpus /share/apps/lumerical/v222/bin/fdtd-engine-ompi-lcl -logall -remote {PROJECT_FILE_PATH}"
- SLURM is the only way I can submit large batch jobs to the cluster across multpile nodes. If I set the resource manager to "Local computer", it uses the mpich2nem solver and that never crashes, however that uses only one core - the core that I used to launch the GUI from the remote ssh session.
Hope this information is useful in helping diagnose the problem.