Running Lumerical FDTD files on a Campus cluster using Sun Grid Engine (SGE)
TAGGED: cluster
-
-
August 30, 2024 at 7:07 pmkatsuya2Subscriber
I want to use the Campus cluster (Sun Grid Engine) to run FDTD files.
https://answers.uillinois.edu/scs/page.php?id=104365
I followed the following instructions to install Lumerical (2024-R1.3) in my cluster folder.
https://optics.ansys.com/hc/en-us/articles/360035201613-Configuring-your-cluster-for-Ansys-Lumerical
Next, based on another article (https://optics.ansys.com/hc/en-us/articles/360039028654-Job-scheduler-submission-scripts-SGE-Slurm-Torque), I created the following shell script as "fdtd_test.sh".
---#!/bin/csh
#$ -N fdtd_test
#$ -cwd
#$ -o run.out
#$ -e run.err
#$ -q intel24
#$ -pe orte 4
module load mpi/openmpi-x86_64echo "Running on nodes:"
cat $PE_HOSTFILE/home/katsuya2/tools/lumerical/v241/bin/fdtd-mpi-status.sh
/home/katsuya2/tools/lumerical/v241/bin/fdtd-run-pbs.shls -l /home/katsuya2/tools/lumerical/v241/bin/fdtd-engine-ompi-lcl
ls -l rod_ctab_1.fspexport PATH=/home/katsuya2/tools/lumerical/v241/bin:$PATH
export LD_LIBRARY_PATH=/home/katsuya2/tools/lumerical/v241/lib:$LD_LIBRARY_PATHulimit -c unlimitedmpirun /home/katsuya2/tools/lumerical/v241/bin/fdtd-engine-ompi-lcl -logall -fullinfo rod_ctab_1.fsp
---
However, it didn't work and I got the error
saying "TERM environment variable not set.
/home/katsuya2/tools/lumerical/v241/bin/fdtd-engine-ompi-lcl: error while loading shared libraries: libmpi.so.40: cannot open shared object file: No such file or directory
/home/katsuya2/tools/lumerical/v241/bin/fdtd-engine-ompi-lcl: error while loading shared libraries: libmpi.so.40: cannot open shared object file: No such file or directory
"
This looks like I'm missing libmpi.so.40, but I'm not sure how I can install this.
Could you give me some advice to run this file on the cluster? -
September 13, 2024 at 3:40 pmLitoAnsys Employee
@katsuya2,
Please consult with IT/cluster admin to install and configure OpenMPI on your cluster.See the KB for more information.
>>Running simulations with MPI on Linux – Ansys OpticsYour submission script will be something like:
export PATH=/home/katsuya2/tools/lumerical/v241/bin:$PATH
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/katsuya2/tools/lumerical/v241/lib
ulimit -c unlimited
mpiexec -n {processes} /home/katsuya2/tools/lumerical/v241/bin/fdtd-engine-ompi-lcl -t 1 rod_ctab_1.fsp -
September 18, 2024 at 10:43 pmkatsuya2Subscriber
Hi, thank you for your help.
I talked with IT admin and he said they have available OpenMPI but not sure what version Ansys software requires.
Do you know where the kind of information is available in KB?
Now I fixed the script as follows, but still not get the file run. Do you see where I got wrong?
module load mpi/openmpi-x86_64echo "Running on nodes:"
cat $PE_HOSTFILE/home/katsuya2/tools/lumerical/v241/bin/fdtd-mpi-status.sh
/home/katsuya2/tools/lumerical/v241/bin/fdtd-run-pbs.shls -l /home/katsuya2/tools/lumerical/v241/bin/fdtd-engine-ompi-lcl
ls -l rod_ctab_1.fspexport PATH=/home/katsuya2/tools/lumerical/v241/bin:$PATH
export LD_LIBRARY_PATH=/home/katsuya2/tools/lumerical/v241/lib:$LD_LIBRARY_PATHulimit -c unlimited
mpiexec -n /home/katsuya2/tools/lumerical/v241/bin/fdtd-engine-ompi-lcl -t rod_ctab_1.fsp
Best,
Katsu -
September 19, 2024 at 10:16 pmLitoAnsys Employee
We tested and support OpenMPI 3 and 4 as indicated in our KB guide. >Running simulations with MPI on Linux – Ansys Optics<
Sorry, I missed the “number of processes” after the "-n" flag in OpenMPI in my previous email. And its missing "1" after the "-t" argument in the engine binary.mpiexec -n ## /home/katsuya2/tools/lumerical/v241/bin/fdtd-engine-ompi-lcl -t 1 rod_ctab_1.fsp
-
September 20, 2024 at 5:54 pmkatsuya2Subscriber
Hello, thank you for your quick response.
I fixed the script to use 4 number of processes and openmpi/4.1.6, then it started running but got the following errors.
This script outputs "rod_ctab_1_p0.log" for the first time, but it said Error: Could not connect to Ansys license server.
Do you have any ideas to deal with this issue? I'd appreciate your help.In run.err
TERM environment variable not set.
--------------------------------------------------------------------------
WARNING: There is at least one non-excluded one OpenFabrics device found,
but there are no active ports detected (or Open MPI was unable to use
them). This is most certainly not what you wanted. Check your
cables, subnet manager configuration, etc. The openib BTL will be
ignored for this job.Local host: compute-3-2
--------------------------------------------------------------------------
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpiexec detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:Process name: [[36203,1],2]
Exit code: 1
--------------------------------------------------------------------------
[compute-3-2.local:85943] 3 more processes have sent help message help-mpi-btl-openib.txt / no active ports found
[compute-3-2.local:85943] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
In run.out
Running on nodes:
compute-3-2.local 4 intel24@compute-3-2.local UNDEFINED
FDTD Solutions includes two versions of the core simulation engine that allow it
to integrate with different MPI distributions for parallel computation. This
includes Intel MPI and Open MPI compatible engines, as well as a standalone
engine for local multithreaded simulations. Using MPI is strongly suggested
for best performance, for both local and distributed simulations.This utility analyzes your system to determine which engines are suitable.
If you are planning on integrating FDTD Solutions with your own MPI distribution
please make sure that it is installed and the location of the shared libraries
is included in your system library path (ld.so.conf or LD_LIBRARY_PATH),
otherwise this utility will not be able to detect the correct engine to use.Please note the name of the engine program that is suitable for your system.
You will need to specify this when configuring your system to run jobs. If you
intend to use the local threaded option, then no configuration is required, this
is the default option.1) fdtd-engine Threaded executable, no external MPI dependencies
2) fdtd-engine-ompi-lcl Open MPI (uses libraries from: /share/apps/openmpi/4.1.6/lib/libmpi.so.40)Remember that you must use the mpiexec/mpirun command that belongs
to the MPI distribution you are using to start your FDTD engine jobsYour system path is configured to use the following mpiexec command.
If this isn't the one you intend to use, please update your path,
or ensure that you always use an absolute path to specify the mpiexec
command./share/apps/openmpi/4.1.6/bin/mpiexec
Your system path is configured to use the following mpirun command.
If this isn't the one you intend to use, please update your path,
or ensure that you always use an absolute path to specify the mpirun
command./share/apps/openmpi/4.1.6/bin/mpirun
Press
to continue.-rwxr-xr-x 1 katsuya2 domain users 53989664 Jul 1 14:11 /home/katsuya2/tools/lumerical/v241/bin/fdtd-engine-ompi-lcl
-rw-r--r-- 1 katsuya2 domain users 373010 Jul 10 15:14 rod_ctab_1.fsp
compute-3-2.local(process 0): Error: Could not connect to Ansys license server specified at
Would you like to reconfigure your license settings?, Response: No
compute-3-2.local(process 0): License error: Could not connect to Ansys license server specified at
compute-3-2.local(process 1): Error: there was a failure with the license. Process number: 0 had this error
compute-3-2.local(process 2): Error: there was a failure with the license. Process number: 0 had this error
compute-3-2.local(process 3): Error: there was a failure with the license. Process number: 0 had this error
compute-3-2.local(process 0): Error: there was a failure with the license. Process number: 0 had this error -
September 20, 2024 at 8:45 pmLitoAnsys Employee
See this KB for more information of the licensing error: -15 -- Cannot connect to the Ansys license server
>>>Fixing common licensing errors – Ansys OpticsPlease make sure that the license manager is configured for shared license access and the cluster cannot to the license server on the ports used by the Ansys license manager.
>>>Configuring the Ansys license manager for shared access – Ansys Optics -
September 20, 2024 at 11:05 pmkatsuya2Subscriber
Thank you very much again.
I understand this is because the license manager hasn't been configured yet for shared license access.
Since my license is shared through university webstore, I cannot find the license file at my local PC. But I know the server name and that it's active.
Therefore, the license manager is stopped. I don't know how to go forward. I guess the only way to contact the university license manager to allow me to configure?
Best,
Katsu -
September 24, 2024 at 10:08 pmLitoAnsys Employee
If you are able to run Lumerical simulations on your local desktop (university computer), check the Lumerical license configuration on the local machine, from which server and port it is obtaining the licenses from. And use the same license configuration on the cluster. See the following KB for details:
- Lumerical license configuration with the Ansys Optics Launcher (GUI) – Ansys Optics
- Lumerical license configuration from the command line (headless Linux systems/without GUI) – Ansys Optics
Otherwise, consult with your IT/admins that the cluster/nodes are allowed to connect/communicate on the ports used by the Ansys license manager on your license server.
-
October 2, 2024 at 4:47 pmkatsuya2Subscriber
Thanks,
I think I've already configured the license on the Ansys Optics Launcher (GUI) and server was active with Default(1055) Port.
The university license manager told me "Lmgrad Port 1055, Vendor Daemon Port 55947, Interconnect Port 2325. They are all static."
I'm not sure if this info helps you understand my situation.
Since the university license manager and cluster IT/admins are in a separate section, cluster IT couldn't help me.I'd appreciate your help.
Best,
Katsu -
October 3, 2024 at 9:14 pmkatsuya2Subscriber
Hi,
I resolved this issue by defining the license environment in the shell script.
https://optics.ansys.com/hc/en-us/articles/7595785040403-Setting-environment-variable-in-Linux
Thank you for your help!
Best,
Katsu
-
- The topic ‘Running Lumerical FDTD files on a Campus cluster using Sun Grid Engine (SGE)’ is closed to new replies.
- Errors Running Ring Modulator Example on Cluster
- Difference between answers in version 2024 and 2017 lumerical mode solution
- INTERCONNECT – No results unless rerun simulation until it gives any
- Import material .txt file with script
- Trapezoidal ring
- Help for qINTERCONNECT
- Issues with getting result from interconnent analysis script
- Topology Optimization Error
- Edge Coupler EME Example Issue
- The two modes overlap the integral
-
1191
-
513
-
488
-
225
-
209
© 2024 Copyright ANSYS, Inc. All rights reserved.