Photonics

Photonics

Topics related to Lumerical and more.

Running Lumerical FDTD files on a Campus cluster using Sun Grid Engine (SGE)

TAGGED: 

    • katsuya2
      Subscriber

      I want to use the Campus cluster (Sun Grid Engine) to run FDTD files. 
      https://answers.uillinois.edu/scs/page.php?id=104365

      I followed the following instructions to install Lumerical (2024-R1.3) in my cluster folder.
      https://optics.ansys.com/hc/en-us/articles/360035201613-Configuring-your-cluster-for-Ansys-Lumerical

      Next, based on another article (https://optics.ansys.com/hc/en-us/articles/360039028654-Job-scheduler-submission-scripts-SGE-Slurm-Torque), I created the following shell script as "fdtd_test.sh". 

      ---

      #!/bin/csh
      #$ -N fdtd_test
      #$ -cwd
      #$ -o run.out
      #$ -e run.err
      #$ -q intel24
      #$ -pe orte 4


      module load mpi/openmpi-x86_64

      echo "Running on nodes:"
      cat $PE_HOSTFILE

      /home/katsuya2/tools/lumerical/v241/bin/fdtd-mpi-status.sh
      /home/katsuya2/tools/lumerical/v241/bin/fdtd-run-pbs.sh

      ls -l /home/katsuya2/tools/lumerical/v241/bin/fdtd-engine-ompi-lcl
      ls -l rod_ctab_1.fsp

      export PATH=/home/katsuya2/tools/lumerical/v241/bin:$PATH
      export LD_LIBRARY_PATH=/home/katsuya2/tools/lumerical/v241/lib:$LD_LIBRARY_PATHulimit -c unlimited

      mpirun /home/katsuya2/tools/lumerical/v241/bin/fdtd-engine-ompi-lcl -logall -fullinfo rod_ctab_1.fsp

      --- 

      However, it didn't work and I got the error
      saying "TERM environment variable not set.
      /home/katsuya2/tools/lumerical/v241/bin/fdtd-engine-ompi-lcl: error while loading shared libraries: libmpi.so.40: cannot open shared object file: No such file or directory
      /home/katsuya2/tools/lumerical/v241/bin/fdtd-engine-ompi-lcl: error while loading shared libraries: libmpi.so.40: cannot open shared object file: No such file or directory
      "
      This looks like I'm missing libmpi.so.40, but I'm not sure how I can install this. 
      Could you give me some advice to run this file on the cluster? 

    • Lito
      Ansys Employee

      @katsuya2,

      Please consult with IT/cluster admin to install and configure OpenMPI on your cluster.See the KB for more information. 
      >>Running simulations with MPI on Linux – Ansys Optics 

      Your submission script will be something like:  

      export PATH=/home/katsuya2/tools/lumerical/v241/bin:$PATH
      export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/katsuya2/tools/lumerical/v241/lib
      ulimit -c unlimited
      mpiexec -n {processes}  /home/katsuya2/tools/lumerical/v241/bin/fdtd-engine-ompi-lcl -t 1  rod_ctab_1.fsp
    • katsuya2
      Subscriber

      Hi, thank you for your help. 

      I talked with IT admin and he said they have available OpenMPI but not sure what version Ansys software requires.
      Do you know where the kind of information is available in KB? 


      Now I fixed the script as follows, but still not get the file run. Do you see where I got wrong?

      module load mpi/openmpi-x86_64

      echo "Running on nodes:"
      cat $PE_HOSTFILE

      /home/katsuya2/tools/lumerical/v241/bin/fdtd-mpi-status.sh
      /home/katsuya2/tools/lumerical/v241/bin/fdtd-run-pbs.sh

      ls -l /home/katsuya2/tools/lumerical/v241/bin/fdtd-engine-ompi-lcl
      ls -l rod_ctab_1.fsp

      export PATH=/home/katsuya2/tools/lumerical/v241/bin:$PATH
      export LD_LIBRARY_PATH=/home/katsuya2/tools/lumerical/v241/lib:$LD_LIBRARY_PATH

      ulimit -c unlimited

      mpiexec -n /home/katsuya2/tools/lumerical/v241/bin/fdtd-engine-ompi-lcl -t rod_ctab_1.fsp

      Best,
      Katsu

    • Lito
      Ansys Employee

      We tested and support OpenMPI 3 and 4 as indicated in our KB guide. >Running simulations with MPI on Linux – Ansys Optics
      Sorry, I missed the “number of processes” after the "-n" flag in OpenMPI in my previous email. And its missing "1" after the "-t" argument in the engine binary.

      mpiexec -n ## /home/katsuya2/tools/lumerical/v241/bin/fdtd-engine-ompi-lcl -t 1 rod_ctab_1.fsp

       

       

       

    • katsuya2
      Subscriber

      Hello, thank you for your quick response. 

      I fixed the script to use 4 number of processes and openmpi/4.1.6, then it started running but got the following errors. 
      This script outputs "rod_ctab_1_p0.log" for the first time, but it said Error: Could not connect to Ansys license server. 
      Do you have any ideas to deal with this issue? I'd appreciate your help. 

      In run.err
      TERM environment variable not set.
      --------------------------------------------------------------------------
      WARNING: There is at least one non-excluded one OpenFabrics device found,
      but there are no active ports detected (or Open MPI was unable to use
      them).  This is most certainly not what you wanted.  Check your
      cables, subnet manager configuration, etc.  The openib BTL will be
      ignored for this job.

        Local host: compute-3-2
      --------------------------------------------------------------------------
      --------------------------------------------------------------------------
      Primary job  terminated normally, but 1 process returned
      a non-zero exit code. Per user-direction, the job has been aborted.
      --------------------------------------------------------------------------
      --------------------------------------------------------------------------
      mpiexec detected that one or more processes exited with non-zero status, thus causing
      the job to be terminated. The first process to do so was:

        Process name: [[36203,1],2]
        Exit code:    1
      --------------------------------------------------------------------------
      [compute-3-2.local:85943] 3 more processes have sent help message help-mpi-btl-openib.txt / no active ports found
      [compute-3-2.local:85943] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

      In run.out
      Running on nodes:
      compute-3-2.local 4 intel24@compute-3-2.local UNDEFINED
      FDTD Solutions includes two versions of the core simulation engine that allow it
      to integrate with different MPI distributions for parallel computation. This
      includes Intel MPI and Open MPI compatible engines, as well as a standalone
      engine for local multithreaded simulations. Using MPI is strongly suggested
      for best performance, for both local and distributed simulations.

      This utility analyzes your system to determine which engines are suitable.
      If you are planning on integrating FDTD Solutions with your own MPI distribution
      please make sure that it is installed and the location of the shared libraries
      is included in your system library path (ld.so.conf or LD_LIBRARY_PATH),
      otherwise this utility will not be able to detect the correct engine to use.

      Please note the name of the engine program that is suitable for your system.
      You will need to specify this when configuring your system to run jobs. If you
      intend to use the local threaded option, then no configuration is required, this
      is the default option.

        1) fdtd-engine            Threaded executable, no external MPI dependencies
        2) fdtd-engine-ompi-lcl   Open MPI (uses libraries from:  /share/apps/openmpi/4.1.6/lib/libmpi.so.40)

      Remember that you must use the mpiexec/mpirun command that belongs
      to the MPI distribution you are using to start your FDTD engine jobs

      Your system path is configured to use the following mpiexec command.
      If this isn't the one you intend to use, please update your path,
      or ensure that you always use an absolute path to specify the mpiexec
      command.

      /share/apps/openmpi/4.1.6/bin/mpiexec

      Your system path is configured to use the following mpirun command.
      If this isn't the one you intend to use, please update your path,
      or ensure that you always use an absolute path to specify the mpirun
      command.

      /share/apps/openmpi/4.1.6/bin/mpirun

      Press to continue.-rwxr-xr-x 1 katsuya2 domain users 53989664 Jul  1 14:11 /home/katsuya2/tools/lumerical/v241/bin/fdtd-engine-ompi-lcl
      -rw-r--r-- 1 katsuya2 domain users 373010 Jul 10 15:14 rod_ctab_1.fsp
      compute-3-2.local(process 0): Error: Could not connect to Ansys license server specified at
      Would you like to reconfigure your license settings?, Response: No
      compute-3-2.local(process 0): License error: Could not connect to Ansys license server specified at
      compute-3-2.local(process 1): Error: there was a failure with the license. Process number: 0 had this error
      compute-3-2.local(process 2): Error: there was a failure with the license. Process number: 0 had this error
      compute-3-2.local(process 3): Error: there was a failure with the license. Process number: 0 had this error
      compute-3-2.local(process 0): Error: there was a failure with the license. Process number: 0 had this error

    • Lito
      Ansys Employee

      See this KB for more information of the licensing error: -15 -- Cannot connect to the Ansys license server
      >>>Fixing common licensing errors – Ansys Optics 

      Please make sure that the license manager is configured for shared license access and the cluster cannot to the license server on the ports used by the Ansys license manager. 
      >>>Configuring the Ansys license manager for shared access – Ansys Optics 

    • katsuya2
      Subscriber

      Thank you very much again. 

      I understand this is because the license manager hasn't been configured yet for shared license access. 
      Since my license is shared through university webstore, I cannot find the license file at my local PC. But I know the server name and that it's active.
      Therefore, the license manager is stopped. I don't know how to go forward. I guess the only way to contact the university license manager to allow me to configure?

      Best,
      Katsu

       

    • Lito
      Ansys Employee

      If you are able to run Lumerical simulations on your local desktop (university computer), check the Lumerical license configuration on the local machine, from which server and port it is obtaining the licenses from. And use the same license configuration on the cluster. See the following KB for details:

      Otherwise, consult with your IT/admins that the cluster/nodes are allowed to connect/communicate on the ports used by the Ansys license manager on your license server. 

       

    • katsuya2
      Subscriber

      Thanks, 

      I think I've already configured the license on the Ansys Optics Launcher (GUI) and server was active with Default(1055) Port. 
      The university license manager told me "Lmgrad Port 1055, Vendor Daemon Port 55947, Interconnect Port 2325. They are all static." 
      I'm not sure if this info helps you understand my situation. 

      Since the university license manager and cluster IT/admins are in a separate section, cluster IT couldn't help me.  

      I'd appreciate your help. 

      Best,
      Katsu

    • katsuya2
      Subscriber

      Hi,

      I resolved this issue by defining the license environment in the shell script. 

      https://optics.ansys.com/hc/en-us/articles/7595785040403-Setting-environment-variable-in-Linux

      Thank you for your help!

      Best,

      Katsu

Viewing 9 reply threads
  • The topic ‘Running Lumerical FDTD files on a Campus cluster using Sun Grid Engine (SGE)’ is closed to new replies.