The Ansys Innovation Space website recently experienced a database corruption issue. While service has been restored there appears to have been some data loss from November 13. We are still investigating and apologize for any issues our users may have as a result.
Photonics

Photonics

Topics related to Lumerical and more.

Errors Running Ring Modulator Example on Cluster

    • adevata
      Subscriber

      Hello! I am using Lumerical (2021R2) through our high-performance computing cluster. Specifically, I am trying to replicate this example simulation: https://optics.ansys.com/hc/en-us/articles/360042322794-Ring-Modulator. Our cluster uses the research package, and we have access to all simulation modules used in this example. For further context, I have had no issues running FDE, FDTD, varFDTD, and EME simulations on this cluster.
      I am having trouble executing step 3 using the CHARGE solver. I have already successfully completed steps 1 and 2, and the results are identical to those in the link. For step 3, I have tried a) clicking simulate --> run at the top right ; b) right clicking step 3 in the parameter sweeps and optimization tab --> run ; c) right clicking the parameter extraction sweep in the sweeps and optimization tab --> run. All three of these methods will open up the file associated with step 1, run the simulation, close the file, and repeat for steps 2 and 3. It does the same for step 4, but errors out after it executes the first voltage step (see step 4 error below). Note all of this happens within the single run command from methods a-c mentioned above: I am not touching the simulation software after clicking the run in the .ldev file associated with Step 3, it automatically re-executes everything starting step 1.
      As such, I have some questions:
      1. Step 3, #2 is unclear to me: How do I properly run step 3 of this simulation such that it allows me to run step 4?
      2. Is there a way to only run step 3 without having the software rerun step 1 and 2?
      3. What is going on in step 4 to cause it to break, either internally within the step 3 nest or externally when I run step 4 separately?
      Our cluster's IT support recommend reaching out on this forum. Any and all help would be greatly appreciated. Thank you for your time and consideration, and I look foward to your reply!
      Edit: I ran this locally on Lumerical 2024 R1.3 and had no issues. The final results closely matched what was shown in the link above. However, running this locally is not a long term solution, I must run this on the cluster in the near future.
      Best,
      AD
      ----------
      ldev error:
      "/software/lumerical/2021r2/lumerical/v212/mpich2/nemesis/bin/mpiexec" -n 1 "/software/lumerical/2021r2/lumerical/v212/bin/device-engine" "/projects/NODE/USER/TrialRuns/RingResonator/c118296f-ring-modulator/waveguide_modulator.ldev"
      ----------------------------------------
      srun job start: Fri Nov 1 09:21:19 CDT 2024
      Job ID: 5747109
      Username: userNAME
      Queue: NODE
      Account: NODE
      ----------------------------------------
      The following variables are not
      guaranteed to be the same in the
      prologue and the job run script
      ----------------------------------------
      step 4 error: /projects/NODE/USER/TrialRuns/RingResonator/c118296f-ring-modulator/waveguide_modulator_parameter_extraction/voltage_neff_sweep_step4.lsf line 16: The optimization or parameter sweep object 'sweep_voltage' has no results, please run the optimization or parameter sweep before using getsweepresult.

    • kghaffari
      Ansys Employee

      Hi AD,

      Sorry I think there has been an issue with the forum and my reply is lost. Please let me know if you received my response earlier.

      In short, I think the issues are related to the CHARGE simulation not running. I recommend avoiding running the sweep for now but just testing with clicking run. I tested the example on my side and did not see the issue you describe. Can you confirm if the issues occur even if run on a local machine (and not cluster)? Currently we are not able to reproduce the issue.

      Best regards,

      Khash

    • adevata
      Subscriber

      Hi Khash,

      Thanks for your reply. I remember seeing your reply about running it locally at first, but I wasn't able to test it before I noticed that the reply was gone. I did edit my post today stating that I was able to run the simulation locally on Lumerical 2024 R1.3, and had no issues at all. However, as I point out in my edit, this is not a sustianable solution; I need to run this over the cluster.

      Just to confirm, your recommendation is to do option a) I've listed above right? This ignores the parameter extraction completely. I just ran it again over the cluster right now (using option a)) and I got the same ldev error I included in my original post. The actual error message is quite a bit longer, with PATH, mpiexec/mpich2, nemesis, and munmap_chunk mentioned a couple times. If you'd like I can share the full error message, although most of it looks like gibberish to me (random numbers and letters strung together)?

      I'm not sure if you are interfacing with my university on this issue, but if you would be willing to schedule a call sometime in the coming week I'd be more than happy to do that? For now, if you could let me know what next steps I should take to either run the simulation properly or to test/debug the issues we're having, please let me know.

      Thank you for your time and help, and I look forward to your reply!

      Cheers,

      AD

    • kghaffari
      Ansys Employee

      Hi,

      Thank you for the update. Yes, I recommend just running the CHARGE simulations for your troubleshooting and avoiding the extraction sweeps for now.

      Are you running the simulation on cluster from the CAD or from the command line with a job scheduler? If using a job scheduler, please share the submission script so we can comment.

       Please note that CHARGE can not run on multiple nodes. So you will have to request 1 node on the cluster and use all threads/cores on that 1 node. If you are using multiple nodes, this may be causing the error.

      Best,

      Khash

      • adevata
        Subscriber

        Hi Khash,

        Thanks again for your reply. Our cluster allows us to access Lumerical's GUI via an interactive session (I believe this is the CAD?), so I am opening CHARGE that way. Our slurm/scheduler code will take this request and assign me to our lab's computing node to start a session. Once I am within this session, I open Lumerical by running "module load lumerical/2021r2", hitting enter, then confirming with "CAD" and enter again (I don't thik this CAD stands for computer aided design, as I need to use the same command to open any Lumerical software using this method). I don't currently have access to our scheduler code that allows me to open an interactive session, but our IT team may be willing to share it (most likely not publicly though). Would you still like to see this?

        All of these simulations are running on a single node. I am not requesting the full processing and memory capabilities of this node as the simulations don't require it. 

        Let me know if there is any additional information I can provide. Thanks for the help!
        - AD

    • Lito
      Ansys Employee

      @adevata,

      The ring modulator example has example/simulation for FDTD, MODE, CHARGE and INTERCONNECT.

      To debug/troubleshoot the issue, please assist us with the following:

      • Is the issue only happening to CHARGE simulations? Or do you have problems with all simulations, e.g. FDTD, MODE and INTC?
      • Does the error happen when running script.lsf?
      • Or when you are running the simulation file directly (not running a script)?
      • Are you able to the “waveguide_modulator.ldev” example from the Lumerical GUI on the cluster?
      • Or are you running the simulation file directly from the command line? 
      • If you run into an issue running a simulation from the Lumerical GUI on the cluster, send a screenshot fo the error message (job details) similar to below. 

      Otherwise, if you are having issues running script.lsf files, 

      • Copy and paste (here) the script you are running that is generating the error.
      • Send a screenshot of the error message – when running the script.lsf file on the cluster.
      • Send the command you use to run an interactive session or the command used to request for resources on the cluster.

       

       
       
       
       
Viewing 4 reply threads
  • You must be logged in to reply to this topic.