TAGGED: cluster, error, example, hpc-cluster, Lumerical-CHARGE, question, ring-resonators, simulation
-
-
November 11, 2024 at 3:11 pmadevataSubscriberHello! I am using Lumerical (2021R2) through our high-performance computing cluster. Specifically, I am trying to replicate this example simulation: https://optics.ansys.com/hc/en-us/articles/360042322794-Ring-Modulator. Our cluster uses the research package, and we have access to all simulation modules used in this example. For further context, I have had no issues running FDE, FDTD, varFDTD, and EME simulations on this cluster.I am having trouble executing step 3 using the CHARGE solver. I have already successfully completed steps 1 and 2, and the results are identical to those in the link. For step 3, I have tried a) clicking simulate --> run at the top right ; b) right clicking step 3 in the parameter sweeps and optimization tab --> run ; c) right clicking the parameter extraction sweep in the sweeps and optimization tab --> run. All three of these methods will open up the file associated with step 1, run the simulation, close the file, and repeat for steps 2 and 3. It does the same for step 4, but errors out after it executes the first voltage step (see step 4 error below). Note all of this happens within the single run command from methods a-c mentioned above: I am not touching the simulation software after clicking the run in the .ldev file associated with Step 3, it automatically re-executes everything starting step 1.As such, I have some questions:1. Step 3, #2 is unclear to me: How do I properly run step 3 of this simulation such that it allows me to run step 4?2. Is there a way to only run step 3 without having the software rerun step 1 and 2?3. What is going on in step 4 to cause it to break, either internally within the step 3 nest or externally when I run step 4 separately?Our cluster's IT support recommend reaching out on this forum. Any and all help would be greatly appreciated. Thank you for your time and consideration, and I look foward to your reply!Edit: I ran this locally on Lumerical 2024 R1.3 and had no issues. The final results closely matched what was shown in the link above. However, running this locally is not a long term solution, I must run this on the cluster in the near future.Best,AD----------ldev error:"/software/lumerical/2021r2/lumerical/v212/mpich2/nemesis/bin/mpiexec" -n 1 "/software/lumerical/2021r2/lumerical/v212/bin/device-engine" "/projects/NODE/USER/TrialRuns/RingResonator/c118296f-ring-modulator/waveguide_modulator.ldev"----------------------------------------srun job start: Fri Nov 1 09:21:19 CDT 2024Job ID: 5747109Username: userNAMEQueue: NODEAccount: NODE----------------------------------------The following variables are notguaranteed to be the same in theprologue and the job run script----------------------------------------step 4 error: /projects/NODE/USER/TrialRuns/RingResonator/c118296f-ring-modulator/waveguide_modulator_parameter_extraction/voltage_neff_sweep_step4.lsf line 16: The optimization or parameter sweep object 'sweep_voltage' has no results, please run the optimization or parameter sweep before using getsweepresult.
-
November 19, 2024 at 11:32 pmkghaffariAnsys Employee
Hi AD,
Sorry I think there has been an issue with the forum and my reply is lost. Please let me know if you received my response earlier.
In short, I think the issues are related to the CHARGE simulation not running. I recommend avoiding running the sweep for now but just testing with clicking run. I tested the example on my side and did not see the issue you describe. Can you confirm if the issues occur even if run on a local machine (and not cluster)? Currently we are not able to reproduce the issue.
Best regards,
Khash
-
November 20, 2024 at 1:40 amadevataSubscriber
Hi Khash,
Thanks for your reply. I remember seeing your reply about running it locally at first, but I wasn't able to test it before I noticed that the reply was gone. I did edit my post today stating that I was able to run the simulation locally on Lumerical 2024 R1.3, and had no issues at all. However, as I point out in my edit, this is not a sustianable solution; I need to run this over the cluster.
Just to confirm, your recommendation is to do option a) I've listed above right? This ignores the parameter extraction completely. I just ran it again over the cluster right now (using option a)) and I got the same ldev error I included in my original post. The actual error message is quite a bit longer, with PATH, mpiexec/mpich2, nemesis, and munmap_chunk mentioned a couple times. If you'd like I can share the full error message, although most of it looks like gibberish to me (random numbers and letters strung together)?
I'm not sure if you are interfacing with my university on this issue, but if you would be willing to schedule a call sometime in the coming week I'd be more than happy to do that? For now, if you could let me know what next steps I should take to either run the simulation properly or to test/debug the issues we're having, please let me know.
Thank you for your time and help, and I look forward to your reply!
Cheers,AD
-
November 21, 2024 at 8:51 pmkghaffariAnsys Employee
Hi,
Thank you for the update. Yes, I recommend just running the CHARGE simulations for your troubleshooting and avoiding the extraction sweeps for now.
Are you running the simulation on cluster from the CAD or from the command line with a job scheduler? If using a job scheduler, please share the submission script so we can comment.
Please note that CHARGE can not run on multiple nodes. So you will have to request 1 node on the cluster and use all threads/cores on that 1 node. If you are using multiple nodes, this may be causing the error.
Best,
Khash
-
November 21, 2024 at 9:27 pmadevataSubscriber
Hi Khash,
Thanks again for your reply. Our cluster allows us to access Lumerical's GUI via an interactive session (I believe this is the CAD?), so I am opening CHARGE that way. Our slurm/scheduler code will take this request and assign me to our lab's computing node to start a session. Once I am within this session, I open Lumerical by running "module load lumerical/2021r2", hitting enter, then confirming with "CAD" and enter again (I don't thik this CAD stands for computer aided design, as I need to use the same command to open any Lumerical software using this method). I don't currently have access to our scheduler code that allows me to open an interactive session, but our IT team may be willing to share it (most likely not publicly though). Would you still like to see this?
All of these simulations are running on a single node. I am not requesting the full processing and memory capabilities of this node as the simulations don't require it.
Let me know if there is any additional information I can provide. Thanks for the help!
- AD
-
-
November 28, 2024 at 12:09 amLitoAnsys Employee
@adevata,
The ring modulator example has example/simulation for FDTD, MODE, CHARGE and INTERCONNECT.
To debug/troubleshoot the issue, please assist us with the following:
- Is the issue only happening to CHARGE simulations? Or do you have problems with all simulations, e.g. FDTD, MODE and INTC?
- Does the error happen when running script.lsf?
- Or when you are running the simulation file directly (not running a script)?
- Are you able to the “waveguide_modulator.ldev” example from the Lumerical GUI on the cluster?
- Or are you running the simulation file directly from the command line?
- If you run into an issue running a simulation from the Lumerical GUI on the cluster, send a screenshot fo the error message (job details) similar to below.
Otherwise, if you are having issues running script.lsf files,
- Copy and paste (here) the script you are running that is generating the error.
- Send a screenshot of the error message – when running the script.lsf file on the cluster.
- Send the command you use to run an interactive session or the command used to request for resources on the cluster.
-
December 3, 2024 at 4:18 pmadevataSubscriber
Hi Lito,
To address your questions in order:
- Following up from my original post, this issue only happens in CHARGE. I do not have issues with FDTD, MODE, and INTC, although because CHARGE would bug out, MODE would also bug out in Step 4. When I used a bogus wg_charge.mat, Step 4 would work.
- I'm not sure what script.lsf is, it is not in the original .zip file I downloaded from my link in the original post.
- The error occurs when I run the simulation file.
- Yes, I am able to open wavguide_modulator.ldev using Lumerical's GUI on the cluster.
- No, I am not using a command line to run the simulation file. The command line is only to open the Lumerical GUI. I do not touch the command line at all during my cluster session.
- I've pasted about 10 lines of the error message in my original post, but ill include the full error message at the end of this reply.
- N/A
- N/A
- We use an internal website to request resources to a GNOME desktop. Once open, I use the terminal app to open Lumerical: I type "module load lumerical/2021r2", hit enter, then type "CAD" to confirm our licensing agreement, and hit enter again. The Lumerical GUI opens, and I can use it just how one uses the application if installed locally.
Please let me know if there is any additional information I can provide. Thank you for your help!
-ADldev error message:
"/software/lumerical/2021r2/lumerical/v212/mpich2/nemesis/bin/mpiexec" -n 1 "/software/lumerical/2021r2/lumerical/v212/bin/device-engine" "/projects/b1204/adevata/TrialRuns/RingResonator/c118296f-ring-modulator/waveguide_modulator.ldev"----------------------------------------srun job start: Fri Nov 1 09:21:19 CDT 2024Job ID: 5747109Username: gep5932Queue: b1204Account: b1204----------------------------------------The following variables are notguaranteed to be the same in theprologue and the job run script----------------------------------------PATH (in prologue) : ::::/software/lumerical/2021r2/lumerical/v212/bin:/hpc/software/spack/opt/spack/linux-rhel7-x86_64/gcc-10.2.0/openmpi-4.0.5-ptbl3lv4c5n35qrk2uiyrwbkd7th74i4/bin:/usr/local/ucx-1.8.1/bin:/usr/local/pmix/pmix-4.2.6/bin:/usr/local/spack/opt/spack/linux-rhel7-x86_64/gcc-4.8.5/hwloc-2.1.0-3eb4pd7xfilv65wakxqaay6iaazkdupj/bin:/hpc/software/spack/opt/spack/linux-rhel7-x86_64/gcc-10.2.0/numactl-2.0.12-chgirxkdplzhqitntuggjygkznwphamo/bin:/hpc/software/spack/opt/spack/linux-rhel7-x86_64/gcc-4.8.5/gcc-10.2.0-lt24t6msesfxor4hrpw6n6mttru2sbht/bin:/hpc/software/spack/opt/spack/linux-rhel7-x86_64/gcc-4.8.5/zstd-1.4.5-qeoj4o7typ4lewlprh7cwjzfeiqed3do/bin:/hpc/software/spack/opt/spack/linux-rhel7-x86_64/gcc-4.8.5/binutils-2.35-av2f4xad3uforj35udka2qsyb354p6up/bin:/hpc/software/spack/opt/spack/linux-rhel7-x86_64/gcc-4.8.5/gettext-0.21-sbihn3s2dmd4mkd5xgxgcf3rhx3enve5/bin:/hpc/software/spack/opt/spack/linux-rhel7-x86_64/gcc-4.8.5/xz-5.2.5-t3skhhvtli57tsykjjw73h737wby7h52/bin:/hpc/software/spack/opt/spack/linux-rhel7-x86_64/gcc-4.8.5/tar-1.32-dfca2rbtzaxqyy2yshb5i67ey4sjke2v/bin:/hpc/software/spack/opt/spack/linux-rhel7-x86_64/gcc-4.8.5/ncurses-6.2-twwptxrvtozlx3ogxkgfpmymggfai6dx/bin:/hpc/software/spack/opt/spack/linux-rhel7-x86_64/gcc-4.8.5/libxml2-2.9.10-sbmpowvpmrw3sytyglnc3y6stcfbkg4o/bin:/hpc/software/spack/opt/spack/linux-rhel7-x86_64/gcc-4.8.5/libiconv-1.16-mhxc7msxw72jrbh2fl7cjcdp7dng7sej/bin:/hpc/software/spack/opt/spack/linux-rhel7-x86_64/gcc-4.8.5/bzip2-1.0.8-7emst37et6cf3qff2srmvfsjnlykhqr5/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/TurboVNC/bin:/home/gep5932/.local/bin:/home/gep5932/binWORKDIR is: /home/gep5932----------------------------------------*** Error in `/software/lumerical/2021r2/lumerical/v212/mpich2/nemesis/bin/mpiexec': munmap_chunk(): invalid pointer: 0x00007ffc2bf46324 ***======= Backtrace: =========/lib64/libc.so.6(+0x7f474)[0x2abcf2c50474]/software/lumerical/2021r2/lumerical/v212/mpich2/nemesis/bin/mpiexec[0x41f1b8]/software/lumerical/2021r2/lumerical/v212/mpich2/nemesis/bin/mpiexec[0x407f73]/software/lumerical/2021r2/lumerical/v212/mpich2/nemesis/bin/mpiexec[0x403179]/lib64/libc.so.6(__libc_start_main+0xf5)[0x2abcf2bf3555]/software/lumerical/2021r2/lumerical/v212/mpich2/nemesis/bin/mpiexec[0x40367d]======= Memory map: ========00400000-00436000 r-xp 00000000 00:2d 70270893 /software/lumerical/2021r2/lumerical/v212/mpich2/nemesis/bin/mpiexec.hydra00635000-00636000 r--p 00035000 00:2d 70270893 /software/lumerical/2021r2/lumerical/v212/mpich2/nemesis/bin/mpiexec.hydra00636000-00638000 rw-p 00036000 00:2d 70270893 /software/lumerical/2021r2/lumerical/v212/mpich2/nemesis/bin/mpiexec.hydra00638000-0063e000 rw-p 00000000 00:00 0009b5000-009f7000 rw-p 00000000 00:00 0 [heap]2abcf1bee000-2abcf1c10000 r-xp 00000000 07:00 24297 /usr/lib64/ld-2.17.so2abcf1c10000-2abcf1c13000 rw-p 00000000 00:00 02abcf1c27000-2abcf1c2d000 rw-p 00000000 00:00 02abcf1e0f000-2abcf1e10000 r--p 00021000 07:00 24297 /usr/lib64/ld-2.17.so2abcf1e10000-2abcf1e11000 rw-p 00022000 07:00 24297 /usr/lib64/ld-2.17.so2abcf1e11000-2abcf1e12000 rw-p 00000000 00:00 02abcf1e12000-2abcf1e16000 r-xp 00000000 00:2d 68514462 /software/lumerical/2021r2/lumerical/v212/mpich2/nemesis/lib/libmpl.so.1.0.02abcf1e16000-2abcf2015000 ---p 00004000 00:2d 68514462 /software/lumerical/2021r2/lumerical/v212/mpich2/nemesis/lib/libmpl.so.1.0.02abcf2015000-2abcf2016000 r--p 00003000 00:2d 68514462 /software/lumerical/2021r2/lumerical/v212/mpich2/nemesis/lib/libmpl.so.1.0.02abcf2016000-2abcf2017000 rw-p 00004000 00:2d 68514462 /software/lumerical/2021r2/lumerical/v212/mpich2/nemesis/lib/libmpl.so.1.0.02abcf2017000-2abcf2028000 r-xp 00000000 00:2d 68514467 /software/lumerical/2021r2/lumerical/v212/mpich2/nemesis/lib/libhwloc.so.0.1.02abcf2028000-2abcf2227000 ---p 00011000 00:2d 68514467 /software/lumerical/2021r2/lumerical/v212/mpich2/nemesis/lib/libhwloc.so.0.1.02abcf2227000-2abcf2228000 r--p 00010000 00:2d 68514467 /software/lumerical/2021r2/lumerical/v212/mpich2/nemesis/lib/libhwloc.so.0.1.02abcf2228000-2abcf2229000 rw-p 00011000 00:2d 68514467 /software/lumerical/2021r2/lumerical/v212/mpich2/nemesis/lib/libhwloc.so.0.1.02abcf2229000-2abcf2388000 r-xp 00000000 07:00 26456 /usr/lib64/libxml2.so.2.9.12abcf2388000-2abcf2587000 ---p 0015f000 07:00 26456 /usr/lib64/libxml2.so.2.9.12abcf2587000-2abcf258f000 r--p 0015e000 07:00 26456 /usr/lib64/libxml2.so.2.9.12abcf258f000-2abcf2591000 rw-p 00166000 07:00 26456 /usr/lib64/libxml2.so.2.9.12abcf2591000-2abcf2593000 rw-p 00000000 00:00 02abcf2593000-2abcf25aa000 r-xp 00000000 07:00 25700 /usr/lib64/libnsl-2.17.so2abcf25aa000-2abcf27a9000 ---p 00017000 07:00 25700 /usr/lib64/libnsl-2.17.so2abcf27a9000-2abcf27aa000 r--p 00016000 07:00 25700 /usr/lib64/libnsl-2.17.so2abcf27aa000-2abcf27ab000 rw-p 00017000 07:00 25700 /usr/lib64/libnsl-2.17.so2abcf27ab000-2abcf27ad000 rw-p 00000000 00:00 02abcf27ad000-2abcf27b4000 r-xp 00000000 07:00 25979 /usr/lib64/librt-2.17.so2abcf27b4000-2abcf29b3000 ---p 00007000 07:00 25979 /usr/lib64/librt-2.17.so2abcf29b3000-2abcf29b4000 r--p 00006000 07:00 25979 /usr/lib64/librt-2.17.so2abcf29b4000-2abcf29b5000 rw-p 00007000 07:00 25979 /usr/lib64/librt-2.17.so2abcf29b5000-2abcf29cc000 r-xp 00000000 07:00 25904 /usr/lib64/libpthread-2.17.so2abcf29cc000-2abcf2bcb000 ---p 00017000 07:00 25904 /usr/lib64/libpthread-2.17.so2abcf2bcb000-2abcf2bcc000 r--p 00016000 07:00 25904 /usr/lib64/libpthread-2.17.so2abcf2bcc000-2abcf2bcd000 rw-p 00017000 07:00 25904 /usr/lib64/libpthread-2.17.so2abcf2bcd000-2abcf2bd1000 rw-p 00000000 00:00 02abcf2bd1000-2abcf2d95000 r-xp 00000000 07:00 24599 /usr/lib64/libc-2.17.so2abcf2d95000-2abcf2f94000 ---p 001c4000 07:00 24599 /usr/lib64/libc-2.17.so2abcf2f94000-2abcf2f98000 r--p 001c3000 07:00 24599 /usr/lib64/libc-2.17.so2abcf2f98000-2abcf2f9a000 rw-p 001c7000 07:00 24599 /usr/lib64/libc-2.17.so2abcf2f9a000-2abcf2f9f000 rw-p 00000000 00:00 02abcf2f9f000-2abcf2fa1000 r-xp 00000000 07:00 24765 /usr/lib64/libdl-2.17.so2abcf2fa1000-2abcf31a1000 ---p 00002000 07:00 24765 /usr/lib64/libdl-2.17.so2abcf31a1000-2abcf31a2000 r--p 00002000 07:00 24765 /usr/lib64/libdl-2.17.so2abcf31a2000-2abcf31a3000 rw-p 00003000 07:00 24765 /usr/lib64/libdl-2.17.so2abcf31a3000-2abcf31b8000 r-xp 00000000 07:00 26493 /usr/lib64/libz.so.1.2.72abcf31b8000-2abcf33b7000 ---p 00015000 07:00 26493 /usr/lib64/libz.so.1.2.72abcf33b7000-2abcf33b8000 r--p 00014000 07:00 26493 /usr/lib64/libz.so.1.2.72abcf33b8000-2abcf33b9000 rw-p 00015000 07:00 26493 /usr/lib64/libz.so.1.2.72abcf33b9000-2abcf33de000 r-xp 00000000 07:00 25539 /usr/lib64/liblzma.so.5.2.22abcf33de000-2abcf35dd000 ---p 00025000 07:00 25539 /usr/lib64/liblzma.so.5.2.22abcf35dd000-2abcf35de000 r--p 00024000 07:00 25539 /usr/lib64/liblzma.so.5.2.22abcf35de000-2abcf35df000 rw-p 00025000 07:00 25539 /usr/lib64/liblzma.so.5.2.22abcf35df000-2abcf36e0000 r-xp 00000000 07:00 25542 /usr/lib64/libm-2.17.so2abcf36e0000-2abcf38df000 ---p 00101000 07:00 25542 /usr/lib64/libm-2.17.so2abcf38df000-2abcf38e0000 r--p 00100000 07:00 25542 /usr/lib64/libm-2.17.so2abcf38e0000-2abcf38e1000 rw-p 00101000 07:00 25542 /usr/lib64/libm-2.17.so2abcf38e1000-2abcf38f6000 r-xp 00000000 07:00 24953 /usr/lib64/libgcc_s-4.8.5-20150702.so.12abcf38f6000-2abcf3af5000 ---p 00015000 07:00 24953 /usr/lib64/libgcc_s-4.8.5-20150702.so.12abcf3af5000-2abcf3af6000 r--p 00014000 07:00 24953 /usr/lib64/libgcc_s-4.8.5-20150702.so.12abcf3af6000-2abcf3af7000 rw-p 00015000 07:00 24953 /usr/lib64/libgcc_s-4.8.5-20150702.so.17ffc2bf20000-7ffc2bf50000 rw-p 00000000 00:00 0 [stack]7ffc2bff5000-7ffc2bff7000 r-xp 00000000 00:00 0 [vdso]ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
-
December 4, 2024 at 1:53 amLitoAnsys Employee
@adevata,
From your message,
We use an internal website to request resources to a GNOME desktop. Once open, I use the terminal app to open Lumerical: I type “module load lumerical/2021r2”, hit enter, then type “CAD” to confirm our licensing agreement, and hit enter again. The Lumerical GUI opens, and I can use it just how one uses the application if installed locally.
- How many resources e.g. nodes, cores, memory, etc. are you requesting?
- When you run the simulation file from the Lumerical CAD/IDE – what is your Resources configuration?
- Have you tried running using “Local Computer” as the Job launching preset – run on only 1 node? (see here)
Please send a screenshot of your Resources configuration & Resource advanced options.
-
December 4, 2024 at 3:14 amadevataSubscriber
Hi Lito,
Thanks again for your reply! Again, in order, here are my responses:
- I'm using 1 node, 1 core, and 1 CPU. On that, I've requested 128 GB RAM.
- The attached pictures below show my resource and advance resource configurations.
- I believe my default is to only run on one node using the "Local Computer" preset. Could you confirm that is the case with my images?
Let me know if there's anything else I can provide. Thank you!
- AD -
December 4, 2024 at 8:35 pmLitoAnsys Employee
Can you run simulation (waveguide_modulator.ldev) using these settings? If you can run using "Local Computer" - the issue is with the MPI installation.
-
December 5, 2024 at 5:01 pmadevataSubscriber
Just to confirm, are you talking about the settings that I currently have configured? Or the ones that you had sent initially?
Also, by simulation (waveguide_modulator.ldev), do you mean to go to the CHARGE tab, then in the simulation section on the right, click the run button? Or is this a command I should be entering into a job terminal somewhere? I'm guessing its the first since I'm launching the Lumerical GUI as a job.
Thank you!
- AD
-
-
December 5, 2024 at 7:03 pmLitoAnsys Employee
Please try to run the simulation file, waveguide_modulator.ldev from the CHARGE CAD/GUI using the settings that you showed here in your post.
-
December 5, 2024 at 7:07 pmLitoAnsys Employee
This is to verify if you can run the simulation file using Local Computer as the job launching preset. The error message on your post was referring to MPICH2.
ldev error:
"/software/lumerical/2021r2/lumerical/v212/mpich2/nemesis/bin/mpiexec" -n 1 "/software/lumerical/2021r2/lumerical/v212/bin/device-engine"
"/projects/NODE/USER/TrialRuns/RingResonator/c118296f-ring-modulator/waveguide_modulator.ldev"-
December 6, 2024 at 11:10 pmadevataSubscriber
I believe the simulation completed successfully, the .log file output from the CHARGE simulator says that too. I'm not really sure what has changed, but now step 4 is running fine too. When I had sent my preset screenshot, those were without any modifications to the original preset. Do you have any pointers on how to recreate the original issue so that it can be effectively solved? Thanks!
- AD
-
-
December 7, 2024 at 12:19 amLitoAnsys Employee
@adevata,
The error message on your post was referring to MPICH2.
Since Multiphysics solvers are single processes/multi threaded solvers - you can only run these simulations on 1 node/machine. These solvers do not support running distributed computing or running on more than 1 node/machine. Do not request more than 1 node on the cluster for these simulations/solvers (CHARGE/HEAT/FEEM/DGTD/MQW). And its best to run the job using Local Computer as the Job launching preset. See the KB for more information.
- Resource configuration for Lumerical solvers running with a single process – Ansys Optics
- Resource configuration elements and controls – Ansys Optics
- Distributed computing – Ansys Optics
-
- You must be logged in to reply to this topic.
- Errors Running Ring Modulator Example on Cluster
- INTERCONNECT – No results unless rerun simulation until it gives any
- Difference between answers in version 2024 and 2017 lumerical mode solution
- Import material .txt file with script
- Calculation of correlation values in laser modulation bandwidth simulation
- Trapezoidal ring
- Help for qINTERCONNECT
- Issues with getting result from interconnent analysis script
- Topology Optimization Error
- Edge Coupler EME Example Issue
-
1131
-
468
-
466
-
225
-
201
© 2024 Copyright ANSYS, Inc. All rights reserved.