-
-
September 25, 2021 at 3:52 pm
ihammond
SubscriberI am running Lumerical FDTD on a cluster, but it always after a bit of simulation calls
"terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
APPLICATION TERMINATED WITH THE EXIT STRING: Hangup (signal 1)"
This appears like a memory issue, but I have allocated hundreds of GB per node and ensured that Its plenty based on the memory estimation in the GUI. It does this randomly, and for two identical simulations sometimes one will crash and the other will simulate completely. So, I tried to overcome this by automating my slurm submission file to continue running the simulation file with the -resume flag for lumerical FDTD, and configured my simulation to checkpoint every 10 minutes. This allowed me to successfully finish a simulation no matter what, but it often times requires many crashes and resumes. I would be fine continuing this method, however every time the simulation does not successfully run in one shot, it ruins the data. Somehow during the checkpointing and resume process, my monitor data is lost and my calculation of Purcell enhancement becomes wildly inaccurate, and my monitors lack any field data. If I could have advice on how to prevent it from crashing that would be most ideal, but another solution is to figure out why it loses data upon crashing. Thanks!
September 27, 2021 at 4:16 pmGuilin Sun
Ansys EmployeePlease check the memory requirements and posted here. Usually it will need sufficient amount of data to be allocated to different node/process. In addition, please provide computer OS and software version information.
September 27, 2021 at 4:38 pmihammond
SubscriberI am running on a supercomputer running red hat 7.8 and using slurm and running Lumerical fdtd 2021 R2 version 8.26.2717. I am using the following slurm submission script (attached submit.txt normally a .bash but changed to allow submission to this forum) to get everything done. The attached png is a screenshot of the memory requirements. Thanks!
September 27, 2021 at 4:55 pmGuilin Sun
Ansys EmployeeThank you! please disable all monitors and test on a single node/process/thread, and see what happens.
September 27, 2021 at 4:57 pmihammond
SubscriberOkay I will do so and then post the results here afterwards. Thanks
September 28, 2021 at 4:04 pmihammond
SubscriberIt appears to work without any crashing on one node with the monitors disabled
September 28, 2021 at 4:23 pmGuilin Sun
Ansys EmployeeSince the file is not large, please test run one full simulation in ONE node, not to use more than one node. try and let me know if it works.
Before you try, please copy all objects but not the FDTD, then paste to a new project file, and add/modify FDTD to do the test.
September 28, 2021 at 4:30 pmihammond
SubscriberOn one node, with monitors enabled this time, it appears as though the simulation will take 40 hours. Also is the new FDTD object supposed to be any different than the one previously used?
September 28, 2021 at 5:01 pmGuilin Sun
Ansys Employee40-hours is an estimation based on the simulation time you set. It can terminate early.
When the file was crashed on cluster, where the file was created? if it is on different computer it could be due to version issue. Make sure the two computers have the same version of the software.
September 28, 2021 at 5:47 pmihammond
SubscriberI generate the script using the python api, but it's run on the same computer and install of Lumerical. Only difference is that the api calls the gui version of lumerical-fdtd-solutions whereas the cluster calls the fdtd-engine which should match the same version, considering they are on the same install.
September 29, 2021 at 4:02 pmihammond
SubscriberWith monitors enabled on a single node, it runs without crashing
September 29, 2021 at 4:09 pmGuilin Sun
Ansys EmployeeGreat ´╝üsome times the generated fie may not be perfect due to some unknown reason. Unknown because it is not repeatable.
September 29, 2021 at 4:11 pmihammond
SubscriberSo is there not a way to run my scripts in parallel on multiple nodes/processes without crashing? One processor takes quite a bit of time
September 29, 2021 at 4:23 pmGuilin Sun
Ansys EmployeeSince this is non-repeatable issue, you can try again. But for this specific case, the meshing its self only needs very small memory. It is the monitors that require large memory. So more processes may have some issues. You may try to use more threads and see if this is helpful.
September 29, 2021 at 4:27 pmihammond
SubscriberInteresting. Is this related to why crashing and loading checkpoints destroys monitor data?
September 29, 2021 at 11:27 pmGuilin Sun
Ansys EmployeeIt is hard to say at this moment, as it is not repeatable. Checkpoint should not affect any thing . If it does, then it will be a bug. Please confirm if the checkpoints create any issues by different testing.
October 1, 2021 at 6:41 pmLito
Ansys EmployeeI generate the script using the python api, but it's run on the same computer and install of Lumerical. Only difference is that the api calls the gui version of lumerical-fdtd-solutions whereas the cluster calls the fdtd-engine which should match the same version, considering they are on the same install.
Do you have a GUI connection to the cluster and run the python script in your cluster to create the simulation file?
October 1, 2021 at 7:46 pmihammond
SubscriberI use the python API with the gui to create the fsp file without any cluster, then use the attached submission script with SLURM on the cluster to run the simulation with the engine, then once it completes I open the GUI again without the cluster
October 1, 2021 at 7:53 pmLito
Ansys Employee,so you create the simulation file from your local computer using the script and not from the Lumerical installation in the cluster? If this is the case, can you send the About page of FDTD that you used to create the simulation file with your script?
October 4, 2021 at 4:51 pmihammond
SubscriberYes, sort of. The installation for the cluster is the same computer as the local machine. I remote into the supercomputer (without using a cluster, just accessing the front end) and create the script using fdtd-solutions on the supercomputer. Then I run the file on the cluster (which is governed by the same computer and same lumerical installation), but this time i use fdtd-engine to run it because it is a cluster. This is the about page, and attached I've included the python script (and supporting text files called by it) that create the fsp file. The python script is labeled python_script.
October 5, 2021 at 12:08 amLito
Ansys EmployeeBased on your submit script to the cluster, you are running using only 1 node with 5 processes. Does the issue happen with monitors enabled using more than 5 processes on 1 node? Or it only happens when you are using more than 1 node?
October 5, 2021 at 12:10 amihammond
SubscriberThe issue happens as long as thereÔÇÖs more than 1 process, not necessarily dependent on the number of nodes.
October 6, 2021 at 1:42 amLito
Ansys EmployeeCan you try and run the FDTD simulation from from this example? Run it with 6 processes in your cluster. Let us know if you run into the same issue.
October 6, 2021 at 1:59 amihammond
SubscriberThe download link for zip file in the link appears to be broken. It says to login to download, but I am already logged in so I cannot download it. Also do you want 6 processes on one node or on multiple?
October 6, 2021 at 7:07 pmLito
Ansys EmployeeAccess to Lumerical Application Gallery requires support registration. Kindly register for support while accessing your current/active Lumerical license. Then download the example file. You can try to run on either one or multiple nodes and let us know how this pans out for the said example simulation file.
Best Lito
Viewing 24 reply threads- The topic ‘How to stop Lumerical FDTD from crashing on a cluster’ is closed to new replies.
Ansys Innovation SpaceTrending discussionsTop Contributors-
3892
-
1414
-
1256
-
1118
-
1015
Top Rated Tags© 2025 Copyright ANSYS, Inc. All rights reserved.
Ansys does not support the usage of unauthorized Ansys software. Please visit www.ansys.com to obtain an official distribution.
-

Ansys Assistant

Welcome to Ansys Assistant!
An AI-based virtual assistant for active Ansys Academic Customers. Please login using your university issued email address.

Hey there, you are quite inquisitive! You have hit your hourly question limit. Please retry after '10' minutes. For questions, please reach out to ansyslearn@ansys.com.
RETRY