{"id":162553,"date":"2021-09-25T15:52:20","date_gmt":"2021-09-25T15:52:20","guid":{"rendered":"\/forum\/forums\/topic\/how-to-stop-lumerical-fdtd-from-crashing-on-a-cluster\/"},"modified":"2021-10-06T19:07:46","modified_gmt":"2021-10-06T19:07:46","slug":"how-to-stop-lumerical-fdtd-from-crashing-on-a-cluster","status":"closed","type":"topic","link":"https:\/\/innovationspace.ansys.com\/forum\/forums\/topic\/how-to-stop-lumerical-fdtd-from-crashing-on-a-cluster\/","title":{"rendered":"How to stop Lumerical FDTD from crashing on a cluster"},"content":{"rendered":"<div class=\"Item-Body\">\n<div class=\"Message userContent\">\n<p>I am running Lumerical FDTD on a cluster, but it always after a bit of simulation calls <\/p>\n<p>&quot;terminate called after throwing an instance of &#039;std::bad_alloc&#039;<\/p>\n<p> what(): std::bad_alloc<\/p>\n<p>APPLICATION TERMINATED WITH THE EXIT STRING: Hangup (signal 1)&quot;<\/p>\n<p>This appears like a memory issue, but I have allocated hundreds of GB per node and ensured that Its plenty based on the memory estimation in the GUI. It does this randomly, and for two identical simulations sometimes one will crash and the other will simulate completely. So, I tried to overcome this by automating my slurm submission file to continue running the simulation file with the -resume flag for lumerical FDTD, and configured my simulation to checkpoint every 10 minutes. This allowed me to successfully finish a simulation no matter what, but it often times requires many crashes and resumes. I would be fine continuing this method, however every time the simulation does not successfully run in one shot, it ruins the data. Somehow during the checkpointing and resume process, my monitor data is lost and my calculation of Purcell enhancement becomes wildly inaccurate, and my monitors lack any field data. If I could have advice on how to prevent it from crashing that would be most ideal, but another solution is to figure out why it loses data upon crashing. Thanks!<\/p>\n","protected":false},"template":"","class_list":["post-162553","topic","type-topic","status-closed","hentry"],"aioseo_notices":[],"acf":[],"custom_fields":[{"0":{"_bbp_author_ip":[""],"_bbp_old_reply_author_name_id":["Anonymous"],"_bbp_old_is_reply_anonymous_id":["false"],"_btv_view_count":["2215"],"_bbp_likes_count":["0"],"_bbp_topic_status":["unanswered"],"_bbp_status":["publish"],"_bbp_topic_id":["162553"],"_bbp_forum_id":["27833"],"_bbp_engagement":["2592","4274","187112"],"_bbp_voice_count":["3"],"_bbp_reply_count":["24"],"_bbp_last_reply_id":["193130"],"_bbp_last_active_id":["193130"],"_bbp_last_active_time":["2021-10-06 19:07:46"]},"test":"ihammond"}],"_links":{"self":[{"href":"https:\/\/innovationspace.ansys.com\/forum\/wp-json\/wp\/v2\/topics\/162553","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/innovationspace.ansys.com\/forum\/wp-json\/wp\/v2\/topics"}],"about":[{"href":"https:\/\/innovationspace.ansys.com\/forum\/wp-json\/wp\/v2\/types\/topic"}],"version-history":[{"count":0,"href":"https:\/\/innovationspace.ansys.com\/forum\/wp-json\/wp\/v2\/topics\/162553\/revisions"}],"wp:attachment":[{"href":"https:\/\/innovationspace.ansys.com\/forum\/wp-json\/wp\/v2\/media?parent=162553"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}