


{"id":362344,"date":"2024-04-15T22:44:24","date_gmt":"2024-04-15T22:44:24","guid":{"rendered":"\/forum\/forums\/topic\/lumerical-cluster-job-crashes-arbitrarily\/"},"modified":"2024-04-15T22:44:24","modified_gmt":"2024-04-15T22:44:24","slug":"lumerical-cluster-job-crashes-arbitrarily","status":"closed","type":"topic","link":"https:\/\/innovationspace.ansys.com\/forum\/forums\/topic\/lumerical-cluster-job-crashes-arbitrarily\/","title":{"rendered":"Lumerical cluster job crashes arbitrarily"},"content":{"rendered":"<p>I run many parallel jobs on a cluster. Sometimes &#8211; for no apparent reason, the job crashes citing &#8220;std::bad_alloc&#8221;, but no more information. The RAM is sufficient (I have verified this, I have allocated 10x the RAM that the requirements ask for &#8211; and have monitored that this is not reached via observing using htop). This crash doesnt happen if I use just a single core on a local computer resource setting. However it happens on parallel slurm jobs with multiple cores. I need to use multiple cores to speed up simulations.<\/p>\n<p>This crash is not reproducible &#8211; it randomly happens, and if I run the same job again (same simulation with same resources in the resource manager), it sometimes runs and sometimes doesnt. Thus this leads me to believe that it doesn&#8217;t have anything to do with RAM but is some other problem.<\/p>\n<p>This issue is okay sometimes &#8211; where for many job submissions it does not crash, but sometimes it crashes quite often, which is not desirable &#8211; and it interrupts sweeps of simulations that I run.&nbsp;<\/p>\n<p>This is with Lumerical version 2022 R2<\/p>\n","protected":false},"template":"","class_list":["post-362344","topic","type-topic","status-closed","hentry"],"aioseo_notices":[],"acf":[],"custom_fields":[{"0":{"_bbp_subscription":["343740","4274"],"_bbp_author_ip":["23.206.193.146"]," _bbp_last_reply_id":["0"]," _bbp_likes_count":["0"],"_btv_view_count":["367"],"_bbp_topic_status":["unanswered"],"_bbp_topic_id":["362344"],"_bbp_forum_id":["27833"],"_bbp_engagement":["4274","343740"],"_bbp_voice_count":["2"],"_bbp_reply_count":["6"],"_bbp_last_reply_id":["363618"],"_bbp_last_active_id":["363618"],"_bbp_last_active_time":["2024-04-22 21:06:03"]},"test":"vn95cornell-edu"}],"_links":{"self":[{"href":"https:\/\/innovationspace.ansys.com\/forum\/wp-json\/wp\/v2\/topics\/362344","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/innovationspace.ansys.com\/forum\/wp-json\/wp\/v2\/topics"}],"about":[{"href":"https:\/\/innovationspace.ansys.com\/forum\/wp-json\/wp\/v2\/types\/topic"}],"version-history":[{"count":0,"href":"https:\/\/innovationspace.ansys.com\/forum\/wp-json\/wp\/v2\/topics\/362344\/revisions"}],"wp:attachment":[{"href":"https:\/\/innovationspace.ansys.com\/forum\/wp-json\/wp\/v2\/media?parent=362344"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}