


{"id":34334,"date":"2019-03-29T15:23:15","date_gmt":"2019-03-29T15:23:15","guid":{"rendered":"\/forum\/forums\/topic\/job-hang-in-rsm-and-cluster\/"},"modified":"2019-03-29T15:23:15","modified_gmt":"2019-03-29T15:23:15","slug":"job-hang-in-rsm-and-cluster","status":"closed","type":"topic","link":"https:\/\/innovationspace.ansys.com\/forum\/forums\/topic\/job-hang-in-rsm-and-cluster\/","title":{"rendered":"Job hang in RSM and Cluster"},"content":{"rendered":"<p>After getting RSM running and the firewall set, we now see the job get to the cluster, qstat shows the four cores on a node allocated.<\/p>\n<p><\/p>\n<p>But the windows client sees the job fail.&nbsp; On the linux node, the job is still occupying the queue slot in the running state.<\/p>\n<p><\/p>\n<p>How do I find at least a hint of the cause for this?<\/p>\n<p><\/p>\n<p><strong>The output from qstat shows:<\/strong><\/p>\n<p><\/p>\n<p>&nbsp;<\/p>\n<p><\/p>\n<p>[root@titan .ansys]# qstat<\/p>\n<p><\/p>\n<p>Job id&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Name&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Username&nbsp; &nbsp; &nbsp; &nbsp; Time Use S Queue&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp;<\/p>\n<p><\/p>\n<hr class=\"bbcode_rule\" \/>\n<hr class=\"bbcode_rule\" \/>\n<hr class=\"bbcode_rule\" \/>\n<hr class=\"bbcode_rule\" \/>\n&#8211;<\/p>\n<hr class=\"bbcode_rule\" \/>\n<p><\/p>\n<p>419&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Josh_Workbench&nbsp; &nbsp;fluentuser&nbsp; &nbsp; &nbsp; 00:01:28 R normal&nbsp; &nbsp; &nbsp; &nbsp;<\/p>\n<p><\/p>\n<p><strong>The following is an excerpt from the user log:<\/strong><\/p>\n<p><\/p>\n<p>2019-03-29 09:52<img decoding=\"async\" src=\"\/content\/images\/emoticons\/curly-lips-emoticon.png\" alt=\"\" class=\"emoticon\" \/>5 [DEBUG] ProcessActivityTracker shows activity.<\/p>\n<p><\/p>\n<p>2019-03-29 09:53<img decoding=\"async\" src=\"\/content\/images\/emoticons\/curly-lips-emoticon.png\" alt=\"\" class=\"emoticon\" \/>5 [DEBUG] ProcessActivityTracker shows activity.<\/p>\n<p><\/p>\n<p>2019-03-29 09:54<img decoding=\"async\" src=\"\/content\/images\/emoticons\/curly-lips-emoticon.png\" alt=\"\" class=\"emoticon\" \/>5 [DEBUG] Proxy found inactive and will shutdown<\/p>\n<p><\/p>\n<p>2019-03-29 09:54<img decoding=\"async\" src=\"\/content\/images\/emoticons\/curly-lips-emoticon.png\" alt=\"\" class=\"emoticon\" \/>5 [FATAL] Unhandled Exception has occurred and the program will exit.<\/p>\n<p><\/p>\n<p>2019-03-29 09:54<img decoding=\"async\" src=\"\/content\/images\/emoticons\/curly-lips-emoticon.png\" alt=\"\" class=\"emoticon\" \/>5 [FATAL] System.Runtime.Remoting.RemotingException: Requested service not found (Ansys.Rsm.UPHost.HostController, Ans.Rsm.UPHost, Version=19.2.0.0, Culture=neutral, PublicKeyToken=null). No receiver for uri \/HostController<\/p>\n<p><\/p>\n<p>&nbsp;<\/p>\n<p><\/p>\n<p>Server stack trace:<\/p>\n<p><\/p>\n<p>&nbsp; at System.Runtime.Remoting.Messaging.MethodCall.ResolveMethod () [0x00064] in &lt;a79e78aef2044b2abf7520c5f76ff5bc&gt;:0<\/p>\n<p><\/p>\n<p>&nbsp; at System.Runtime.Remoting.Messaging.MethodCall..ctor (System.Object handlerObject, System.Runtime.Serialization.Formatters.Binary.BinaryMethodCallMessage smuggledMsg) [0x00088] in &lt;a79e78aef2044b2abf7520c5f76ff5bc&gt;:0<\/p>\n<p><\/p>\n<p>&nbsp; at System.Runtime.Serialization.Formatters.Binary.BinaryMethodCall.Read(System.Object[] callA, System.Object handlerObject) [0x00189] in &lt;a79e78aef2044b2abf7520c5f76ff5bc&gt;:0<\/p>\n<p><\/p>\n<p>&nbsp;<\/p>\n","protected":false},"template":"","class_list":["post-34334","topic","type-topic","status-closed","hentry"],"aioseo_notices":[],"acf":[],"custom_fields":[{"0":{"_bbp_old_topic_id":["6506"],"_bbp_old_topic_author_name_id":["Anonymous"],"_bbp_old_is_topic_anonymous_id":["false"],"_bbp_old_closed_status_id":["publish"],"_bbp_author_ip":[null],"_bbp_old_sticky_status_id":["normal"],"_bbp_likes_count":["0","0"],"_btv_view_count":["4649"],"_bbp_topic_status":["unanswered"],"_bbp_status":["publish"],"_bbp_topic_id":["34334"],"_bbp_forum_id":["27796"],"_bbp_engagement":["157443","157480","163006"],"_bbp_voice_count":["3"],"_bbp_reply_count":["21"],"_bbp_last_reply_id":["83112"],"_bbp_last_active_id":["83112"],"_bbp_last_active_time":["2019-04-16 18:53:54"]},"test":"srwheat"}],"_links":{"self":[{"href":"https:\/\/innovationspace.ansys.com\/forum\/wp-json\/wp\/v2\/topics\/34334","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/innovationspace.ansys.com\/forum\/wp-json\/wp\/v2\/topics"}],"about":[{"href":"https:\/\/innovationspace.ansys.com\/forum\/wp-json\/wp\/v2\/types\/topic"}],"version-history":[{"count":0,"href":"https:\/\/innovationspace.ansys.com\/forum\/wp-json\/wp\/v2\/topics\/34334\/revisions"}],"wp:attachment":[{"href":"https:\/\/innovationspace.ansys.com\/forum\/wp-json\/wp\/v2\/media?parent=34334"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}