Ansys Assistant will be unavailable on the Learning Forum starting January 30. An upgraded version is coming soon. We apologize for any inconvenience and appreciate your patience. Stay tuned for updates.
Ansys Products

Ansys Products

Discuss installation & licensing of our Ansys Teaching and Research products.

Command line configuring RSM on a Linux server running SLURM/Torque

    • swheat
      Subscriber

      I am installing RSM on a Linux cluster head node, Titan.  I don't have a GUI environment for this headnode.  For the time being, what I need is the ability to set up a single account for remote users using RSM; call it "fluent_user".  I need to somehow configure where the temporary files go.  And I need to configure it towards PBS/Torque.  We run SLURM, but I'm told the SLURM/Torque wrapper works fine for RSM.  We are running v193 RSM and Fluent.

    • JakeC
      Ansys Employee

      Hi srwheat,


       


      The wrapper for SLURM can sometimes work, but it is certainly not recommended.


      In either case, on the submission node, you need to install the RSM Launcher service:


      /ansys_inc/v193/RSM//Config/tools/linux/install_daemon -launcher


       


      Thank you,


      Jake

    • JakeC
      Ansys Employee

      On the client side, are you submitting from a windows machine or a linux machine?


      Also which temp files are you referring to?  Log files, or the project staging directory?


       


      Thank you,


      Jake


       

    • swheat
      Subscriber

      jcallery,


      If the wrapper is not recommended, then what would be recommended for me use for my SLURM configuration?


      As for the launcher, once I run that command, is the daemon set up to run always at boot time?  


      A friend had suggested I do the following:


         Enable the daemon script to start at reboot. The best way is adding it to root's cron.


       


         @reboot /ansys_inc/v193/RSM/Config/tools/linux/rsmlauncher restart &


      Are the two methods equivalent?


      Thanks!


      Stephen

    • swheat
      Subscriber

      Yes, the jobs are being submitted from Windows.


      As for the file sharing, yes, this is for the staging.  I have successfully gotten the configuration tool to run; I have set things up to use an NFS directory that is available to all the cluster nodes for the staging.


      I told it to use "Torque for Moab"; now I'm wondering if I should have chosen "PBS Pro", to connect to my wrappers.


      Is there some manual somewhere that walks through the configuration that I could follow?  Even with just a few selections, the combinatorics are against me for getting it to work.

    • JakeC
      Ansys Employee

      Hi Stephen,


      SLURM is not supported at all with RSM out of the box.


      I would recommend using a supported scheduler.


       


      Using the command I suggested will create a /etc/init.d entry for you, so that it will start on reboot.


      Which version of linux are you running on the cluster?


       


      Thank you,


      Jake

    • swheat
      Subscriber

      SLURM is mandated ... sigh.


      Centos 7.5, latest kernel.

    • JakeC
      Ansys Employee

      When submitting from windows I would use the RSM Internal transfer mechanism in the File Transfer Section, and not worry about OS file transfers.


      This is where the project files will go on the cluster side.


       


      Unfortunately I don't really know anything about the wrappers, so you will probably need to try both types in the client side RSM Configuration Utility.


       


      Thank you,


      Jake

    • swheat
      Subscriber

      We are trying the internal path, but we are running into the problem where the Windows RSM Config tool is v192, whereas we are v193 on the cluster.  So, it tried to connect via port 9192, which nothing is there for.  So, we said on the HPC Resource tab to use SSH to connect, which forces us to use External mechanism for file transfer, which we chose SCP via SSH.  We then, when trying, got the following error after the "Submission in progress... " line, KEYPATH required for SCP transfer but not defined in environment.  Any hints on that?  Can we get v192 to work with v193 using our first approach?

    • swheat
      Subscriber

      OK, we found instructions on doing KEYPATH.  We already had the ability to ssh (openssh) from windows to linux w/o using password.  So, we pointed the KEYPATH variable to the private key file.  When trying plink -i %KEYPATH% user@system pwd, it said it could not use the KEYPATH file; but plink w/o the KEYPATH parameter worked fine.  While trying to test the queue in RSMManager on the windows side, it too said that it could not use the KEYPATH file.


      We had the file in Usersmyname.ssh folder; and it was named xyz, so in control panel we set a user variable KEYPATH to c:usersmyname.sshxyz and we could see from a new shell command window that it was set.

    • tsiriaks
      Ansys Employee

      I will leave other parts for Jake to comment but I know that, regardless of any trick, you must use identical versions on client (Windows) and cluster.


      Thanks,


      Win

    • JakeC
      Ansys Employee

      Hi srwheat,


      Win is correct.  You will need to have matching versions of the Ansys products on the client machine and the cluster.


      So it sounds like you need to install Ansys 19.2 on the cluster and then install the 19.2 version of the RSM Launcher service on the cluster:


      sudo /ansys_inc/v192/RSM/Config/tools/linux/install_daemon -launcher


       


      Thank you,
      Jake

    • swheat
      Subscriber

      So, some news.  It turns out, that I have clients that are running v19.1 and some that are running v19.2.  I had already installed v19.3.


      I installed v19.2, and that installed "ok"; I haven't been able to do anything to test it yet, as I'm not a fluent user, so I need a user to work with me on validating the install.


      But, as I try to install v19.1, I only get to 65% done; this has happened twice.  The second time, I've let it run for a long time ... now going for more than 30min after getting stuck.  I haven't killed it yet.


      The last few lines in the detailed log are:


      AWPROOTDIR = /opt/apps/ansys/v191


      Creating ANSYS RSM Cluster Master Service script ...


      Created service script: arcmaster


       


      AWPROOTDIR = /opt/apps/ansys/v191


      Creating ANSYS RSM Cluster Node Service script ...


      Created service script: arcnode


      That is where it remains stuck.  I am presuming that I can have several versions installed and running at the same time.  Is that correct?


      On my firewall, I have opened up the following tcp ports: 9191 9192 and 9193


      Any idea as to why the hang is happening?  This help thread is getting a bit diverse on topic as the actions to solve the original problem are creating new questions that must be solved first.

    • swheat
      Subscriber

      I tried to add in to this post the results of "ps -aef | grep ans" which shows several processes for v191 running.  But "add post" would not work with that text.

    • swheat
      Subscriber

      I found out how to attach the file above.

    • swheat
      Subscriber

      I found these two log files from the v191 install


       

    • JakeC
      Ansys Employee

      Hi srweat,


      Regarding the stuck installation please make sure that libpng12 is installed, then retry the installation.


      Thank you,


      Jake


       


       

    • swheat
      Subscriber

      Jake,


      Thanks!  That solved the install problem.  I will now be able to turn my attention to the original problem.


       


      Stephen

    • swheat
      Subscriber

      Jake,


      I have installed the client on my windows laptop.  I have RSM configurator set to go to my cluster.  I get the following, endlessly, listing in my rsm log files:


      2019-03-12 142:49 [WARN] Failed to operate on the mutexPath: /tmp/MMFLock_USERPROXY191_ERV_STEPHEN WHEAT_FLUENTUSER.lock : Chmod failed to operate on MMFLock_USERPROXY191_ERV_STEPHEN WHEAT_FLUENTUSER.lock in /tmp : chmod: cannot access âMMFLock_USERPROXY191_ERV_STEPHENâ: No such file or directory


       


      I have tried running rsmlauncher as root (via disabling the rsmadmin account) and as rsmadmin, with the same results.  However, I think when I was running as rsmadmin, my client "ready" light was always red.  When running as root, the client ready light goes green.  And, if I stop the server, the client takes notice.


      Any ideas?


      Stephen

    • swheat
      Subscriber

      Also, I have my credentials on both the client rsm config and the server rsm config for user "fluentuser", with the same password on each, both linked to Titan.


      I don't know why the server would be using information about my windows account in the lock file.

    • JakeC
      Ansys Employee

      Hi srwheat,


      Can you manually delete the MMFLock*.log file?


       


      Then try running the job again.


       


      Thank you,


      Jake


       

    • swheat
      Subscriber

      There is no such file there.  But, when the client is trying to "download the queue" or "queue test button", this file does get created: UserProxyLauncherLock.lock


       


      But, it disappears when I restart the service.

    • JakeC
      Ansys Employee

      Hi srwheet,


       


      When it is created, what are the permissions on it?  Also what are the permissions on /tmp?


      Can anyone read/write to and from the /tmp directory?


      If you create a file there with a normal user, are you able to change the permissions of that file with chmod as that normal user?


       


      Thank you,
      Jake

    • swheat
      Subscriber

      /tmp is d777t


      Verified any user can create/delete files


      As it turns out, these two files show up right at the beginning of the transaction, but the first disappears quickly; I just happened to catch it this time


      -rw-r--r-- 1 root root 40 Mar 12 14:50 MMFLock_USERPROXY191_ERV_STEPHEN WHEAT_FLUENTUSER.lock


      -rw-rw-rw- 1 root root 40 Mar 12 14:50 UserProxyLauncherLock.lock


       

    • swheat
      Subscriber

      Note, is it possible the issue that there is a space between STEPHEN and WHEAT in the file name?

    • JakeC
      Ansys Employee

      Hi srwheat,


      It is a possibility.  


      Can you try from a user that does not have a space in the name?


      Also could you please create a file in /tmp as a regular user and and then do a chmod 777 on that file?


      I want to make sure it doesn't throw an error.


       


      Thank you,


      Jake

    • swheat
      Subscriber

      Yes, I will be working with a user later today or tomorrow.  I verified that the file can be created and chmod'd .

    • swheat
      Subscriber

      Progress; RSM on windows client with a user name without spaces works.  But, we can't get workbench on that same client to see RSM as a service.  Looking on the web, it tells us how to use RSM from workbench when it is loaded as a service, but there are no instructions we can find to do that.


      What steps do I need to do in Workbench on the client after RSM is configured and verified on that client?


       


      We're close!


      Thanks,


      Stephen

    • swheat
      Subscriber

      Further update.  We tripped onto how to do that.  We did a right click in the schematic space and found properties where we could update the "update option".  Then the job ran, but failed with the following:


       


      find: error while loading shared libraries: libSM.so.6: cannot open shared object file: No such file or directory


      /opt/apps/ansys/v192/Tools/mono/Linux64/bin/mono: error while loading shared libraries: libSM.so.6: cannot open shared object file: No such file or directory


      Seems like we are really close.

    • swheat
      Subscriber

      OK, found that the compute nodes did not have libSM installed; have fixed that and we are rebooting now.  Will update this when it is done and we have had a chance to run the job again.

    • swheat
      Subscriber

      Now we get the following error: terminated after throwing an instance of sgd::runtime_error


      what(): locale::facet::_s_create_c_locale name not valid


      It's almost as if the job is trying to visualize on the compute nodes.


      We're missing something.  We'll wait for your insight, as we are without further ideas here.


       


      Thanks,


      Stephen

    • swheat
      Subscriber

      We've managed to deduce that perhaps the update should be done on the fluent task itself vs the project.  It's still not working properly, but the original question of this thread has been resolved.

    • JakeC
      Ansys Employee

      Sounds great, thanks for the update.


       


      Thank you,


      Jake

    • Memorisexu
      Subscriber

      And I have a question on account. I think that I finished the installation.It creates an account RSMADMIN for me ,but I don't know the password, so how can I input the password it required.


    • JakeC
      Ansys Employee

      MemoriseXuXu,  Please create a new topic for your question.


      rsmadmin is not a user that wil be used to do anything other than start the RSM and ARC services.


      This user cannot be used to log in or authenticate against.


      Please fill in YOU username and password for the machine you are trying to solve on, not rsmadmin.


       


      Thank you,


      Jake

Viewing 34 reply threads
  • The topic ‘Command line configuring RSM on a Linux server running SLURM/Torque’ is closed to new replies.
[bingo_chatbox]