Memory usage in cluster runs

- February 26, 2019 at 11:51 pm
  
  alitabei
  Subscriber
  
  Hi,
  
  Is there a setting to maximize/adjust memory usage of jobs submitted by RSM?
  I am seeing (by smem command) that my RSM-submitted job uses only 250 MB of memory (and takes unexpectedly long on 96 CPUs to solve). There is 1.5 TB memory available.
  Is this something I need to adjust to make the whole memory available or that is automatically set?
  
  Thanks
- February 27, 2019 at 5:34 pm
  
  JakeC
  Ansys Employee
  
  Hi Alitabei,
  
  Chances are you don't need to specify memory usage.
  
  Please post the contents of the solve.out for the job here. That contains many of the statistics needed to determine where the bottleneck is.
  
  How many machines did you send the job to?
  
  What kind of interconnect are the compute nodes running on?
  
  Thank you,
  
  Jake
- February 28, 2019 at 4:05 pm
  
  alitabei
  Subscriber
  
  Thanks for your reply. We have a UGE(SGE) cluster config. I need to ask our admin for more details if needed
  
  The solver output (file attached) says that:
  
  Maximum total memory used : 64208.0 MB
  Maximum total memory allocated : 100585.0 MB
  Total physical memory available : 503 GB
  Maximum total memory available (all machines) : 1510 GB
  
  so, only 64GB of memory was used. In each of three machines 500 GB is available and in total, 1510 GB.
  
  In terms of the speed of calculations (and maximal and efficient usage of memory), is this the best possible or there are things I can change?
  
  If specific info on cluster details are needed, please let me know.
  
  thanks
- March 1, 2019 at 8:36 pm
  
  JakeC
  Ansys Employee
  
  Hi Alitabei,
  
  The cluster appears to be well set up.
  
  Interconnect speeds are good, and it is solving incore meaning disk I/O should is at a minimum.
  
  These two lines specify communication speed between the master node and the other two nodes:
  
  Communication speed from master to core 32 = 3586.93 MB/sec
  
  Communication speed from master to core 64 = 3575.49 MB/sec
  
  The CPU time and the Elapsed time are close, meaning there wasn't much time lost in MPI and Disk I/O, as expected due to incore solving.
  
  The only thing that I did notice is that you are solving using a NFS mount.
  
  This "may" be hindering that disk I/O that is happening. I would suggest solving from a local disk and using local scratch if possible.
  
  However, with a job this size, you may not see much of a difference.
  
  This is a small job with 300k nodes at 3 DOF a piece. Essentially this is too small to run on 96 cores.
  
  I would suggest trying on 32 cores, and you may get very similar performance.
  
  Any more and you will probably get diminishing returns.
  
  From the log:
  
  Element load balance ratio = 6.713
  
  You want that number as close to 1 as you can get. This means that some CPUs are working much harder than others due to the fact that it could not evenly divide up the job.
  
  Some analyses require longer compute times not because of a large number of degrees of freedom, but because of a large number of calculations are performed, such as a large number of time steps. If the DOF is small, parallel processing will likely not significantly decress the solution time.
  
  The only suggestion I can really make is for faster memory and/or higher clock rates for the cpu.
  
  The current model on your cluster is 2.1GHz.
  
  Thank you,
  
  Jake