Photonics

Photonics

Topics related to Lumerical and more.

Computer suitable for Lumerical simulation

    • xyjxyj1992
      Subscriber

      Hello, we are trying to get a more powerful PC for running lumerical simulation. Does anyone have the recommendation for CPUs or RAMs which can help accelerate the simulation or just go for higher volume and faster clock rate? Is there any setting which can also help increase the calculation efficiency (like letting GPU come into play)? Thank you so much!!

    • Lito
      Ansys Employee
      Lumerical does not support GPU acceleration. Best option is a multi-processor computer. These systems provide good performance because each processor has it's own memory bus connection to the RAM.Simulation speeds are typically limited by the memory bandwidth between RAM and processor, so having one memory bus per processor means the simulation speed tends to scale well with the number of processors. See this this post for more information. This post on our FDTD tests/benchmarks might come handy.
      Best Lito
    • xyjxyj1992
      Subscriber
      Thank you so much for your reply! Do you mean more cores (or threads) the better or do you mean more than one CPU in a computer can help accelrate the simulation?
    • Lito
      Ansys Employee
      From the article, I think the more memory channels supported by your CPU the better. e.g. an 8 core system with 8 memory channels will perform better than one with only 2 or 4 memory channels. Dual sockets/CPUs might offer better CPU-Memory bandwidth than a single CPU system.
    • xyjxyj1992
      Subscriber
      Hi Lito, I actually have another question regarding the benchmark scores on this website /forum/discussion/25782/lumerical-fdtd-simulation-benchmarks. Is the results of M5.24x large 1130Mnodes/s with two CPUs running simltaneously or just with a single CPU?
    • Lito
      Ansys Employee
      As shown in the table, the instance has 2 Intel Xeon Platinum 8175 CPUs with a total of 96 threads/processes.
      The FDTD solver speed for the sample simulation using 96 threads is 1130 Mnodes/s


    • xyjxyj1992
      Subscriber
      I see, thanks for your reply, does that mean with single 8175 CPU, you can only reach 565Mnodes/s?
    • Lito
      Ansys Employee
      ,I don't think that would be the case and the difference in performance is not linear in terms of the number of CPU/cores as seen on the results from the article. Will have to test on 1 CPU to get the FDTD solver speed for the same simulation file used in the article. Speed could vary depending on the simulation and machine.
    • xyjxyj1992
      Subscriber
      Thanks for your reply, I understand the last part. But what I really want to know here is that does 2 CPUs grant you twice of the speed or less compared to a single CPU under the same configration and the same simulation file? Thank you!
    • Lito
      Ansys Employee
      Yes, 2 CPU/processor machines have better performance but not in a linear way vs the FDTD solver rate/speed. i.e. doubling the bandwidth does not mean that the FDTD solver rate will double. As seen in our example increasing the number of processes/threads used to run the simulation is not linear in terms of the FDTD solver rate. Running the simulation from 8 to 16 to 36 processes does not provide a linear performance increase.


    • 16arnoldk
      Subscriber
      Hi Lito This line of conversation and supporting documentation has been very helpful to understand the differences in performance for various configurations. It seems that it is almost always the case that a dual processor configuration would achieve better performance than a single processor (with double the specs). However, this brings me to a new issue which I have not yet been able to uncover an answer to:
      Is there a way to specify whether a single process is applied half each to the two CPUs in a dual setup vs. only on one of the CPUs in a dual setup?
      This becomes important when considering the allocation of solvers (which we have a finite supply of). If running a gradient / particle swarm optimization or a sweep with several simulations in parallel, I notice that dual processor machines will take up solvers following: #solvers = 2*#simulations running concurrently. Thus, for a modest number of particles in one generation, we quickly add up to our limit.
      While dividing tasks across both processors is advantageous for a one-off simulation, I would argue that this benefit is reduced when considering many simulations at once. There should be very little performance difference between (for an example of running 8 simulations in parallel with access to dual quad core processors) running half of the 8 simulations across 8 cores (4 from each processor) vs. running 4 simulations across 4 cores of one processor and 4 across 4 for the other. So, is this something that I can control in the FDTD resource manager? Or perhaps elsewhere?
      Thanks!
      KP
    • Lito
      Ansys Employee
      I think you are talking about pinning the process to a specific CPU/thread. You can use Intel MPI for thread/process pinning. Add the "-env" options in the FDTD advanced configuration on Windows.
      Best Lito

    • 16arnoldk
      Subscriber
      Hi Lito Thanks a lot! This looks like the exact solution!
      Best KP
    • Lito
      Ansys Employee
      You are welcome.
      Best Lito
Viewing 13 reply threads
  • The topic ‘Computer suitable for Lumerical simulation’ is closed to new replies.