-
-
May 6, 2021 at 5:04 pmxyjxyj1992Subscriber
Hello, we are trying to get a more powerful PC for running lumerical simulation. Does anyone have the recommendation for CPUs or RAMs which can help accelerate the simulation or just go for higher volume and faster clock rate? Is there any setting which can also help increase the calculation efficiency (like letting GPU come into play)? Thank you so much!!
May 6, 2021 at 6:03 pmLitoAnsys EmployeeLumerical does not support GPU acceleration. Best option is a multi-processor computer. These systems provide good performance because each processor has it's own memory bus connection to the RAM.Simulation speeds are typically limited by the memory bandwidth between RAM and processor, so having one memory bus per processor means the simulation speed tends to scale well with the number of processors. See this this post for more information. This post on our FDTD tests/benchmarks might come handy.
Best Lito
May 7, 2021 at 2:25 amxyjxyj1992SubscriberThank you so much for your reply! Do you mean more cores (or threads) the better or do you mean more than one CPU in a computer can help accelrate the simulation?
May 8, 2021 at 12:59 amLitoAnsys EmployeeFrom the article, I think the more memory channels supported by your CPU the better. e.g. an 8 core system with 8 memory channels will perform better than one with only 2 or 4 memory channels. Dual sockets/CPUs might offer better CPU-Memory bandwidth than a single CPU system.
July 28, 2021 at 5:17 pmxyjxyj1992SubscriberHi Lito, I actually have another question regarding the benchmark scores on this website /forum/discussion/25782/lumerical-fdtd-simulation-benchmarks. Is the results of M5.24x large 1130Mnodes/s with two CPUs running simltaneously or just with a single CPU?
July 28, 2021 at 7:11 pmJuly 29, 2021 at 4:25 pmxyjxyj1992SubscriberI see, thanks for your reply, does that mean with single 8175 CPU, you can only reach 565Mnodes/s?
July 29, 2021 at 4:39 pmLitoAnsys Employee,I don't think that would be the case and the difference in performance is not linear in terms of the number of CPU/cores as seen on the results from the article. Will have to test on 1 CPU to get the FDTD solver speed for the same simulation file used in the article. Speed could vary depending on the simulation and machine.
July 30, 2021 at 4:22 pmxyjxyj1992SubscriberThanks for your reply, I understand the last part. But what I really want to know here is that does 2 CPUs grant you twice of the speed or less compared to a single CPU under the same configration and the same simulation file? Thank you!
July 30, 2021 at 4:34 pmLitoAnsys EmployeeYes, 2 CPU/processor machines have better performance but not in a linear way vs the FDTD solver rate/speed. i.e. doubling the bandwidth does not mean that the FDTD solver rate will double. As seen in our example increasing the number of processes/threads used to run the simulation is not linear in terms of the FDTD solver rate. Running the simulation from 8 to 16 to 36 processes does not provide a linear performance increase.
September 22, 2021 at 5:20 pm16arnoldkSubscriberHi Lito This line of conversation and supporting documentation has been very helpful to understand the differences in performance for various configurations. It seems that it is almost always the case that a dual processor configuration would achieve better performance than a single processor (with double the specs). However, this brings me to a new issue which I have not yet been able to uncover an answer to:
Is there a way to specify whether a single process is applied half each to the two CPUs in a dual setup vs. only on one of the CPUs in a dual setup?
This becomes important when considering the allocation of solvers (which we have a finite supply of). If running a gradient / particle swarm optimization or a sweep with several simulations in parallel, I notice that dual processor machines will take up solvers following: #solvers = 2*#simulations running concurrently. Thus, for a modest number of particles in one generation, we quickly add up to our limit.
While dividing tasks across both processors is advantageous for a one-off simulation, I would argue that this benefit is reduced when considering many simulations at once. There should be very little performance difference between (for an example of running 8 simulations in parallel with access to dual quad core processors) running half of the 8 simulations across 8 cores (4 from each processor) vs. running 4 simulations across 4 cores of one processor and 4 across 4 for the other. So, is this something that I can control in the FDTD resource manager? Or perhaps elsewhere?
Thanks!
KP
September 22, 2021 at 8:41 pmLitoAnsys EmployeeI think you are talking about pinning the process to a specific CPU/thread. You can use Intel MPI for thread/process pinning. Add the "-env" options in the FDTD advanced configuration on Windows.
Best Lito
September 23, 2021 at 2:13 pm16arnoldkSubscriberHi Lito Thanks a lot! This looks like the exact solution!
Best KP
September 23, 2021 at 10:49 pmLitoAnsys EmployeeYou are welcome.
Best Lito
Viewing 13 reply threads- The topic ‘Computer suitable for Lumerical simulation’ is closed to new replies.
Ansys Innovation SpaceTrending discussions- Difference between answers in version 2024 and 2017 lumerical mode solution
- Errors Running Ring Modulator Example on Cluster
- INTERCONNECT – No results unless rerun simulation until it gives any
- Import material .txt file with script
- Help for qINTERCONNECT
- Trapezoidal ring
- Issues with getting result from interconnent analysis script
- Topology Optimization Error
- Edge Coupler EME Example Issue
- How to measure transmission coefficients on a given plane .
Top Contributors-
1216
-
543
-
523
-
225
-
209
Top Rated Tags© 2024 Copyright ANSYS, Inc. All rights reserved.
Ansys does not support the usage of unauthorized Ansys software. Please visit www.ansys.com to obtain an official distribution.
-