{"id":182748,"date":"2024-07-11T13:56:29","date_gmt":"2024-07-11T13:56:29","guid":{"rendered":"https:\/\/innovationspace.ansys.com\/knowledge\/?post_type=topic&#038;p=182748"},"modified":"2026-05-05T15:16:19","modified_gmt":"2026-05-05T15:16:19","slug":"rocky-gpu-buying-guide","status":"publish","type":"topic","link":"https:\/\/innovationspace.ansys.com\/knowledge\/forums\/topic\/rocky-gpu-buying-guide\/","title":{"rendered":"Rocky GPU Buying Guide"},"content":{"rendered":"<p style=\"text-align: center\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-198445 size-full\" src=\"https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/Rocky-GPU-Buying-Guide.png\" alt=\"\" width=\"1920\" height=\"650\" srcset=\"https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/Rocky-GPU-Buying-Guide.png 1920w, https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/Rocky-GPU-Buying-Guide-300x102.png 300w, https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/Rocky-GPU-Buying-Guide-1024x347.png 1024w, https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/Rocky-GPU-Buying-Guide-768x260.png 768w, https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/Rocky-GPU-Buying-Guide-1536x520.png 1536w, https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/Rocky-GPU-Buying-Guide-24x8.png 24w, https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/Rocky-GPU-Buying-Guide-36x12.png 36w, https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/Rocky-GPU-Buying-Guide-48x16.png 48w\" sizes=\"auto, (max-width: 1920px) 100vw, 1920px\" \/><\/p>\n<p>&nbsp;<\/p>\n<blockquote>\n<p style=\"text-align: center\"><em>With <strong>Ansys Rocky\u2122 <\/strong><\/em>particle dynamics simulation software<em><strong>\u00a0<\/strong> you can use one or more <strong>Graphic Processing Units (GPUs)<\/strong> to process your simulations.\u00a0<\/em><em>Before investing in new hardware, see the FAQs below to find guidelines and recommendations.<\/em><\/p>\n<h3 class=\"attachment\" style=\"background-color: #fedb8d;padding: 1%;font-weight: 300;display: flex;border-radius: 15px;margin-bottom: 20px;height: auto !important\"><b><a href=\"https:\/\/www.ansys.com\/blog\/mastering-multi-gpu-ansys-rocky-software-enhancing-its-performance\">Mastering Multi-GPU in Ansys Rocky Software and Enhancing Its Performance<\/a><br \/>\n<\/b><\/h3>\n<p>&nbsp;<\/p><\/blockquote>\n<h3  id=\"ROCKY-GPU-PERFORMANCE-BENCHMARK\"><strong><span style=\"font-size: 50px;font-weight: 900;color: #fedb8d\">\/<\/span> Rocky GPU Performance Benchmark<\/strong><\/h3>\n<ol>\n<li><a href=\"#Rockyperfbench\">Rocky GPU Performance Benchmark<\/a><\/li>\n<li><a href=\"#RockyGPU1\">The benefits of GPU<\/a><\/li>\n<li><a href=\"#RockyGPU2\">Performance Benchmark<\/a><\/li>\n<li><a href=\"#RockyGPU3\">Benchmark results for Ansys Rocky 2026 R1<\/a><\/li>\n<li><a href=\"#RockyGPU4\">Relevant conclusions on simulation performance<\/a><\/li>\n<li><a href=\"#RockyGPU6\">Rocky GPU Performance Benchmark \u2013 DEM case (Distributed parallel computing)<\/a><\/li>\n<li><a href=\"#RockyGPU7\">Rocky GPU Performance Benchmark \u2013 CFD-DEM case<\/a><\/li>\n<li><a href=\"#RockyGPU8\">Rocky GPU Performance Benchmark \u2013 SPH case<\/a><\/li>\n<\/ol>\n<h3  id=\"ROCKY-GPU-FAQS\"><strong><span style=\"font-size: 50px;font-weight: 900;color: #fedb8d\">\/<\/span> Rocky GPU FAQs<\/strong><\/h3>\n<p><strong><a href=\"#RockyFAQs\">Rocky GPU FAQs<\/a><\/strong><\/p>\n<ol>\n<li><strong><a href=\"#FAQ1\">Which license is required to run Rocky on GPUs?<\/a><\/strong><\/li>\n<li><strong><a href=\"#FAQ2\">Which GPU cards are recommended for use with Rocky?<\/a><\/strong><\/li>\n<li><strong><a href=\"#FAQ3\">What are the minimum requirements for GPU cards that will be used for running Rocky?<\/a><\/strong><\/li>\n<li><strong><a href=\"#FAQ4\">What cards are best for running only spherical particles? What about cases using shaped particles?<\/a><\/strong><\/li>\n<li><strong><a href=\"#FAQ5\">Which cards are best for running SPH?<\/a><\/strong><\/li>\n<li><strong><a href=\"#FAQ6\">Can you provide some examples for comparison?<\/a><\/strong><\/li>\n<li><strong><a href=\"#FAQ7\">There are a lot of cards on that list! How do I choose the one that is right for me?<\/a><\/strong><\/li>\n<li><strong><a href=\"#FAQ8\">I have only a mid-range budget. Can you recommend a card for me?<\/a><\/strong><\/li>\n<li><strong><a href=\"#FAQ9\">If you had to recommend one, all-around best card for most situations, which would it be?<\/a><\/strong><\/li>\n<li><strong><a href=\"#FAQ10\">Won\u2019t the (non-recommended) card I already have work just as well as a recommended one?<\/a><\/strong><\/li>\n<li><strong><a href=\"#FAQ11\">Assuming I use a recommended GPU card, how much faster can I expect my simulations to run?<\/a><\/strong><\/li>\n<\/ol>\n<div id=\"RockyFAQs\"><\/div>\n<div>\n<hr \/>\n<\/div>\n<div><\/div>\n<h2  id=\"ROCKY-GPU-FAQS\"><strong><span style=\"font-size: 50px;font-weight: 900;color: #fedb8d\">\/<\/span> Rocky GPU FAQs<\/strong><\/h2>\n<div id=\"FAQ1\"><\/div>\n<h3  id=\"1-WHICH-LICENSE-IS-REQUIRED-TO-RUN-ROCKY-ON-GPUS\"><strong><span style=\"font-size: 50px;font-weight: 900;color: #fedb8d\">\/<\/span>1. Which license is required to run Rocky on GPUs?<\/strong><\/h3>\n<p>The Ansys Rocky base product license allows you to run a single job with up to 112 graphic cards SM&#8217;s (Streaming Multiprocessor). It is indifferent whether this is with a single or multiple GPU cards.<\/p>\n<p>For example, if you have one A100 card (108 SMs), you can run your Rocky simulation without needing any additional HPC license. In the same way, if you have four RTX 3060 cards (28 SM\u2019s each), you can run on multi-GPU as the total SM\u2019s count in this case is 112.*<\/p>\n<p>To run any job with more than 112 SM&#8217;s, you need to add Ansys Rocky HPC Licensing. 1 Ansys Rocky HPC task enables 14 SM&#8217;s, e.g, 1 Ansys Rocky HPC-8 license includes 8 tasks, that enables 112 additional SM\u2019s.<\/p>\n<p>Additional SM\u2019s can be enabled with Ansys Rocky HPC Licensing.<\/p>\n<p>1 Ansys Rocky HPC task enables 14 SM\u2019s, e.g, 1 Ansys Rocky HPC-8 license includes 8 tasks, that enables 112 additional SM\u2019s.<\/p>\n<table>\n<tbody>\n<tr>\n<td width=\"287\">\n<p style=\"text-align: center\"><strong>SM\u2019s<\/strong><\/p>\n<\/td>\n<td width=\"288\">\n<p style=\"text-align: center\"><strong>Rocky HPC tasks<\/strong><\/p>\n<\/td>\n<\/tr>\n<tr>\n<td width=\"287\">1 \u2013 112<\/td>\n<td width=\"288\">0<\/td>\n<\/tr>\n<tr>\n<td width=\"287\">113 \u2013 224<\/td>\n<td width=\"288\">1 \u2013 8<\/td>\n<\/tr>\n<tr>\n<td width=\"287\">225 \u2013 336<\/td>\n<td width=\"288\">9 \u2013 16<\/td>\n<\/tr>\n<tr>\n<td width=\"287\">337 \u2013 560<\/td>\n<td width=\"288\">17 \u2013 32<\/td>\n<\/tr>\n<tr>\n<td width=\"287\">561 \u2013 1008<\/td>\n<td width=\"288\">33 \u2013 64<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Now consider another situation, in which you have one RTX 4090 card (128 SM\u2019s) or five RTX 3060 cards (140 SM\u2019s). In both cases you will need to invest in 1 Rocky HPC-8 License (see the table below).<\/p>\n<h3 style=\"text-align: center\" style=\"text-align: center\"  id=\"HPC-FEATURES-REQUIRED-ACCORDING-TO-THE-CARDS-SM-COUNT\">HPC features required according to the card(s) SM count<\/h3>\n<table width=\"674\">\n<tbody>\n<tr>\n<td colspan=\"3\" width=\"319\"><strong>RTX 3060<\/strong><\/td>\n<td colspan=\"3\" width=\"349\"><strong>RTX 4090<\/strong><\/td>\n<\/tr>\n<tr>\n<td width=\"76\"><strong>Cards<\/strong><\/td>\n<td width=\"95\"><strong>SM Count<\/strong><\/td>\n<td width=\"144\"><strong>Rocky HPC-8<\/strong><\/td>\n<td width=\"77\"><strong>Cards<\/strong><\/td>\n<td width=\"95\"><strong>SM Count<\/strong><\/td>\n<td width=\"173\"><strong>Rocky HPC-8<\/strong><\/td>\n<\/tr>\n<tr>\n<td width=\"76\">1<\/td>\n<td width=\"95\">28<\/td>\n<td width=\"144\">0<\/td>\n<td width=\"77\">1<\/td>\n<td width=\"95\">128<\/td>\n<td width=\"173\">1<\/td>\n<\/tr>\n<tr>\n<td width=\"76\">2<\/td>\n<td width=\"95\">56<\/td>\n<td width=\"144\">0<\/td>\n<td width=\"77\">2<\/td>\n<td width=\"95\">256<\/td>\n<td width=\"173\">2<\/td>\n<\/tr>\n<tr>\n<td width=\"76\">3<\/td>\n<td width=\"95\">84<\/td>\n<td width=\"144\">0<\/td>\n<td width=\"77\">3<\/td>\n<td width=\"95\">384<\/td>\n<td width=\"173\">3<\/td>\n<\/tr>\n<tr>\n<td width=\"76\">4<\/td>\n<td width=\"95\">112<\/td>\n<td width=\"144\">0<\/td>\n<td width=\"77\">4<\/td>\n<td width=\"95\">512<\/td>\n<td width=\"173\">4<\/td>\n<\/tr>\n<tr>\n<td width=\"76\">5<\/td>\n<td width=\"95\">140<\/td>\n<td width=\"144\">1<\/td>\n<td width=\"77\">5<\/td>\n<td width=\"95\">640<\/td>\n<td width=\"173\">5<\/td>\n<\/tr>\n<tr>\n<td width=\"76\">6<\/td>\n<td width=\"95\">168<\/td>\n<td width=\"144\">1<\/td>\n<td width=\"77\">6<\/td>\n<td width=\"95\">768<\/td>\n<td width=\"173\">6<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><span style=\"font-size: 9.0pt\">*For more information about SMs, refer to the APPENDIX section.<\/span><\/p>\n<p><strong><em>Notes:<\/em><\/strong><\/p>\n<ul>\n<li><em>When using multiple GPU\u2019s, licensing is based on the total number of SM\u2019s across all GPU\u2019s irrespective of the number of GPU\u2019s.<\/em><\/li>\n<li><em>All available SM\u2019s are used on a GPU card. It is not possible to restrict usage to a subset of SM\u2019s.<\/em><\/li>\n<li><em>Since 2026 R1 Rocky has support for GPU distributed computing (only for sphere particles with no coupling and Linux-only). However, this is a Beta feature and work is ongoing. <\/em><\/li>\n<li><em>We advise that all GPU cards reside on a single server, because Ansys Rocky does not fully support distributed GPU computing.<\/em><\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<div id=\"FAQ2\"><\/div>\n<h3  id=\"2-WHICH-GPU-CARDS-ARE-RECOMMENDED-FOR-USE-WITH-ROCKY\"><strong><span style=\"font-size: 50px;font-weight: 900;color: #fedb8d\">\/<\/span>2. Which GPU cards are recommended for use with Rocky?<\/strong><\/h3>\n<p>Rocky team considers that the following NVIDIA GPU cards may be interesting for Rocky Solver:<\/p>\n<ul>\n<li><strong>Server<\/strong>: A30, A100, L40, H100, H100 NVL and H200.\n<ul>\n<li><strong>PROS<\/strong>: Faster when using spherical and\/or shaped particles and\/or SPH elements<\/li>\n<li><strong>CONS: <\/strong>More expensive; must be installed on a server rack; no video output<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<ul>\n<li><strong>Workstation<\/strong>: Quadro RTX A6000, RTX A2000, RTX A4000, RTX A5000\n<ul>\n<li><strong>PROS:<\/strong> Faster when using only spherical particles and\/or SPH elements; can be installed on individual workstations; has video output<\/li>\n<li><strong>CONS:<\/strong> Cost is still high expensive<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<ul>\n<li><strong>Gaming<\/strong>: RTX 3060, RTX 3070, RTX 4060, RTX 4090 and RTX5060\n<ul>\n<li><strong>PROS<\/strong>: Faster when using only spherical particles and\/or SPH elements; inexpensive; can be installed on individual workstations; has video output<\/li>\n<li><strong>CONS:<\/strong> Slower when using shaped particles<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p>For better results, use only the above recommended GPU cards during Rocky processing.<\/p>\n<p>Gaming cards can have good performance when running small cases with spherical particles and\/or SPH elements but may not be the best choice for simulations with shaped particles.<\/p>\n<div id=\"FAQ3\"><\/div>\n<h3  id=\"3-WHAT-ARE-THE-MINIMUM-REQUIREMENTS-FOR-GPU-CARDS-THAT-WILL-BE-USED-FOR-RUNNING-ROCKY\"><strong><span style=\"font-size: 50px;font-weight: 900;color: #fedb8d\">\/<\/span>3. What are the minimum requirements for GPU cards that will be used for running Rocky?<\/strong><\/h3>\n<p>There are some minimum requirements for GPU or multi-GPU processing, and you must choose one or more NVIDIA GPU cards (computing or gaming), according to the following criteria:<\/p>\n<p>At least 4 GB memory.<\/p>\n<p>Fast double-precision processing capabilities.<\/p>\n<p>A CUDA compute capability of 6.0 or higher.<\/p>\n<p>A graphics driver version that supports the CUDA version 12.8 toolkit or higher.<\/p>\n<p><em>(Access Nvidia website to see a CUDA driver table with a list of which driver version supports which toolkit version)<\/em><\/p>\n<div id=\"FAQ4\"><\/div>\n<h3  id=\"4-WHAT-CARDS-ARE-BEST-FOR-RUNNING-ONLY-SPHERICAL-PARTICLES-WHAT-ABOUT-CASES-USING-SHAPED-PARTICLES\"><strong><span style=\"font-size: 50px;font-weight: 900;color: #fedb8d\">\/<\/span>4. What cards are best for running only spherical particles? What about cases using shaped particles?<\/strong><\/h3>\n<p>Regarding particle shapes, here are some guidelines:<\/p>\n<p>When running cases with shaped particles, choosing GPUs with higher <strong>double-precision performance<\/strong> should be your primary focus.<\/p>\n<p>When running cases using only spherical particles, choosing GPUs with<strong> higher memory bandwidth<\/strong> will get you faster results in your processing.<\/p>\n<p>If you intend to run very large cases, with millions of particles, you should consider GPUs with<strong> larger memory size.<\/strong><\/p>\n<p>Consider cards that have poor double precision, but considerable memory bandwidth performance. This means that they will perform very well when simulating only spherical particles, but very poorly with shaped particles. This is a critical point when you are deciding which card to purchase.<\/p>\n<div id=\"FAQ5\"><\/div>\n<h3  id=\"5-WHICH-CARDS-ARE-BEST-FOR-RUNNING-SPH\"><strong><span style=\"font-size: 50px;font-weight: 900;color: #fedb8d\">\/<\/span>5. Which cards are best for running SPH?<\/strong><\/h3>\n<p>For simulation with only SPH elements, choose a GPU with high single-precision performance and higher memory bandwidth so you will speed up your simulations. GPUs with larger memory allow you to run bigger cases with millions of SPH elements, so keep it in mind when selecting the hardware.<\/p>\n<p>If you are going to run simulations with both SPH elements and DEM particles together, you must take the tips from the last section into account, since the performance bottleneck can be either the SPH or the DEM, depending on the element\/particle amount and the particle shape.<\/p>\n<h3  id=\"6-CAN-YOU-PROVIDE-SOME-EXAMPLES-FOR-COMPARISON\"><strong><span style=\"font-size: 50px;font-weight: 900;color: #fedb8d\">\/<\/span>6. Can you provide some examples for comparison?<\/strong><\/h3>\n<p>The table below show that the RTX 5060 Ti is about than 23% faster than the RTX 5060 attributed to its higher double-precision, with the Ti model having two times memory. If you look compare H100 and H100 NVL you can have almost two times more memory bandwidth buying the NVL version with just 20% more investment. Also, you can get almost 20% more double precision (Gflops) using the NVL version.<\/p>\n<p>Comparing the cards RTX 3090 with the RTX 3090 Ti, both have the same memory size, and the Ti version is 12% faster. Despite the performance gain not being too substantial, probably, the cost is not significant and in this case the Ti version would be a better choice. Meanwhile, if the performance is a bottle neck, the RTX 4090 could be considered as an option, as it is 2x faster than the RTX 3090 with the same memory size. In this case, an assessment of the pros and cons is required, as the RTX 4090 has a higher cost and requires a HPC license due to its SM count.<\/p>\n<h3  id=\"7-THERE-ARE-A-LOT-OF-CARDS-ON-THAT-LIST-HOW-DO-I-CHOOSE-THE-ONE-THAT-IS-RIGHT-FOR-ME\"><strong><span style=\"font-size: 50px;font-weight: 900;color: #fedb8d\">\/<\/span>7. There are a lot of cards on that list! How do I choose the one that is right for me?<\/strong><\/h3>\n<p>Choosing the card that will work best for you depends upon the type of simulations you will be running, how fast you need those simulations to complete, and the budget available to spend on your hardware.<\/p>\n<p>The below tables provide a quick comparison of the most common workstation, server and gaming cards.<\/p>\n<p><em>*Last update March 2026. Prices are estimated and can vary from region to region, market demand and other reasons.<\/em><\/p>\n<table style=\"height: 1571px\" width=\"880\">\n<tbody>\n<tr>\n<td width=\"115\"><\/td>\n<td width=\"76\"><strong>Card Name<\/strong><\/td>\n<td width=\"83\"><strong>Memory Size (GB)<\/strong><\/td>\n<td width=\"101\"><strong>Memory Bandwidth (GB\/s)<\/strong><\/td>\n<td width=\"48\"><strong>SMs<\/strong><\/td>\n<td width=\"85\"><strong>Single Precision (Tflops)<\/strong><\/td>\n<td width=\"89\"><strong>Double Precision (Gflops)<\/strong><\/td>\n<td width=\"124\"><strong>Estimated Purchase Price* (USD)<\/strong><\/td>\n<\/tr>\n<tr>\n<td rowspan=\"7\" width=\"115\"><strong>Workstation Cards<\/strong><\/p>\n<p>&nbsp;<\/td>\n<td width=\"76\">RTX A6000<\/td>\n<td width=\"83\">48<\/td>\n<td width=\"101\">768<\/td>\n<td width=\"48\">84<\/td>\n<td width=\"85\">38.71<\/td>\n<td width=\"89\">605<\/td>\n<td width=\"124\">4,600 \u2013 5,200<\/td>\n<\/tr>\n<tr>\n<td width=\"76\">RTX 6000 Ada<\/td>\n<td width=\"83\">48<\/td>\n<td width=\"101\">960<\/td>\n<td width=\"48\">142<\/td>\n<td width=\"85\">91<\/td>\n<td width=\"89\">1423<\/td>\n<td width=\"124\">6,800 \u2013 7,500<\/td>\n<\/tr>\n<tr>\n<td width=\"76\">RTX A2000<\/td>\n<td width=\"83\">12<\/td>\n<td width=\"101\">288<\/td>\n<td width=\"48\">26<\/td>\n<td width=\"85\">7.9<\/td>\n<td width=\"89\">124.8<\/td>\n<td width=\"124\">450 \u2013 600<\/td>\n<\/tr>\n<tr>\n<td width=\"76\">RTX A4000<\/td>\n<td width=\"83\">16<\/td>\n<td width=\"101\">448<\/td>\n<td width=\"48\">48<\/td>\n<td width=\"85\">19.2<\/td>\n<td width=\"89\">299.5<\/td>\n<td width=\"124\">800 \u2013 1,100<\/td>\n<\/tr>\n<tr>\n<td width=\"76\">RTX A5000<\/td>\n<td width=\"83\">24<\/td>\n<td width=\"101\">768<\/td>\n<td width=\"48\">64<\/td>\n<td width=\"85\">27.8<\/td>\n<td width=\"89\">433.9<\/td>\n<td width=\"124\">2,200 \u2013 2,500<\/td>\n<\/tr>\n<tr>\n<td width=\"76\">RTX PRO 2000<\/td>\n<td width=\"83\">16<\/td>\n<td width=\"101\">288<\/td>\n<td width=\"48\">34<\/td>\n<td width=\"85\">17.03<\/td>\n<td width=\"89\">266.2<\/td>\n<td width=\"124\">800 &#8211; 840<\/td>\n<\/tr>\n<tr>\n<td width=\"76\">RTX PRO 4000<\/td>\n<td width=\"83\">24<\/td>\n<td width=\"101\">432<\/td>\n<td width=\"48\">70<\/td>\n<td width=\"85\">24.05<\/td>\n<td width=\"89\">375.8<\/td>\n<td width=\"124\">1700 &#8211; 2000<\/td>\n<\/tr>\n<tr>\n<td colspan=\"6\" width=\"518\"><\/td>\n<td width=\"89\"><\/td>\n<td width=\"124\"><\/td>\n<\/tr>\n<tr>\n<td rowspan=\"7\" width=\"115\"><strong>Server Cards<\/strong><\/td>\n<td width=\"76\">A30<\/td>\n<td width=\"83\">24<\/td>\n<td width=\"101\">930<\/td>\n<td width=\"48\">56<\/td>\n<td width=\"85\">10.3<\/td>\n<td width=\"89\">5161<\/td>\n<td width=\"124\">3,500 \u2013 5,000<\/td>\n<\/tr>\n<tr>\n<td width=\"76\">A100<\/td>\n<td width=\"83\">40<\/td>\n<td width=\"101\">1555<\/td>\n<td width=\"48\">108<\/td>\n<td width=\"85\">19.5<\/td>\n<td width=\"89\">9746<\/td>\n<td width=\"124\">8,000 \u2013 11,000<\/td>\n<\/tr>\n<tr>\n<td width=\"76\">A100 80GB<\/td>\n<td width=\"83\">80<\/td>\n<td width=\"101\">1935<\/td>\n<td width=\"48\">108<\/td>\n<td width=\"85\">19.5<\/td>\n<td width=\"89\">9746<\/td>\n<td width=\"124\">12,000 \u2013 17,000<\/td>\n<\/tr>\n<tr>\n<td width=\"76\">H100<\/td>\n<td width=\"83\">80<\/td>\n<td width=\"101\">2039<\/td>\n<td width=\"48\">114<\/td>\n<td width=\"85\">51.22<\/td>\n<td width=\"89\">25610<\/td>\n<td width=\"124\">25,000 \u2013 32,000<\/td>\n<\/tr>\n<tr>\n<td width=\"76\">H100 NVL<\/td>\n<td width=\"83\">94<\/td>\n<td width=\"101\">3940<\/td>\n<td width=\"48\">132<\/td>\n<td width=\"85\">60.32<\/td>\n<td width=\"89\">30160<\/td>\n<td width=\"124\">30,000 \u2013 38,000<\/td>\n<\/tr>\n<tr>\n<td width=\"76\">L40<\/td>\n<td width=\"83\">48<\/td>\n<td width=\"101\">864<\/td>\n<td width=\"48\">142<\/td>\n<td width=\"85\">90.52<\/td>\n<td width=\"89\">1414<\/td>\n<td width=\"124\">9,500 \u2013 11,000<\/td>\n<\/tr>\n<tr>\n<td width=\"76\">H200<\/td>\n<td width=\"83\">141<\/td>\n<td width=\"101\">4800<\/td>\n<td width=\"48\">132<\/td>\n<td width=\"85\">60.32<\/td>\n<td width=\"89\">30160<\/td>\n<td width=\"124\">31,000 \u2013 42,000<\/td>\n<\/tr>\n<tr>\n<td colspan=\"6\" width=\"518\"><\/td>\n<td width=\"89\"><\/td>\n<td width=\"124\"><\/td>\n<\/tr>\n<tr>\n<td rowspan=\"11\" width=\"115\"><strong>Gaming Cards<\/strong><\/td>\n<td width=\"76\">RTX 3060 Ti<\/td>\n<td width=\"83\">8<\/td>\n<td width=\"101\">448<\/td>\n<td width=\"48\">38<\/td>\n<td width=\"85\">16.2<\/td>\n<td width=\"89\">253.1<\/td>\n<td width=\"124\">250 \u2013 300 (Used)<\/td>\n<\/tr>\n<tr>\n<td width=\"76\">RTX 3070<\/td>\n<td width=\"83\">8<\/td>\n<td width=\"101\">448<\/td>\n<td width=\"48\">46<\/td>\n<td width=\"85\">20.31<\/td>\n<td width=\"89\">317.4<\/td>\n<td width=\"124\">300 \u2013 400 (Used)<\/td>\n<\/tr>\n<tr>\n<td width=\"76\">RTX 3070 Ti<\/td>\n<td width=\"83\">8<\/td>\n<td width=\"101\">608.3<\/td>\n<td width=\"48\">48<\/td>\n<td width=\"85\">21.75<\/td>\n<td width=\"89\">339.8<\/td>\n<td width=\"124\">300 \u2013 400 (Used)<\/td>\n<\/tr>\n<tr>\n<td width=\"76\">RTX 3080<\/td>\n<td width=\"83\">10<\/td>\n<td width=\"101\">760<\/td>\n<td width=\"48\">68<\/td>\n<td width=\"85\">29.77<\/td>\n<td width=\"89\">465.1<\/td>\n<td width=\"124\">450 \u2013 600 (Used)<\/td>\n<\/tr>\n<tr>\n<td width=\"76\">RTX 3080 Ti<\/td>\n<td width=\"83\">12<\/td>\n<td width=\"101\">912.4<\/td>\n<td width=\"48\">80<\/td>\n<td width=\"85\">34.1<\/td>\n<td width=\"89\">532.8<\/td>\n<td width=\"124\">450 \u2013 600 (Used)<\/td>\n<\/tr>\n<tr>\n<td width=\"76\">RTX 3090<\/td>\n<td width=\"83\">24<\/td>\n<td width=\"101\">936.2<\/td>\n<td width=\"48\">82<\/td>\n<td width=\"85\">35.58<\/td>\n<td width=\"89\">556<\/td>\n<td width=\"124\">700 \u2013 900 (Used)<\/td>\n<\/tr>\n<tr>\n<td width=\"76\">RTX 3090 Ti<\/td>\n<td width=\"83\">24<\/td>\n<td width=\"101\">1008<\/td>\n<td width=\"48\">84<\/td>\n<td width=\"85\">40<\/td>\n<td width=\"89\">625<\/td>\n<td width=\"124\">700 \u2013 900 (Used)<\/td>\n<\/tr>\n<tr>\n<td width=\"76\">RTX 4090<\/td>\n<td width=\"83\">24<\/td>\n<td width=\"101\">1008<\/td>\n<td width=\"48\">128<\/td>\n<td width=\"85\">82.58<\/td>\n<td width=\"89\">1290<\/td>\n<td width=\"124\">1,600 \u2013 1,900<\/td>\n<\/tr>\n<tr>\n<td width=\"76\">RTX 5060<\/td>\n<td width=\"83\">8<\/td>\n<td width=\"101\">448<\/td>\n<td width=\"48\">30<\/td>\n<td width=\"85\">19.18<\/td>\n<td width=\"89\">299.6<\/td>\n<td width=\"124\">320 \u2013 380<\/td>\n<\/tr>\n<tr>\n<td width=\"76\">RTX 5060 Ti<\/td>\n<td width=\"83\">16<\/td>\n<td width=\"101\">448<\/td>\n<td width=\"48\">36<\/td>\n<td width=\"85\">23.7<\/td>\n<td width=\"89\">370.4<\/td>\n<td width=\"124\">450 \u2013 550<\/td>\n<\/tr>\n<tr>\n<td width=\"76\">RTX 5090<\/td>\n<td width=\"83\">32<\/td>\n<td width=\"101\">233.75<\/td>\n<td width=\"48\">170<\/td>\n<td width=\"85\">104.8<\/td>\n<td width=\"89\">1637<\/td>\n<td width=\"124\">1,999 \u2013 2,500<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h3  id=\"8-I-HAVE-ONLY-A-MID-RANGE-BUDGET-CAN-YOU-RECOMMEND-A-CARD-FOR-ME\"><strong><span style=\"font-size: 50px;font-weight: 900;color: #fedb8d\">\/<\/span>8. I have only a mid-range budget. Can you recommend a card for me?<\/strong><\/h3>\n<p>On the mid-price range of cards, there are the two generations of RTX\u2019s workstation cards: A6000 and A6000 Ada. Both have same memory size (48 GB) and poor double precision compared to server cards. In the other hand, the card A30 has half memory (24 GB) but blazing-fast double precision. A good option for a mid-range budget could be the A100, which performs highly in double precision, has good memory (40 GB) and has great memory bandwidth (1555 GB\/s).<\/p>\n<p>Thus, you need to choose what you need: larger memory (better for running larger cases with only spherical particles or with SPH elements) or faster double precision (better for running cases with shaped particles).<\/p>\n<p>Another interesting GPU card to look at is the L40 (good memory and double precision). Its cost is higher than A6000, however the performance could compensate the extra investment.<\/p>\n<div id=\"FAQ9\"><\/div>\n<h3  id=\"9-IF-YOU-HAD-TO-RECOMMEND-ONE-ALL-AROUND-BEST-CARD-FOR-MOST-SITUATIONS-WHICH-WOULD-IT-BE\"><strong><span style=\"font-size: 50px;font-weight: 900;color: #fedb8d\">\/<\/span>9. If you had to recommend one, all-around best card for most situations, which would it be?<\/strong><\/h3>\n<p>All in all, the A100 is by far the Rocky team\u2019s preferred choice. It has a good amount of memory, blazing-fast double precision, and it delivers the most in terms of processing capacity given its cost.<\/p>\n<p>And if it turns out your simulation does not fit onto a single GPU, you can always use Rocky\u2019s support for multi-GPU to stack-up the GPU\u2019s combined memory.<\/p>\n<div id=\"FAQ10\"><\/div>\n<h3  id=\"10-WONT-THE-NON-RECOMMENDED-CARD-I-ALREADY-HAVE-WORK-JUST-AS-WELL-AS-A-RECOMMENDED-ONE\"><strong><span style=\"font-size: 50px;font-weight: 900;color: #fedb8d\">\/<\/span>10. Won\u2019t the (non-recommended) card I already have work just as well as a recommended one?<\/strong><\/h3>\n<p>Different GPU cards can have one order of magnitude difference in performance, which is why we have recommended only the cards that will have the best performance with Rocky. Just because Rocky appears to run fine on a non-recommended GPU card, does not mean that it is helping the processing performance. And if it is not helping the performance, then there is no point in running your simulations on GPUs.<\/p>\n<p>To see for yourself the huge range of performance differences, visit the Nvidia and review the Processing Power \/ Single Precision \/ Double Precision of the GPUs cards.<\/p>\n<div id=\"FAQ11\"><\/div>\n<h3  id=\"11-ASSUMING-I-USE-A-RECOMMENDED-GPU-CARD-HOW-MUCH-FASTER-CAN-I-EXPECT-MY-SIMULATIONS-TO-RUN\"><strong><span style=\"font-size: 50px;font-weight: 900;color: #fedb8d\">\/<\/span>11. Assuming I use a recommended GPU card, how much faster can I expect my simulations to run?<\/strong><\/h3>\n<p>Compared to a CPU with 8 cores, adding even one GTX 980 has been shown to speed up the processing time 5 fold; add in three P100s and what was once a 3-day simulation can be completed in just over an hour. But it all depends upon what you are simulating, how large your case is, and how much budget you have.<\/p>\n<p>&nbsp;<\/p>\n<h3 content_id=\"section1\" content_id=\"section1\"  id=\"APPENDIX-WHAT-ARE-STREAMING-MULTIPROCESSORS-SMS\">Appendix: What are Streaming Multiprocessors (SMs)?<\/h3>\n<p>Streaming Multiprocessors (SMs) are key components of the NVIDIA GPU\u2019s responsible for executing parallel computations, perform tasks related to rendering and other general-purpose computing. A SM consists of multiple CUDA cores and more powerful GPU cards typically contain more SM\u2019s.<\/p>\n<p>&nbsp;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-182937\" src=\"https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/gpu-guide-300x109.png\" alt=\"\" width=\"881\" height=\"320\" srcset=\"https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/gpu-guide-300x109.png 300w, https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/gpu-guide-1024x373.png 1024w, https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/gpu-guide-768x279.png 768w, https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/gpu-guide-1536x559.png 1536w, https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/gpu-guide-2048x745.png 2048w, https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/gpu-guide-24x9.png 24w, https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/gpu-guide-36x13.png 36w, https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/gpu-guide-48x17.png 48w\" sizes=\"auto, (max-width: 881px) 100vw, 881px\" \/><\/p>\n<p style=\"text-align: left\"><em>GH100 Full GPU architecture with 144 SMs <\/em><\/p>\n<div id=\"Rockyperfbench\"><\/div>\n<h2  id=\"ROCKY-GPU-PERFORMANCE-BENCHMARK\"><strong><span style=\"font-size: 50px;font-weight: 900;color: #fedb8d\">\/<\/span>Rocky GPU Performance Benchmark\u00a0<\/strong><\/h2>\n<p>In the past, DEM simulations were restricted to relatively small problems that used, for example, only thousands of larger particles that were mostly spherical in shape.<\/p>\n<p>Continual improvements in both DEM codes and computational power have enabled closer-to-reality particle simulations. Users today can expect to simulate problems using the real particle shape and the actual particle size distribution (PSD), creating DEM simulations with many millions of particles.<\/p>\n<p>However, these enhancements in simulation accuracy have come at the cost of increased computational loads in both processing time and memory requirements. Within Rocky, these loads can be offset considerably by using GPU processing abilities, which provides users with the capacity to obtain results in a more practical time frame.<\/p>\n<div id=\"RockyGPU1\"><\/div>\n<h3 content_id=\"section1\" content_id=\"section1\"  id=\"THE-BENEFITS-OF-GPU\">The benefits of GPU<\/h3>\n<p>The addition of GPU processing has helped to make DEM a practical tool for engineering design. For example, the speed-up experienced by processing a simulation with even an inexpensive gaming GPU is remarkable when compared to a standard 32-core CPU machine working alone.<\/p>\n<p>Since the release 4 of Rocky, users have been able to make use of multi-GPU technology capabilities, which facilitates large-scale and\/or complicated solutions that were previously impossible to tackle due to memory limitations. By combining the memory of multiple GPU cards at once, users have been able to overcome these limitations and achieve a substantial performance increase by aggregating their computing power.<\/p>\n<p>From an investment perspective, there are many benefits to multi-GPU processing. The hardware cost of running cases with several millions of particles using multiple GPUs is much smaller than buying an equivalent CPU-based machine. The energy consumption is also less with GPUs, and GPU-based machines are also easier to upgrade by adding more cards or buying newer ones.<\/p>\n<p>Moreover, in a world where we push multi-physics simulations ever farther, Rocky GPU and multi-GPU processing enables you to free-up all your CPUs for coupled simulations, avoiding hardware competition.<\/p>\n<div id=\"RockyGPU2\"><\/div>\n<h3 content_id=\"section1\" content_id=\"section1\"  id=\"PERFORMANCE-BENCHMARK\">Performance Benchmark<\/h3>\n<p>To better illustrate the gains in processing speed that are possible for common applications, a performance benchmark of a rotating drum (Figure 1) was developed. Multiple runs using different criteria were evaluated as explained below.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-183441\" src=\"https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/Picture1-1-300x188.png\" alt=\"\" width=\"648\" height=\"406\" srcset=\"https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/Picture1-1-300x188.png 300w, https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/Picture1-1-1024x641.png 1024w, https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/Picture1-1-768x481.png 768w, https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/Picture1-1-24x15.png 24w, https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/Picture1-1-36x23.png 36w, https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/Picture1-1-48x30.png 48w, https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/Picture1-1.png 1108w\" sizes=\"auto, (max-width: 648px) 100vw, 648px\" \/><\/p>\n<p><em>Figure 1 \u2013 Rotating drum benchmark case.<\/em><\/p>\n<p><strong>Criteria 1: Particle shape<\/strong><\/p>\n<p>Two different particle shapes were evaluated at the same equivalent size (Figure 2):<\/p>\n<ul>\n<li>Spheres<\/li>\n<li>Polyhedrons (shaped from 16 triangles)<\/li>\n<\/ul>\n<p>Drum geometry was lengthened as the number of particles increased to keep the material cross-section consistent across the various runs.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-183442\" src=\"https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/Picture2-300x163.png\" alt=\"\" width=\"609\" height=\"331\" srcset=\"https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/Picture2-300x163.png 300w, https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/Picture2-1024x557.png 1024w, https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/Picture2-768x418.png 768w, https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/Picture2-24x13.png 24w, https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/Picture2-36x20.png 36w, https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/Picture2-48x26.png 48w, https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/Picture2.png 1112w\" sizes=\"auto, (max-width: 609px) 100vw, 609px\" \/><\/p>\n<p><em>Figure 2 \u2013 Sphere (left) and 16-triangle polyhedron (right) particle shapes used in the benchmark case.<\/em><\/p>\n<p><strong>Criteria 2: Processing type<\/strong><\/p>\n<p>Four different processing combinations were evaluated:<\/p>\n<ul>\n<li>CPU: Intel(R) Xeon(R) Gold 6542Y @ 2.90 GHz on 48 cores<\/li>\n<li>1 GPU: NVIDIA H100, NVIDIA A100, NVIDIA L40<\/li>\n<li>2 GPUs: NVIDIA H100, NVIDIA A100, NVIDIA L40<\/li>\n<li>4 GPUs: NVIDIA L40<\/li>\n<\/ul>\n<p><strong>Criteria 3: Performance measurement<\/strong><\/p>\n<p>Two measurements were taken at steady state to evaluate performance:<\/p>\n<ul>\n<li><strong>Simulation Pace (speed up)<\/strong>, which is the amount of hardware processing time (duration) required to advance the simulation one second. In general, a lower simulation pace indicates faster processing. The simulation speed up metric is used considering the CPU pace as reference.<\/li>\n<li><strong>GPU Memory Usage<\/strong>, which is the amount of memory being used on the GPU while processing the simulation. In general, a lower memory usage allows for more particles to be processed, and\/or more calculations to be performed.<\/li>\n<\/ul>\n<p><strong>\u00a0<\/strong><\/p>\n<h3 content_id=\"section1\" content_id=\"section1\"  id=\"BENCHMARK-RESULTS-FOR-ANSYS-ROCKY-2026-R1\">Benchmark results for Ansys Rocky 2026 R1<\/h3>\n<h4 class=\"attachment\" style=\"background-color: #fedb8d;padding: 1%;font-weight: 300;height: auto!important;display: flex;border-radius: 15px;margin-bottom: 20px\"><b><a href=\"https:\/\/www.ansys.com\/content\/dam\/it-solutions\/platform-support\/2026-r1\/ansys-2026-r1-gpu-compute-capabilities.pdf\">GPUs Tested in Ansys Rocky 2026 R1<\/a><br \/>\n<\/b><\/h4>\n<div id=\"RockyGPU4\"><\/div>\n<p><strong>Relevant conclusions on simulation performance<\/strong><\/p>\n<p>The following plots, from Figure 3 to Figure 6, show the performance gains for spheres and polyhedrons for different numbers of particles, using different models and numbers of GPU.<\/p>\n<ul>\n<li>Rocky\u2019s latest CPU simulation enhancements significantly reduced execution times when running on standard processors. Compared to previous releases, the GPU speed-up is lower, but still significantly high.<\/li>\n<li>The performance gain running in GPU or multi-GPU is significant. You can achieve the same result in a multi-GPU simulation up to 41 times faster than 48-cores CPU. Using one GPU<\/li>\n<li>Excellent scalability and efficiency were observed for all GPU cards tested.<\/li>\n<\/ul>\n<p><a href=\"https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/rocky_gpu_26r1.svg\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-198423 size-full alignnone\" src=\"https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/rocky_gpu_26r1.svg\" alt=\"\" width=\"778\" height=\"371\" \/><\/a><\/p>\n<p><em>Figure 3 \u2013 GPU speed-up based upon Simulation Pace (compared with CPU 48x cores) achieved using 16 million spheres.<\/em><\/p>\n<p>&nbsp;<\/p>\n<p><a href=\"https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/rocky_gpu_26r1_2.svg\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-198424 size-full\" src=\"https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/rocky_gpu_26r1_2.svg\" alt=\"\" width=\"778\" height=\"371\" \/><\/a><\/p>\n<h3  id=\"\"><\/h3>\n<p><em>Figure 4 \u2013 GPU speed up based upon Simulation Pace (compared with CPU 48x cores) achieved using 32 million spheres.<\/em><\/p>\n<p>&nbsp;<\/p>\n<p><a href=\"https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/rocky_gpu_26r1_3.svg\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-198425 size-full\" src=\"https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/rocky_gpu_26r1_3.svg\" alt=\"\" width=\"778\" height=\"371\" \/><\/a><\/p>\n<p>&nbsp;<\/p>\n<p><em>Figure 5 \u2013 <\/em><em>GPU speed up based upon Simulation Pace (compared with CPU 48x cores) achieved using 16 million polyhedrons.<\/em><\/p>\n<p>&nbsp;<\/p>\n<p><a href=\"https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/rocky_gpu_26r1_4.svg\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-198426 size-full\" src=\"https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/rocky_gpu_26r1_4.svg\" alt=\"\" width=\"791\" height=\"373\" \/><\/a><\/p>\n<p>&nbsp;<\/p>\n<p><em>Figure 6 \u2013 <\/em><em>GPU speed up based upon Simulation Pace (compared with CPU 48x cores) achieved using 32 million polyhedrons.<\/em><\/p>\n<div id=\"RockyGPU5\"><\/div>\n<p><strong>Relevant conclusions on GPU memory consumption<\/strong><\/p>\n<p>From Figure 7 to Figure 10 the GPUs memory usage is shown for spheres and polyhedrons for different numbers of particles using different models and numbers of GPUs.<\/p>\n<ul>\n<li>For the case with 26 million sphere particles, simulations takes total memory consumption is, approximately between 23GB and 26GB. For 16 millions polyhedrons the range of GPU memory used is from 43GB to 46GB.<\/li>\n<li>Considering the simulation with 32 million sphere particles the GPU memory usage is from 47GB to 50GB, approximately. For 32 million polyhedrons the total memory consumption is up to 90GB.<\/li>\n<li>Ansys Rocky can simulate 32 million polyhedrons using 90GB of GPU memory, in other words this means that 90GB could be used to simulate about 185 million spheres particles.<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<p><a href=\"https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/rocky_gpu_26r1_5.svg\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-198427 size-full\" src=\"https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/rocky_gpu_26r1_5.svg\" alt=\"\" width=\"803\" height=\"374\" \/><\/a><\/p>\n<p><em>Figure 7 \u2013 <\/em><em>Total GPU memory consumption using 16 million spheres.<\/em><\/p>\n<p><a href=\"https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/rocky_gpu_26r1_6.svg\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-198428 size-full\" src=\"https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/rocky_gpu_26r1_6.svg\" alt=\"\" width=\"803\" height=\"374\" \/><\/a><\/p>\n<p><em>Figure 8 \u2013 <\/em><em>Total GPU memory consumption using 32 million spheres.<\/em><\/p>\n<p><a href=\"https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/rocky_gpu_26r1_7.svg\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-198429 size-full\" src=\"https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/rocky_gpu_26r1_7.svg\" alt=\"\" width=\"803\" height=\"374\" \/><\/a><\/p>\n<p><em>Figure 9 \u2013 Total GPU memory consumption using 16 million polyhedrons.<\/em><\/p>\n<p><a href=\"https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/rocky_gpu_26r1_8.svg\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-198430 size-full\" src=\"https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/rocky_gpu_26r1_8.svg\" alt=\"\" width=\"803\" height=\"374\" \/><\/a><\/p>\n<p><em>Figure 10 \u2013Total GPU memory consumption using 32 million polyhedrons.<\/em><\/p>\n<p>&nbsp;<\/p>\n<div id=\"RockyGPU6\">\n<h3  id=\"ROCKY-GPU-PERFORMANCE-BENCHMARK-DEM-CASE-DISTRIBUTED-PARALLEL-COMPUTING\"><strong>Rocky GPU Performance Benchmark \u2013 DEM case (Distributed parallel computing)<\/strong><\/h3>\n<\/div>\n<h4 class=\"attachment\" style=\"background-color: #fedb8d;padding: 1%;font-weight: 300;height: auto!important;display: flex;border-radius: 15px;margin-bottom: 20px\" class=\"attachment\" style=\"background-color: #fedb8d;padding: 1%;font-weight: 300;height: auto!important;display: flex;border-radius: 15px;margin-bottom: 20px\"  id=\"IMPORTANT-NOTE-THAT-BETA-FEATURES-HAVE-NOT-BEEN-FULLY-TESTED-AND-VALIDATED-ANSYS-INC-MAKES-NO-COMMITMENT-TO-RESOLVE-DEFECTS-REPORTED-AGAINST-THESE-PROTOTYPE-FEATURES-HOWEVER-YOUR-FEED\">Important:\u00a0\u00a0Note that beta features have not been fully tested and validated. Ansys, Inc. makes no commitment to resolve defects reported against these prototype features. However, your feedback will help us improve the overall quality of the product. We will not guarantee that the projects using this beta feature will run successfully when the feature is finally released so you may, therefore, need to modify the projects.<\/h4>\n<p>Since Rocky 2026R1 release, it is possible to run Rocky software in multi-node clusters. Currently, there are a series of limitations regarding this Ansys Rocky feature. The two most significants are the particles that you can run and which hardware is possible to use. First, only sphere particles are available to be evaluated. Second, it is only possible to run GPUs only in Linux distros.<\/p>\n<p><strong>Relevant conclusions on simulation performance<\/strong><\/p>\n<p>Figure 11 and Figure 12 show the speed-up for 16 million and 32 million sphere particles.<strong> The results for 4x GPUs and for 6x GPUs are those runned in two and three nodes, respectively. <\/strong><\/p>\n<ul>\n<li>For the A100 card the scalability and the efficiency are great. For the H100, results are good, the performance can be improved by simulating a problem with more particles. Then, the solver will utilize more GPU memory and memory bandwidth. In other words, for a 16 million particle simulation, the memory bus in 4 H100, makes the CUDA cores waste too much time to exchange information.<\/li>\n<li>The speed-up for 32 million sphere particles shows an improvement in H100 efficiency. As stated in the previous item, more particles more GPU memory and more memory bandwidth. As consequence, less memory bus when running simulations with 4 H100, less time to communicate between CUDA cores and efficiency improvement.<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<p><a href=\"https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/rocky_gpu_26r1_9.svg\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-198432 size-full\" src=\"https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/rocky_gpu_26r1_9.svg\" alt=\"\" width=\"778\" height=\"371\" \/><\/a><\/p>\n<p><em>Figure 11 \u2013 GPU speed-up based upon Simulation Pace (compared with CPU 48x cores) achieved using 16 million spheres.<\/em><\/p>\n<p><a href=\"https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/rocky_gpu_26r1_10.svg\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-198433 size-full\" src=\"https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/rocky_gpu_26r1_10.svg\" alt=\"\" width=\"777\" height=\"370\" \/><\/a><\/p>\n<p><em>Figure 12 \u2013 GPU speed-up based upon Simulation Pace (compared with CPU 48x cores) achieved using 32 million spheres.<\/em><\/p>\n<p>&nbsp;<\/p>\n<p><strong>Relevant conclusions on GPU memory consumption<\/strong><\/p>\n<p>Figure 13 and Figure 14 show the memory consumption for the cases with 16 and 32 million sphere particles.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-198434 size-large\" src=\"https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/gpuimage1-1024x477.png\" alt=\"\" width=\"640\" height=\"298\" srcset=\"https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/gpuimage1-1024x477.png 1024w, https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/gpuimage1-300x140.png 300w, https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/gpuimage1-768x358.png 768w, https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/gpuimage1-1536x716.png 1536w, https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/gpuimage1-24x11.png 24w, https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/gpuimage1-36x17.png 36w, https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/gpuimage1-48x22.png 48w, https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/gpuimage1.png 1823w\" sizes=\"auto, (max-width: 640px) 100vw, 640px\" \/><\/p>\n<p><em>Figure 13 &#8211; Total GPU memory consumption using 16 million spheres.<\/em><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-198435 size-full\" src=\"https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/gpuimage3.png\" alt=\"\" width=\"1027\" height=\"478\" srcset=\"https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/gpuimage3.png 1027w, https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/gpuimage3-300x140.png 300w, https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/gpuimage3-1024x477.png 1024w, https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/gpuimage3-768x357.png 768w, https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/gpuimage3-24x11.png 24w, https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/gpuimage3-36x17.png 36w, https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/gpuimage3-48x22.png 48w\" sizes=\"auto, (max-width: 1027px) 100vw, 1027px\" \/><\/p>\n<p lang=\"EN-US\" xml:lang=\"EN-US\"><em>Figure 14 &#8211; <span data-teams=\"true\">Total GPU memory consumption using 32 million spheres.<\/span><\/em><\/p>\n<div id=\"RockyGPU7\">\n<h3  id=\"ROCKY-GPU-PERFORMANCE-BENCHMARK-CFD-DEM-CASE\"><strong>Rocky GPU Performance Benchmark \u2013 CFD-DEM case<\/strong><\/h3>\n<\/div>\n<p>A coupling between CFD (Computational Fluid Dynamics) and DEM is fundamental computing the discrete phase of systems that mix fluid and granular particles.<\/p>\n<p>For many industrial applications, a CFD-only simulation (often neglects particle-particle interactions) is physically insufficient. This happens because the interaction between particle and particle is not considered. Consequently, this approach is generally restricted to dilute flows. By integrating DEM, we can account for collisions and enduring contacts, allowing for the study of dense-phase flows. In this guide we will explore just the 1-way approach.<\/p>\n<p><strong>The benefits of GPU<\/strong><\/p>\n<p>To simulate an engineering device through coupling CFD and DEM is one of the most computationally expensive tasks. Many physical problems are unfeasible or extremely expensive to solve using CPU.<\/p>\n<p>Traditionally, solving these systems on CPUs was either cost-prohibitive or too slow for industrial design cycles. Nowadays, thanks to GPU advancements, complex coupling simulations with CFD-DEM solvers are becoming feasible. Consequently, simulation engineers can tackle relevant industrial problems using GPU.<\/p>\n<p><strong>Performance Benchmark<\/strong><\/p>\n<p>To analyze the performance of the CFD-DEM coupling, a cyclone separator problem was developed, see Figure 15. As stated in the previous section, the coupling evaluated is a 1-way coupling. Here, the Ansys Rocky solver plays a more significant role in overall simulation performance.<\/p>\n<p>Similar to the DEM benchmark, multiple runs were evaluated to gather information to assess Ansys Rocky.<\/p>\n<p lang=\"EN-US\" xml:lang=\"EN-US\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-198436 size-full\" src=\"https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/cfdimage1.png\" alt=\"\" width=\"447\" height=\"662\" srcset=\"https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/cfdimage1.png 447w, https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/cfdimage1-203x300.png 203w, https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/cfdimage1-16x24.png 16w, https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/cfdimage1-24x36.png 24w, https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/cfdimage1-32x48.png 32w\" sizes=\"auto, (max-width: 447px) 100vw, 447px\" \/><\/p>\n<p><em>Figure 15 \u2013 Cyclone benchmark case.<\/em><\/p>\n<p>&nbsp;<\/p>\n<p><strong>Criteria 1: Particle details<\/strong><\/p>\n<p>In the present problem the DEM phase is modeled by spheres particles. The discrete phase has about 185 millions of sphere particles. The size is a Particle Size Distribution as follow:<\/p>\n<table>\n<tbody>\n<tr>\n<td width=\"119\"><strong>Diameter (d)<\/strong><\/td>\n<td width=\"139\"><strong>Cumulative (%)<\/strong><\/td>\n<\/tr>\n<tr>\n<td width=\"119\">5.0 x 10<sup>-5<\/sup>m<\/td>\n<td width=\"139\">100%<\/td>\n<\/tr>\n<tr>\n<td width=\"119\">4.1&#215;10<sup>-5<\/sup>m<\/td>\n<td width=\"139\">97.5%<\/td>\n<\/tr>\n<tr>\n<td width=\"119\">3.2&#215;10<sup>-5<\/sup>m<\/td>\n<td width=\"139\">91.75%<\/td>\n<\/tr>\n<tr>\n<td width=\"119\">2.3&#215;10<sup>-5<\/sup>m<\/td>\n<td width=\"139\">77.52%<\/td>\n<\/tr>\n<tr>\n<td width=\"119\">1.4&#215;10<sup>-5<\/sup>m<\/td>\n<td width=\"139\">44.48%<\/td>\n<\/tr>\n<tr>\n<td width=\"119\">5.0&#215;10<sup>-6<\/sup>m<\/td>\n<td width=\"139\">30%<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><strong>Criteria 2: Processing type<\/strong><\/p>\n<p>Four different processing combinations were evaluated:<\/p>\n<ul>\n<li>CPU: Intel(R) Xeon(R) Gold 6542Y CPU @ 2.90 GHz on 48 cores<\/li>\n<li>1 GPU: NVIDIA H100, NVIDIA A100<\/li>\n<li>2 GPUs: NVIDIA H100, NVIDIA A100<\/li>\n<li>4 GPUs: NVIDIA L40<\/li>\n<\/ul>\n<p><strong>Criteria 3: Performance measurement<\/strong><\/p>\n<p>Two measurements were taken at steady state to evaluate performance:<\/p>\n<ul>\n<li><strong>Simulation Pace (speed up)<\/strong>, which is the amount of hardware processing time (duration) required to advance the simulation 0.2 seconds. In general, a lower simulation pace indicates faster processing. The simulation speed up metric is used considering the CPU pace as reference.<\/li>\n<li><strong>GPU Memory Usage<\/strong>, which is the amount of memory being used on the GPU while processing the simulation. In general, a lower memory usage allows for more particles to be processed, and\/or more calculations to be performed.<\/li>\n<\/ul>\n<p><strong>Benchmark results for Ansys Rocky 2026 R1 \u2013 CFD-DEM Case<\/strong><\/p>\n<p><strong>Relevant conclusions on simulation performance<\/strong><\/p>\n<p>Figure 16 shows the speed-up for the cyclone separator simulation using different combinations of GPU models and numbers.<\/p>\n<ul>\n<li>The results show a performance gain very interesting GPU performance. In a single GPU run it is possible to gain a speed-up about 18 times against a CPU run with 48 cores.<\/li>\n<li>For multi-GPU the same result can be achieved 31 times faster than using 48 CPU cores.<\/li>\n<li>The results show a great speed-up and good scalability and efficiency for a CFD-DEM simulation. Note: consider that this CFD-DEM simulation is processed with information from an external solver of Rocky (Ansys Fluent). Therefore, in this case data from Fluent needs to be passed to Rocky, which diminish the scalability and speed-up of Rocky.<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<p><a href=\"https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/rocky_gpu_26r1_11.svg\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-198438 size-full\" src=\"https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/rocky_gpu_26r1_11.svg\" alt=\"\" width=\"778\" height=\"371\" \/><\/a><\/p>\n<p><em>Figure 16 \u2013 GPU speed-up based upon Simulation Pace (compared with CPU 48x cores) for CFD-DEM simulation.<\/em><\/p>\n<p><strong>Relevant conclusions on GPU memory consumption<\/strong><\/p>\n<p>In the cyclone separator analysis there is 185 million of sphere particles. Here, the sphere particles follows a PSD, as explained before. Figure 17 shows the GPU memory usage for the present case.<\/p>\n<ul>\n<li>The case use data from Fluent to coup with Rocky. However, the memory usage shown here concerns mostly to Rocky.<\/li>\n<li>As you can see the memory usage for multi-GPU (around 92 GB) is 17% higher than single GPU (around 77GB).<\/li>\n<\/ul>\n<p><a href=\"https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/rocky_gpu_26r1_12.svg\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-198439 size-full\" src=\"https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/rocky_gpu_26r1_12.svg\" alt=\"\" width=\"804\" height=\"374\" \/><\/a><\/p>\n<p><em>Figure 17 \u2013 Total GPU memory consumption for the CFD-DEM case.<\/em><\/p>\n<p>&nbsp;<\/p>\n<div id=\"RockyGPU8\">\n<h3  id=\"ROCKY-GPU-PERFORMANCE-BENCHMARK-SPH-CASE\"><strong>Rocky GPU Performance Benchmark \u2013 SPH case<\/strong><\/h3>\n<\/div>\n<p>Smoothed Particle Hydrodynamics (SPH) is a meshless, Lagrangian computational method used to simulate the dynamics of continuum media, such as liquids and gases. Unlike traditional Grid-Based (Eulerian) methods that look at fluid passing through a fixed point, SPH follows the individual &#8220;elements&#8221; of the fluid as they move through space.<\/p>\n<p>It is well-suited to evaluate free surface flows, such as fluid sloshing, dam breaks, tire aquaplaning and other similar phenomena. Its Lagrangian nature allows tracking fluid interfaces without complex pre-processing tasks or other techniques, for example, mesh-deformation algorithms.<\/p>\n<p><strong>The benefits of GPU<\/strong><\/p>\n<p>In any SPH simulation the fluid is discretized into millions of elements. On CPU hardware, the computations are processed via a limited number of high-performance concurrent cores compared to GPUs partition. In other words, one GPU computes thousands of mathematical operations when a CPU handles a few hundreds. Consequently, the computation time tends to be significantly reduced, allowing the SPH solver to scale more efficiently on GPU hardware due to the SPH algorithm&#8217;s inherently parallel nature.<\/p>\n<p>Memory bandwidth plays an important role on SPH simulations. The SPH elements move and gather information of their neighbors at every time step. This leads to frequent, irregular memory access patterns. CPU can avoid waiting for memory using caches (L1, L2 or L3), but they lack memory bandwidth to handle millions of SPH elements efficiently. On the other hand, GPUs are designed for high workloads with a massive memory bandwidth. For example, comparing Intel Xeon Gold 6542Y and Nvidia H100 the ratio of memory between GPU\/CPU is about 9 times for GPU, but this ratio can be even higher. Therefore, this superior data-transfer capability makes GPU the natural choice for large-scale SPH simulations.<\/p>\n<p><strong>Performance Benchmark<\/strong><\/p>\n<p><strong>Criteria 1: SPH elements<\/strong><\/p>\n<p>To assess the performance of a Rocky SPH simulation we use a vehicle driving through a water puddle (car wading). The fluid is model using 16 millions of SPH elements and all geometries have about a total of 32 million triangles.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-198440 size-full\" src=\"https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/sphimage1.png\" alt=\"\" width=\"619\" height=\"324\" srcset=\"https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/sphimage1.png 619w, https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/sphimage1-300x157.png 300w, https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/sphimage1-24x13.png 24w, https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/sphimage1-36x19.png 36w, https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/sphimage1-48x25.png 48w\" sizes=\"auto, (max-width: 619px) 100vw, 619px\" \/><\/p>\n<p><em>Figure 18 \u2013 SPH car wading simulation.<\/em><\/p>\n<p>&nbsp;<\/p>\n<p><strong>Criteria 2: Processing type<\/strong><\/p>\n<p>Four different processing combinations were evaluated:<\/p>\n<ul>\n<li>CPU: Intel(R) Xeon(R) Gold 6542Y CPU @ 2.90 GHz on 48 cores<\/li>\n<li>1 GPU: NVIDIA H100, NVIDIA A100, NVIDIA L40<\/li>\n<li>2 GPUs: NVIDIA H100, NVIDIA A100, NVIDIA L40<\/li>\n<\/ul>\n<p><strong>Criteria 3: Performance measurement<\/strong><\/p>\n<p>Two measurements were taken at steady state to evaluate performance:<\/p>\n<ul>\n<li><strong>Simulation Pace (speed up)<\/strong>, which is the amount of hardware processing time (duration) required to advance the simulation two seconds. In general, a lower simulation pace indicates faster processing. The simulation speed up metric is used considering the CPU pace as reference.<\/li>\n<li><strong>GPU Memory Usage<\/strong>, which is the amount of memory being used on the GPU while processing the simulation. In general, a lower memory usage allows for more particles to be processed, and\/or more calculations to be performed.<\/li>\n<\/ul>\n<p><strong>Benchmark results for Ansys Rocky 2026 R1 \u2013 SPH Case<\/strong><\/p>\n<p><strong>Relevant conclusions on simulation performance<\/strong><\/p>\n<p>Figure 19 shows the performance speed-up for the IISPH solver in Ansys Rocky.<\/p>\n<ul>\n<li>The results show a huge performance gain of running a SPH simulation in a GPU. In the worst-case scenario, compared to a run in 48 CPU cores, you can reduce your simulation pace with one GPU in 23 times, approximately.<\/li>\n<li>The multi-GPU results also highlight the benefits of running SPH in a GPU, with two GPUs you can achieve a time reduction about 65 times compared with a 48 CPU cores.<\/li>\n<\/ul>\n<p><a href=\"https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/rocky_gpu_26r1_13.svg\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-198441 size-full\" src=\"https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/rocky_gpu_26r1_13.svg\" alt=\"\" width=\"778\" height=\"371\" \/><\/a><\/p>\n<p><em>Figure 19 &#8211; GPU speed-up based upon Simulation Pace (compared with CPU 48x cores) achieved for the SPH simulation.<\/em><\/p>\n<p><strong>Relevant conclusions on GPU memory consumption<\/strong><\/p>\n<p>Figure 20 shows the GPU memory usage for the SPH simulation within Ansys Rocky.<\/p>\n<ul>\n<li>A SPH simulation with 16 million SPH elements can be performed in just one GPU. Theoretically, it will be possible to run a case around 35 million SPH elements in 80 GB card, such as NVIDIA H100 NVL and NVIDIA A100 80GB PCIe.<\/li>\n<li>The maximum GPU memory consumption is about 27 GB in two GPUs. This result indicates that the SPH solver inside Rocky can be employed to analyze many complex problems.<\/li>\n<\/ul>\n<p><a href=\"https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/rocky_gpu_26r1_14.svg\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-198442 size-full\" src=\"https:\/\/innovationspace.ansys.com\/knowledge\/wp-content\/uploads\/sites\/4\/2024\/07\/rocky_gpu_26r1_14.svg\" alt=\"\" width=\"804\" height=\"374\" \/><\/a><\/p>\n<p><em>Figure 20 \u2013 Total GPU memory consumption for the SPH simulation.<\/em><\/p>\n<p>&nbsp;<\/p>\n<p class=\"Paragraph SCXW234597759 BCX0\" lang=\"EN-US\" style=\"text-align: right\" xml:lang=\"EN-US\"><span class=\"EOP SCXW234597759 BCX0\" data-ccp-props=\"{\">Ansys Rocky\u2122 particle dynamics simulation software<\/span><\/p>\n<p class=\"Paragraph SCXW234597759 BCX0\" lang=\"EN-US\" style=\"text-align: right\" xml:lang=\"EN-US\"><span class=\"EOP SCXW234597759 BCX0\" data-ccp-props=\"{\">Learn more about Rocky software in the <a href=\"https:\/\/innovationspace.ansys.com\/ais-rocky\/\" rel=\"nofollow external noopener noreferrer\" data-wpel-link=\"external\">Ansys Rocky\u00a0 Innovation Space<\/a>.<\/span><\/p>\n","protected":false},"template":"","class_list":["post-182748","topic","type-topic","status-publish","hentry"],"aioseo_notices":[],"acf":[],"custom_fields":[{"0":{"_edit_lock":["1777996519:17114"],"_edit_last":["17114"],"application_name":[""],"_application_name":["field_64a80903c8e15"],"filter_by_optics_product":["Lumerical"],"_filter_by_optics_product":["field_64fb192ba3121"],"family":["Fluids"],"_family":["field_64a809229a857"],"siebel_km_number":[""],"_siebel_km_number":["field_63ecbffce60db"],"salesforce_km_number":[""],"_salesforce_km_number":["field_63ecc018e60dc"],"km_published_date":[""],"_km_published_date":["field_64c77704499dd"],"product_version":[""],"_product_version":["field_64c776cb4fd2e"],"_bbp_forum_id":["180025"],"_bbp_topic_id":["198472"],"_bbp_author_ip":["192.104.24.225"],"_bbp_last_reply_id":["0"],"_bbp_last_active_id":["182751"],"_bbp_last_active_time":["2024-07-11 14:03:34"],"_bbp_reply_count":["0"],"_bbp_reply_count_hidden":["0"],"_bbp_voice_count":["0"],"_yoast_wpseo_content_score":["90"],"_yoast_wpseo_estimated-reading-time-minutes":["14"],"_btv_view_count":["27613"],"_yoast_wpseo_wordproof_timestamp":[""],"_aioseo_title":[null],"_aioseo_description":[null],"_aioseo_keywords":["a:0:{}"],"_aioseo_og_title":[null],"_aioseo_og_description":[null],"_aioseo_og_article_section":[""],"_aioseo_og_article_tags":["a:0:{}"],"_aioseo_twitter_title":[null],"_aioseo_twitter_description":[null],"_bbp_likes_count":["2"]},"test":"articlesansys-com"}],"_links":{"self":[{"href":"https:\/\/innovationspace.ansys.com\/knowledge\/wp-json\/wp\/v2\/topics\/182748","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/innovationspace.ansys.com\/knowledge\/wp-json\/wp\/v2\/topics"}],"about":[{"href":"https:\/\/innovationspace.ansys.com\/knowledge\/wp-json\/wp\/v2\/types\/topic"}],"version-history":[{"count":81,"href":"https:\/\/innovationspace.ansys.com\/knowledge\/wp-json\/wp\/v2\/topics\/182748\/revisions"}],"predecessor-version":[{"id":198472,"href":"https:\/\/innovationspace.ansys.com\/knowledge\/wp-json\/wp\/v2\/topics\/182748\/revisions\/198472"}],"wp:attachment":[{"href":"https:\/\/innovationspace.ansys.com\/knowledge\/wp-json\/wp\/v2\/media?parent=182748"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}