GPUs You Might Consider for Molecular Dynamics
We list out the possible GPUs you may be considering for MD simulation workloads in AMBER, GROMACS, and NAMD. If you want to learn the reasons why we will go over performance, cost, and limitations/considerations for selecting the right GPU. The prices we listed are safe estimations. It's possible to find these cards cheaper (or more expensive) depending on where you look.
GPU Model | FP32 Perf. | CUDA Cores | Price | Price to FP32 | Price per Core |
---|---|---|---|---|---|
51.20 TFLOPS | 14,592 | $35,000 | $683.59 | $2.40 | |
91.61 TFLOPS | 18,176 | $8,500 | $92.78 | $0.48 | |
91.06 TFLOPS | 18,176 | $7,500 | $82.36 | $0.41 | |
82.58 TFLOPS | 16,384 | $2000 | $24.21 | $0.12 | |
65.28 TFLOPS | 12,800 | $4500 | $68.93 | $0.35 | |
39.63 TFLOPS | 7,680 | $2500 | $75.70 | $0.39 |
FP32 is the standard floating point number format used by most molecular dynamics suites for driving and processing the workloads in an MD simulation. We will take the GPU’s FP32 throughput and its CUDA core count into consideration since these hard numbers are good indications as to the parallel performance of the GPU when running simulations. Read till the end for our top pick.
NVIDIA H100 For Molecular Dynamics?
The short answer is no, the NVIDIA H100 is unnecessary to get the best performance out of your molecular dynamics simulation. The NVIDIA H100 is NVIDIA’s top-of-the-line GPU for powering AI and HPC workloads. It's extremely expensive because the performance it delivers to enterprises that can harness it is much faster than the rest of the product stack. It is sort of a halo product or a hypercar. But do you need a hypercar for molecular dynamics like in AMBER and GROMACS?
A single H100 GPU costs more than a fully built workstation or server. NVIDIA H100, although labeled to be used in HPC, is not the optimal GPU for running molecular dynamics simulation, just like a hypercar is not the best for carving a canyon. Despite being the flagship GPU, NVIDIA H100 does not have the best FP32 performance in the product stack.
The H100 is more purpose-built for AI and FP16 and mixed FP8 precision. Training and inferencing AI can be sped up using these less precise floating point precision formats since the speed of training is more important than the slight deviation in accuracy. The high degree of accuracy can be shed off for faster performance and responsiveness, especially in AI models that you can interact with.
However, if you feel as though your calculations can be more accurate using FP64 or dual precision floating point operation, none of the GPUs, other than the H100, has native FP64 acceleration. But we probably still wouldn’t consider the H100 for just MD workloads.
RTX 4090 for Molecular Dynamics?
Looking at our table, the RTX 4090 has the lowest cost per FP32 throughput ratio out of all the other GPUs. It also has the lowest price per core as well. The RTX 4090 is a high-performance consumer GPU that is great for MD simulation! It is a powerhouse in gaming and productivity workloads as many industries leverage the RTX 4090 as their workstation graphics card of choice for its lower cost and superb performance.
However, the downfall of the RTX 4090 is the scalability. It is hard to deploy a multi-GPU configuration with RTX 4090s without major modifications like water cooling, custom chassis, PCIe risers, etc. Full tower workstations should be able to fit 2 RTX 4090s which would let you run 2 simulations in parallel. The RTX 4090 is a great choice for individual researchers who store their data locally on the workstation. But other GPUs have more scalability.
Best GPU Performance & Scalability - RTX 6000 Ada & RTX 5000 Ada
If you want the fastest throughput with no compromises, we suggest the RTX 6000 Ada. Yes, the L40S is technically faster but only by a hair which won’t necessarily show up in the performance. Not to mention the flexibility of the RTX 6000 Ada, able to be slotted into a workstation or a server. The L40S is a passively cooled GPU that only works on a server.
However, our top pick would be the RTX 5000 Ada. The price is lower, the performance is comparable, and the VRAM is 32GB per GPU which is plenty. While the RTX 5000 Ada is not quite as powerful as the RTX 4090 or the RTX 6000 Ada, the option to have your GPUs in a server greatly benefits the flexibility and scalability of the GPU. Multiple GPUs mean that you can run multiple different simulations on each GPU.
All the things mentioned in the section for RTX 5000 Ada apply to the RTX 6000 Ada, except the fact that these GPUs are pricier but deliver the fastest throughput. If your research is predicated on the speed at which computations are completed, RTX 6000 Ada will get you to the finish line the fastest.
Conclusions
Figuring out the hardware can get complicated and that's why we post a couple of articles detailing the hardware and considerations for configuring your next dream system.
The RTX 5000 Ada is the most bang for your buck GPU for MD simulations, with a great price to performance. You can slot a workstation with 4 GPUs or a server with 8 GPUs. But if you're ever confused when thinking of your computing requirements, contact our team and let us know your workload requirements, budget, and what you're looking for and we can help.