in ,

Choosing the Best GPU for Deep Learning in 2020

Choosing the Best GPU for Deep Learning in 2020

State-of-the-art (SOTA) deep learning models have massive memory footprints. Many GPUs don’t have enough VRAM to train them. In this post, we determine which GPUs can train state-of-the-art networks without throwing memory errors. We also benchmark each GPU’s training performance.

TLDR:

The following GPUs can train all SOTA models as of February 2020:

  • RTX 8000: 48 GB of memory, $5,500 MSRP.
  • RTX 6000: 24 GB of memory, $4,000 MSRP.
  • Titan RTX: 24 GB of memory, $2,500 MSRP.

The following GPUs can train most (but not all) SOTA models:

  • RTX 2080 Ti: 11 GB of memory, $1,150 MSRP. *
  • RTX 1080 Ti: 11 GB of memory, $800 refurbished. *
  • RTX 2080: 8 GB of memory, $720 MSRP. *
  • RTX 2070: 8 GB of memory, $500 MSRP. *

The following GPU is not a good fit for training SOTA models:

  • RTX 2060: 6 GB of memory, $359 MSRP.

* Training on these GPUs requires small batch sizes, so expect lower model accuracy because the approximation of a model’s energy landscape will be compromised.

Image models

Maximum batch size before running out of memory

*The GPU does not have enough memory to run the model.

*The GPU does not have enough memory to run the model.

Language models

*The GPU does not have enough memory to run the model.

*The GPU does not have enough memory to run the model.

Results normalized by Quadro RTX 8000

Figure 2. Training throughput normalized against Quadro RTX 8000. Left: image models. Right: Language models.
  • Language models benefit more from larger GPU memory than image models. Note how the right diagram is steeper than the left. This indicates that language models are more memory-bound and image models are more computationally bounded.
  • GPUs with higher VRAM have better performance because using larger batch sizes helps saturate the CUDA cores.
  • GPUs with higher VRAM enable proportionally larger batch sizes. Back-of-the-envelope calculations yield reasonable results: GPUs with 24 GB of VRAM can fit a ~3x larger batches than a GPUs with 8 GB of VRAM.
  • Language models are disproportionately memory intensive for long sequences because attention is quadratic to the sequence length.
  • RTX 2060 (6 GB): if you want to explore deep learning in your spare time.
  • RTX 2070 or 2080 (8 GB): if you are serious about deep learning, but your GPU budget is $600-800. Eight GB of VRAM can fit the majority of models.
  • RTX 2080 Ti (11 GB): if you are serious about deep learning and your GPU budget is ~$1,200. The RTX 2080 Ti is ~40% faster than the RTX 2080.
  • Titan RTX and Quadro RTX 6000 (24 GB): if you are working on SOTA models extensively, but don’t have budget for the future-proofing available with the RTX 8000.
  • Quadro RTX 8000 (48 GB): you are investing in the future and might even be lucky enough to research SOTA deep learning in 2020.

Lambda offers GPU laptops and workstations with GPU configurations ranging from a single RTX 2070 up to 4 Quadro RTX 8000s. Additionally, we offer servers supporting up to 10 Quadro RTX 8000s or 16 Tesla V100 GPUs.

Source: lambdalabs.com

What do you think?

48 points
Upvote Downvote

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Predicting how well neural networks will scale

Predicting how well neural networks will scale

Book Review: Python Machine Learning – Third Edition by Sebastian Raschka, Vahid Mirjalili

Book Review: Python Machine Learning – Third Edition by Sebastian Raschka, Vahid Mirjalili