in ,

Choosing the Best GPU for Deep Learning in 2020

Choosing the Best GPU for Deep Learning in 2020

State-of-the-art (SOTA) deep learning models have massive memory footprints. Many GPUs don’t have enough VRAM to train them. In this post, we determine which GPUs can train state-of-the-art networks without throwing memory errors. We also benchmark each GPU’s training performance.

TLDR:

The following GPUs can train all SOTA models as of February 2020:

  • RTX 8000: 48 GB of memory, $5,500 MSRP.
  • RTX 6000: 24 GB of memory, $4,000 MSRP.
  • Titan RTX: 24 GB of memory, $2,500 MSRP.

The following GPUs can train most (but not all) SOTA models:

  • RTX 2080 Ti: 11 GB of memory, $1,150 MSRP. *
  • RTX 1080 Ti: 11 GB of memory, $800 refurbished. *
  • RTX 2080: 8 GB of memory, $720 MSRP. *
  • RTX 2070: 8 GB of memory, $500 MSRP. *

The following GPU is not a good fit for training SOTA models:

  • RTX 2060: 6 GB of memory, $359 MSRP.

* Training on these GPUs requires small batch sizes, so expect lower model accuracy because the approximation of a model’s energy landscape will be compromised.

Image models

Maximum batch size before running out of memory

*The GPU does not have enough memory to run the model.

*The GPU does not have enough memory to run the model.

Language models

*The GPU does not have enough memory to run the model.

*The GPU does not have enough memory to run the model.

Results normalized by Quadro RTX 8000

Figure 2. Training throughput normalized against Quadro RTX 8000. Left: image models. Right: Language models.
  • Language models benefit more from larger GPU memory than image models. Note how the right diagram is steeper than the left. This indicates that language models are more memory-bound and image models are more computationally bounded.
  • GPUs with higher VRAM have better performance because using larger batch sizes helps saturate the CUDA cores.
  • GPUs with higher VRAM enable proportionally larger batch sizes. Back-of-the-envelope calculations yield reasonable results: GPUs with 24 GB of VRAM can fit a ~3x larger batches than a GPUs with 8 GB of VRAM.
  • Language models are disproportionately memory intensive for long sequences because attention is quadratic to the sequence length.
  • RTX 2060 (6 GB): if you want to explore deep learning in your spare time.
  • RTX 2070 or 2080 (8 GB): if you are serious about deep learning, but your GPU budget is $600-800. Eight GB of VRAM can fit the majority of models.
  • RTX 2080 Ti (11 GB): if you are serious about deep learning and your GPU budget is ~$1,200. The RTX 2080 Ti is ~40% faster than the RTX 2080.
  • Titan RTX and Quadro RTX 6000 (24 GB): if you are working on SOTA models extensively, but don’t have budget for the future-proofing available with the RTX 8000.
  • Quadro RTX 8000 (48 GB): you are investing in the future and might even be lucky enough to research SOTA deep learning in 2020.

Lambda offers GPU laptops and workstations with GPU configurations ranging from a single RTX 2070 up to 4 Quadro RTX 8000s. Additionally, we offer servers supporting up to 10 Quadro RTX 8000s or 16 Tesla V100 GPUs.

Source: lambdalabs.com

What do you think?

48 points
Upvote Downvote

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Predicting how well neural networks will scale

Predicting how well neural networks will scale

Book Review: Python Machine Learning – Third Edition by Sebastian Raschka, Vahid Mirjalili

Book Review: Python Machine Learning – Third Edition by Sebastian Raschka, Vahid Mirjalili