Selecting the Best Cloud GPU Service as of January 2024

2024-01-16

3 min read

Summary:

Vast AI is the best on-demand service for individual and academic use, provided that the user is comfortable working with cloud servers.

Google Colab provides a user-friendly web interface (essentially a Jupyter notebook) with a consistently available V100 card at around $0.5/hour.
The threshold for renting reserved servers is high, particularly in terms of pricing and scale.
If feasible, purchasing the actual hardware proves to be more cost-effective in the long run. In such cases, the RTX 4090 is the optimal choice.

Cloud GPU Services

The table below lists the pricing for various cloud GPU services. Please note that the availability of some on-demand services is limited, as indicated by the (*) symbol in the table.

Service	H100	A100 40GB	4090	A6000	V100
Vast AI (on-demand, hourly)		$0.8 - 1.0	$0.4 - 0.6	$0.4 - 0.6	$0.2 - 0.3
Vast AI (reserved, monthly)		$0.7 - 0.8	$0.4 - 0.5	$0.4 - 0.5	$0.2 - 0.3
Lambda Lab (on-demand, hourly)	$2.5 (*)	$1.3 (*)		$0.8 (*)
Lambda Lab (1 year contract)	$2.3 (min 64 GPUs)
Lambda Lab (3 year contract)	$1.9 (min 64 GPUs)
Google Colab (on-demand, hourly)		$1.3 (*)			$0.5
Google Cloud (on-demand, hourly)					$2.5
Microsoft Azure (on-demand, hourly)		$27.2 for 8			$3.1
Amazon AWS (on-demand, hourly)	$98.3 for 8	$32.8 for 8			$3.1

The star symbol * indicates that the service has very limited availability.

Comparison of Key Features

It’s important to note that the user experience varies significantly between Lambda Labs, Vast.ai, and Google Colab:

Feature	Lambda Labs	Vast.ai	Google Colab
Ease of Use	User-friendly	Less straightforward	Very user-friendly
Usability	Pre-configured for AI	SSH / Jupyter access	Intuitive web interface
Availability	Hard to get any GPU	Diverse GPU options	V100 and T4 only
Cost	Affordable	Very affordable	Affordable
Stability	High (cloud standard)	Relatively lower	High but has time limits

GPU Specifications

The performance of GPUs can differ based on specific workloads. For a general comparison in deep learning tasks, refer to the table below:

GPU	FP16 (half)	FP32 (single)
H100 80GB PCIe	3.3	5.3
A100 40GB	2.3	3.5
RTX 4090	2.0	2.9
RTX A6000 48GB	1.5	2.1
V100 16GB	1	1

The numbers represent the relative speed of cards compared to a V100 16GB card. See the Deep Learning GPU Benchmarks from Lambda Lab.

Surprisingly, the NVIDIA 4090 GPU performs nearly as well as an A100 40GB card. The high cost of V100, A100, and H100 is due to their optimization for data-center use.

For a comprehensive discussion, refer to Tim Dettmers’ article: The Best GPUs for Deep Learning - An In-Depth Analysis.

Tim Dettmers’ decision tree for selecting the right GPU for deep learning tasks.

Other Remarks

TPU: Google Colab provides TPU services at $2/hour.

Direct Machine Purchase: When possible, buying and assembling a machine can be an excellent choice.

High GPU Memory Device: For training tasks that require substantial GPU memory, the Mac Studio with the M2 Ultra chip is the most cost-effective choice.