Best GPU Clouds of 2026

Updated May 3, 2026 · 7 picks · live pricing · affiliate disclosure

Open-source model inference with 200+ models and pay-per-token pricing plus fine-tuning.

BEST OVERALL5.7/10$2,040/yr more

Together AI

Open-source model inference with 200+ models and pay-per-token pricing plus fine-tuning.

$5 free credits; cancel-anytime

Try Together AI See full review

How it stacks up

Free $5
vs Replicate marketplace
Tokens $0.10-0.90/1M
vs Modal serverless
H100 $1.49+/hr
vs Lambda raw GPU

Replicate5.6/10

From $200/mo

View

Modal5.4/10

From $30/mo

View

#	Pick	Best for	Starting	Free	Score
1	Together AI	Best open-source model inference with 200+ models and pay-per-token	$200.00/mo	✓	5.7/10
2	Replicate	Best model marketplace with Cog deployment framework	$200.00/mo	✓	5.6/10
3	Modal	Best serverless GPU functions with auto-scaling cold-start	$30.00/mo	✓	5.4/10
4	RunPod	Best community GPU cloud with cheap H100 community spot tier	$5,000.00/mo	✓	5.2/10
5	Lambda Labs	Best overall GPU cloud, mainstream on-demand leader	$25,000.00/mo	—	3.6/10
6	CoreWeave	Best Kubernetes-native enterprise GPU cloud with NVLink and InfiniBand	$100,000.00/mo	—	3.4/10
7	Vast.ai	Best cheapest decentralized GPU marketplace for budget-conscious researchers	—	—	3.2/10

Quick pick by use case

If you only have thirty seconds, find your situation below and skip to that pick.

If You want mainstream on-demand A100 plus H100 with brand recognitionLambda LabsLambda is the largest mainstream GPU cloud; A100 at $1.29/hr; H100 at $2.49/hr; Reserved up to 50 percent off for steady production.If You run variable bursty inference with unpredictable trafficModalModal serverless GPU functions scale to zero between requests; Free $30 credits; Starter $30/mo plus usage; eliminates idle GPU cost.If You run open-source models without managing GPU infrastructureReplicateReplicate model marketplace plus Cog deployment; Pay-as-you-go from $0; A100 at $0.000725/sec; Team at $200/mo for private models.If You train large models with multi-node distributed trainingCoreWeaveCoreWeave Kubernetes-native NVLink plus InfiniBand; A100 at $2.39/hr; H100 at $3.49/hr; Reserved 25-40 percent off for production.If You run fault-tolerant workloads at the cheapest GPU priceVast.aiVast.ai decentralized marketplace; RTX 4090 at $0.18/hr; A100 at $0.79/hr; cheapest GPU access for budget-conscious researchers.If You iterate LLM fine-tuning with cheap H100 community spotRunPodRunPod Community H100 at $1.99/hr cheapest H100 in lineup; Secure A100 at $1.89/hr competitive; serverless endpoints free.

Compare all 7 picks

					Free tier	Top spec
#1Together AI	5.7/10	$200.00/mo	—	$2,040/yr more	✓	Free $5
#2Replicate	5.6/10	$2,500.00/mo	$30,000.00/yr	$29,640/yr more	✓	Free credits
#3Modal	5.4/10	$250.00/mo	—	$2,640/yr more	✓	Free $30 credits
#4RunPod	5.2/10	$5,000.00/mo	$60,000.00/yr	$59,640/yr more	✓	Free serverless
#5Lambda Labs	3.6/10	$50,000.00/mo	$600,000.00/yr	$599,640/yr more	—	A100 $1.29/hr
#6CoreWeave	3.4/10	$250,000.00/mo	$3,000,000.00/yr	$2,999,640/yr more	—	A100 $2.39/hr
#7Vast.ai	3.2/10	—	—	—	—	RTX 4090 $0.18/hr

Together AI

5.7/10$2,040/yr more

Best open-source model inference with 200+ models and pay-per-token

Try Together AI See Together AI alternatives

Open-source model inference with 200+ models and pay-per-token pricing plus fine-tuning.

Plan	Monthly	What you get
Free	Free	$5 free credits with 200+ open-source models and inference API plus fine-tuning.
Pay-as-you-go	Free	$0.10-$0.90 per 1M tokens by model plus GPU instances $1.49+/hr H100.
Pro	$200.00/mo	$200 monthly plus usage with priority queue and custom dedicated instances.
Enterprise	$5,000.00/mo	Reserved H100 plus Together Cluster with SOC 2 and HIPAA available.

Together AI is the open-source model inference platform for teams running Llama, Mixtral, Qwen, and other open-source models without managing GPU infrastructure. Founded in 2022 in San Francisco, Together AI positions around the inference-as-a-service shape with 200+ open-source models plus pay-per-token pricing.

Four tiers serve four buyer profiles. Free ships $5 credits with 200+ models plus inference API plus fine-tuning. Pay-as-you-go ships at $0.10-$0.90 per 1M tokens plus H100 instances at $1.49+/hr. Pro ships at $200 monthly plus usage with priority queue. Enterprise ships custom with reserved H100 plus SOC 2 plus HIPAA available.

The load-bearing wedge is the open-source model inference shape. Where Lambda, CoreWeave, and Modal target raw GPU compute, Together AI targets the use case where teams want to run Llama 3 or Mixtral without provisioning GPUs. The pay-per-token pricing makes inference economics simpler than per-hour GPU costs. The catch is the model-specific pricing variance. For teams running open-source models without infrastructure, Together AI Pay-as-you-go covers the use case better than Lambda raw GPU.

Pros

200+ open-source models available via inference API
Pay-per-token pricing simpler than per-hour GPU economics
Custom fine-tuning managed
H100 instances at $1.49+/hr competitive
SOC 2 plus HIPAA on Enterprise

Cons

Per-token pricing variance by model complicates budgeting
For raw GPU compute, dedicated platforms cover better at high scale

Free $5Tokens $0.10-0.90/1MH100 $1.49+/hr$5 free credits; cancel-anytime

Best for: Teams running open-source models without managing GPU infrastructure. Free $5 credits; Pay-as-you-go from $0/mo; Pro at $200/mo for priority.

Compliance & residency: 8
GPU availability: 9
Setup complexity: 10
Value: 9
Support: 8

Try Together AI

Replicate

5.6/10$29,640/yr more

Best model marketplace with Cog deployment framework

Try Replicate See Replicate alternatives

Model marketplace plus Cog deployment framework for one-click model deployment.

Plan	Monthly	What you get
Free	Free	Free trial credits with pay-per-second model API and public marketplace.
Pay-as-you-go	Free	$0.000725 per second on A100 with no monthly minimum and Cog framework.
Team	$200.00/mo	$200 monthly plus usage with private models and higher rate limits.
Enterprise	$2,500.00/mo	Custom contract with dedicated GPU instances, SOC 2, and audit logs.

Replicate is the model marketplace plus deployment platform for teams running open-source models without infrastructure overhead. Founded in 2019 in San Francisco and backed by Andreessen Horowitz, Replicate positions around the marketplace shape where developers run models from a public catalog or deploy custom models via Cog framework.

Four tiers serve four buyer profiles. Free ships trial credits with pay-per-second model API. Pay-as-you-go ships at $0.000725 per second on A100 with no monthly minimum plus Cog deployment. Team at $200 monthly with private models. Enterprise ships custom with dedicated GPU instances plus SOC 2.

The load-bearing wedge is the marketplace plus Cog framework. Where Lambda and CoreWeave target raw GPU compute, Replicate targets the model-as-a-service shape where teams deploy and run models without managing GPU infrastructure. The Cog framework simplifies model packaging. The catch is the per-second pricing variance. Heavy usage on Replicate can exceed dedicated GPU costs; for steady production, Lambda or CoreWeave reserved cover better. For teams running open-source models without infrastructure overhead, Replicate Pay-as-you-go is the cheapest entry path.

Pros

Public model marketplace with thousands of models
Cog deployment framework simplifies custom models
Pay-per-second pricing eliminates idle cost
Free trial credits for evaluation
SOC 2 plus audit logs on Enterprise

Cons

Heavy usage exceeds dedicated GPU cost economics
No persistent instance option for steady workloads

Free creditsA100 $0.000725/secTeam $200/moFree trial credits; cancel-anytime

Best for: Teams running open-source models without infrastructure overhead. Free credits; Pay-as-you-go from $0/mo; Team at $200/mo for private models.

Compliance & residency: 8
GPU availability: 9
Setup complexity: 10
Value: 8
Support: 7

Try Replicate

Modal

5.4/10$2,640/yr more

Best serverless GPU functions with auto-scaling cold-start

Try Modal See Modal alternatives

Serverless GPU functions with auto-scaling cold-start and $30 monthly free credits.

Plan	Monthly	What you get
Free	Free	$30 monthly free credits with serverless GPU functions for evaluation.
Starter	$30.00/mo	$30 monthly with $1.10/hr A10G and $2.78/hr A100 80GB plus auto-scaling.
Team	$250.00/mo	$250 monthly plus usage with higher concurrency limits and Slack support.
Enterprise	$2,000.00/mo	Custom contract with reserved GPU instances and dedicated CSM.

Modal is the serverless GPU functions platform for developer-friendly auto-scaling workloads. Founded in 2021 in San Francisco by ex-Spotify engineer Erik Bernhardsson, Modal positions around the serverless GPU model where functions scale from zero to thousands of GPUs based on incoming traffic.

Four tiers serve four buyer profiles. The Free tier ships $30 monthly free credits with serverless GPU functions plus pay-as-you-go after credits. The Starter tier ships at $30 monthly with $1.10/hr A10G plus $2.78/hr A100 80GB plus auto-scaling cold-start. The Team tier ships at $250 monthly plus usage with higher concurrency limits plus Slack support. The Enterprise tier ships custom contract with reserved GPU instances plus dedicated CSM.

The load-bearing wedge is the serverless plus dev-friendly shape. Where Lambda, CoreWeave, and RunPod target persistent GPU instances, Modal targets variable bursty workloads where GPUs scale to zero between requests. For inference workloads with variable traffic (RAG, image generation APIs, batch processing), Modal eliminates idle GPU cost entirely. The catch is the cold-start latency. First request to an idle endpoint adds 5-30 seconds; for latency-sensitive workloads, persistent instances (Lambda, RunPod) cover better. For variable bursty inference workloads, Modal Starter at $30/mo plus per-second usage is the cheapest path.

Pros

Serverless GPU functions scale to zero between requests
Free $30 monthly credits for evaluation
A100 80GB at $2.78/hr competitive with hourly platforms
Auto-scaling cold-start handles variable traffic
Developer-friendly Python SDK

Cons

Cold-start adds 5-30s on first request to idle endpoint
Persistent workloads cost more than hourly platforms

Free $30 creditsStarter $30/moA100 $2.78/hr$30 monthly free credits permanent

Best for: Developers running variable bursty GPU workloads. Free $30 credits; Starter $30/mo plus usage; Team for higher concurrency.

Compliance & residency: 8
GPU availability: 9
Setup complexity: 10
Value: 8
Support: 8

Try Modal

RunPod

5.2/10$59,640/yr more

Best community GPU cloud with cheap H100 community spot tier

Try RunPod See RunPod alternatives

Community GPU cloud with Secure Cloud A100 at $1.89/hr plus cheap Community Cloud H100 spot.

Plan	Monthly	Annual	What you get
Community Free	Free	—	Free tier credits with serverless endpoints and community templates.
Secure Cloud A100	Free	—	$1.89/hr A100 80GB with persistent volumes and public network IP option.
Community Cloud H100	Free	—	$1.99/hr H100 80GB community spot tier; less reliable but cheapest H100.
Reserved	$5,000.00/mo	$60,000.00/yr	Multi-month reservation with ~30% off on-demand and dedicated GPU pool.

RunPod is the community GPU cloud platform for teams wanting cheap H100 access plus serverless endpoints. Founded in 2022, RunPod positions between Lambda's mainstream pricing and Vast.ai's bid-based marketplace with two-tier offering: Secure Cloud (datacenter-grade) plus Community Cloud (lower-cost spot).

Four tiers serve four buyer profiles. The Community Free tier ships free credits with serverless endpoints plus community templates. The Secure Cloud A100 tier ships at $1.89/hr A100 80GB with persistent volumes plus public network IP option. The Community Cloud H100 tier ships at $1.99/hr H100 80GB community spot tier; less reliable than Secure but the cheapest H100 available. The Reserved tier ships custom multi-month with ~30 percent off on-demand.

The load-bearing wedge is the cheap H100 community spot tier. Where Lambda H100 is $2.49/hr and CoreWeave H100 is $3.49/hr, RunPod Community Cloud H100 at $1.99/hr is the cheapest H100 access for fault-tolerant workloads. For teams iterating on LLM fine-tuning or stable diffusion training where checkpoints save work between interruptions, RunPod Community covers the use case at significant cost savings. The catch is the community-tier reliability. For production inference, Secure Cloud A100 at $1.89/hr is competitive with Lambda.

Pros

Community Cloud H100 at $1.99/hr cheapest H100 in lineup
Secure Cloud A100 at $1.89/hr competitive with Lambda
Free serverless endpoint tier
Reserved 30% off on-demand for steady production
Persistent volumes plus public network IP option

Cons

Community Cloud less reliable than Secure
No SOC 2 compliance; not for regulated workloads

Free serverlessSecure A100 $1.89/hrComm H100 $1.99/hrCommunity Free permanent; cancel-anytime

Best for: Teams iterating on LLM fine-tuning with checkpoints. Community Free for testing; Secure A100 at $1.89/hr; Community H100 at $1.99/hr.

Compliance & residency: 7
GPU availability: 8
Setup complexity: 8
Value: 9
Support: 7

Try RunPod

Lambda Labs

3.6/10$599,640/yr more

Best overall GPU cloud, mainstream on-demand leader

Try Lambda Labs See Lambda Labs alternatives

Largest mainstream GPU cloud with on-demand A100 at $1.29/hr and the deepest brand recognition.

Plan	Monthly	Annual	What you get
On-Demand 1x A100	Free	—	$1.29/hr A100 40GB on-demand with pay-per-minute billing and storage.
On-Demand H100	Free	—	$2.49/hr H100 80GB SXM5 with cloud and 1-Click Clusters available.
1-Click Cluster	$50,000.00/mo	$600,000.00/yr	Custom contract for 16-1024 GPU clusters with InfiniBand networking.
Reserved	$25,000.00/mo	$300,000.00/yr	Multi-year reservation with up to 50% discount and dedicated GPU pool.

Lambda Labs is the default GPU cloud for most paid ML teams. Founded in 2012 in San Francisco by brothers Stephen and Michael Balaban, Lambda serves the largest mainstream GPU cloud market with the deepest brand recognition for ML training plus inference.

Four tiers serve four buyer profiles. The On-Demand 1x A100 tier ships at $1.29/hr A100 40GB with pay-per-minute billing plus persistent storage. The On-Demand H100 tier ships at $2.49/hr H100 80GB SXM5 with cloud and 1-Click Clusters. The 1-Click Cluster tier ships custom contract for 16-1024 GPU clusters with InfiniBand networking. The Reserved tier ships custom multi-year reservation with up to 50 percent discount versus on-demand.

The load-bearing wedge is mainstream brand recognition plus the historic on-demand baseline. Lambda set the $1.29/hr A100 baseline that competitors (CoreWeave, RunPod) anchor against. The catch is the no-free-tier model. Where Modal, Replicate, and Vast.ai offer free credits or trials, Lambda requires immediate billing. For mainstream production ML teams wanting reliable on-demand A100 plus H100 with reserved discounts at scale, Lambda Labs covers the use case better than CoreWeave at lower entry friction or RunPod at higher reliability.

Pros

Largest mainstream brand for ML GPU cloud
On-Demand A100 at $1.29/hr historic baseline
On-Demand H100 at $2.49/hr competitive
Pay-per-minute billing for variable use
Reserved discounts up to 50 percent for steady production

Cons

No free tier; immediate billing required
Smaller serverless option vs Modal

A100 $1.29/hrH100 $2.49/hrReserved -50%No free tier; pay-per-minute billing

Best for: Mainstream ML teams wanting reliable on-demand A100 plus H100. On-Demand A100 at $1.29/hr; H100 at $2.49/hr; Reserved for production.

Compliance & residency: 9
GPU availability: 9
Setup complexity: 9
Value: 9
Support: 8

Try Lambda Labs

CoreWeave

3.4/10$2,999,640/yr more

Best Kubernetes-native enterprise GPU cloud with NVLink and InfiniBand

Try CoreWeave See CoreWeave alternatives

Kubernetes-native enterprise GPU with NVLink and InfiniBand for large-scale training.

Plan	Monthly	Annual	What you get
On-Demand A100	Free	—	$2.39/hr A100 80GB SXM4 with Kubernetes-native object storage and networking.
On-Demand H100	Free	—	$3.49/hr H100 80GB SXM5 with NVLink, InfiniBand, and bare-metal options.
Reserved 1-year	$100,000.00/mo	$1,200,000.00/yr	25-40% off on-demand with reserved A100/H100 pool and dedicated CSM.
Enterprise	$250,000.00/mo	$3,000,000.00/yr	Multi-year reserved clusters with NVLink, on-prem hybrid, and private cloud.

CoreWeave is the Kubernetes-native enterprise GPU cloud for large-scale ML training. Founded in 2017 in New Jersey and IPO'd in March 2025 (the largest tech IPO of 2025), CoreWeave serves the enterprise GPU market with deep Kubernetes integration plus NVLink plus InfiniBand for multi-node distributed training.

Four tiers serve four buyer profiles. The On-Demand A100 tier ships at ~$2.39/hr A100 80GB SXM4 with Kubernetes-native object storage plus networking. The On-Demand H100 tier ships at ~$3.49/hr H100 80GB SXM5 with NVLink plus InfiniBand plus bare-metal options. The Reserved 1-year tier ships ~25-40 percent off on-demand with reserved A100/H100 pool. The Enterprise tier ships custom multi-year with reserved clusters plus on-prem hybrid.

The load-bearing wedge is the Kubernetes-native plus NVLink shape. Where Lambda targets single-node on-demand and Modal targets serverless, CoreWeave targets multi-node distributed training where InfiniBand networking is load-bearing for gradient synchronization at scale. For training large language models or computer vision at scale, CoreWeave plus reserved capacity covers the use case better than other cloud GPU providers. The catch is the institutional pricing. On-demand is more expensive than Lambda; reserved requires multi-year commitment. For enterprise ML teams running multi-node distributed training, CoreWeave is the historic gold standard.

Pros

Kubernetes-native deep integration
NVLink plus InfiniBand for multi-node distributed training
Reserved 1-year tier 25-40 percent off on-demand
Bare-metal options for performance-critical workloads
On-prem hybrid plus private cloud on Enterprise

Cons

On-demand more expensive than Lambda Labs
Reserved requires multi-year commitment

A100 $2.39/hrH100 $3.49/hrReserved -25%No free tier; institutional contract for Reserved

Best for: Enterprise ML teams running multi-node distributed training with NVLink. On-Demand A100 at $2.39/hr; H100 at $3.49/hr; Reserved 25-40% off.

Compliance & residency: 9
GPU availability: 10
Setup complexity: 7
Value: 7
Support: 9

Try CoreWeave

Vast.ai

3.2/10

Best cheapest decentralized GPU marketplace for budget-conscious researchers

Try Vast.ai See Vast.ai alternatives

Decentralized GPU marketplace with bid-based pricing; cheapest GPU access in lineup.

Plan	Monthly	What you get
Interruptible RTX 4090	Free	$0.18/hr RTX 4090 on the decentralized GPU marketplace; cheapest in lineup.
On-Demand A100	Free	$0.79/hr A100 with bid-based marketplace pricing and persistent volumes.
Reserved H100	Free	$1.65/hr H100 with datacenter-tier hosts and verified-host filtering.

Vast.ai is the decentralized GPU marketplace for budget-conscious researchers and fault-tolerant workloads. Founded in 2018 in San Francisco, Vast.ai positions around the marketplace shape where individual hosts (datacenter operators, crypto miners pivoting to AI, individual GPU owners) list capacity at bid-based prices.

Three tiers serve three buyer profiles. The Interruptible RTX 4090 tier ships at ~$0.18/hr with the lowest cost in this lineup; instances may be reclaimed if outbid. The On-Demand A100 tier ships at ~$0.79/hr with bid-based marketplace pricing plus persistent volume option. The Reserved H100 tier ships at ~$1.65/hr with datacenter-tier hosts plus verified-host filtering.

The load-bearing wedge is the cheapest GPU access. Where Lambda, CoreWeave, and Modal price A100 at $1.29-$2.78/hr, Vast.ai prices A100 at ~$0.79/hr (40-70 percent cheaper). For fault-tolerant research workloads (model fine-tuning with checkpoints, hyperparameter sweeps, batch inference), Vast.ai cuts costs dramatically. The catch is the reliability variance. Hosts vary in quality; verified-host filtering helps but does not eliminate risk. For budget-conscious researchers running fault-tolerant workloads, Vast.ai Interruptible at $0.18/hr is the cheapest path; for higher reliability needs, RunPod Secure or Lambda cover better.

Pros

Cheapest GPU access in lineup (RTX 4090 at $0.18/hr)
A100 at ~$0.79/hr 40-70% cheaper than mainstream
Bid-based marketplace pricing competitive across hosts
Verified-host filtering for datacenter-tier hosts
Persistent volume option for stateful workloads

Cons

Reliability varies; interruptible instances may be reclaimed
No SOC 2 compliance; not for regulated workloads

RTX 4090 $0.18/hrA100 $0.79/hrH100 $1.65/hrNo free tier; pay-as-you-go bid-based

Best for: Budget-conscious researchers running fault-tolerant workloads. Interruptible RTX 4090 at $0.18/hr; On-Demand A100 at $0.79/hr.

Compliance & residency: 5
GPU availability: 7
Setup complexity: 6
Value: 10
Support: 5

Try Vast.ai

How we picked

Each pick gets a transparent composite score from price, features, free-tier availability, and editor fit. Pricing flows from our live database, so when a vendor changes prices the score updates here too.

We weight price 40 percent, features 30, free tier 15, and fit 15. Utilization rate determines cost more than hourly rate; pay-as-you-go and serverless eliminate idle cost. Spot tiers cut costs in half for fault-tolerant workloads. Reserved multi-year contracts save 25-50 percent for steady production.

40%
Price
Cheaper relative to category average ranks higher.
30%
Features
How many of the category-specific features the pick claims.
15%
Free tier
A free tier earns full points; no free tier earns zero.
15%
Editor fit
How well a GPU cloud platform fits a head-term ML team: mainstream brand recognition for hiring and ecosystem, GPU SKU availability (A100, H100, InfiniBand), serverless vs hourly vs reserved fit for workload pattern, compliance posture (SOC 2, HIPAA), and price-fit relative to utilization rate and spot tolerance.

We don't claim "30,000 hours of testing." Our methodology is the formula above plus the editor's published verdict for each pick. Verifiable, auditable, and updated when the underlying data changes.

Why trust Subrupt

We're a subscription tracker first, a buying guide second. Every claim on this page is something you can check.

Live pricing. Prices come from our own database, refreshed as vendors update them. When a price moves, the composite score moves with it.
Public methodology. The score is a published formula, not a vibe. The weights are listed right above this block, and you can recompute them yourself.
Honest savings math. Savings are computed against a category baseline, not against the vendor's own list price. We don't inflate the headline.
Affiliate disclosure on every page. When we earn a commission we say so. The editor's pick order is decided by the score, not by who pays the most.

By use case

Best overall GPU cloud

Lambda Labs

Read the full review →

Try Lambda Labs

Best serverless GPU functions

Modal

Read the full review →

Try Modal

Best model marketplace plus deployment

Replicate

Read the full review →

Try Replicate

Best Kubernetes-native enterprise GPU

CoreWeave

Read the full review →

Try CoreWeave

Best cheapest decentralized GPU marketplace

Vast.ai

Read the full review →

Try Vast.ai

Didn't make the list

Vast.ai

Already in picks (fifth) but worth flagging the cheapest GPU access; RTX 4090 at $0.18/hr and A100 at $0.79/hr cut GPU costs 40-70 percent vs mainstream for fault-tolerant workloads.

RunPod

Already in picks (sixth) but worth flagging the cheap H100 community tier; Community Cloud H100 at $1.99/hr is the cheapest H100 in lineup vs Lambda $2.49/hr or CoreWeave $3.49/hr.

Modal

Already in picks (second) but worth flagging the serverless model; eliminates idle GPU cost entirely for variable bursty workloads, hard to beat for inference APIs with unpredictable traffic.

Together AI

Already in picks (seventh) but worth flagging the model-as-a-service economics; for under 5B monthly tokens on open-source models, pay-per-token simpler and cheaper than self-hosting.

How to choose your GPU Cloud

Seven product shapes compete for one head term

The 'best GPU cloud' search covers seven shapes. Mainstream GPU cloud (Lambda Labs) targets ML teams wanting reliable on-demand A100 plus H100 with brand recognition. Serverless GPU functions (Modal) targets variable bursty workloads. Model marketplace (Replicate) targets teams running open-source models without infrastructure. Open-source model inference (Together AI) targets teams running Llama, Mixtral via API. Kubernetes-native enterprise (CoreWeave) targets multi-node distributed training. Community GPU cloud (RunPod) targets fault-tolerant training with cheap H100 spot. Decentralized marketplace (Vast.ai) targets budget-conscious researchers. The honest framework: identify your workload pattern before subscribing. Persistent steady production uses Lambda or CoreWeave Reserved; variable bursty uses Modal serverless; model-as-a-service uses Replicate or Together AI; multi-node training uses CoreWeave; fault-tolerant cheap iteration uses Vast.ai or RunPod Community.

Utilization rate matters more than hourly rate

GPU utilization rate determines total cost more than hourly pricing. An idle GPU at $1.29/hr costs the same as a fully-used one; a $1.29/hr A100 used 25 percent of the time costs $5.16 per useful hour. The honest framework: estimate your monthly utilization rate before picking a vendor. For under 25 percent utilization (variable bursty workloads), serverless or pay-as-you-go (Modal, Replicate, Together AI) eliminates idle cost. For 25-75 percent utilization (steady but not always-on), hourly on-demand (Lambda, CoreWeave) is the realistic baseline. For over 75 percent utilization (steady production), reserved discounts (Lambda 50 percent off, CoreWeave 25-40 percent off) pay back within months. Quarterly utilization audit: track actual GPU-hours versus paid GPU-hours; if utilization dropped below 25 percent, evaluate switching to serverless or pay-as-you-go.

Spot tiers: when fault tolerance enables half-price GPU

Spot and interruptible tiers (Vast.ai Interruptible, RunPod Community Cloud) cut GPU costs in half for fault-tolerant workloads. Spot pricing works because providers monetize otherwise-idle capacity that may need to be reclaimed. The honest framework: use spot when (1) workload is fault-tolerant (model training with checkpoints, batch inference with retry, hyperparameter sweeps), (2) you can save state between interruptions, (3) cost savings outweigh the 5-20 percent productivity loss from interruptions. Avoid spot when (1) workload is latency-critical (production inference, real-time training), (2) you cannot save state efficiently, (3) reliability is load-bearing. For LLM fine-tuning with checkpointing, spot tiers reliably save 40-60 percent vs on-demand. RunPod Community H100 at $1.99/hr vs Lambda H100 at $2.49/hr saves 20 percent; Vast.ai A100 at $0.79/hr vs Lambda A100 at $1.29/hr saves 39 percent.

Serverless vs hourly vs reserved: matching to workload pattern

Pricing model selection matters more than per-hour rate. Serverless GPU functions (Modal) bill by GPU-second with auto-scaling to zero; great for variable bursty workloads, expensive for steady. Pay-as-you-go (Replicate, Together AI) bills by per-second or per-token; great for unknown workloads, surprise risk at scale. Hourly on-demand (Lambda, CoreWeave, RunPod Secure) bills per minute with persistent instances; great for development and steady inference. Reserved (Lambda, CoreWeave) bills monthly or yearly with deep discounts; great for production with known capacity needs. The honest framework: match pricing model to workload. Variable bursty inference uses serverless; development uses hourly on-demand; steady production training uses reserved. Most teams use multiple models simultaneously: hourly for development, reserved for production, serverless for inference APIs.

Multi-node distributed training: when InfiniBand matters

InfiniBand networking matters when training models large enough to require multi-node distributed training. Single-node training (single A100 or single 8x H100 server) covers most fine-tuning and small-model training. Multi-node distributed training (16+ GPUs across multiple servers) requires high-bandwidth low-latency networking for gradient synchronization; standard Ethernet networking adds significant overhead at scale. The honest framework: InfiniBand matters when (1) model size exceeds single-server GPU memory (typically 1B+ parameter dense models, 70B+ MoE models), (2) you train from scratch rather than fine-tune, (3) training time is critical. CoreWeave NVLink plus InfiniBand and Lambda 1-Click Cluster InfiniBand are the two production options. For most fine-tuning and inference workloads, InfiniBand is overkill; single-node A100 or H100 covers the use case at lower cost.

Open-source model inference: when Together AI beats raw GPU

Open-source model inference platforms (Together AI, Replicate) beat raw GPU cloud for teams running Llama, Mixtral, Qwen via API rather than self-hosting. The math: running Llama 3 70B inference on Together AI at $0.88 per 1M tokens output processes 100M tokens monthly for $88. Self-hosting Llama 3 70B on Lambda H100 at $2.49/hr requires ~24/7 instance ($1800/mo) for moderate throughput. The honest framework: open-source inference platforms pay off when (1) your monthly token volume is under 5B tokens, (2) you do not need custom model modifications, (3) you want pay-per-token pricing simplicity. For higher token volumes or custom models, dedicated GPU instances (Lambda, CoreWeave) cover better. For 5-10B monthly tokens, evaluate Together AI Pay-as-you-go vs reserved GPU instances; the breakeven shifts based on model size and reservation discounts.

Frequently asked questions

Are these prices guaranteed not to change?

Vendor pricing changes regularly. Rates here are what each vendor advertises in May 2026. Lambda Labs A100 at $1.29/hr stable. Modal A100 at $2.78/hr stable. Replicate A100 at $0.000725/sec stable. Together AI tokens at $0.10-$0.90/1M stable. CoreWeave A100 at $2.39/hr stable. RunPod Secure A100 at $1.89/hr stable. Vast.ai bid-based pricing varies; A100 ~$0.79/hr typical. Verify current rates on the vendor site.

Does Subrupt earn a commission from any of these picks?

We track which picks have approved affiliate programs in our database, and the FTC disclosure block at the top of every guide names which ones currently have a click-tracking partnership. Affiliate revenue does not change ranking. The composite math runs against the same weights for every pick regardless of partnership.

Why is Lambda Labs ranked first instead of cheapest Vast.ai?

Lambda wins both mainstream brand-recognition consensus across TechCrunch, Latent Space, and AI engineering newsletters AND uniquely-true on the mainstream-GPU-cloud flag in our composite math. Vast.ai is composite-cheapest at $0.18/hr RTX 4090 and wins the cheapest-decentralized wedge, but reliability variance makes it unsuitable for production. The editorial picks-array order leads with the most-recognized reliable GPU cloud brand.

Should I use spot or on-demand GPU?

Spot for fault-tolerant workloads with checkpoints; on-demand for production. Spot tiers (Vast.ai Interruptible, RunPod Community) cut costs 40-60 percent but instances may be reclaimed. For LLM fine-tuning with checkpointing, batch inference with retry, or hyperparameter sweeps, spot reliably saves money. For latency-critical production inference or training without checkpoints, on-demand is required. Most teams use both: spot for development and experiments; on-demand for production.

When does serverless GPU beat hourly?

When utilization rate is under 25 percent. Modal serverless GPU functions scale to zero between requests; you pay only for GPU-seconds actually used. For variable bursty inference workloads (RAG APIs, image generation, batch processing), serverless eliminates idle GPU cost. For steady production (always-on inference, ongoing training), hourly on-demand or reserved cover better. The cost crossover is around 25 percent utilization; below that, serverless wins; above, hourly wins.

Should I run Llama 3 on Together AI or self-host on Lambda?

Together AI for under 5B monthly tokens; self-host for higher volumes. Math: Llama 3 70B inference at $0.88 per 1M output tokens processes 100M tokens monthly for $88. Self-hosting on Lambda H100 ($2.49/hr) requires ~24/7 instance ($1800/mo) for moderate throughput. Together AI economics break around 5-10B monthly tokens depending on model size. For lower volumes or pay-per-token simplicity, Together AI; for higher volumes or custom model modifications, self-host.

When does CoreWeave beat Lambda for enterprise?

When you need multi-node distributed training with InfiniBand. Lambda 1-Click Cluster ships InfiniBand for 16-1024 GPU clusters; CoreWeave ships Kubernetes-native NVLink plus InfiniBand for multi-node training as the default. For training large models from scratch (1B+ parameter dense, 70B+ MoE), CoreWeave Kubernetes-native deployment plus NVLink is the historic gold standard. For single-node training and inference, Lambda is competitive at lower entry friction.

How do I cancel a GPU cloud subscription?

Hourly platforms (Lambda, CoreWeave, RunPod Secure) cancel by stopping instances; persistent storage continues billing until deleted. Pay-as-you-go (Replicate, Modal, Together AI, Vast.ai) cancel by stopping API usage; storage continues billing. Reserved contracts require negotiation through enterprise procurement; many include early termination clauses. Always export training checkpoints and model weights before cancellation; some platforms purge data 30-90 days after cancellation.

When does this guide get updated?

We aim to refresh /best/ guides quarterly when there are no major shifts, and immediately when there are. Major triggers: vendor pricing changes (rates stable through 2025-2026 with H100 generation), new entrants (Crusoe, FluidStack gaining adoption), new GPU SKUs (B200 availability shifts pricing), and major customer migrations. The lastReviewed date at the top reflects the most recent editorial sweep.

Subrupt Editorial

The team behind subrupt.com. We track subscriptions, surface cheaper alternatives, and publish buying guides where the score formula is on the page so you can recompute it yourself. We do not claim 30,000 hours of testing. What we claim is live pricing from our database, a transparent composite score, and honest savings math against a category baseline.

Last reviewed May 3, 2026

Citations

Affiliate disclosure: Subrupt earns a commission when you switch to a service through our recommendation links. This never changes the price you pay. We only recommend services where there's a real cost or feature advantage for you, and our picks are based on the data on this page, not on which programs pay the most.

Related buying guides

Buying guide

Best Threat Intelligence Platforms of 2026

Read guide

Buying guide

Best VPNs of 2026

Read guide

Buying guide

Best Free VPNs of 2026

Read guide

Track your subscriptions on Subrupt

Add the GPU Cloud you pay for and see how much you'd save by switching.

Open dashboard

More buying guides

Independent rankings for the subscriptions worth paying for.

See all guides

Together AI

All picks at a glance

Quick pick by use case

Compare all 7 picks

Pros

Cons

Pros

Cons

Pros

Cons

Pros

Cons

Pros

Cons

Pros

Cons

Pros

Cons

How we picked

Why trust Subrupt

By use case

Best overall GPU cloud

Best serverless GPU functions

Best model marketplace plus deployment

Best Kubernetes-native enterprise GPU

Best cheapest decentralized GPU marketplace

Didn't make the list

How to choose your GPU Cloud

Seven product shapes compete for one head term

Utilization rate matters more than hourly rate

Spot tiers: when fault tolerance enables half-price GPU

Serverless vs hourly vs reserved: matching to workload pattern

Multi-node distributed training: when InfiniBand matters

Open-source model inference: when Together AI beats raw GPU

Frequently asked questions

Related buying guides

Track your subscriptions on Subrupt

More buying guides