Skip to content

Best GPU Clouds of 2026

Updated · 7 picks · live pricing · affiliate disclosure

Open-source model inference with 200+ models and pay-per-token pricing plus fine-tuning.

BEST OVERALL5.7/10$2,040/yr more

Together AI

Open-source model inference with 200+ models and pay-per-token pricing plus fine-tuning.

$5 free credits; cancel-anytime

How it stacks up

  • Free $5

    vs Replicate marketplace

  • Tokens $0.10-0.90/1M

    vs Modal serverless

  • H100 $1.49+/hr

    vs Lambda raw GPU

#2
Replicate5.6/10

From $200/mo

View
#3
Modal5.4/10

From $30/mo

View

All picks at a glance

#PickBest forStartingFreeScore
1Together AIBest open-source model inference with 200+ models and pay-per-token$200.00/mo5.7/10
2ReplicateBest model marketplace with Cog deployment framework$200.00/mo5.6/10
3ModalBest serverless GPU functions with auto-scaling cold-start$30.00/mo5.4/10
4RunPodBest community GPU cloud with cheap H100 community spot tier$5,000.00/mo5.2/10
5Lambda LabsBest overall GPU cloud, mainstream on-demand leader$25,000.00/mo3.6/10
6CoreWeaveBest Kubernetes-native enterprise GPU cloud with NVLink and InfiniBand$100,000.00/mo3.4/10
7Vast.aiBest cheapest decentralized GPU marketplace for budget-conscious researchers3.2/10

Quick pick by use case

If you only have thirty seconds, find your situation below and skip to that pick.

Compare all 7 picks

Free tierTop spec
#1Together AI5.7/10$200.00/mo$2,040/yr moreFree $5
#2Replicate5.6/10$2,500.00/mo$30,000.00/yr$29,640/yr moreFree credits
#3Modal5.4/10$250.00/mo$2,640/yr moreFree $30 credits
#4RunPod5.2/10$5,000.00/mo$60,000.00/yr$59,640/yr moreFree serverless
#5Lambda Labs3.6/10$50,000.00/mo$600,000.00/yr$599,640/yr moreA100 $1.29/hr
#6CoreWeave3.4/10$250,000.00/mo$3,000,000.00/yr$2,999,640/yr moreA100 $2.39/hr
#7Vast.ai3.2/10RTX 4090 $0.18/hr
#1

Together AI

5.7/10$2,040/yr more

Best open-source model inference with 200+ models and pay-per-token

Open-source model inference with 200+ models and pay-per-token pricing plus fine-tuning.

PlanMonthlyWhat you get
FreeFree$5 free credits with 200+ open-source models and inference API plus fine-tuning.
Pay-as-you-goFree$0.10-$0.90 per 1M tokens by model plus GPU instances $1.49+/hr H100.
Pro$200.00/mo$200 monthly plus usage with priority queue and custom dedicated instances.
Enterprise$5,000.00/moReserved H100 plus Together Cluster with SOC 2 and HIPAA available.

Together AI is the open-source model inference platform for teams running Llama, Mixtral, Qwen, and other open-source models without managing GPU infrastructure. Founded in 2022 in San Francisco, Together AI positions around the inference-as-a-service shape with 200+ open-source models plus pay-per-token pricing.

Four tiers serve four buyer profiles. Free ships $5 credits with 200+ models plus inference API plus fine-tuning. Pay-as-you-go ships at $0.10-$0.90 per 1M tokens plus H100 instances at $1.49+/hr. Pro ships at $200 monthly plus usage with priority queue. Enterprise ships custom with reserved H100 plus SOC 2 plus HIPAA available.

The load-bearing wedge is the open-source model inference shape. Where Lambda, CoreWeave, and Modal target raw GPU compute, Together AI targets the use case where teams want to run Llama 3 or Mixtral without provisioning GPUs. The pay-per-token pricing makes inference economics simpler than per-hour GPU costs. The catch is the model-specific pricing variance. For teams running open-source models without infrastructure, Together AI Pay-as-you-go covers the use case better than Lambda raw GPU.

Pros

  • 200+ open-source models available via inference API
  • Pay-per-token pricing simpler than per-hour GPU economics
  • Custom fine-tuning managed
  • H100 instances at $1.49+/hr competitive
  • SOC 2 plus HIPAA on Enterprise

Cons

  • Per-token pricing variance by model complicates budgeting
  • For raw GPU compute, dedicated platforms cover better at high scale
Free $5Tokens $0.10-0.90/1MH100 $1.49+/hr$5 free credits; cancel-anytime

Best for: Teams running open-source models without managing GPU infrastructure. Free $5 credits; Pay-as-you-go from $0/mo; Pro at $200/mo for priority.

Compliance & residency
8
GPU availability
9
Setup complexity
10
Value
9
Support
8
#2

Replicate

5.6/10$29,640/yr more

Best model marketplace with Cog deployment framework

Model marketplace plus Cog deployment framework for one-click model deployment.

PlanMonthlyWhat you get
FreeFreeFree trial credits with pay-per-second model API and public marketplace.
Pay-as-you-goFree$0.000725 per second on A100 with no monthly minimum and Cog framework.
Team$200.00/mo$200 monthly plus usage with private models and higher rate limits.
Enterprise$2,500.00/moCustom contract with dedicated GPU instances, SOC 2, and audit logs.

Replicate is the model marketplace plus deployment platform for teams running open-source models without infrastructure overhead. Founded in 2019 in San Francisco and backed by Andreessen Horowitz, Replicate positions around the marketplace shape where developers run models from a public catalog or deploy custom models via Cog framework.

Four tiers serve four buyer profiles. Free ships trial credits with pay-per-second model API. Pay-as-you-go ships at $0.000725 per second on A100 with no monthly minimum plus Cog deployment. Team at $200 monthly with private models. Enterprise ships custom with dedicated GPU instances plus SOC 2.

The load-bearing wedge is the marketplace plus Cog framework. Where Lambda and CoreWeave target raw GPU compute, Replicate targets the model-as-a-service shape where teams deploy and run models without managing GPU infrastructure. The Cog framework simplifies model packaging. The catch is the per-second pricing variance. Heavy usage on Replicate can exceed dedicated GPU costs; for steady production, Lambda or CoreWeave reserved cover better. For teams running open-source models without infrastructure overhead, Replicate Pay-as-you-go is the cheapest entry path.

Pros

  • Public model marketplace with thousands of models
  • Cog deployment framework simplifies custom models
  • Pay-per-second pricing eliminates idle cost
  • Free trial credits for evaluation
  • SOC 2 plus audit logs on Enterprise

Cons

  • Heavy usage exceeds dedicated GPU cost economics
  • No persistent instance option for steady workloads
Free creditsA100 $0.000725/secTeam $200/moFree trial credits; cancel-anytime

Best for: Teams running open-source models without infrastructure overhead. Free credits; Pay-as-you-go from $0/mo; Team at $200/mo for private models.

Compliance & residency
8
GPU availability
9
Setup complexity
10
Value
8
Support
7
#3

Modal

5.4/10$2,640/yr more

Best serverless GPU functions with auto-scaling cold-start

Serverless GPU functions with auto-scaling cold-start and $30 monthly free credits.

PlanMonthlyWhat you get
FreeFree$30 monthly free credits with serverless GPU functions for evaluation.
Starter$30.00/mo$30 monthly with $1.10/hr A10G and $2.78/hr A100 80GB plus auto-scaling.
Team$250.00/mo$250 monthly plus usage with higher concurrency limits and Slack support.
Enterprise$2,000.00/moCustom contract with reserved GPU instances and dedicated CSM.

Modal is the serverless GPU functions platform for developer-friendly auto-scaling workloads. Founded in 2021 in San Francisco by ex-Spotify engineer Erik Bernhardsson, Modal positions around the serverless GPU model where functions scale from zero to thousands of GPUs based on incoming traffic.

Four tiers serve four buyer profiles. The Free tier ships $30 monthly free credits with serverless GPU functions plus pay-as-you-go after credits. The Starter tier ships at $30 monthly with $1.10/hr A10G plus $2.78/hr A100 80GB plus auto-scaling cold-start. The Team tier ships at $250 monthly plus usage with higher concurrency limits plus Slack support. The Enterprise tier ships custom contract with reserved GPU instances plus dedicated CSM.

The load-bearing wedge is the serverless plus dev-friendly shape. Where Lambda, CoreWeave, and RunPod target persistent GPU instances, Modal targets variable bursty workloads where GPUs scale to zero between requests. For inference workloads with variable traffic (RAG, image generation APIs, batch processing), Modal eliminates idle GPU cost entirely. The catch is the cold-start latency. First request to an idle endpoint adds 5-30 seconds; for latency-sensitive workloads, persistent instances (Lambda, RunPod) cover better. For variable bursty inference workloads, Modal Starter at $30/mo plus per-second usage is the cheapest path.

Pros

  • Serverless GPU functions scale to zero between requests
  • Free $30 monthly credits for evaluation
  • A100 80GB at $2.78/hr competitive with hourly platforms
  • Auto-scaling cold-start handles variable traffic
  • Developer-friendly Python SDK

Cons

  • Cold-start adds 5-30s on first request to idle endpoint
  • Persistent workloads cost more than hourly platforms
Free $30 creditsStarter $30/moA100 $2.78/hr$30 monthly free credits permanent

Best for: Developers running variable bursty GPU workloads. Free $30 credits; Starter $30/mo plus usage; Team for higher concurrency.

Compliance & residency
8
GPU availability
9
Setup complexity
10
Value
8
Support
8
#4

RunPod

5.2/10$59,640/yr more

Best community GPU cloud with cheap H100 community spot tier

Community GPU cloud with Secure Cloud A100 at $1.89/hr plus cheap Community Cloud H100 spot.

PlanMonthlyAnnualWhat you get
Community FreeFreeFree tier credits with serverless endpoints and community templates.
Secure Cloud A100Free$1.89/hr A100 80GB with persistent volumes and public network IP option.
Community Cloud H100Free$1.99/hr H100 80GB community spot tier; less reliable but cheapest H100.
Reserved$5,000.00/mo$60,000.00/yrMulti-month reservation with ~30% off on-demand and dedicated GPU pool.

RunPod is the community GPU cloud platform for teams wanting cheap H100 access plus serverless endpoints. Founded in 2022, RunPod positions between Lambda's mainstream pricing and Vast.ai's bid-based marketplace with two-tier offering: Secure Cloud (datacenter-grade) plus Community Cloud (lower-cost spot).

Four tiers serve four buyer profiles. The Community Free tier ships free credits with serverless endpoints plus community templates. The Secure Cloud A100 tier ships at $1.89/hr A100 80GB with persistent volumes plus public network IP option. The Community Cloud H100 tier ships at $1.99/hr H100 80GB community spot tier; less reliable than Secure but the cheapest H100 available. The Reserved tier ships custom multi-month with ~30 percent off on-demand.

The load-bearing wedge is the cheap H100 community spot tier. Where Lambda H100 is $2.49/hr and CoreWeave H100 is $3.49/hr, RunPod Community Cloud H100 at $1.99/hr is the cheapest H100 access for fault-tolerant workloads. For teams iterating on LLM fine-tuning or stable diffusion training where checkpoints save work between interruptions, RunPod Community covers the use case at significant cost savings. The catch is the community-tier reliability. For production inference, Secure Cloud A100 at $1.89/hr is competitive with Lambda.

Pros

  • Community Cloud H100 at $1.99/hr cheapest H100 in lineup
  • Secure Cloud A100 at $1.89/hr competitive with Lambda
  • Free serverless endpoint tier
  • Reserved 30% off on-demand for steady production
  • Persistent volumes plus public network IP option

Cons

  • Community Cloud less reliable than Secure
  • No SOC 2 compliance; not for regulated workloads
Free serverlessSecure A100 $1.89/hrComm H100 $1.99/hrCommunity Free permanent; cancel-anytime

Best for: Teams iterating on LLM fine-tuning with checkpoints. Community Free for testing; Secure A100 at $1.89/hr; Community H100 at $1.99/hr.

Compliance & residency
7
GPU availability
8
Setup complexity
8
Value
9
Support
7
#5

Lambda Labs

3.6/10$599,640/yr more

Best overall GPU cloud, mainstream on-demand leader

Largest mainstream GPU cloud with on-demand A100 at $1.29/hr and the deepest brand recognition.

PlanMonthlyAnnualWhat you get
On-Demand 1x A100Free$1.29/hr A100 40GB on-demand with pay-per-minute billing and storage.
On-Demand H100Free$2.49/hr H100 80GB SXM5 with cloud and 1-Click Clusters available.
1-Click Cluster$50,000.00/mo$600,000.00/yrCustom contract for 16-1024 GPU clusters with InfiniBand networking.
Reserved$25,000.00/mo$300,000.00/yrMulti-year reservation with up to 50% discount and dedicated GPU pool.

Lambda Labs is the default GPU cloud for most paid ML teams. Founded in 2012 in San Francisco by brothers Stephen and Michael Balaban, Lambda serves the largest mainstream GPU cloud market with the deepest brand recognition for ML training plus inference.

Four tiers serve four buyer profiles. The On-Demand 1x A100 tier ships at $1.29/hr A100 40GB with pay-per-minute billing plus persistent storage. The On-Demand H100 tier ships at $2.49/hr H100 80GB SXM5 with cloud and 1-Click Clusters. The 1-Click Cluster tier ships custom contract for 16-1024 GPU clusters with InfiniBand networking. The Reserved tier ships custom multi-year reservation with up to 50 percent discount versus on-demand.

The load-bearing wedge is mainstream brand recognition plus the historic on-demand baseline. Lambda set the $1.29/hr A100 baseline that competitors (CoreWeave, RunPod) anchor against. The catch is the no-free-tier model. Where Modal, Replicate, and Vast.ai offer free credits or trials, Lambda requires immediate billing. For mainstream production ML teams wanting reliable on-demand A100 plus H100 with reserved discounts at scale, Lambda Labs covers the use case better than CoreWeave at lower entry friction or RunPod at higher reliability.

Pros

  • Largest mainstream brand for ML GPU cloud
  • On-Demand A100 at $1.29/hr historic baseline
  • On-Demand H100 at $2.49/hr competitive
  • Pay-per-minute billing for variable use
  • Reserved discounts up to 50 percent for steady production

Cons

  • No free tier; immediate billing required
  • Smaller serverless option vs Modal
A100 $1.29/hrH100 $2.49/hrReserved -50%No free tier; pay-per-minute billing

Best for: Mainstream ML teams wanting reliable on-demand A100 plus H100. On-Demand A100 at $1.29/hr; H100 at $2.49/hr; Reserved for production.

Compliance & residency
9
GPU availability
9
Setup complexity
9
Value
9
Support
8
#6

CoreWeave

3.4/10$2,999,640/yr more

Best Kubernetes-native enterprise GPU cloud with NVLink and InfiniBand

Kubernetes-native enterprise GPU with NVLink and InfiniBand for large-scale training.

PlanMonthlyAnnualWhat you get
On-Demand A100Free$2.39/hr A100 80GB SXM4 with Kubernetes-native object storage and networking.
On-Demand H100Free$3.49/hr H100 80GB SXM5 with NVLink, InfiniBand, and bare-metal options.
Reserved 1-year$100,000.00/mo$1,200,000.00/yr25-40% off on-demand with reserved A100/H100 pool and dedicated CSM.
Enterprise$250,000.00/mo$3,000,000.00/yrMulti-year reserved clusters with NVLink, on-prem hybrid, and private cloud.

CoreWeave is the Kubernetes-native enterprise GPU cloud for large-scale ML training. Founded in 2017 in New Jersey and IPO'd in March 2025 (the largest tech IPO of 2025), CoreWeave serves the enterprise GPU market with deep Kubernetes integration plus NVLink plus InfiniBand for multi-node distributed training.

Four tiers serve four buyer profiles. The On-Demand A100 tier ships at ~$2.39/hr A100 80GB SXM4 with Kubernetes-native object storage plus networking. The On-Demand H100 tier ships at ~$3.49/hr H100 80GB SXM5 with NVLink plus InfiniBand plus bare-metal options. The Reserved 1-year tier ships ~25-40 percent off on-demand with reserved A100/H100 pool. The Enterprise tier ships custom multi-year with reserved clusters plus on-prem hybrid.

The load-bearing wedge is the Kubernetes-native plus NVLink shape. Where Lambda targets single-node on-demand and Modal targets serverless, CoreWeave targets multi-node distributed training where InfiniBand networking is load-bearing for gradient synchronization at scale. For training large language models or computer vision at scale, CoreWeave plus reserved capacity covers the use case better than other cloud GPU providers. The catch is the institutional pricing. On-demand is more expensive than Lambda; reserved requires multi-year commitment. For enterprise ML teams running multi-node distributed training, CoreWeave is the historic gold standard.

Pros

  • Kubernetes-native deep integration
  • NVLink plus InfiniBand for multi-node distributed training
  • Reserved 1-year tier 25-40 percent off on-demand
  • Bare-metal options for performance-critical workloads
  • On-prem hybrid plus private cloud on Enterprise

Cons

  • On-demand more expensive than Lambda Labs
  • Reserved requires multi-year commitment
A100 $2.39/hrH100 $3.49/hrReserved -25%No free tier; institutional contract for Reserved

Best for: Enterprise ML teams running multi-node distributed training with NVLink. On-Demand A100 at $2.39/hr; H100 at $3.49/hr; Reserved 25-40% off.

Compliance & residency
9
GPU availability
10
Setup complexity
7
Value
7
Support
9
#7

Vast.ai

3.2/10

Best cheapest decentralized GPU marketplace for budget-conscious researchers

Decentralized GPU marketplace with bid-based pricing; cheapest GPU access in lineup.

PlanMonthlyWhat you get
Interruptible RTX 4090Free$0.18/hr RTX 4090 on the decentralized GPU marketplace; cheapest in lineup.
On-Demand A100Free$0.79/hr A100 with bid-based marketplace pricing and persistent volumes.
Reserved H100Free$1.65/hr H100 with datacenter-tier hosts and verified-host filtering.

Vast.ai is the decentralized GPU marketplace for budget-conscious researchers and fault-tolerant workloads. Founded in 2018 in San Francisco, Vast.ai positions around the marketplace shape where individual hosts (datacenter operators, crypto miners pivoting to AI, individual GPU owners) list capacity at bid-based prices.

Three tiers serve three buyer profiles. The Interruptible RTX 4090 tier ships at ~$0.18/hr with the lowest cost in this lineup; instances may be reclaimed if outbid. The On-Demand A100 tier ships at ~$0.79/hr with bid-based marketplace pricing plus persistent volume option. The Reserved H100 tier ships at ~$1.65/hr with datacenter-tier hosts plus verified-host filtering.

The load-bearing wedge is the cheapest GPU access. Where Lambda, CoreWeave, and Modal price A100 at $1.29-$2.78/hr, Vast.ai prices A100 at ~$0.79/hr (40-70 percent cheaper). For fault-tolerant research workloads (model fine-tuning with checkpoints, hyperparameter sweeps, batch inference), Vast.ai cuts costs dramatically. The catch is the reliability variance. Hosts vary in quality; verified-host filtering helps but does not eliminate risk. For budget-conscious researchers running fault-tolerant workloads, Vast.ai Interruptible at $0.18/hr is the cheapest path; for higher reliability needs, RunPod Secure or Lambda cover better.

Pros

  • Cheapest GPU access in lineup (RTX 4090 at $0.18/hr)
  • A100 at ~$0.79/hr 40-70% cheaper than mainstream
  • Bid-based marketplace pricing competitive across hosts
  • Verified-host filtering for datacenter-tier hosts
  • Persistent volume option for stateful workloads

Cons

  • Reliability varies; interruptible instances may be reclaimed
  • No SOC 2 compliance; not for regulated workloads
RTX 4090 $0.18/hrA100 $0.79/hrH100 $1.65/hrNo free tier; pay-as-you-go bid-based

Best for: Budget-conscious researchers running fault-tolerant workloads. Interruptible RTX 4090 at $0.18/hr; On-Demand A100 at $0.79/hr.

Compliance & residency
5
GPU availability
7
Setup complexity
6
Value
10
Support
5

How we picked

Each pick gets a transparent composite score from price, features, free-tier availability, and editor fit. Pricing flows from our live database, so when a vendor changes prices the score updates here too.

We weight price 40 percent, features 30, free tier 15, and fit 15. Utilization rate determines cost more than hourly rate; pay-as-you-go and serverless eliminate idle cost. Spot tiers cut costs in half for fault-tolerant workloads. Reserved multi-year contracts save 25-50 percent for steady production.

We don't claim "30,000 hours of testing." Our methodology is the formula above plus the editor's published verdict for each pick. Verifiable, auditable, and updated when the underlying data changes.

Why trust Subrupt

We're a subscription tracker first, a buying guide second. Every claim on this page is something you can check.

By use case

Best overall GPU cloud

Lambda Labs

Read the full review →

Best serverless GPU functions

Modal

Read the full review →

Best model marketplace plus deployment

Replicate

Read the full review →

Best Kubernetes-native enterprise GPU

CoreWeave

Read the full review →

Best cheapest decentralized GPU marketplace

Vast.ai

Read the full review →

Didn't make the list

Already in picks (fifth) but worth flagging the cheapest GPU access; RTX 4090 at $0.18/hr and A100 at $0.79/hr cut GPU costs 40-70 percent vs mainstream for fault-tolerant workloads.

Already in picks (sixth) but worth flagging the cheap H100 community tier; Community Cloud H100 at $1.99/hr is the cheapest H100 in lineup vs Lambda $2.49/hr or CoreWeave $3.49/hr.

Already in picks (second) but worth flagging the serverless model; eliminates idle GPU cost entirely for variable bursty workloads, hard to beat for inference APIs with unpredictable traffic.

Already in picks (seventh) but worth flagging the model-as-a-service economics; for under 5B monthly tokens on open-source models, pay-per-token simpler and cheaper than self-hosting.

How to choose your GPU Cloud

Seven product shapes compete for one head term

The 'best GPU cloud' search covers seven shapes. Mainstream GPU cloud (Lambda Labs) targets ML teams wanting reliable on-demand A100 plus H100 with brand recognition. Serverless GPU functions (Modal) targets variable bursty workloads. Model marketplace (Replicate) targets teams running open-source models without infrastructure. Open-source model inference (Together AI) targets teams running Llama, Mixtral via API. Kubernetes-native enterprise (CoreWeave) targets multi-node distributed training. Community GPU cloud (RunPod) targets fault-tolerant training with cheap H100 spot. Decentralized marketplace (Vast.ai) targets budget-conscious researchers. The honest framework: identify your workload pattern before subscribing. Persistent steady production uses Lambda or CoreWeave Reserved; variable bursty uses Modal serverless; model-as-a-service uses Replicate or Together AI; multi-node training uses CoreWeave; fault-tolerant cheap iteration uses Vast.ai or RunPod Community.

Utilization rate matters more than hourly rate

GPU utilization rate determines total cost more than hourly pricing. An idle GPU at $1.29/hr costs the same as a fully-used one; a $1.29/hr A100 used 25 percent of the time costs $5.16 per useful hour. The honest framework: estimate your monthly utilization rate before picking a vendor. For under 25 percent utilization (variable bursty workloads), serverless or pay-as-you-go (Modal, Replicate, Together AI) eliminates idle cost. For 25-75 percent utilization (steady but not always-on), hourly on-demand (Lambda, CoreWeave) is the realistic baseline. For over 75 percent utilization (steady production), reserved discounts (Lambda 50 percent off, CoreWeave 25-40 percent off) pay back within months. Quarterly utilization audit: track actual GPU-hours versus paid GPU-hours; if utilization dropped below 25 percent, evaluate switching to serverless or pay-as-you-go.

Spot tiers: when fault tolerance enables half-price GPU

Spot and interruptible tiers (Vast.ai Interruptible, RunPod Community Cloud) cut GPU costs in half for fault-tolerant workloads. Spot pricing works because providers monetize otherwise-idle capacity that may need to be reclaimed. The honest framework: use spot when (1) workload is fault-tolerant (model training with checkpoints, batch inference with retry, hyperparameter sweeps), (2) you can save state between interruptions, (3) cost savings outweigh the 5-20 percent productivity loss from interruptions. Avoid spot when (1) workload is latency-critical (production inference, real-time training), (2) you cannot save state efficiently, (3) reliability is load-bearing. For LLM fine-tuning with checkpointing, spot tiers reliably save 40-60 percent vs on-demand. RunPod Community H100 at $1.99/hr vs Lambda H100 at $2.49/hr saves 20 percent; Vast.ai A100 at $0.79/hr vs Lambda A100 at $1.29/hr saves 39 percent.

Serverless vs hourly vs reserved: matching to workload pattern

Pricing model selection matters more than per-hour rate. Serverless GPU functions (Modal) bill by GPU-second with auto-scaling to zero; great for variable bursty workloads, expensive for steady. Pay-as-you-go (Replicate, Together AI) bills by per-second or per-token; great for unknown workloads, surprise risk at scale. Hourly on-demand (Lambda, CoreWeave, RunPod Secure) bills per minute with persistent instances; great for development and steady inference. Reserved (Lambda, CoreWeave) bills monthly or yearly with deep discounts; great for production with known capacity needs. The honest framework: match pricing model to workload. Variable bursty inference uses serverless; development uses hourly on-demand; steady production training uses reserved. Most teams use multiple models simultaneously: hourly for development, reserved for production, serverless for inference APIs.

Multi-node distributed training: when InfiniBand matters

InfiniBand networking matters when training models large enough to require multi-node distributed training. Single-node training (single A100 or single 8x H100 server) covers most fine-tuning and small-model training. Multi-node distributed training (16+ GPUs across multiple servers) requires high-bandwidth low-latency networking for gradient synchronization; standard Ethernet networking adds significant overhead at scale. The honest framework: InfiniBand matters when (1) model size exceeds single-server GPU memory (typically 1B+ parameter dense models, 70B+ MoE models), (2) you train from scratch rather than fine-tune, (3) training time is critical. CoreWeave NVLink plus InfiniBand and Lambda 1-Click Cluster InfiniBand are the two production options. For most fine-tuning and inference workloads, InfiniBand is overkill; single-node A100 or H100 covers the use case at lower cost.

Open-source model inference: when Together AI beats raw GPU

Open-source model inference platforms (Together AI, Replicate) beat raw GPU cloud for teams running Llama, Mixtral, Qwen via API rather than self-hosting. The math: running Llama 3 70B inference on Together AI at $0.88 per 1M tokens output processes 100M tokens monthly for $88. Self-hosting Llama 3 70B on Lambda H100 at $2.49/hr requires ~24/7 instance ($1800/mo) for moderate throughput. The honest framework: open-source inference platforms pay off when (1) your monthly token volume is under 5B tokens, (2) you do not need custom model modifications, (3) you want pay-per-token pricing simplicity. For higher token volumes or custom models, dedicated GPU instances (Lambda, CoreWeave) cover better. For 5-10B monthly tokens, evaluate Together AI Pay-as-you-go vs reserved GPU instances; the breakeven shifts based on model size and reservation discounts.

Frequently asked questions

Are these prices guaranteed not to change?

Vendor pricing changes regularly. Rates here are what each vendor advertises in May 2026. Lambda Labs A100 at $1.29/hr stable. Modal A100 at $2.78/hr stable. Replicate A100 at $0.000725/sec stable. Together AI tokens at $0.10-$0.90/1M stable. CoreWeave A100 at $2.39/hr stable. RunPod Secure A100 at $1.89/hr stable. Vast.ai bid-based pricing varies; A100 ~$0.79/hr typical. Verify current rates on the vendor site.

Does Subrupt earn a commission from any of these picks?

We track which picks have approved affiliate programs in our database, and the FTC disclosure block at the top of every guide names which ones currently have a click-tracking partnership. Affiliate revenue does not change ranking. The composite math runs against the same weights for every pick regardless of partnership.

Why is Lambda Labs ranked first instead of cheapest Vast.ai?

Lambda wins both mainstream brand-recognition consensus across TechCrunch, Latent Space, and AI engineering newsletters AND uniquely-true on the mainstream-GPU-cloud flag in our composite math. Vast.ai is composite-cheapest at $0.18/hr RTX 4090 and wins the cheapest-decentralized wedge, but reliability variance makes it unsuitable for production. The editorial picks-array order leads with the most-recognized reliable GPU cloud brand.

Should I use spot or on-demand GPU?

Spot for fault-tolerant workloads with checkpoints; on-demand for production. Spot tiers (Vast.ai Interruptible, RunPod Community) cut costs 40-60 percent but instances may be reclaimed. For LLM fine-tuning with checkpointing, batch inference with retry, or hyperparameter sweeps, spot reliably saves money. For latency-critical production inference or training without checkpoints, on-demand is required. Most teams use both: spot for development and experiments; on-demand for production.

When does serverless GPU beat hourly?

When utilization rate is under 25 percent. Modal serverless GPU functions scale to zero between requests; you pay only for GPU-seconds actually used. For variable bursty inference workloads (RAG APIs, image generation, batch processing), serverless eliminates idle GPU cost. For steady production (always-on inference, ongoing training), hourly on-demand or reserved cover better. The cost crossover is around 25 percent utilization; below that, serverless wins; above, hourly wins.

Should I run Llama 3 on Together AI or self-host on Lambda?

Together AI for under 5B monthly tokens; self-host for higher volumes. Math: Llama 3 70B inference at $0.88 per 1M output tokens processes 100M tokens monthly for $88. Self-hosting on Lambda H100 ($2.49/hr) requires ~24/7 instance ($1800/mo) for moderate throughput. Together AI economics break around 5-10B monthly tokens depending on model size. For lower volumes or pay-per-token simplicity, Together AI; for higher volumes or custom model modifications, self-host.

When does CoreWeave beat Lambda for enterprise?

When you need multi-node distributed training with InfiniBand. Lambda 1-Click Cluster ships InfiniBand for 16-1024 GPU clusters; CoreWeave ships Kubernetes-native NVLink plus InfiniBand for multi-node training as the default. For training large models from scratch (1B+ parameter dense, 70B+ MoE), CoreWeave Kubernetes-native deployment plus NVLink is the historic gold standard. For single-node training and inference, Lambda is competitive at lower entry friction.

How do I cancel a GPU cloud subscription?

Hourly platforms (Lambda, CoreWeave, RunPod Secure) cancel by stopping instances; persistent storage continues billing until deleted. Pay-as-you-go (Replicate, Modal, Together AI, Vast.ai) cancel by stopping API usage; storage continues billing. Reserved contracts require negotiation through enterprise procurement; many include early termination clauses. Always export training checkpoints and model weights before cancellation; some platforms purge data 30-90 days after cancellation.

When does this guide get updated?

We aim to refresh /best/ guides quarterly when there are no major shifts, and immediately when there are. Major triggers: vendor pricing changes (rates stable through 2025-2026 with H100 generation), new entrants (Crusoe, FluidStack gaining adoption), new GPU SKUs (B200 availability shifts pricing), and major customer migrations. The lastReviewed date at the top reflects the most recent editorial sweep.

Subrupt Editorial

The team behind subrupt.com. We track subscriptions, surface cheaper alternatives, and publish buying guides where the score formula is on the page so you can recompute it yourself. We do not claim 30,000 hours of testing. What we claim is live pricing from our database, a transparent composite score, and honest savings math against a category baseline.

Last reviewed

Citations

Affiliate disclosure: Subrupt earns a commission when you switch to a service through our recommendation links. This never changes the price you pay. We only recommend services where there's a real cost or feature advantage for you, and our picks are based on the data on this page, not on which programs pay the most.

Related buying guides

Track your subscriptions on Subrupt

Add the GPU Cloud you pay for and see how much you'd save by switching.

Open dashboard

More buying guides

Independent rankings for the subscriptions worth paying for.

See all guides