Together AI
5.7/10$2,040/yr moreBest open-source model inference with 200+ models and pay-per-token
Open-source model inference with 200+ models and pay-per-token pricing plus fine-tuning.
| Plan | Monthly | What you get |
|---|---|---|
| Free | Free | $5 free credits with 200+ open-source models and inference API plus fine-tuning. |
| Pay-as-you-go | Free | $0.10-$0.90 per 1M tokens by model plus GPU instances $1.49+/hr H100. |
| Pro | $200.00/mo | $200 monthly plus usage with priority queue and custom dedicated instances. |
| Enterprise | $5,000.00/mo | Reserved H100 plus Together Cluster with SOC 2 and HIPAA available. |
Together AI is the open-source model inference platform for teams running Llama, Mixtral, Qwen, and other open-source models without managing GPU infrastructure. Founded in 2022 in San Francisco, Together AI positions around the inference-as-a-service shape with 200+ open-source models plus pay-per-token pricing.
Four tiers serve four buyer profiles. Free ships $5 credits with 200+ models plus inference API plus fine-tuning. Pay-as-you-go ships at $0.10-$0.90 per 1M tokens plus H100 instances at $1.49+/hr. Pro ships at $200 monthly plus usage with priority queue. Enterprise ships custom with reserved H100 plus SOC 2 plus HIPAA available.
The load-bearing wedge is the open-source model inference shape. Where Lambda, CoreWeave, and Modal target raw GPU compute, Together AI targets the use case where teams want to run Llama 3 or Mixtral without provisioning GPUs. The pay-per-token pricing makes inference economics simpler than per-hour GPU costs. The catch is the model-specific pricing variance. For teams running open-source models without infrastructure, Together AI Pay-as-you-go covers the use case better than Lambda raw GPU.
Pros
- 200+ open-source models available via inference API
- Pay-per-token pricing simpler than per-hour GPU economics
- Custom fine-tuning managed
- H100 instances at $1.49+/hr competitive
- SOC 2 plus HIPAA on Enterprise
Cons
- Per-token pricing variance by model complicates budgeting
- For raw GPU compute, dedicated platforms cover better at high scale
Best for: Teams running open-source models without managing GPU infrastructure. Free $5 credits; Pay-as-you-go from $0/mo; Pro at $200/mo for priority.
- Compliance & residency
- 8
- GPU availability
- 9
- Setup complexity
- 10
- Value
- 9
- Support
- 8