Skip to main content
Reserved Deployments provide dedicated GPU clusters tailored to your specific infrastructure needs — guaranteed capacity, consistent performance, and no resource contention. Navigate to Deploy → Reserved to submit a request. Reserved

When To Use Reserved

High-Volume Production

Workloads that require consistent throughput at a scale where serverless costs exceed a flat cluster rate.

Latency SLAs

Applications with strict latency requirements that need dedicated, non-shared GPU resources.

Enterprise Compliance

Deployments that require data isolation, custom networking, or specific compliance guarantees.

Tailored Pricing

A flat cluster rate replaces variable per-token billing — easier to budget at scale and priced to your specific workload and capacity requirements.

Pricing

Reserved deployments start from $250k / year. A 20% deposit is required upfront to reserve your nodes, and pricing is locked for the full contract term.

Dedicated Nodes

Nodes are allocated exclusively to your account and not shared with other tenants.

Rate Lock

Pricing is fixed for the duration of your contract term.

SLA-Backed

Uptime guarantee backed by a formal SLA agreement.

Requesting a Cluster

Submit a request through the form at Deploy → Reserved. Contact Info Full Name — Your name (pre-filled from your profile)
Work Email — Your business email address
Company — Your organization name
Job Title — Your role (optional)
Cluster Requirements Preferred GPU — Choose from NVIDIA H200 or NVIDIA B200
Total GPUs Needed — Capacity is reserved in full nodes of 8 GPUs each; select 8 (1 node), 16 (2 nodes), or 24 (3 nodes)
Contract Length — Choose a 1-year (12 months) or 2-year (24 months) term; pricing is locked for the full duration
Workload / Use Case — Select the primary workload type: Inference / Serving, RAG / Retrieval-Augmented Apps, Agentic / Tool-Using Systems, or Other
Use the Networking / Storage field to describe bandwidth targets, storage needs, or interconnect preferences. Use Other Notes for compliance requirements, SLA expectations, or anything else the team should know.

What Happens Next

1

Submit Your Request

Fill out the form with your GPU and workload requirements and click Submit Request.
2

We Reach Out

Our team reviews your needs and follows up within 1 business day with a custom proposal.
3

Nodes Provisioned

Once the contract is signed and deposit received, we spin up your dedicated cluster.

Not Ready for Reserved?

Start with Serverless Inference — instant access, pay-per-token, no setup required. Serverless is suitable for most development and production API workloads. Move to reserved when volume or latency requirements exceed what serverless offers.
You can also reach us directly via Management → Contact Us to discuss your capacity needs before submitting a formal request.