Skip to main content
Tensormesh simplifies the deployment process by integrating GPU resource selection directly into your workflow. You can transition from configuration to a live API endpoint in minutes.

Deployment Tiers

On-Demand

Flexible, pay-as-you-go GPU resources for dynamic workloads.

Reserved

Dedicated GPU clusters for consistent, high-volume workloads.

Serverless

Auto-scaling deployments with no infrastructure management (Coming Soon).

Deploying On-Demand

Navigate to Deployments → On-Demand from the sidebar to begin. On Demand Deploy
Before You Deploy: You must have at least one valid card on file to deploy GPU resources. Add your payment method in Management → Billing → Payment Methods before attempting to provision infrastructure.
1

Cloud Configuration

Configure your hardware environment to match your model’s requirements.Cloud Provider — Select your preferred provider
Region — Choose a geographic location for your deployment
GPU Type — Select the specific GPU model
GPUs per Replica — Specify the number of GPUs assigned to each instance
Number of Replicas — Set how many parallel instances to run to handle traffic volume
Deployment Name — Provide a custom name or leave empty for an auto-generated ID (optional)
2

Model Source

Tensormesh supports multiple sources for your models.Model Library — Choose from a curated, pre-optimized model collection
HuggingFace — Deploy directly from the HuggingFace Hub (provide repository ID and token for private models)
S3 Cloud Storage — Deploy custom or fine-tuned models from your private storage (Coming Soon)
3

Advanced Configuration

Optimize your inference engine with specialized caching and offloading settings.

CPU Offloading

Default: EnabledOffloads the KV cache to CPU memory to manage larger contexts.

Storage Offloading

(Coming Soon)Will allow offloading the KV cache to external storage.
4

Review System Specifications

Review the hardware capabilities of your selection in the side panel.Architecture — The underlying GPU architecture
Memory — Total HBM available
Bandwidth — Data transfer rate
Interconnect — GPU-to-GPU communication technology
5

Cost Summary & Launch

The Cost Summary panel provides real-time estimates based on your configuration.Hourly Cost — The real-time rate per hour
Monthly Estimate — Projected cost for a full month of continuous operation
Once all required configuration steps are complete, click Deploy Model to initiate provisioning.

Reserved Deployments

For large-scale workloads and enterprise-grade performance, Reserved Deployments provide dedicated GPU clusters tailored to your specific infrastructure needs. Navigate to Deployments → Reserved to request a tailored capacity plan. Reserved

Requesting a Cluster

To initiate a reserved deployment, provide your cluster specifications through the request form. GPU Selection — Choose between high-compute options
Cluster Size — Define the total number of GPUs required for your workload
Timeline — Specify your deployment window
Use Case — Define the intention for the cluster
Use the “Additional Requirements” field to specify custom networking needs, storage bandwidth targets, or specific compliance and SLA requirements.

Review and Launch

Once a request is submitted, the Tensormesh team reviews your requirements:
1

Consultation

Our team contacts you within 1 business day to discuss your requirements.
2

Capacity Planning

We provide a tailored pricing and hardware roadmap based on your specifications.
3

Provisioning

Dedicated instances are assigned to your organization’s private mesh.