Model Deployment - Tensormesh User Documentation

Tensormesh simplifies the deployment process by integrating GPU resource selection directly into your workflow. You can transition from configuration to a live API endpoint in minutes.

Deployment Tiers

On-Demand

Flexible, pay-as-you-go GPU resources for dynamic workloads.

Reserved

Dedicated GPU clusters for consistent, high-volume workloads.

Serverless

Pay-per-token API access with no infrastructure to manage.

Deploying On-Demand

Navigate to Deploy → On-Demand from the sidebar to begin.

Before You Deploy: You must have at least one valid card on file to deploy GPU resources. Add your payment method in Management → Billing → Payment Methods before attempting to provision infrastructure.

Cloud Configuration

Configure your hardware environment to match your model’s requirements.Cloud Provider — Select your preferred provider
Region — Choose a geographic location for your deployment
GPU Type — Select the specific GPU model
GPUs per Replica — Specify the number of GPUs assigned to each instance
Scaling Mode — Choose between Fixed (set a static replica count) or Auto Scaling (define min/max replicas and let the platform scale based on demand)
Deployment Name — Provide a custom name or leave empty for an auto-generated ID (optional)

Beta Feature: Auto Scaling is currently in beta. Behavior and configuration options may change as the feature matures.

When Auto Scaling is enabled, configure:Min Replicas — Minimum number of replicas (can be 0 to scale to zero)
Max Replicas — Maximum number of replicas to scale up to (currently up to 2)
Scale to Zero Delay — Seconds to wait before scaling down to zero
Scale Up Stabilization — Seconds to wait before scaling up after a spike
Scale Down Stabilization — Seconds to wait before scaling down after load drops

Model Source

Tensormesh supports multiple sources for your models.Model Library — Choose from a curated, pre-optimized model collection
HuggingFace — Deploy directly from the HuggingFace Hub (provide repository ID and token for private models)
S3 Cloud Storage — Deploy custom or fine-tuned models from your private storage (Coming Soon)

Advanced Configuration

Optimize your inference engine with specialized caching and offloading settings.

CPU Offloading

Default: EnabledOffloads the KV cache to CPU memory to manage larger contexts.

Storage Offloading

(Coming Soon)Will allow offloading the KV cache to external storage.

Review System Specifications

Review the hardware capabilities of your selection in the side panel.Architecture — The underlying GPU architecture
Memory — Total HBM available
Bandwidth — Data transfer rate
Interconnect — GPU-to-GPU communication technology

Cost Summary & Launch

The Cost Summary panel provides real-time estimates based on your configuration.Hourly Cost — The real-time rate per hour
Monthly Estimate — Projected cost for a full month of continuous operationOnce all required configuration steps are complete, click Deploy Model to initiate provisioning.

Reserved Deployments

For large-scale workloads and enterprise-grade performance, Reserved Deployments provide dedicated GPU clusters tailored to your specific infrastructure needs. Navigate to Deploy → Reserved to request a tailored capacity plan.

Requesting a Cluster

To initiate a reserved deployment, provide your cluster specifications through the request form. GPU Selection — Choose between high-compute options
Cluster Size — Define the total number of GPUs required for your workload
Timeline — Specify your deployment window
Use Case — Define the intention for the cluster

Use the “Additional Requirements” field to specify custom networking needs, storage bandwidth targets, or specific compliance and SLA requirements.

Review and Launch

Once a request is submitted, the Tensormesh team reviews your requirements:

Consultation

Our team contacts you within 1 business day to discuss your requirements.

Capacity Planning

We provide a tailored pricing and hardware roadmap based on your specifications.

Provisioning

Dedicated instances are assigned to your organization’s private mesh.

Documentation Index

​Deployment Tiers

On-Demand

Reserved

Serverless

​Deploying On-Demand

CPU Offloading

Storage Offloading

​Reserved Deployments

​Requesting a Cluster

​Review and Launch

Deployment Tiers

Deploying On-Demand

Reserved Deployments

Requesting a Cluster

Review and Launch