Deploy Model

curl --request POST \ --url https://api.tensormesh.ai/v1/models \ --header 'Authorization: Bearer <token>' \ --header 'Content-Type: application/json' \ --data ' { "modelName": "<string>", "userId": "<string>", "description": "<string>", "infra": { "cloudProvider": "CLOUD_PROVIDER_UNSPECIFIED", "nebiusRegion": "NEBIUS_REGION_UNSPECIFIED", "lambdaRegion": "LAMBDA_REGION_UNSPECIFIED", "onpremRegion": "<string>" }, "modelPath": "<string>", "gpuCount": 123, "gpuType": "GPU_TYPE_UNSPECIFIED", "modelSpec": {}, "apiKey": "<string>", "hfToken": "<string>", "kvCacheEnabled": true, "cpuOffloadingEnabled": true, "nodeId": "<string>" } '

{ "model": { "modelId": "<string>", "deploymentId": "<string>", "userId": "<string>", "description": "<string>", "modelPath": "<string>", "modelName": "<string>", "status": "MODEL_STATUS_UNSPECIFIED", "events": [ { "createdAt": "2023-11-07T05:31:56Z", "log": "<string>", "eventType": "EVENT_TYPE_UNSPECIFIED" } ], "createdAt": "2023-11-07T05:31:56Z", "updatedAt": "2023-11-07T05:31:56Z", "modelSpec": {}, "infra": { "cloudProvider": "CLOUD_PROVIDER_UNSPECIFIED", "nebiusRegion": "NEBIUS_REGION_UNSPECIFIED", "lambdaRegion": "LAMBDA_REGION_UNSPECIFIED", "onpremRegion": "<string>" }, "gpuCount": 123, "gpuType": "GPU_TYPE_UNSPECIFIED", "replicas": 123, "endpoint": "<string>", "apiKey": "<string>" } }

Authorizations

Authorization

string

header

required

Bearer authentication using an access token. Format: Bearer <access_token>

Body

application/json

modelName

string

Model nick name. Must be unique per user.

userId

string

User ID who owns this model. Must be a valid UUID.

description

string

Optional description of the model.

infra

object

Infra specifies the infrastructure configuration for deploying and running models.

This message defines where a model deployment should run by specifying both the cloud provider and the specific region. It uses a oneof for region selection to ensure type-safe region specification based on the chosen provider.

See also: tensormesh/common/v1/cloud_provider.proto for provider and region enum definitions

Show child attributes

modelPath

string

Model path (e.g., HuggingFace model ID).

gpuCount

integer<int64>

Number of GPUs to allocate for this model.

gpuType

enum<string>

default:GPU_TYPE_UNSPECIFIED

GPUType specifies the type of GPU to use for a model deployment.

This enum defines the supported GPU types for model deployments. It allows clients to specify the exact GPU hardware they need for their models.

enum definitions

Available options:

GPU_TYPE_UNSPECIFIED,

GPU_TYPE_A100,

GPU_TYPE_H100,

GPU_TYPE_H200,

GPU_TYPE_B200

modelSpec

object

Additional model-specific configuration.

apiKey

string

hfToken

string

kvCacheEnabled

boolean

Enable KV cache.

cpuOffloadingEnabled

boolean

Enable CPU offloading.

nodeId

string

Response

A successful response.

model

object

Model represents a model instance created by user.

Show child attributes

Get Started

On-Demand Inference

Serverless Inference

Models

Billing - Balance

Billing - Address

Billing - Transactions

Billing - Pricing

Billing - Products

Billing - Stripe

Billing - Model Billing

Observability

Activity

Support

Support - Reserved Deployments

User

Admin - Models

Admin - Users

Admin - Billing

Admin - Products

Admin - Pricing

Authorizations

Body

Response