Skip to main content
POST
https://api.tensormesh.ai
/
v1
/
models
Deploy Model
curl --request POST \
  --url https://api.tensormesh.ai/v1/models \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "modelName": "<string>",
  "userId": "<string>",
  "description": "<string>",
  "infra": {
    "cloudProvider": "CLOUD_PROVIDER_UNSPECIFIED",
    "nebiusRegion": "NEBIUS_REGION_UNSPECIFIED",
    "lambdaRegion": "LAMBDA_REGION_UNSPECIFIED",
    "onpremRegion": "<string>"
  },
  "modelPath": "<string>",
  "gpuCount": 123,
  "gpuType": "GPU_TYPE_UNSPECIFIED",
  "modelSpec": {},
  "apiKey": "<string>",
  "hfToken": "<string>",
  "kvCacheEnabled": true,
  "cpuOffloadingEnabled": true,
  "nodeId": "<string>"
}
'
{
  "model": {
    "modelId": "<string>",
    "deploymentId": "<string>",
    "userId": "<string>",
    "description": "<string>",
    "modelPath": "<string>",
    "modelName": "<string>",
    "status": "MODEL_STATUS_UNSPECIFIED",
    "events": [
      {
        "createdAt": "2023-11-07T05:31:56Z",
        "log": "<string>",
        "eventType": "EVENT_TYPE_UNSPECIFIED"
      }
    ],
    "createdAt": "2023-11-07T05:31:56Z",
    "updatedAt": "2023-11-07T05:31:56Z",
    "modelSpec": {},
    "infra": {
      "cloudProvider": "CLOUD_PROVIDER_UNSPECIFIED",
      "nebiusRegion": "NEBIUS_REGION_UNSPECIFIED",
      "lambdaRegion": "LAMBDA_REGION_UNSPECIFIED",
      "onpremRegion": "<string>"
    },
    "gpuCount": 123,
    "gpuType": "GPU_TYPE_UNSPECIFIED",
    "replicas": 123,
    "endpoint": "<string>",
    "apiKey": "<string>"
  }
}

Authorizations

Authorization
string
header
required

Bearer authentication using an access token. Format: Bearer <access_token>

Body

application/json
modelName
string

Model nick name. Must be unique per user.

userId
string

User ID who owns this model. Must be a valid UUID.

description
string

Optional description of the model.

infra
object

Infra specifies the infrastructure configuration for deploying and running models.

This message defines where a model deployment should run by specifying both the cloud provider and the specific region. It uses a oneof for region selection to ensure type-safe region specification based on the chosen provider.

See also: tensormesh/common/v1/cloud_provider.proto for provider and region enum definitions

modelPath
string

Model path (e.g., HuggingFace model ID).

gpuCount
integer<int64>

Number of GPUs to allocate for this model.

gpuType
enum<string>
default:GPU_TYPE_UNSPECIFIED

GPUType specifies the type of GPU to use for a model deployment.

This enum defines the supported GPU types for model deployments. It allows clients to specify the exact GPU hardware they need for their models.

enum definitions

Available options:
GPU_TYPE_UNSPECIFIED,
GPU_TYPE_A100,
GPU_TYPE_H100,
GPU_TYPE_H200,
GPU_TYPE_B200
modelSpec
object

Additional model-specific configuration.

apiKey
string
hfToken
string
kvCacheEnabled
boolean

Enable KV cache.

cpuOffloadingEnabled
boolean

Enable CPU offloading.

nodeId
string

Response

A successful response.

model
object

Model represents a model instance created by user.