Skip to main content
This page covers the fastest raw-API paths in Tensormesh. For direct HTTP callers, treat 429 as rate limiting, honor Retry-After when present, and be conservative about retrying non-idempotent POST requests automatically. Use it when you want to:
  • make a first successful Inference API request with curl from explicit environment variables
  • make a first successful Control Plane request with curl
  • optionally derive local operator values from the supported CLI login flow

1. Choose The Surface

  • Control Plane: management APIs such as users, models, billing, tickets, logs, and metrics
  • Inference API:
    • Serverless: OpenAI-compatible POST /v1/chat/completions
    • On-Demand: near-compatible POST /v1/chat/completions plus routed models, completions, responses, tokenize, detokenize, health, and version with required X-User-Id
You can use both from the same machine, but they authenticate differently:
  • Control Plane uses Authorization: Bearer <access_token>
  • Inference API uses:
    • Serverless: Authorization: Bearer <API_KEY> for POST routes; the public host also serves GET /v1/models, GET /health, and GET /version without auth
    • On-Demand: Authorization: Bearer <API_KEY> plus X-User-Id: <uuid>

2. Fastest Standalone Inference Request

If you already have explicit inference credentials, you do not need the CLI for a first raw inference request.

On-Demand

Use the provider-specific Tensormesh host, your inference API key, your user id, and the served gateway model name:
GATEWAY_BASE="https://external.nebius.tensormesh.ai"
GATEWAY_API_KEY="YOUR_INFERENCE_API_KEY"
GATEWAY_USER_ID="00000000-0000-0000-0000-000000000000"
GATEWAY_MODEL_NAME="YOUR_SERVED_GATEWAY_MODEL_NAME"

curl -sS \
  -H "Authorization: Bearer $GATEWAY_API_KEY" \
  -H "X-User-Id: $GATEWAY_USER_ID" \
  -H "Content-Type: application/json" \
  "$GATEWAY_BASE/v1/chat/completions" \
  -d '{
    "model": "'"$GATEWAY_MODEL_NAME"'",
    "messages": [
      {"role": "user", "content": "Say hello."}
    ]
  }'
Use the served gateway model name here, not the Control Plane modelId UUID. Other On-Demand routes on the routed host are /v1/models, /v1/completions, /v1/responses, /tokenize, /detokenize, /health, and /version. Use the dedicated pages under On-Demand API Reference when you need those request and response shapes.

Serverless

Serverless does not send X-User-Id:
SERVERLESS_BASE="https://serverless.tensormesh.ai"
SERVERLESS_API_KEY="YOUR_INFERENCE_API_KEY"
SERVERLESS_MODEL_NAME="YOUR_SERVERLESS_MODEL_NAME"

curl -sS \
  -H "Authorization: Bearer $SERVERLESS_API_KEY" \
  -H "Content-Type: application/json" \
  "$SERVERLESS_BASE/v1/chat/completions" \
  -d '{
    "model": "'"$SERVERLESS_MODEL_NAME"'",
    "messages": [
      {"role": "user", "content": "Say hello."}
    ]
  }'
Replace YOUR_SERVERLESS_MODEL_NAME with a serverless model name that is available on your target host. Other verified serverless routes on this host are /v1/models, /v1/completions, /v1/responses, /tokenize, /detokenize, /health, and /version. Use the dedicated pages under Serverless API Reference when you need those request and response shapes. If you have Control Plane access for the same Tensormesh environment, discover published serverless models with tm billing pricing serverless list and use the returned pricing[].model value in the request body. If you only have inference credentials, or you are targeting a different serverless host override, ask your operator or admin for the exact serverless model string for that host before sending the request. Read Choose A Serverless Model Name if you need the full decision flow.

Streaming Example

Serverless SSE example:
curl -N \
  -H "Authorization: Bearer $SERVERLESS_API_KEY" \
  -H "Accept: text/event-stream" \
  -H "Content-Type: application/json" \
  "$SERVERLESS_BASE/v1/chat/completions" \
  -d '{
    "model": "'"$SERVERLESS_MODEL_NAME"'",
    "stream": true,
    "messages": [
      {"role": "user", "content": "Reply with two short tokens."}
    ]
  }'
On-Demand streaming uses the same request body shape plus X-User-Id: $GATEWAY_USER_ID. The same SSE contract also applies to POST /v1/completions and POST /v1/responses when the request body includes "stream": true. In both cases the stream is emitted as data-only SSE and terminates with data: [DONE].

3. Get A Control Plane Bearer Token

If you already have a Control Plane bearer token, export it directly:
TOKEN="YOUR_CONTROL_PLANE_TOKEN"
If you are using the standard CLI login flow instead, log in first:
tm auth login
tm auth whoami
If you need to target a different Control Plane host for this session, set it explicitly before login:
tm --controlplane-base https://api.gcpstaging.tensormesh.ai auth login
If you are using the standard local operator flow for that environment, run the same explicit Control Plane base on tm init --sync once as well. That setup persists controlplane_base into the active config.toml, so later Control Plane-assisted flows such as --model @latest keep using the same environment:
tm --controlplane-base https://api.gcpstaging.tensormesh.ai init --sync
Then, in a controlled shell, capture the current bearer token:
TOKEN="$(tm auth print-token --yes-i-know)"
tm auth whoami and the request below both use GET /auth/profile, which is the stable bearer-token validation endpoint for the Control Plane.

4. First Control Plane Request

Use the current default Control Plane base URL, or replace it with an explicit override for your environment. If you are already using the CLI flow, the current default Control Plane host is https://api.tensormesh.ai, and you can confirm whether you are still on that host or on an environment-specific override by inspecting the resolved controlplane_base first:
tm --output json config show --sources
If you are using the CLI flow, export the currently resolved host before you run curl:
CONTROLPLANE_BASE="$(tm --output json config show | python3 -c 'import json,sys; print(json.load(sys.stdin)[\"controlplane_base\"])')"
Look at controlplane_base in plain tm --output json config show, or use values.controlplane_base and sources.controlplane_base from the --sources form when you need both the resolved host and its source. If you are not using the CLI flow, set the environment-specific host explicitly instead:
CONTROLPLANE_BASE="https://YOUR_CONTROLPLANE_BASE"
Use the Control Plane host for your environment here. If you are not relying on the CLI-managed value, ask your operator or admin for the correct host before sending the request. Validate the token directly:
curl -sS \
  -H "Authorization: Bearer $TOKEN" \
  "${CONTROLPLANE_BASE}/auth/profile"
Then fetch a common resource:
curl -sS \
  -H "Authorization: Bearer $TOKEN" \
  "${CONTROLPLANE_BASE}/v1/models?size=10"

5. Optional CLI-Assisted Inference Request

If you are using the standard local operator flow, sync the managed gateway values first:
tm auth login
tm init --sync
If you are targeting a different Control Plane host, pass the same --controlplane-base value here so the active config.toml persists that host for later @latest and Control Plane-assisted flows:
tm --controlplane-base https://api.gcpstaging.tensormesh.ai auth login
tm --controlplane-base https://api.gcpstaging.tensormesh.ai init --sync
Then export the synced values for a raw curl call:
eval "$(
python3 - <<'PY'
from pathlib import Path
import os
import shlex
import tomllib

config_root = Path(
    os.environ.get("TM_CONFIG_HOME", "~/.config/tensormesh")
).expanduser()
config_path = config_root / "config.toml"
data = tomllib.loads(config_path.read_text(encoding="utf-8"))
managed = data["managed"]
for env_name, key in (
    ("GATEWAY_API_KEY", "gateway_api_key"),
    ("GATEWAY_USER_ID", "gateway_user_id"),
    ("GATEWAY_MODEL_NAME", "gateway_model_id"),
):
    print(f"export {env_name}={shlex.quote(str(managed[key]))}")
PY
)"
GATEWAY_BASE="$(
tm --output json config show | python3 -c '
import json
import sys

print(json.load(sys.stdin)["gateway_base"])
'
)"
tm init --sync stores the served gateway model name under [managed].gateway_model_id. gateway_model_id is the config key name; its value is the served gateway model name string you send as model. The shell variable in this example is called GATEWAY_MODEL_NAME to make that meaning explicit. Then call the chat endpoint directly:
curl -sS \
  -H "Authorization: Bearer $GATEWAY_API_KEY" \
  -H "X-User-Id: $GATEWAY_USER_ID" \
  -H "Content-Type: application/json" \
  "$GATEWAY_BASE/v1/chat/completions" \
  -d '{
    "model": "'"$GATEWAY_MODEL_NAME"'",
    "messages": [
      {"role": "user", "content": "Say hello."}
    ]
  }'

6. What Is Public Versus CLI-Flow Internal

  • GET /auth/profile is a stable bearer-token endpoint and is published in the Control Plane API reference.
  • /auth/cli/start, /auth/cli/exchange, and /auth/cli/refresh are used by the CLI browser-login flow. They are documented in the CLI auth guide, but they are not the stable raw-API integration surface for external clients.

7. If Something Fails

  • 401 on Control Plane:
    • run tm auth whoami again
    • refresh with tm auth refresh
  • 401 on Gateway:
    • check the explicit API key you passed, or [managed].gateway_api_key if you are using the CLI-assisted flow
  • 404 or routing failures on Gateway:
    • check X-User-Id
    • confirm the served gateway model name, not the Control Plane modelId
  • not sure which credentials are loaded:
    • run tm auth status --exit-status
    • run tm infer doctor --exit-status