API Quickstart - Tensormesh User Documentation

This page covers the fastest raw-API paths in Tensormesh. For direct HTTP callers, treat 429 as rate limiting, honor Retry-After when present, and be conservative about retrying non-idempotent POST requests automatically. Use it when you want to:

make a first successful Inference API request with curl from explicit environment variables
make a first successful Control Plane request with curl

1. Choose The Surface

Control Plane: management APIs such as users, models, billing, tickets, logs, and metrics
Inference API: serverless OpenAI-compatible POST /v1/chat/completions plus models, completions, responses, tokenize, detokenize, health, and version

You can use both from the same machine, but they authenticate differently:

Control Plane uses Authorization: Bearer <access_token>
Inference API uses Authorization: Bearer <API_KEY> for POST routes; the public host also serves GET /v1/models, GET /health, and GET /version without auth

2. Fastest Standalone Inference Request

If you already have explicit inference credentials, you do not need the CLI for a first raw inference request.

SERVERLESS_BASE="https://serverless.tensormesh.ai"
SERVERLESS_API_KEY="YOUR_INFERENCE_API_KEY"
SERVERLESS_MODEL_NAME="YOUR_SERVERLESS_MODEL_NAME"

curl -sS \
  -H "Authorization: Bearer $SERVERLESS_API_KEY" \
  -H "Content-Type: application/json" \
  "$SERVERLESS_BASE/v1/chat/completions" \
  -d '{
    "model": "'"$SERVERLESS_MODEL_NAME"'",
    "messages": [
      {"role": "user", "content": "Say hello."}
    ]
  }'

Replace YOUR_SERVERLESS_MODEL_NAME with a serverless model name that is available on your target host. Other verified serverless routes on this host are /v1/models, /v1/completions, /v1/responses, /tokenize, /detokenize, /health, and /version. Use the dedicated pages under Serverless API Reference when you need those request and response shapes. If you have Control Plane access for the same Tensormesh environment, discover published serverless models with tm billing pricing serverless list and use the returned pricing[].model value in the request body. If you only have inference credentials, or you are targeting a different serverless host override, ask your operator or admin for the exact serverless model string for that host before sending the request. Read Choose A Serverless Model Name if you need the full decision flow.

Streaming Example

Serverless SSE example:

curl -N \
  -H "Authorization: Bearer $SERVERLESS_API_KEY" \
  -H "Accept: text/event-stream" \
  -H "Content-Type: application/json" \
  "$SERVERLESS_BASE/v1/chat/completions" \
  -d '{
    "model": "'"$SERVERLESS_MODEL_NAME"'",
    "stream": true,
    "messages": [
      {"role": "user", "content": "Reply with two short tokens."}
    ]
  }'

The same SSE contract also applies to POST /v1/completions and POST /v1/responses when the request body includes "stream": true. The stream is emitted as data-only SSE and terminates with data: [DONE].

3. Get A Control Plane Bearer Token

If you already have a Control Plane bearer token, export it directly:

TOKEN="YOUR_CONTROL_PLANE_TOKEN"

If you are using the standard CLI login flow instead, log in first:

tm auth login
tm auth whoami

If you need to target a different Control Plane host for this session, set it explicitly before login:

tm --controlplane-base https://api.gcpstaging.tensormesh.ai auth login

Then, in a controlled shell, capture the current bearer token:

TOKEN="$(tm auth print-token --yes-i-know)"

tm auth whoami and the request below both use GET /auth/profile, which is the stable bearer-token validation endpoint for the Control Plane.

4. First Control Plane Request

Use the current default Control Plane base URL, or replace it with an explicit override for your environment. If you are already using the CLI flow, the current default Control Plane host is https://api.tensormesh.ai, and you can confirm whether you are still on that host or on an environment-specific override by inspecting the resolved controlplane_base first:

tm --output json config show --sources

If you are using the CLI flow, export the currently resolved host before you run curl:

CONTROLPLANE_BASE="$(tm --output json config show | python3 -c 'import json,sys; print(json.load(sys.stdin)[\"controlplane_base\"])')"

If you are not using the CLI flow, set the environment-specific host explicitly instead:

CONTROLPLANE_BASE="https://YOUR_CONTROLPLANE_BASE"

Validate the token directly:

curl -sS \
  -H "Authorization: Bearer $TOKEN" \
  "${CONTROLPLANE_BASE}/auth/profile"

Then fetch a common resource:

curl -sS \
  -H "Authorization: Bearer $TOKEN" \
  "${CONTROLPLANE_BASE}/v1/models?size=10"

5. What Is Public Versus CLI-Flow Internal

GET /auth/profile is a stable bearer-token endpoint and is published in the Control Plane API reference.
/auth/cli/start, /auth/cli/exchange, and /auth/cli/refresh are used by the CLI browser-login flow. They are documented in the CLI auth guide, but they are not the stable raw-API integration surface for external clients.

6. If Something Fails

401 on Control Plane:
- run tm auth whoami again
- refresh with tm auth refresh
401 on inference:
- check the explicit API key you passed, or [overrides].gateway_api_key in config.toml if you are using the CLI-assisted flow
not sure which credentials are loaded:
- run tm auth status --exit-status
- run tm infer doctor --exit-status

​1. Choose The Surface

​2. Fastest Standalone Inference Request

​Streaming Example

​3. Get A Control Plane Bearer Token

​4. First Control Plane Request

​5. What Is Public Versus CLI-Flow Internal

​6. If Something Fails

​Related Docs