429 as rate limiting, honor Retry-After when present, and be conservative about retrying non-idempotent POST requests automatically.
Use it when you want to:
- make a first successful Inference API request with
curlfrom explicit environment variables - make a first successful Control Plane request with
curl - optionally derive local operator values from the supported CLI login flow
1. Choose The Surface
- Control Plane: management APIs such as users, models, billing, tickets, logs, and metrics
- Inference API:
- Serverless: OpenAI-compatible
POST /v1/chat/completions - On-Demand: near-compatible
POST /v1/chat/completionsplus routedmodels,completions,responses,tokenize,detokenize,health, andversionwith requiredX-User-Id
- Serverless: OpenAI-compatible
- Control Plane uses
Authorization: Bearer <access_token> - Inference API uses:
- Serverless:
Authorization: Bearer <API_KEY>for POST routes; the public host also servesGET /v1/models,GET /health, andGET /versionwithout auth - On-Demand:
Authorization: Bearer <API_KEY>plusX-User-Id: <uuid>
- Serverless:
2. Fastest Standalone Inference Request
If you already have explicit inference credentials, you do not need the CLI for a first raw inference request.On-Demand
Use the provider-specific Tensormesh host, your inference API key, your user id, and the served gateway model name:modelId UUID.
Other On-Demand routes on the routed host are /v1/models, /v1/completions, /v1/responses, /tokenize, /detokenize, /health, and /version. Use the dedicated pages under On-Demand API Reference when you need those request and response shapes.
Serverless
Serverless does not sendX-User-Id:
YOUR_SERVERLESS_MODEL_NAME with a serverless model name that is available on your target host.
Other verified serverless routes on this host are /v1/models, /v1/completions, /v1/responses, /tokenize, /detokenize, /health, and /version. Use the dedicated pages under Serverless API Reference when you need those request and response shapes.
If you have Control Plane access for the same Tensormesh environment, discover
published serverless models with tm billing pricing serverless list and use
the returned pricing[].model value in the request body. If you only have
inference credentials, or you are targeting a different serverless host
override, ask your operator or admin for the exact serverless model string
for that host before sending
the request. Read Choose A Serverless Model Name
if you need the full decision flow.
Streaming Example
Serverless SSE example:X-User-Id: $GATEWAY_USER_ID.
The same SSE contract also applies to POST /v1/completions and POST /v1/responses when the request body includes "stream": true.
In both cases the stream is emitted as data-only SSE and terminates with
data: [DONE].
3. Get A Control Plane Bearer Token
If you already have a Control Plane bearer token, export it directly:tm init --sync once as well. That setup
persists controlplane_base into the active config.toml, so later
Control Plane-assisted flows such as --model @latest keep using the same
environment:
tm auth whoami and the request below both use GET /auth/profile, which is the stable bearer-token validation endpoint for the Control Plane.
4. First Control Plane Request
Use the current default Control Plane base URL, or replace it with an explicit override for your environment. If you are already using the CLI flow, the current default Control Plane host ishttps://api.tensormesh.ai, and you
can confirm whether you are still on that host or on an environment-specific
override by inspecting the resolved controlplane_base first:
curl:
controlplane_base in plain tm --output json config show, or use
values.controlplane_base and sources.controlplane_base from the
--sources form when you need both the resolved host and its source. If you
are not using the CLI flow, set the environment-specific host explicitly
instead:
5. Optional CLI-Assisted Inference Request
If you are using the standard local operator flow, sync the managed gateway values first:--controlplane-base value here so the active config.toml persists that host
for later @latest and Control Plane-assisted flows:
curl call:
tm init --sync stores the served gateway model name under [managed].gateway_model_id. gateway_model_id is the config key name; its value is the served gateway model name string you send as model. The shell variable in this example is called GATEWAY_MODEL_NAME to make that meaning explicit.
Then call the chat endpoint directly:
6. What Is Public Versus CLI-Flow Internal
GET /auth/profileis a stable bearer-token endpoint and is published in the Control Plane API reference./auth/cli/start,/auth/cli/exchange, and/auth/cli/refreshare used by the CLI browser-login flow. They are documented in the CLI auth guide, but they are not the stable raw-API integration surface for external clients.
7. If Something Fails
401on Control Plane:- run
tm auth whoamiagain - refresh with
tm auth refresh
- run
401on Gateway:- check the explicit API key you passed, or
[managed].gateway_api_keyif you are using the CLI-assisted flow
- check the explicit API key you passed, or
404or routing failures on Gateway:- check
X-User-Id - confirm the served gateway model name, not the Control Plane
modelId
- check
- not sure which credentials are loaded:
- run
tm auth status --exit-status - run
tm infer doctor --exit-status
- run

